New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Escape regex special characters in FITS keyword values used for IFC filtering #770
Conversation
In `ImageFileCollection()`, the method `_find_keywords_by_values()` uses regular expressions to find matches for specified FITS keywords by value. In the case where a FITS keyword value has parentheses (whether for a mathematical or facility reason), the regex search interprets the parenthesis as a control character and does not correctly match the string as specified, returning none of the desired frames. As an example from the Lowell Observatory Discovery Telescope (LDT) DeVeny Optical Spectrograph, we specify order-blocking filters both by name and filter wheel position, e.g.: FILTREAR= 'Clear (C)' / Rear Filter There is logic in the `_find_keywords_by_values()` method to specify that one is supplying a regular expression for search, but the method also constructs a simple regex search based on the specified keyword value if `regex_match=False`. This commit adds escape characters to parentheses in the case where the user has NOT specified an input regular expression. The resulting pattern search is successful. modified: ccdproc/image_collection.py
modified: CHANGES.rst
The actual test would require test data that contains a FITS keyword whose value includes a parenthesis. An example is given in the comment. modified: ccdproc/tests/test_image_collection.py
Codecov Report
@@ Coverage Diff @@
## main #770 +/- ##
==========================================
+ Coverage 95.51% 95.94% +0.43%
==========================================
Files 30 30
Lines 3944 3922 -22
==========================================
- Hits 3767 3763 -4
+ Misses 177 159 -18
Continue to review full report at Codecov.
|
modified: CHANGES.rst
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening this @tbowers7. I think with one small change it would fix #769, which reports a similar issue. When the regex search was added ~2 years ago I didn't even think about the fact that the value
might have regex characters in it.
Do you have time to revise this week? I'm hoping to release v2.2 by Monday and would like to include this.
If you don't have time, let me know -- I can either delay the release or do the revision.
Thanks!
ccdproc/image_collection.py
Outdated
@@ -710,6 +710,8 @@ def _find_keywords_by_values(self, **kwd): | |||
pattern = re.compile(value, | |||
flags=re.IGNORECASE) | |||
else: | |||
# Add escape characters to parentheses: | |||
value = value.replace('(',r'\(').replace(')',r'\)') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would fix #769 it were this instead, so that all special characters were escaped:
value = value.replace('(',r'\(').replace(')',r'\)') | |
value = re.escape(value) |
def test_filter_with_parenthetical_keyword_values(self, triage_setup): | ||
# Would need test data containing FITS keywords whose values have () | ||
# e.g.: FILTREAR= 'Clear (C)' / Rear Filter | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the short example that @mkelley provided in #769 could be used as the body of the test here, or at least as the start of a test. It generates a FITS file with a special character in it then tries to filter by a value with the special character.
I can add a credit to @mkelley for the commit with the test body if you end up using a lot of his sample code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will add this test this week. Change to use re.escape() already committed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent, thanks!
The original PR for this issue [astropy#770] manually added escapes for parentheses. As it turns out, python is smarter than I am and already has a function to find and escape all regex special characters in a string. This change will allow any regex special character to exist in keyword filtering requests to `ImageFileCollection()`. modified: ccdproc/image_collection.py
Update CHANGES.rst to reflect commit 42c1135 where, now, ALL regex special characters are escaped rather than just manually escaping parentheses. modified: CHANGES.rst
In ``ImageFileCollection()``, filtering by keyword is done with use of regular expressions. In PR astropy#770, added escaping of any regex special characters that might be present in the keyword used for filtering a collection. This test checks that keywords containing regex special characters are correctly escaped and therefore properly return the desired filtered collection. modified: CHANGES.rst modified: ccdproc/tests/test_image_collection.py
Test written and committed. The test was tested separately, passing when the |
modified: ccdproc/tests/test_image_collection.py
modified: ccdproc/tests/test_image_collection.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent, thanks for the quick turnaround -- just one line to delete in the test and this is good to go!
'regex_special_{:d}.fits'.format(i))) | ||
|
||
ic = ImageFileCollection(triage_setup.test_dir) | ||
#ic.summary.pprint() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please take this line out?
#ic.summary.pprint() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All done.
Remove commented-out vestigial pprint() statement from ``test_filter_on_regex_escape_characters()``. modified: ccdproc/tests/test_image_collection.py
Thanks for the fix! |
In
ImageFileCollection()
, the method_find_keywords_by_values()
uses regularexpressions to find matches for specified FITS keywords by value. In the case
where a FITS keyword value has parentheses (whether for a mathematical or
facility reason), the regex search interprets the parenthesis as a control
character and does not correctly match the string as specified, returning none
of the desired frames.
As an example from the Lowell Observatory Discovery Telescope (LDT) DeVeny
Optical Spectrograph, we specify order-blocking filters both by name and filter
wheel position, e.g.:
FILTREAR= 'Clear (C)' / Rear Filter
There is logic in the
_find_keywords_by_values()
method to specify that one issupplying a regular expression for search, but the method also constructs a
simple regex search based on the specified keyword value if
regex_match=False
.This commit adds escape characters to parentheses in the case where the user
has NOT specified an input regular expression. The resulting pattern search is
successful.
Also, add the shell of a test for parenthetical FITS keyword values. The actual
test would require test data that contains a FITS keyword whose value includes
a parenthesis. An example is given in the comment.
Please have a look at the following list and replace the "[ ]" with a "[x]" if
the answer to this question is yes.
For documentation changes:
Note that it should not if you changed any examples!
For bugfixes:
For new functionality:
Please note that the last point is not a requirement. It is meant as a check if
the pull request potentially breaks backwards-compatibility.