Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escape regex special characters in FITS keyword values used for IFC filtering #770

Merged
merged 10 commits into from May 20, 2021

Conversation

tbowers7
Copy link
Contributor

@tbowers7 tbowers7 commented May 10, 2021

In ImageFileCollection(), the method _find_keywords_by_values() uses regular
expressions to find matches for specified FITS keywords by value. In the case
where a FITS keyword value has parentheses (whether for a mathematical or
facility reason), the regex search interprets the parenthesis as a control
character and does not correctly match the string as specified, returning none
of the desired frames.

As an example from the Lowell Observatory Discovery Telescope (LDT) DeVeny
Optical Spectrograph, we specify order-blocking filters both by name and filter
wheel position, e.g.:
FILTREAR= 'Clear (C)' / Rear Filter

There is logic in the _find_keywords_by_values() method to specify that one is
supplying a regular expression for search, but the method also constructs a
simple regex search based on the specified keyword value if regex_match=False.

This commit adds escape characters to parentheses in the case where the user
has NOT specified an input regular expression. The resulting pattern search is
successful.

Also, add the shell of a test for parenthetical FITS keyword values. The actual
test would require test data that contains a FITS keyword whose value includes
a parenthesis. An example is given in the comment.

modified:   ccdproc/image_collection.py
modified:   CHANGES.rst
modified:   ccdproc/tests/test_image_collection.py

Please have a look at the following list and replace the "[ ]" with a "[x]" if
the answer to this question is yes.

  • For new contributors: Did you add yourself to the "Authors.rst" file? N/A

For documentation changes:

  • For documentation changes: Does your commit message include a "[skip ci]"? N/A
    Note that it should not if you changed any examples!

For bugfixes:

  • Did you add an entry to the "Changes.rst" file? Yes.
  • Did you add a regression test? Yes-ish. Added a shell, but requires different test data.
  • Does the commit message include a "Fixes #issue_number" (replace "issue_number"). N/A
  • Does this PR add, rename, move or remove any existing functions or parameters? No.

For new functionality:

  • Did you add an entry to the "Changes.rst" file? N/A
  • Did you include a meaningful docstring with Parameters, Returns and Examples? N/A
  • Does the commit message include a "Fixes #issue_number" (replace "issue_number"). N/A
  • Did you include tests for the new functionality? N/A
  • Does this PR add, rename, move or remove any existing functions or parameters? N/A

Please note that the last point is not a requirement. It is meant as a check if
the pull request potentially breaks backwards-compatibility.


In `ImageFileCollection()`, the method `_find_keywords_by_values()` uses regular
expressions to find matches for specified FITS keywords by value.  In the case
where a FITS keyword value has parentheses (whether for a mathematical or
facility reason), the regex search interprets the parenthesis as a control
character and does not correctly match the string as specified, returning none
of the desired frames.

As an example from the Lowell Observatory Discovery Telescope (LDT) DeVeny
Optical Spectrograph, we specify order-blocking filters both by name and filter
wheel position, e.g.:
FILTREAR= 'Clear (C)'          / Rear Filter

There is logic in the `_find_keywords_by_values()` method to specify that one is
supplying a regular expression for search, but the method also constructs a
simple regex search based on the specified keyword value if `regex_match=False`.

This commit adds escape characters to parentheses in the case where the user
has NOT specified an input regular expression.  The resulting pattern search is
successful.

	modified:   ccdproc/image_collection.py
The actual test would require test data that contains a FITS keyword whose value
includes a parenthesis.  An example is given in the comment.

	modified:   ccdproc/tests/test_image_collection.py
@codecov
Copy link

codecov bot commented May 10, 2021

Codecov Report

Merging #770 (a147fbf) into main (40bafe7) will increase coverage by 0.43%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #770      +/-   ##
==========================================
+ Coverage   95.51%   95.94%   +0.43%     
==========================================
  Files          30       30              
  Lines        3944     3922      -22     
==========================================
- Hits         3767     3763       -4     
+ Misses        177      159      -18     
Impacted Files Coverage Δ
ccdproc/image_collection.py 99.19% <100.00%> (+<0.01%) ⬆️
ccdproc/tests/test_image_collection.py 98.89% <100.00%> (+0.02%) ⬆️
ccdproc/__init__.py 100.00% <0.00%> (+11.76%) ⬆️
ccdproc/_astropy_init.py 75.00% <0.00%> (+23.64%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 40bafe7...a147fbf. Read the comment docs.

	modified:   CHANGES.rst
@mwcraig mwcraig added this to the 2.2.0 milestone May 19, 2021
Copy link
Member

@mwcraig mwcraig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this @tbowers7. I think with one small change it would fix #769, which reports a similar issue. When the regex search was added ~2 years ago I didn't even think about the fact that the value might have regex characters in it.

Do you have time to revise this week? I'm hoping to release v2.2 by Monday and would like to include this.

If you don't have time, let me know -- I can either delay the release or do the revision.

Thanks!

@@ -710,6 +710,8 @@ def _find_keywords_by_values(self, **kwd):
pattern = re.compile(value,
flags=re.IGNORECASE)
else:
# Add escape characters to parentheses:
value = value.replace('(',r'\(').replace(')',r'\)')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would fix #769 it were this instead, so that all special characters were escaped:

Suggested change
value = value.replace('(',r'\(').replace(')',r'\)')
value = re.escape(value)

def test_filter_with_parenthetical_keyword_values(self, triage_setup):
# Would need test data containing FITS keywords whose values have ()
# e.g.: FILTREAR= 'Clear (C)' / Rear Filter
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the short example that @mkelley provided in #769 could be used as the body of the test here, or at least as the start of a test. It generates a FITS file with a special character in it then tries to filter by a value with the special character.

I can add a credit to @mkelley for the commit with the test body if you end up using a lot of his sample code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will add this test this week. Change to use re.escape() already committed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, thanks!

The original PR for this issue [astropy#770] manually added escapes for parentheses.
As it turns out, python is smarter than I am and already has a function to find
and escape all regex special characters in a string.

This change will allow any regex special character to exist in keyword filtering
requests to `ImageFileCollection()`.

	modified:   ccdproc/image_collection.py
Update CHANGES.rst to reflect commit 42c1135
where, now, ALL regex special characters are escaped rather than just manually
escaping parentheses.

	modified:   CHANGES.rst
@tbowers7 tbowers7 changed the title Escape parentheses in FITS keyword values used for IFC filtering Escape regex special characters in FITS keyword values used for IFC filtering May 20, 2021
In ``ImageFileCollection()``, filtering by keyword is done with use of regular
expressions.  In PR astropy#770, added escaping of any regex special characters that
might be present in the keyword used for filtering a collection.  This test
checks that keywords containing regex special characters are correctly escaped
and therefore properly return the desired filtered collection.

	modified:   CHANGES.rst
	modified:   ccdproc/tests/test_image_collection.py
@tbowers7
Copy link
Contributor Author

Test written and committed. The test was tested separately, passing when the value = re.escape(value) line is included in ccdproc/tests/test_image_collection.py, and failing when that line is commented out (i.e. pre-PR#770).

	modified:   ccdproc/tests/test_image_collection.py
	modified:   ccdproc/tests/test_image_collection.py
Copy link
Member

@mwcraig mwcraig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, thanks for the quick turnaround -- just one line to delete in the test and this is good to go!

'regex_special_{:d}.fits'.format(i)))

ic = ImageFileCollection(triage_setup.test_dir)
#ic.summary.pprint()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please take this line out?

Suggested change
#ic.summary.pprint()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All done.

Remove commented-out vestigial pprint() statement from
``test_filter_on_regex_escape_characters()``.

	modified:   ccdproc/tests/test_image_collection.py
@mwcraig mwcraig merged commit 91f2c46 into astropy:main May 20, 2021
@mkelley
Copy link

mkelley commented May 20, 2021

Thanks for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants