-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let fitsdiff compare files with lower case HIERARCH keyword #16357
Conversation
Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.
|
Thanks! Would need a change log too. |
p.s. Since this changes some fitsdiff behavior, maybe shouldn't backport? I'll let saimn decide. |
docs/changes/16357.other.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking more like bugfix
under docs/changes/io.fits
but let's wait to hear back from saimn first. Thanks!
Added a changelog; needed to create the PR first.
I don't think this change is important enough to backport, but I suppose it shouldn't be problematic either. Apparently no one cared about comparing case-sensitive HIERARCH keywords for 12 years. Just scratching my own itch here. Should I squash this first? |
Understood. But if you want to predict the number in the future, we have a little advertised tool at https://github.com/astropy/astropy-tools/blob/main/next_pr_number.py 😉
We have "squash and merge" enabled, but if you feel better squashing manually, you are also welcome to do so. Thanks! |
FWIW, I'm starting to agree with the sentiment in #3746 (comment) that perhaps automatically automatically using HIERARCH keywords was a mistake. I only stumbled upon this issue because I accidentally created a |
Put to draft because I realized this code might not work properly with
Now this code would have raised a |
At least I don't think this change has broken existing code that uses But I don't see how to fix the example above without possibly breaking existing code. In particular, this test would fail if we update astropy/astropy/io/fits/tests/test_diff.py Lines 286 to 287 in 0c06ac3
And maybe someone has used that in their code somewhere, so we should not willy-nilly break that. But it also seems silly to not allow ignoring of lower case HIERARCH keywords in fitsdiff, which is what this PR in its current state would accomplish. |
About backwards compatibility, this code would have passed before, but not in this PR: ha = Header([("HIERARCH HELLO", 1), ("HIERARCH hello", 2)])
hb = Header([("HIERARCH HELLO", 1), ("HIERARCH hello", 12)])
diff = HeaderDiff(ha, hb)
assert diff.identical but I guess that is a good thing, as these headers are obviously different. So this PR as-is might break someones workflow somewhere, but I think the above should be considered a bug, not a feature, so we should not be concerned. But in this case, I suppose that one should be able to selectively ignore either of the keywords. Hmm, should this pass?:
That code currently passes both on main and with this PR. But maybe that is okay? |
I'm going to un-draft this PR, because it is an improvement compared to the old code, and I don't really know how to improve the I thought about making the check whether a keyword should be ignored case-sensitive if the keyword is a header keyword. But that is a bit tricky, because then it is not really clear how to put that in the parameters. Maybe the best would be to make the to-ignore comparison entirely case-insensitive, by making everything uppercase at compare time. That is change astropy/astropy/io/fits/diff.py Lines 875 to 876 in 0c06ac3
to if keyword.upper() in self.ignore_keywords:
continue I think it requires a bit rethinking what we want fitsdiff to actually do. E.g. the "HIERARCH HELLO" vs "HELLO" vs "hello" vs "HIERARCH hello". Otherwise it will become pretty hacky pretty quickly. Nevertheless, this PR is an improvement, because it prevents a |
I will note that cfitsio writes all HIERARCH keys as UPPERCASE, as of version 3380. That's consistent with the FITS standard (see 4.1.2.1 of the paper), which the cfitsio developers interpreted (correctly, as much as it pains me to say) to also apply to HIERARCH keys. And now that I check, the HIERARCH convention also explicitly only allows these characters: The idea behind this fix (be case-insensitive when comparing) seems reasonable, but it should probably also print some warnings about the lowercased keys, too. Rubin/LSST has had our own sets of problems with HIERARCH keys. See DM-21989, DM-21991, DM-43963, and this discussion (from 2016, and we're still limping along with it). |
Thanks for digging up the standard and sharing your problems @parejkoj. I don't have a strong personal opinion on allowing lower case keywords. I've grown to appreciate adhering to the standard, so perhaps the default of astropy should be to create only uppercase keywords. (But I do appreciate backwards compatibility. I still have muscle memory writing The FITS files I was comparing were actually generated with astropy, so fitsdiff should definitely have support for comparing lower-case keywords. And I think fitsdiff should complain if there is a difference in the casing; for example one might compare FITS files created with cfitsio 3380 with FITS created with an earlier version to figure out why some tool suddenly broke. So while this PR is not sufficient to correctly deal with lower-case HIERARCH keywords, it is a step forward. |
Just to put it on the table, an alternative given the FITS standard would be to add code to verify() to automatically fix these keywords to make them uppercase and emit appropriate warnings, which would prevent lowercase keywords from being written out subsequently. |
Yeah it would probably be better to always write upper case FITS headers, since that is in the standard. Changing the fits module itself is kinda perpendicular to this PR though, because fitsdiff should be usable to determine the difference between old non-standard FITS files and new standard FITS files (or at least not crash). |
Just to be clear, I meant to fix on read which would also solve the fitsdiff case |
Changing the headers upon read would make it impossible to use fitsdiff to learn about those differences. So it would indeed prevent fitsdiff from crashing, at the cost of significantly reducing its power. My perfect version of fitsdiff would complain about any difference unless explicitly told to ignore something. So fitsdiff should work on non-standard FITS files to some extent. I think that at a minimum fitsdiff should accept most fits files that past versions of astropy could produce. Because one might use fitsdiff to figure out the differences between what different versions of astropy produce. Or why one tool produces FITS-compliant files and another tool does not. |
I was also digging the standards yesterday and yes the registered convention (https://fits.gsfc.nasa.gov/registry/hierarch_keyword.html) mentions uppercase keywords:
Maybe it wasn't so clear in the past and we ended up having pyfits supporting lowercase HIERARCH keywords as well... Enforcing uppercase keywords make sense (as for normal keywords) and it seems everybody agrees, cfitsio does it too, so I think we should modify The question that remains from the discussion above is whether astropy/astropy/io/fits/card.py Lines 1140 to 1152 in 0c06ac3
So we could use that as well for HIERARCH, a warning would be issued for lowercase keywords, and get/set/comparison/etc. would remain case insensitive ? (fitsdiff would display the verification warning I guess) |
I agree that forcing uppercase headers is the correct approach. It is also the least surprising I think. I'd say that astropy should then also write uppercase keywords by default. Forcing uppercase (reading or writing) will break some people's code (mine, LSST's). E.g. people that use those FITS headers as-is as dictionary keys, or class attributes etc. But arguably that code would be pretty brittle to start with. Explicit is better than implicit. I still would like fitsdiff to be able to detect such inconsistencies, because that's the kind of thing I like to use fitsdiff for. Perhaps fitsdiff could provide a comparison for the warnings that are produced? So e.g. a FITS file with |
It's a bit annoying that in https://heasarc.gsfc.nasa.gov/docs/fcg/common_dict.html HIERARCH is defined as supporting lower case header keys:
I assume somewhere it's stated that that definition is now obsolete and everything should be read back as upper case? (I assume if you ask an astropy header object for "hello" you get back the value for "HELLO"?) |
@timj - indeed it's a bit annoying, but this webpage seems very old, the parent page (https://heasarc.gsfc.nasa.gov/docs/fcg/) is from 2000 refers to the Standard from 1999... The HIERARCH convention was registered in 2009 (https://fits.gsfc.nasa.gov/registry/hierarch_keyword.html). However it mentions that keywords are uppercase "Under the ESO implementation of this convention", so I guess lowercase is not strictly forbidden... which might be the reason why astropy allows that currently. But the state in astropy is not really consistent, the key will be written as lowercase if created this way but you can access / edit it with uppercase:
And if you create it with uppercase, you can still access / edit with lowercase but then it is written as uppercase...
So I guess within astropy we could choose to force uppercase when creating a new header/file, but we shouldn't raise a warning for a lowercase HIERARCH card. And we still need to fix fitsdiff. |
I think astropy definitely should warn on lowercase HIERARCH keys, and change them to uppercase on write (as we do in LSST). The ESO implementation of the HIERARCH convention effectively is the convention, especially since that is what cfitsio (the defacto FITS definition, given the number of conventions that aren't strictly part of the standard) implements. As I linked above, the fitsio docs specifically list the allowed characters: Is it silly that there is no way to have lowercase keys in FITS? Yes, yes it is. But we're stuck with it, until someone finally pushes through a FITS versioning system so we can finally bring it into the late 20th century. That said, @hugobuddel : fixing fitsdiff so it doesn't raise for this kind of comparison is good. |
How do we proceed? As in, I'm not sure about the process. I propose to merge this PR, and tackle the behavior of astropy w.r.t. writing FITS headers in another PR. My purpose here was to make sure that fitsdiff works on files with lower case header keywords, because people might have such files, standard compliant or not. That goal has been achieved. Modifying astropy to produce only uppercase headers is beyond the commitment I can currently make. Maybe at some later date, as I am affected by it. Maybe we could backport this PR to v6, and change the behavior of astropy in v7? That way fitsdiff is fixed before we break backwards compatibility. The codecov check failed, because I removed two covered lines and thus reduced the coverage. There is an open comment from @pllim that I addressed, but I'm not sure who should mark that as resolved. |
@hugobuddel - agreed, let's fix first the issue with fitsdiff so that it doesn't crash in this case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks @hugobuddel
… HIERARCH keyword
…357-on-v6.1.x Backport PR #16357 on branch v6.1.x (Let fitsdiff compare files with lower case HIERARCH keyword)
Description
This pull request is to address #16355. Fits files with lower case HIERARCH keywords can now be compared without raising a
KeyError
.Fixes #16355.