Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let fitsdiff compare files with lower case HIERARCH keyword #16357

Merged
merged 4 commits into from
May 17, 2024

Conversation

hugobuddel
Copy link
Contributor

Description

This pull request is to address #16355. Fits files with lower case HIERARCH keywords can now be compared without raising a KeyError.

Fixes #16355.

  • By checking this box, the PR author has requested that maintainers do NOT use the "Squash and Merge" button. Maintainers should respect this when possible; however, the final decision is at the discretion of the maintainer that merges the PR.

Copy link

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

  • Do the proposed changes actually accomplish desired goals?
  • Do the proposed changes follow the Astropy coding guidelines?
  • Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see instructions for rebase and squash.
  • Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
  • Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
  • Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
  • At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

@pllim pllim added this to the v7.0.0 milestone Apr 29, 2024
@pllim
Copy link
Member

pllim commented Apr 29, 2024

Thanks! Would need a change log too.

@pllim pllim added the Bug label Apr 29, 2024
@pllim
Copy link
Member

pllim commented Apr 29, 2024

p.s. Since this changes some fitsdiff behavior, maybe shouldn't backport? I'll let saimn decide.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking more like bugfix under docs/changes/io.fits but let's wait to hear back from saimn first. Thanks!

@hugobuddel
Copy link
Contributor Author

Thanks! Would need a change log too.

Added a changelog; needed to create the PR first.

p.s. Since this changes some fitsdiff behavior, maybe shouldn't backport? I'll let saimn decide.

I don't think this change is important enough to backport, but I suppose it shouldn't be problematic either. Apparently no one cared about comparing case-sensitive HIERARCH keywords for 12 years. Just scratching my own itch here.

Should I squash this first?

@pllim
Copy link
Member

pllim commented Apr 29, 2024

needed to create the PR first

Understood. But if you want to predict the number in the future, we have a little advertised tool at https://github.com/astropy/astropy-tools/blob/main/next_pr_number.py 😉

Should I squash this first?

We have "squash and merge" enabled, but if you feel better squashing manually, you are also welcome to do so.

Thanks!

@hugobuddel
Copy link
Contributor Author

Apparently no one cared about comparing case-sensitive HIERARCH keywords for 12 years. Just scratching my own itch here.

FWIW, I'm starting to agree with the sentiment in #3746 (comment) that perhaps automatically automatically using HIERARCH keywords was a mistake. I only stumbled upon this issue because I accidentally created a HIERARCH pixel_size card. That should have been CDELT1 or so.

@hugobuddel hugobuddel marked this pull request as draft April 29, 2024 16:30
@hugobuddel
Copy link
Contributor Author

Put to draft because I realized this code might not work properly with ignore_keywords. And indeed, there doesn't seem to be anything that can be put in the ignore_keywords argument below to make the assertion hold:

ha = Header([("A", 1), ("B", 2), ("C", 3), ("HIERARCH low", 4)])
hb = ha.copy()
hb["B"] = 4
hb["low"] = 14
diff = HeaderDiff(ha, hb, ignore_keywords=["B", "low"])
assert diff.identical, diff.diff_keyword_values

Now this code would have raised a KeyError before. Should I try to fix this? Not sure what can of worms that will open

@hugobuddel
Copy link
Contributor Author

At least I don't think this change has broken existing code that uses ignore_keywords.

But I don't see how to fix the example above without possibly breaking existing code. In particular, this test would fail if we update ignore_keywords to treat lower case arguments differently from upper case:

# Test case-insensitivity
diff = HeaderDiff(ha, hb, ignore_keywords=["b"])

And maybe someone has used that in their code somewhere, so we should not willy-nilly break that.

But it also seems silly to not allow ignoring of lower case HIERARCH keywords in fitsdiff, which is what this PR in its current state would accomplish.

@hugobuddel
Copy link
Contributor Author

About backwards compatibility, this code would have passed before, but not in this PR:

ha = Header([("HIERARCH HELLO", 1), ("HIERARCH hello", 2)])
hb = Header([("HIERARCH HELLO", 1), ("HIERARCH hello", 12)])
diff = HeaderDiff(ha, hb)
assert diff.identical

but I guess that is a good thing, as these headers are obviously different. So this PR as-is might break someones workflow somewhere, but I think the above should be considered a bug, not a feature, so we should not be concerned.

But in this case, I suppose that one should be able to selectively ignore either of the keywords.

Hmm, should this pass?:

ha = Header([("HIERARCH HELLO", 1)])
hb = Header([("HELLO", 1)])
diff = HeaderDiff(ha, hb)
assert diff.identical

That code currently passes both on main and with this PR. But maybe that is okay?

@hugobuddel
Copy link
Contributor Author

I'm going to un-draft this PR, because it is an improvement compared to the old code, and I don't really know how to improve the ignore_keywords scenario.

I thought about making the check whether a keyword should be ignored case-sensitive if the keyword is a header keyword. But that is a bit tricky, because then it is not really clear how to put that in the parameters.

Maybe the best would be to make the to-ignore comparison entirely case-insensitive, by making everything uppercase at compare time. That is change

if keyword in self.ignore_keywords:
continue

to

 if keyword.upper() in self.ignore_keywords: 
     continue

I think it requires a bit rethinking what we want fitsdiff to actually do. E.g. the "HIERARCH HELLO" vs "HELLO" vs "hello" vs "HIERARCH hello". Otherwise it will become pretty hacky pretty quickly.

Nevertheless, this PR is an improvement, because it prevents a KeyError when comparing normal FITS files.

@hugobuddel hugobuddel marked this pull request as ready for review April 29, 2024 18:41
@parejkoj
Copy link
Contributor

parejkoj commented Apr 30, 2024

I will note that cfitsio writes all HIERARCH keys as UPPERCASE, as of version 3380. That's consistent with the FITS standard (see 4.1.2.1 of the paper), which the cfitsio developers interpreted (correctly, as much as it pains me to say) to also apply to HIERARCH keys. And now that I check, the HIERARCH convention also explicitly only allows these characters: digits 0-9, upper case letters A-Z, the dash '-' and the underscore '_'. So, any operation with lowercase HIERARCH keys is probably undefined behavior.

The idea behind this fix (be case-insensitive when comparing) seems reasonable, but it should probably also print some warnings about the lowercased keys, too.

Rubin/LSST has had our own sets of problems with HIERARCH keys. See DM-21989, DM-21991, DM-43963, and this discussion (from 2016, and we're still limping along with it).

@hugobuddel
Copy link
Contributor Author

Thanks for digging up the standard and sharing your problems @parejkoj. I don't have a strong personal opinion on allowing lower case keywords. I've grown to appreciate adhering to the standard, so perhaps the default of astropy should be to create only uppercase keywords.

(But I do appreciate backwards compatibility. I still have muscle memory writing clobber=True...)

The FITS files I was comparing were actually generated with astropy, so fitsdiff should definitely have support for comparing lower-case keywords. And I think fitsdiff should complain if there is a difference in the casing; for example one might compare FITS files created with cfitsio 3380 with FITS created with an earlier version to figure out why some tool suddenly broke.

So while this PR is not sufficient to correctly deal with lower-case HIERARCH keywords, it is a step forward.

@astrofrog
Copy link
Member

astrofrog commented Apr 30, 2024

Just to put it on the table, an alternative given the FITS standard would be to add code to verify() to automatically fix these keywords to make them uppercase and emit appropriate warnings, which would prevent lowercase keywords from being written out subsequently.

@hugobuddel
Copy link
Contributor Author

Yeah it would probably be better to always write upper case FITS headers, since that is in the standard.

Changing the fits module itself is kinda perpendicular to this PR though, because fitsdiff should be usable to determine the difference between old non-standard FITS files and new standard FITS files (or at least not crash).

@astrofrog
Copy link
Member

astrofrog commented Apr 30, 2024

Just to be clear, I meant to fix on read which would also solve the fitsdiff case

@hugobuddel
Copy link
Contributor Author

Just to be clear, I meant to fix on read which would also solve the fitsdiff case

Changing the headers upon read would make it impossible to use fitsdiff to learn about those differences. So it would indeed prevent fitsdiff from crashing, at the cost of significantly reducing its power.

My perfect version of fitsdiff would complain about any difference unless explicitly told to ignore something.

So hello, HELLO, HIERARCH hello, HIERACH HELLO should all be seen as different keys. Or as the same key but with different capitalization. So fitsdiff should flag those as different unless one sets the hypothetical ignore_key_casing parameter to True.

fitsdiff should work on non-standard FITS files to some extent. I think that at a minimum fitsdiff should accept most fits files that past versions of astropy could produce. Because one might use fitsdiff to figure out the differences between what different versions of astropy produce. Or why one tool produces FITS-compliant files and another tool does not.

@saimn
Copy link
Contributor

saimn commented Apr 30, 2024

I was also digging the standards yesterday and yes the registered convention (https://fits.gsfc.nasa.gov/registry/hierarch_keyword.html) mentions uppercase keywords:

Under the ESO implementation of this convention, each token string that precedes the
equals sign must only contain characters that are legal in formal FITS keywords, i.e., the
uppercase letters A through Z, the digits 0 through 9, and the hyphen and underscore
characters

Maybe it wasn't so clear in the past and we ended up having pyfits supporting lowercase HIERARCH keywords as well...

Enforcing uppercase keywords make sense (as for normal keywords) and it seems everybody agrees, cfitsio does it too, so I think we should modify io.fits.
Most of the operations on keywords in io.fits are case insensitive so that should not cause problems, and would be consistent with normal keyword so easier to handle (including in fitsdiff).

The question that remains from the discussion above is whether io.fits / fitsdiff should complain if some HIERARCH keyword is lowercase. We already have some verification warnings (and auto-fix) for lowercase keywords (which currently excludes HIERARCH keywords explicitly!):

if self._image:
# PyFITS will auto-uppercase any standard keyword, so lowercase
# keywords can only occur if they came from the wild
keyword = self._split()[0]
if keyword != keyword.upper():
# Keyword should be uppercase unless it's a HIERARCH card
errs.append(
{
"err_text": f"Card keyword {keyword!r} is not upper case.",
"fix_text": fix_text,
"fix": self._fix_keyword,
}
)

So we could use that as well for HIERARCH, a warning would be issued for lowercase keywords, and get/set/comparison/etc. would remain case insensitive ? (fitsdiff would display the verification warning I guess)

@hugobuddel
Copy link
Contributor Author

I agree that forcing uppercase headers is the correct approach. It is also the least surprising I think. I'd say that astropy should then also write uppercase keywords by default.

Forcing uppercase (reading or writing) will break some people's code (mine, LSST's). E.g. people that use those FITS headers as-is as dictionary keys, or class attributes etc. But arguably that code would be pretty brittle to start with. Explicit is better than implicit.

I still would like fitsdiff to be able to detect such inconsistencies, because that's the kind of thing I like to use fitsdiff for. Perhaps fitsdiff could provide a comparison for the warnings that are produced?

So e.g. a FITS file with HIERARCH HELLO = 1 and another with HIERARCH hello = 1 could be identical in their keywords and their values, but different in their warnings.

@timj
Copy link
Contributor

timj commented Apr 30, 2024

It's a bit annoying that in https://heasarc.gsfc.nasa.gov/docs/fcg/common_dict.html HIERARCH is defined as supporting lower case header keys:

KEYWORD:   HIERARCH
REFERENCE: [ESO ](http://arcdev.hq.eso.org/dicb/dicd/dic-1-1.4.html)    
HDU:       any
DATATYPE:  none
COMMENT:   denotes the HIERARCH keyword convention
DEFINITION: The HIERARCH keyword, when followed by spaces in columns 9
and 10 of the FITS card image, indicates that the ESO HIERARCH keyword
convention should be used to interpret the name and value of the
keyword. The HIERARCH keyword formally has no value because it is not
followed by an equals sign in column 9. Under the HIERARCH convention
the actual name of the keyword begins in column 11 of the card image and
is terminated by the equal sign ('=') character. The name can contain
any number of characters as long as it fits within columns 11 and 80 of
the card image and also leaves enough space for the equal sign separator
and the value field. The name can contain any printable ASCII text
character, including spaces and lower-case characters, except for the
equal sign character which serves as the terminator of the name field.
Leading and trailing spaces in the name field are not significant, but
embedded spaces within the name are significant. The value field follows
the equals sign and must conform to the syntax for a free-format value
field as defined in the FITS Standard. The value field may be null, in
which case it contains only space characters, otherwise it may contain
either a character string enclosed in single quotes, the logical
constant T or F, an integer number, a floating point number, a complex
integer number, or a complex floating point number. The value field may
be followed by an optional comment string. The comment field must be
separated from the value field by a slash character ('/'). It is
recommended that the slash character be preceeded and followed by a
space character. Example: "HIERARCH Filter Wheel = 12 / filter
position". In this example the logical name of the keyword is 'Filter
Wheel' and the value is 12.

I assume somewhere it's stated that that definition is now obsolete and everything should be read back as upper case? (I assume if you ask an astropy header object for "hello" you get back the value for "HELLO"?)

@saimn
Copy link
Contributor

saimn commented Apr 30, 2024

@timj - indeed it's a bit annoying, but this webpage seems very old, the parent page (https://heasarc.gsfc.nasa.gov/docs/fcg/) is from 2000 refers to the Standard from 1999...

The HIERARCH convention was registered in 2009 (https://fits.gsfc.nasa.gov/registry/hierarch_keyword.html). However it mentions that keywords are uppercase "Under the ESO implementation of this convention", so I guess lowercase is not strictly forbidden... which might be the reason why astropy allows that currently.

But the state in astropy is not really consistent, the key will be written as lowercase if created this way but you can access / edit it with uppercase:

In [3]: hdr['foo bar'] = 12
WARNING: VerifyWarning: Keyword name 'foo bar' is greater than 8 characters or contains characters not all
owed by the FITS standard; a HIERARCH card will be created. [astropy.io.fits.card]

In [4]: hdr['foo bar']
Out[4]: 12

In [5]: hdr['FOO BAR']
Out[5]: 12

In [6]: hdr
Out[6]: HIERARCH foo bar =                   12

And if you create it with uppercase, you can still access / edit with lowercase but then it is written as uppercase...

In [19]: hdr = fits.Header()

In [20]: hdr['FOO BAR'] = 23
WARNING: VerifyWarning: Keyword name 'FOO BAR' is greater than 8 characters or contains characters not allowed by the FITS standard; a HIERARCH card will be created. [astropy.io.fits.card]

In [21]: hdr['foo bar']
Out[21]: 23

In [22]: hdr
Out[22]: HIERARCH FOO BAR =                   23

So I guess within astropy we could choose to force uppercase when creating a new header/file, but we shouldn't raise a warning for a lowercase HIERARCH card. And we still need to fix fitsdiff.

@parejkoj
Copy link
Contributor

parejkoj commented Apr 30, 2024

So I guess within astropy we could choose to force uppercase when creating a new header/file, but we shouldn't raise a warning for a lowercase HIERARCH card.

I think astropy definitely should warn on lowercase HIERARCH keys, and change them to uppercase on write (as we do in LSST). The ESO implementation of the HIERARCH convention effectively is the convention, especially since that is what cfitsio (the defacto FITS definition, given the number of conventions that aren't strictly part of the standard) implements. As I linked above, the fitsio docs specifically list the allowed characters: and allows characters consisting of digits 0-9, upper case letters A-Z, the dash '-' and the underscore '_'. Going against the cfitisio definition is generally not recommended.

Is it silly that there is no way to have lowercase keys in FITS? Yes, yes it is. But we're stuck with it, until someone finally pushes through a FITS versioning system so we can finally bring it into the late 20th century.

That said, @hugobuddel : fixing fitsdiff so it doesn't raise for this kind of comparison is good.

@hugobuddel
Copy link
Contributor Author

How do we proceed? As in, I'm not sure about the process.

I propose to merge this PR, and tackle the behavior of astropy w.r.t. writing FITS headers in another PR.

My purpose here was to make sure that fitsdiff works on files with lower case header keywords, because people might have such files, standard compliant or not. That goal has been achieved.

Modifying astropy to produce only uppercase headers is beyond the commitment I can currently make. Maybe at some later date, as I am affected by it.

Maybe we could backport this PR to v6, and change the behavior of astropy in v7? That way fitsdiff is fixed before we break backwards compatibility.

The codecov check failed, because I removed two covered lines and thus reduced the coverage. There is an open comment from @pllim that I addressed, but I'm not sure who should mark that as resolved.

@saimn saimn modified the milestones: v7.0.0, v6.1.1 May 17, 2024
@saimn
Copy link
Contributor

saimn commented May 17, 2024

@hugobuddel - agreed, let's fix first the issue with fitsdiff so that it doesn't crash in this case.

@saimn saimn added the backport-v6.1.x on-merge: backport to v6.1.x label May 17, 2024
Copy link
Contributor

@saimn saimn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks @hugobuddel

@saimn saimn merged commit be2b16c into astropy:main May 17, 2024
32 of 33 checks passed
meeseeksmachine pushed a commit to meeseeksmachine/astropy that referenced this pull request May 17, 2024
pllim added a commit that referenced this pull request May 17, 2024
…357-on-v6.1.x

Backport PR #16357 on branch v6.1.x (Let fitsdiff compare files with lower case HIERARCH keyword)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v6.1.x on-merge: backport to v6.1.x Bug io.fits
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fitsdiff raises a KeyError for lowercase HIERARCH keywords
6 participants