Migrate unit validation to desiutil; support adding units to FITS files. #201

weaverba137 · 2023-09-15T22:25:48Z

This PR

Closes utilities to add units to tables #191.
Closes Migrate unit verification code from desidatamodel #200.
Restores some API documentation that had been commented out due to desiutil.setup may be incompatible with recent versions of setuptools #198.

Please test by running the script annotate_fits. Input column definitions can be in CSV format (like desidatamodel's column_descriptions.csv), YAML (like desitarget's units.yaml) or directly on the command line with a COLUMN=value:COLUMN=value notation. See annotate_fits --help for more information.

Additional questions that need to be answered before this PR can be completed:

I went slightly beyond the scope of utilities to add units to tables #191 to support adding units to images (BUNIT), but this gets into tricky territory, because then do we also have to support adding units to compressed images? I'm leaning toward only supporting adding units to tables, i.e. strictly fits.BinTableHDU.
Comments on table columns are currently added using the TTYPEnnn= 'COLUMN_NAME' / COMMENT convention, but we could also support TCOMMnnn= 'COMMENT'. The latter is not officially part of the FITS standard, but might be in the near future.
No matter which comment convention we choose, sometimes comments will be too long to fit. What do we do in that case? Just issue a warning? If we choose the TCOMMnnn convention, then a long comment can be split across multiple lines, but then the LONGSTRN keyword has to be present.
annotate_fits() adds a checksum to the HDU it is modifying, but should it only do so if a checksum is already present?
To what extent do we want to try to hide warnings about non-standard FITS units? As noted in Unit parsing errors produce misleading advice in certain cases astropy/astropy#15313, the message is misleading, but there are also inconsistencies in how the warning is issued that may make it difficult to hide the warning.

* main: update dev version after 3.4.0 tag update release notes and version for desiutil/3.4.0

coveralls · 2023-09-15T22:27:31Z

coverage: 76.165% (+1.8%) from 74.401% when pulling 66b3c72 on migrate-units into 681277b on main.

sbailey · 2023-09-19T23:32:05Z

Basic functionality appears to work as advertised to make it easier to add units/comments to a fits file. Thanks. I have several comments, followed by my opinions on the questions you posed in the original PR post. In approximate order of descending priority:

Given that desidatamodel column_descriptions.csv is really useful for units but not yet useful for comments (they are too long), let's add an option to use a csv file as input but only update the units but not the comments even if the file has a comments column.
annotate_fits ... --test INFILE OUTFILE writes outfile, despite the --test flag whose doc says "Test mode; show what would be done but do not change any files"
annotate_fits prints DEBUG-level messages regardless of $DESI_LOGLEVEL, even if --verbose is not set and $DESI_LOGLEVEL=info
the dictionary dump of the --test output is rather dizzying for a large file; it would be nicer to add a loop over key/value to make a more human-readable report over multiple lines.

I went slightly beyond the scope of #191 to support adding units to images (BUNIT), but this gets into tricky territory, because then do we also have to support adding units to compressed images? I'm leaning toward only supporting adding units to tables, i.e. strictly fits.BinTableHDU.

That sounds fine.

Comments on table columns are currently added using the TTYPEnnn= 'COLUMN_NAME' / COMMENT convention, but we could also support TCOMMnnn= 'COMMENT'. The latter is not officially part of the FITS standard, but might be in the near future.

For this PR, let's just stick with TTYPEnnn= 'COLUMN_NAME' / COMMENT and not expand the scope to TCOMMnnn.

No matter which comment convention we choose, sometimes comments will be too long to fit. What do we do in that case? Just issue a warning? If we choose the TCOMMnnn convention, then a long comment can be split across multiple lines, but then the LONGSTRN keyword has to be present.

My preference for this PR: if a comment is too long, log an error (or critical) message and stop, unless a new option --truncate-comments is specified by the user.

If that is too much of a hassle, just printing a warning message and moving on would be ok too, as long as there is some way to completely turn off adding comments while still adding units (see earlier bullet).

annotate_fits() adds a checksum to the HDU it is modifying, but should it only do so if a checksum is already present?

I think it is fine to add a CHECKSUM no matter what.

To what extent do we want to try to hide warnings about non-standard FITS units? As noted in astropy/astropy#15313, the message is misleading, but there are also inconsistencies in how the warning is issued that may make it difficult to hide the warning.

Let's do a best effort to hide specific units warnings about cases that we know about and still want to consider valid (nanomaggie, nannomaggy, nanomaggys, nanomaggies, possibly others) while still generically letting warnings through to catch common mistakes like 'deg' vs. 'degree' vs. 'degrees'.

weaverba137 · 2023-09-20T17:02:41Z

Thank you @sbailey, these are all good comments. I think that satisfies all the questions I had. It will probably take another day or so for final implementation.

weaverba137 · 2023-09-20T21:27:33Z

@sbailey, in the current implementation, the --test option actually does nothing, so that explains your observation about files being written out with that flag set.

I'm considering not supporting a --test mode at all, since ultimately the test is: does the output file have the correct headers? If the original file is not overwritten, the output file can be removed and regenerated as needed.

weaverba137 · 2023-09-20T21:29:34Z

annotate_fits prints DEBUG-level messages regardless of $DESI_LOGLEVEL, even if --verbose is not set and $DESI_LOGLEVEL=info.

When you made that observation, did you have --test set? Because despite what I said above, that will activate DEBUG messages.

weaverba137 · 2023-09-20T23:06:13Z

Additional detail on the last comment: As I mentioned, a warning about truncating a comment is already emitted, but the warning doesn't say which comment, so we could add that, or convert the warning to an exception while also providing more information about the comment that triggered it. I checked and the warning is a real warnings.warn(), so it can be caught.

sbailey · 2023-09-22T18:47:41Z

annotate_fits prints DEBUG-level messages regardless of $DESI_LOGLEVEL, even if --verbose is not set and $DESI_LOGLEVEL=info.
When you made that observation, did you have --test set? Because despite what I said above, that will activate DEBUG messages.

Indeed, I was using --test, and I verify that without --test it follows the expected $DESI_LOGLEVEL. I think that is fine, though perhaps the --test help string should mention something like "implies --verbose"

astropy.io.fits automatically truncates comments to 46 characters and it emits a warning when it does so. I'm not sure if that's a warnings.warn() message or a logging.warning() message, but I don't think we need to go to extra effort to warn about truncated comments if there's already such a warning.
Additional detail on the last comment: As I mentioned, a warning about truncating a comment is already emitted, but the warning doesn't say which comment, so we could add that, or convert the warning to an exception while also providing more information about the comment that triggered it. I checked and the warning is a real warnings.warn(), so it can be caught.

astropy appears to issue a single "WARNING: VerifyWarning: Card is too long, comment will be truncated. [astropy.io.fits.card]" even if it is truncating multiple comments, which implies that it is a warnings.warn() and is perhaps the motivation for why it doesn't include which comment is being truncated. In this case, I think it would be very helpful to know which comment is being truncated so let's add that info with our own warning message.

Resummarizing what I think we want regarding comments:

Already done (thanks!): --disable-comments option to update units but not comments
By default, do not truncate comments, and stop with an error saying which key+comment is too long.
Add option --truncate-comments that would override that default behavior and allow truncating comments, while still logging every key/comment that is too long so that you know which ones you might want to shorten and try again.

i.e. make truncating comments "opt-in", and augment the logging to say which keys/comments are being truncated.

weaverba137 · 2023-09-22T18:50:48Z

Thank you. As I noted above, it might be easier to eliminate --test altogether. Do you have thoughts on that?

weaverba137 · 2023-09-22T21:26:56Z

@sbailey, et al., this is ready for final review. I removed the --test option entirely for reasons mentioned above.

sbailey · 2023-09-22T22:38:27Z

Dropping --test is fine, but I added more logging to say what it is doing when --verbose is selected, as well as a final INFO log that it actually wrote the file. I see tests failed while I was writing this so I will also check on that after finishing this comment.

One remaining gotcha: I appreciate the concept of reporting all comments that are too long before exiting so that the user doesn't have to find,fix,rerun one-by-one, but as currently implemented the code will error+exit for long comments on columns that aren't even in the input file, i.e. long comments that don't matter. I think it should pre-select only the columns that are in the input file and check only those. Since this is a little more surgery than just adding verbose logging, I'm checking with you first. How about something like:

with fits.open(filename, mode='readonly') as hdulist:
    ...
    column_comments = dict()
    for i in range(1, 1000):
        ttype = f"TTYPE{i:d}"
        if ttype in hdu.header:
            colname = hdu.header[ttype]
            if comments and colname in comments:
                column_comments[colname] = comments[colname]
        else:
            break

        n_long = check_comment_length(column_comments, error=(not truncate))
        ...

weaverba137 · 2023-09-22T22:44:03Z

Yes, that's fair to check only the comments on columns that are actually in the file. However, both that change and the change to the logging in annotate_fits will require some moderately extensive changes to the test suite.

weaverba137 · 2023-09-22T22:45:53Z

Actually, I'm having second thoughts about that code snippet. Please do not attempt to insert that, I don't think it actually works with the way that function is planned out.

weaverba137 · 2023-09-22T22:47:23Z

Among other things, raising an exception inside a with block could result in memory not being freed. So if you're importing and running the function inside a larger pipeline, rather than as a stand-alone script, that could be a big problem.

weaverba137 · 2023-09-22T22:48:33Z

And yes, there are other places where exceptions can be raised inside the with block. This is making me rethink all of them.

sbailey · 2023-09-22T22:51:59Z

OK, I fixed the tests I broke by changing the details of the log messages, but I'll leave it to you @weaverba137 to investigate how to only check the columns that actually appear in the data file rather than all of them in the csv/yaml file whether they are needed or not.

weaverba137 · 2023-09-22T23:49:28Z

Ah, I just discovered another gotcha: independently of checking the comments, there is also nowhere where the units are validated, either in annotate_table() or annotate_fits(). I don't think I'll be able to get something out this afternoon.

* main: update dev version after 3.4.1 tag update release notes and version for desiutil/3.4.1 update changelog for PR #202 fix codestyle whitespace I think add support for removing dependencies

sbailey · 2023-09-26T23:39:07Z

Rechecking latest version:

Good: it only checks comment lengths for columns that are present in the input data file, correctly ignoring other comments in the input csv/yaml file if they aren't needed.
Good: it checks units and complains about invalid units, treating variations of nanomaggy as valid even if they technically aren't valid.
Todo: let's add an option --allow-invalid-units (or equivalent) so that if we find other cases like nanomaggy that don't pass the units checker but we want to force in anyway, we still have a way to do that without requiring a desiutil code update. This would be opt-in; having the current default of not allowing invalid units is good.

weaverba137 · 2023-09-26T23:41:43Z

Sounds reasonable. I'll try to get that out tomorrow.

weaverba137 · 2023-09-27T17:00:28Z

@sbailey, last suggested change is in. I will plan to merge later today.

weaverba137 added 16 commits September 1, 2023 14:10

add validate function

e6ab735

test for numeric factors

434a8ef

annotation of FITS files

b61069e

fix filenames

7325875

Set BUNIT for images

946240b

fix image path

c12309c

remove obsolete section

d068997

rename to annotate_fits

2972757

test some BUNIT corner cases

25cba69

use proper temp dir

3729dbb

pin Sphinx

729a692

fully restore api doc

080f55f

note places that still need clarification

b9c9c5c

tweak doc string and comments

4d935ea

Merge branch 'main' into migrate-units

5d796ea

* main: update dev version after 3.4.0 tag update release notes and version for desiutil/3.4.0

update change log

67114df

weaverba137 added the WIP Work in Progress label Sep 15, 2023

weaverba137 requested review from sbailey, moustakas and geordie666 September 15, 2023 22:25

weaverba137 self-assigned this Sep 15, 2023

weaverba137 mentioned this pull request Sep 19, 2023

Migrate units to desiutil desihub/desidatamodel#186

Merged

weaverba137 added 3 commits September 20, 2023 13:55

remove support for image HDU

b65279a

avoid overwriting header that already exists.

aeb2ec2

raise or warn instead of log unit errors.

d263ba7

weaverba137 added 3 commits September 22, 2023 13:35

add function to check comment length

caf3615

check comment length in annotate_fits

0683e7f

denote critical errors.

e339e65

weaverba137 removed the WIP Work in Progress label Sep 22, 2023

weaverba137 changed the title ~~[WIP] Migrate unit validation to desiutil; support adding units to FITS files.~~ Migrate unit validation to desiutil; support adding units to FITS files. Sep 22, 2023

more logging when verbose

604f509

fix tests

b71a19e

weaverba137 added 5 commits September 26, 2023 09:32

Merge branch 'main' into migrate-units

18349d4

* main: update dev version after 3.4.1 tag update release notes and version for desiutil/3.4.1 update changelog for PR #202 fix codestyle whitespace I think add support for removing dependencies

rework logic to verify units and comments

9a7183c

fix up tests

d6db42b

fix indentation

c3f04a4

fix next version in change log

de1fb78

add one more command-line option

66b3c72

weaverba137 merged commit 93aca73 into main Sep 27, 2023
24 checks passed

weaverba137 deleted the migrate-units branch September 27, 2023 20:31

moustakas mentioned this pull request Apr 25, 2024

Consider merging other FITS column and unit code into desiutil.annotate #204

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate unit validation to desiutil; support adding units to FITS files. #201

Migrate unit validation to desiutil; support adding units to FITS files. #201

weaverba137 commented Sep 15, 2023 •

edited

Loading

coveralls commented Sep 15, 2023 •

edited

Loading

sbailey commented Sep 19, 2023

weaverba137 commented Sep 20, 2023

weaverba137 commented Sep 20, 2023

weaverba137 commented Sep 20, 2023

weaverba137 commented Sep 20, 2023

sbailey commented Sep 22, 2023

weaverba137 commented Sep 22, 2023

weaverba137 commented Sep 22, 2023

sbailey commented Sep 22, 2023

weaverba137 commented Sep 22, 2023

weaverba137 commented Sep 22, 2023

weaverba137 commented Sep 22, 2023

weaverba137 commented Sep 22, 2023

sbailey commented Sep 22, 2023

weaverba137 commented Sep 22, 2023

sbailey commented Sep 26, 2023

weaverba137 commented Sep 26, 2023

weaverba137 commented Sep 27, 2023

Migrate unit validation to desiutil; support adding units to FITS files. #201

Migrate unit validation to desiutil; support adding units to FITS files. #201

Conversation

weaverba137 commented Sep 15, 2023 • edited Loading

coveralls commented Sep 15, 2023 • edited Loading

sbailey commented Sep 19, 2023

weaverba137 commented Sep 20, 2023

weaverba137 commented Sep 20, 2023

weaverba137 commented Sep 20, 2023

weaverba137 commented Sep 20, 2023

sbailey commented Sep 22, 2023

weaverba137 commented Sep 22, 2023

weaverba137 commented Sep 22, 2023

sbailey commented Sep 22, 2023

weaverba137 commented Sep 22, 2023

weaverba137 commented Sep 22, 2023

weaverba137 commented Sep 22, 2023

weaverba137 commented Sep 22, 2023

sbailey commented Sep 22, 2023

weaverba137 commented Sep 22, 2023

sbailey commented Sep 26, 2023

weaverba137 commented Sep 26, 2023

weaverba137 commented Sep 27, 2023

weaverba137 commented Sep 15, 2023 •

edited

Loading

coveralls commented Sep 15, 2023 •

edited

Loading