auto update column units and descriptions #144

sbailey · 2023-03-03T05:47:51Z

This PR builds upon the work of @aureliocarnero, @stephjuneau, @angelaberti, and Becky Canning to:

add py/desidatamodel/data/column_descriptions.csv with standardized units and descriptions for each column
update bin/generate_model to use those units/descriptions when creating a new datamodel stub file
add bin/update_column_descriptions to update the column units and descriptions of a pre-existing data model file
- default only updates blank entries
- with --force, it will override pre-existing entries, even non-blank ones
- in both cases it prints log messages about what it is doing

I updated a coadd data model file with

update_column_descriptions -i coadd-SPECTROGRAPH-TILEID-GROUPID.rst -o coadd-SPECTROGRAPH-TILEID-GROUPID.rst

as an example of the output from non-force mode. Note that it handles the case of the Units column needing to get wider since the newly added units were wider than any of the previously existing units.

This includes unit tests with complete coverage for all new code except one log line for the case that the input fits file had a non-blank column description that it will override; I don't think any of the existing test files have non-blank column descriptions to test that on.

Implementation notes

desidatamodel.update.format_rst_table has a lot of conceptual overlap with table formatting code in desidatamodel.stub.Stub, but since those were implemented as member functions of the Stub class, I couldn't directly re-use them. We could update the Stub class to use this non-member function instead, but I didn't want to embark upon that surgery in this PR.
generate_model provides a way to use a non-default column description file, but not an easy way to not override descriptions at all (except by providing a description file that doesn't have any descriptions in it)
update_column_descriptions doesn't provide a way to override the description file; that woudl probably be good to do, but I'm running out of steam tonight and would like to get this into review tomorrow.

coveralls · 2023-03-03T05:48:37Z

Coverage: 100.0%. Remained the same when pulling 80b0b26 on autocolumndesc2 into 834841e on main.

sbailey · 2023-03-03T18:14:13Z

The code test failure was an astropy 4.x vs. 5.x thing: when creating a table from a list of dicts, 5.x uses the order of the keys in the dict (what I was expecting) while 4.x alphabetizes the columns resulting in output tables with columns like

 ===================================== =============== ======= =========
 Description                           Name            Type    Units
 ===================================== =============== ======= =========
 blah                                  TARGETID        int64
...

I added code to force the column order (Name Type Units Description) and now tests pass on both astropy 4.x and 5.x.

The style check is still failing due to whitespace complaints which I'd rather not spend time chasing.

weaverba137 · 2023-03-03T18:40:39Z

I'll take a look at this later today. I'm pretty sure I can easily fix the style & also add unit tests to bring the coverage back to 100%.

weaverba137 · 2023-03-03T18:43:26Z

What is the difference between doc/column_descriptions.csv and py/desidatamodel/data/column_descriptions.csv?

…ta/column_descriptions.csv

sbailey · 2023-03-03T19:18:32Z

What is the difference between doc/column_descriptions.csv and py/desidatamodel/data/column_descriptions.csv?

Whoops; doc/column_descriptions.csv was the original version from Aurelio's branch. I replaced it with py/desidatamodel/data/column_descriptions.csv so that it could be algorithmically loaded using resource_filename, but forgot to remove the original (now removed). I also updated py/desidatamodel/data/column_descriptions.csv to be a dump of the spreadsheet that Stephanie, Becky, Angela, (and others?) worked on during the last documentation sprint. The intention is that after this PR is merged, column_descriptions.csv will be the authoritative place to make further updates (albeit less user-friendly than a google sheet).

I will go check now if there are other recent updates from that google sheet that need to be incorporated here, as well as adding some missing units that I noticed.

weaverba137 · 2023-03-03T21:52:07Z

'Overriding header description %s with column description file description %s'

This warning doesn't quite parse right grammatically. Is the intention to list both the replacement text and the filename or just the filename?

weaverba137 · 2023-03-03T22:01:27Z

We're in good shape in terms of style and testing. It may still be in one of the unit tests, but do we need to continue to support Astropy 4?

sbailey · 2023-03-03T22:24:04Z

'Overriding header description %s with column description file description %s'

This warning doesn't quite parse right grammatically. Is the intention to list both the replacement text and the filename or just the filename?

A more verbose version of what I was trying to say is "This code is taking the action of overriding the description that comes from the fits file header with the description that comes from column_descriptions.csv, aka 'the column descriptions file'". I simplified the text to just list the filename instead of also calling it the "column description file".

The style check is failing again because apparently it disagrees with the default python indentation from vim, but it doesn't tell me what indentation it wants, just that it is underindented.

We're in good shape in terms of style and testing. It may still be in one of the unit tests, but do we need to continue to support Astropy 4?

Thanks for updates. We don't need to go to great lengths to support astropy 4, but I found it useful to know that it didn't work with astropy 4 and the fix was pretty easy so I went ahead and did it and restored support for astropy 4. In the future we may find some astropy 4 incompatibility that would be harder to support, in which case we might drop astropy 4 support at that time. i.e. keep the test in place for now, while recognizing that the existence of the test doesn't obligate us to support astropy 4 forever.

weaverba137 · 2023-03-03T22:29:52Z

I've fixed the style and updated the change log.

The style trick here is to align indented text with the text after the opening (, i.e.:

                    log.warning('Overriding header units "%s" with units "%s" from %s',
                                units, desc_units, self.description_file)

instead of

                    log.warning('Overriding header units "%s" with units "%s" from %s',
                            units, desc_units, self.description_file)

weaverba137

I think we've already fixed anything I would have commented on, so merge when ready.

sbailey · 2023-03-03T22:42:08Z

Thanks for the indentation fix and the accompanying explanation. I'll make one last update to the column_descriptions.txt file from the latest version of the google sheet, and then merge this and deprecate updates to the google sheet.

stephjuneau · 2023-03-06T02:42:09Z

@sbailey it's possible that the original csv file and the spreadsheet have different column order between units and description. Not sure if this would break the code? Might be worth having a unit test to audit the column description file. I haven't made changes to the spreadsheet in a little while but we're having new documentation sprints starting this week so I was going to revisit it then. Should I start working from the CSV file instead and make a new PR now that this one is merged?

sbailey · 2023-03-06T18:30:18Z

@stephjuneau thanks for pointing out the different column order. I basically took a dump of the spreadsheet as the latest version, and then adapted @aureliocarnero's code to be able to use that, even with the different column order than original (and extra columns like Units). There are tests that ensure that the csv file is still valid and can be used, e.g. if you add a Description that includes a comma but forget to add quotes around the description, the unit test will catch that you broke the csv format (run "pytest" at the top level to run tests before pushing to github).

Additionally, desidatamodel.stub.read_column_descriptions specifically checks that the columns are in the expected order before blindly using them, and then the returned structure is standardized such that the rest of the code is agnostic to the original input format.

I haven't made changes to the spreadsheet in a little while but we're having new documentation sprints starting this week so I was going to revisit it then. Should I start working from the CSV file instead and make a new PR now that this one is merged?

Yes, thanks. I made one more round of updates to the spreadsheet and propagated it to this PR just before merging, but future updates should be to the CSV file (admittedly less user-friendly than editing the spreadsheet, but easier to keep in sync with the code).

aureliocarnero and others added 4 commits March 2, 2023 21:15

Adding the possibility to read description and units from external file

1810efe

auto add/update column units and descriptions

b7e4eab

update tests

78d0cbf

update coadd file as example

d7f1b10

sbailey requested a review from weaverba137 March 3, 2023 05:47

Stephen Bailey added 2 commits March 3, 2023 10:00

astropy 4.x support

e34611b

style stuff

57c22e8

remove doc/column_descriptions.csv; superseded by py/desidatamodel/da…

5005ff9

…ta/column_descriptions.csv

fix style warnings

02c3596

weaverba137 added 2 commits March 3, 2023 14:55

Restore full test coverage.

82f04fe

update .gitignore

deb76c5

simplify/standardize warning text

94f51a2

fix style; update change log

cb3dc5d

weaverba137 approved these changes Mar 3, 2023

View reviewed changes

update column_descriptions.csv from google sheet

80b0b26

sbailey merged commit 180dc7e into main Mar 4, 2023

sbailey deleted the autocolumndesc2 branch March 4, 2023 00:17

weaverba137 mentioned this pull request Mar 22, 2023

Script to sync column descriptions #137

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

auto update column units and descriptions #144

auto update column units and descriptions #144

sbailey commented Mar 3, 2023

coveralls commented Mar 3, 2023 •

edited

sbailey commented Mar 3, 2023

weaverba137 commented Mar 3, 2023

weaverba137 commented Mar 3, 2023

sbailey commented Mar 3, 2023

weaverba137 commented Mar 3, 2023

weaverba137 commented Mar 3, 2023

sbailey commented Mar 3, 2023

weaverba137 commented Mar 3, 2023

weaverba137 left a comment

sbailey commented Mar 3, 2023

stephjuneau commented Mar 6, 2023 •

edited

sbailey commented Mar 6, 2023

auto update column units and descriptions #144

auto update column units and descriptions #144

Conversation

sbailey commented Mar 3, 2023

Implementation notes

coveralls commented Mar 3, 2023 • edited

sbailey commented Mar 3, 2023

weaverba137 commented Mar 3, 2023

weaverba137 commented Mar 3, 2023

sbailey commented Mar 3, 2023

weaverba137 commented Mar 3, 2023

weaverba137 commented Mar 3, 2023

sbailey commented Mar 3, 2023

weaverba137 commented Mar 3, 2023

weaverba137 left a comment

Choose a reason for hiding this comment

sbailey commented Mar 3, 2023

stephjuneau commented Mar 6, 2023 • edited

sbailey commented Mar 6, 2023

coveralls commented Mar 3, 2023 •

edited

stephjuneau commented Mar 6, 2023 •

edited