Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto update column units and descriptions #144

Merged
merged 13 commits into from Mar 4, 2023
Merged

auto update column units and descriptions #144

merged 13 commits into from Mar 4, 2023

Conversation

sbailey
Copy link
Contributor

@sbailey sbailey commented Mar 3, 2023

This PR builds upon the work of @aureliocarnero, @stephjuneau, @angelaberti, and Becky Canning to:

  • add py/desidatamodel/data/column_descriptions.csv with standardized units and descriptions for each column
  • update bin/generate_model to use those units/descriptions when creating a new datamodel stub file
  • add bin/update_column_descriptions to update the column units and descriptions of a pre-existing data model file
    • default only updates blank entries
    • with --force, it will override pre-existing entries, even non-blank ones
    • in both cases it prints log messages about what it is doing

I updated a coadd data model file with

update_column_descriptions -i coadd-SPECTROGRAPH-TILEID-GROUPID.rst -o coadd-SPECTROGRAPH-TILEID-GROUPID.rst

as an example of the output from non-force mode. Note that it handles the case of the Units column needing to get wider since the newly added units were wider than any of the previously existing units.

This includes unit tests with complete coverage for all new code except one log line for the case that the input fits file had a non-blank column description that it will override; I don't think any of the existing test files have non-blank column descriptions to test that on.

Implementation notes

  • desidatamodel.update.format_rst_table has a lot of conceptual overlap with table formatting code in desidatamodel.stub.Stub, but since those were implemented as member functions of the Stub class, I couldn't directly re-use them. We could update the Stub class to use this non-member function instead, but I didn't want to embark upon that surgery in this PR.
  • generate_model provides a way to use a non-default column description file, but not an easy way to not override descriptions at all (except by providing a description file that doesn't have any descriptions in it)
  • update_column_descriptions doesn't provide a way to override the description file; that woudl probably be good to do, but I'm running out of steam tonight and would like to get this into review tomorrow.

@sbailey sbailey requested a review from weaverba137 March 3, 2023 05:47
@coveralls
Copy link

coveralls commented Mar 3, 2023

Coverage Status

Coverage: 100.0%. Remained the same when pulling 80b0b26 on autocolumndesc2 into 834841e on main.

@sbailey
Copy link
Contributor Author

sbailey commented Mar 3, 2023

The code test failure was an astropy 4.x vs. 5.x thing: when creating a table from a list of dicts, 5.x uses the order of the keys in the dict (what I was expecting) while 4.x alphabetizes the columns resulting in output tables with columns like

 ===================================== =============== ======= =========
 Description                           Name            Type    Units
 ===================================== =============== ======= =========
 blah                                  TARGETID        int64
...

I added code to force the column order (Name Type Units Description) and now tests pass on both astropy 4.x and 5.x.

The style check is still failing due to whitespace complaints which I'd rather not spend time chasing.

@weaverba137
Copy link
Member

I'll take a look at this later today. I'm pretty sure I can easily fix the style & also add unit tests to bring the coverage back to 100%.

@weaverba137
Copy link
Member

What is the difference between doc/column_descriptions.csv and py/desidatamodel/data/column_descriptions.csv?

@sbailey
Copy link
Contributor Author

sbailey commented Mar 3, 2023

What is the difference between doc/column_descriptions.csv and py/desidatamodel/data/column_descriptions.csv?

Whoops; doc/column_descriptions.csv was the original version from Aurelio's branch. I replaced it with py/desidatamodel/data/column_descriptions.csv so that it could be algorithmically loaded using resource_filename, but forgot to remove the original (now removed). I also updated py/desidatamodel/data/column_descriptions.csv to be a dump of the spreadsheet that Stephanie, Becky, Angela, (and others?) worked on during the last documentation sprint. The intention is that after this PR is merged, column_descriptions.csv will be the authoritative place to make further updates (albeit less user-friendly than a google sheet).

I will go check now if there are other recent updates from that google sheet that need to be incorporated here, as well as adding some missing units that I noticed.

@weaverba137
Copy link
Member

'Overriding header description %s with column description file description %s'

This warning doesn't quite parse right grammatically. Is the intention to list both the replacement text and the filename or just the filename?

@weaverba137
Copy link
Member

We're in good shape in terms of style and testing. It may still be in one of the unit tests, but do we need to continue to support Astropy 4?

@sbailey
Copy link
Contributor Author

sbailey commented Mar 3, 2023

'Overriding header description %s with column description file description %s'

This warning doesn't quite parse right grammatically. Is the intention to list both the replacement text and the filename or just the filename?

A more verbose version of what I was trying to say is "This code is taking the action of overriding the description that comes from the fits file header with the description that comes from column_descriptions.csv, aka 'the column descriptions file'". I simplified the text to just list the filename instead of also calling it the "column description file".

The style check is failing again because apparently it disagrees with the default python indentation from vim, but it doesn't tell me what indentation it wants, just that it is underindented.

We're in good shape in terms of style and testing. It may still be in one of the unit tests, but do we need to continue to support Astropy 4?

Thanks for updates. We don't need to go to great lengths to support astropy 4, but I found it useful to know that it didn't work with astropy 4 and the fix was pretty easy so I went ahead and did it and restored support for astropy 4. In the future we may find some astropy 4 incompatibility that would be harder to support, in which case we might drop astropy 4 support at that time. i.e. keep the test in place for now, while recognizing that the existence of the test doesn't obligate us to support astropy 4 forever.

@weaverba137
Copy link
Member

I've fixed the style and updated the change log.

The style trick here is to align indented text with the text after the opening (, i.e.:

                    log.warning('Overriding header units "%s" with units "%s" from %s',
                                units, desc_units, self.description_file)

instead of

                    log.warning('Overriding header units "%s" with units "%s" from %s',
                            units, desc_units, self.description_file)

Copy link
Member

@weaverba137 weaverba137 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we've already fixed anything I would have commented on, so merge when ready.

@sbailey
Copy link
Contributor Author

sbailey commented Mar 3, 2023

Thanks for the indentation fix and the accompanying explanation. I'll make one last update to the column_descriptions.txt file from the latest version of the google sheet, and then merge this and deprecate updates to the google sheet.

@sbailey sbailey merged commit 180dc7e into main Mar 4, 2023
@sbailey sbailey deleted the autocolumndesc2 branch March 4, 2023 00:17
@stephjuneau
Copy link

stephjuneau commented Mar 6, 2023

@sbailey it's possible that the original csv file and the spreadsheet have different column order between units and description. Not sure if this would break the code? Might be worth having a unit test to audit the column description file. I haven't made changes to the spreadsheet in a little while but we're having new documentation sprints starting this week so I was going to revisit it then. Should I start working from the CSV file instead and make a new PR now that this one is merged?

@sbailey
Copy link
Contributor Author

sbailey commented Mar 6, 2023

@stephjuneau thanks for pointing out the different column order. I basically took a dump of the spreadsheet as the latest version, and then adapted @aureliocarnero's code to be able to use that, even with the different column order than original (and extra columns like Units). There are tests that ensure that the csv file is still valid and can be used, e.g. if you add a Description that includes a comma but forget to add quotes around the description, the unit test will catch that you broke the csv format (run "pytest" at the top level to run tests before pushing to github).

Additionally, desidatamodel.stub.read_column_descriptions specifically checks that the columns are in the expected order before blindly using them, and then the returned structure is standardized such that the rest of the code is agnostic to the original input format.

I haven't made changes to the spreadsheet in a little while but we're having new documentation sprints starting this week so I was going to revisit it then. Should I start working from the CSV file instead and make a new PR now that this one is merged?

Yes, thanks. I made one more round of updates to the spreadsheet and propagated it to this PR just before merging, but future updates should be to the CSV file (admittedly less user-friendly than editing the spreadsheet, but easier to keep in sync with the code).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants