New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
auto update column units and descriptions #144
Conversation
The code test failure was an astropy 4.x vs. 5.x thing: when creating a table from a list of dicts, 5.x uses the order of the keys in the dict (what I was expecting) while 4.x alphabetizes the columns resulting in output tables with columns like
I added code to force the column order (Name Type Units Description) and now tests pass on both astropy 4.x and 5.x. The style check is still failing due to whitespace complaints which I'd rather not spend time chasing. |
I'll take a look at this later today. I'm pretty sure I can easily fix the style & also add unit tests to bring the coverage back to 100%. |
What is the difference between |
…ta/column_descriptions.csv
Whoops; I will go check now if there are other recent updates from that google sheet that need to be incorporated here, as well as adding some missing units that I noticed. |
This warning doesn't quite parse right grammatically. Is the intention to list both the replacement text and the filename or just the filename? |
We're in good shape in terms of style and testing. It may still be in one of the unit tests, but do we need to continue to support Astropy 4? |
A more verbose version of what I was trying to say is "This code is taking the action of overriding the description that comes from the fits file header with the description that comes from column_descriptions.csv, aka 'the column descriptions file'". I simplified the text to just list the filename instead of also calling it the "column description file". The style check is failing again because apparently it disagrees with the default python indentation from vim, but it doesn't tell me what indentation it wants, just that it is underindented.
Thanks for updates. We don't need to go to great lengths to support astropy 4, but I found it useful to know that it didn't work with astropy 4 and the fix was pretty easy so I went ahead and did it and restored support for astropy 4. In the future we may find some astropy 4 incompatibility that would be harder to support, in which case we might drop astropy 4 support at that time. i.e. keep the test in place for now, while recognizing that the existence of the test doesn't obligate us to support astropy 4 forever. |
I've fixed the style and updated the change log. The style trick here is to align indented text with the text after the opening log.warning('Overriding header units "%s" with units "%s" from %s',
units, desc_units, self.description_file) instead of log.warning('Overriding header units "%s" with units "%s" from %s',
units, desc_units, self.description_file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we've already fixed anything I would have commented on, so merge when ready.
Thanks for the indentation fix and the accompanying explanation. I'll make one last update to the column_descriptions.txt file from the latest version of the google sheet, and then merge this and deprecate updates to the google sheet. |
@sbailey it's possible that the original csv file and the spreadsheet have different column order between units and description. Not sure if this would break the code? Might be worth having a unit test to audit the column description file. I haven't made changes to the spreadsheet in a little while but we're having new documentation sprints starting this week so I was going to revisit it then. Should I start working from the CSV file instead and make a new PR now that this one is merged? |
@stephjuneau thanks for pointing out the different column order. I basically took a dump of the spreadsheet as the latest version, and then adapted @aureliocarnero's code to be able to use that, even with the different column order than original (and extra columns like Units). There are tests that ensure that the csv file is still valid and can be used, e.g. if you add a Description that includes a comma but forget to add quotes around the description, the unit test will catch that you broke the csv format (run "pytest" at the top level to run tests before pushing to github). Additionally,
Yes, thanks. I made one more round of updates to the spreadsheet and propagated it to this PR just before merging, but future updates should be to the CSV file (admittedly less user-friendly than editing the spreadsheet, but easier to keep in sync with the code). |
This PR builds upon the work of @aureliocarnero, @stephjuneau, @angelaberti, and Becky Canning to:
py/desidatamodel/data/column_descriptions.csv
with standardized units and descriptions for each columnbin/generate_model
to use those units/descriptions when creating a new datamodel stub filebin/update_column_descriptions
to update the column units and descriptions of a pre-existing data model file--force
, it will override pre-existing entries, even non-blank onesI updated a coadd data model file with
as an example of the output from non-force mode. Note that it handles the case of the Units column needing to get wider since the newly added units were wider than any of the previously existing units.
This includes unit tests with complete coverage for all new code except one log line for the case that the input fits file had a non-blank column description that it will override; I don't think any of the existing test files have non-blank column descriptions to test that on.
Implementation notes
desidatamodel.update.format_rst_table
has a lot of conceptual overlap with table formatting code indesidatamodel.stub.Stub
, but since those were implemented as member functions of theStub
class, I couldn't directly re-use them. We could update the Stub class to use this non-member function instead, but I didn't want to embark upon that surgery in this PR.generate_model
provides a way to use a non-default column description file, but not an easy way to not override descriptions at all (except by providing a description file that doesn't have any descriptions in it)update_column_descriptions
doesn't provide a way to override the description file; that woudl probably be good to do, but I'm running out of steam tonight and would like to get this into review tomorrow.