New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read units row from ascii files #9665
Comments
I would argue that ECSV is human-readable. What ECSV is NOT is human-writeable.
My question related to these requests to read such files is whether the data files with units in the second line are showing up "in the wild" (from collaborators or whatnot), or if you want to purposely write such a file in a new, implicitly-defined format. |
I am hesitant to support random formats on a whim. It is a slippery slope. We already have options for you to write your own parser by subclassing |
I am not suggesting creating or supporting new formats, really. I just want to be able to treat a specified header row as 'units' in the general case. Sure, my use case is mostly for human-readable formats, but the application is broad. |
I have a lot of trauma trying to parse human-written units in the past. For example, "angstroms" vs "angstrom" or "erg" vs "ergs" or "ct" vs "cts" vs "count" vs "counts". And when you get to flux or magnitude units, even worse. Where do we draw the line? Is it human reable when the unit string in your header is very very long compared to the actual data for that column? |
But You've highlighted plenty of corner cases, but corner cases always exist. I'm not asking for a fully flexible, always-solves-all-problems solution here, but I do think it should be possible to read something as simple as this:
more simply than:
which is what I have to do now to get the units extracted in the same format as the data |
My real pushback in supporting easily reading this in astropy core is that it encourages people to write in this format, which is IMO a bad thing. It is basically a new ASCII data format that is not a standardized or a legacy format (like having the column names in the first row). But I do recognize that a certain degree of pragmatism is needed, and this issue comes up every so often, so I'm happy to hear arguments for accepting this format for reading. In astropy core we could certainly add a subclass of the basic reader and a subclass of the fixed-width read to do this. It might not be entirely trivial just because there are many options and we aim to make any reader class that is in the core actually work correctly for all options. With #9671 (and a slight mod to accept a masked Row for input) the code to read your file could be 2 lines:
|
Sorry, I didn't get all the back to your first response (about files in the wild), so ignore my first paragraph. |
👍 on #9671, that's a nice step in the right direction. The fixed-width, headers optionally included format is to me an undefined legacy standard that has always existed. Fixed-width is the old FORTRAN standard, and adding headers to it is - imo - a strict improvement. I often encounter these as un-headered fixed-width files that I manually edit to include headers. |
Related to #4743 and #756, and loosely related to #9639 (which provides the workaround): we should have a mechanism for reading the units from the second (or n'th) header row of an ascii table.
Presently the only (?) human-readable format supported by astropy.tables is the IPAC format, which includes additional information and a very specific set of formatting, but it should be straightforward to include a units-parsing component to the generic ascii reader.
(if we have this feature somewhere, I was unable to find it by reading the docs...)
The text was updated successfully, but these errors were encountered: