Read units row from ascii files #9665

keflavich · 2019-11-24T20:09:54Z

Related to #4743 and #756, and loosely related to #9639 (which provides the workaround): we should have a mechanism for reading the units from the second (or n'th) header row of an ascii table.

Presently the only (?) human-readable format supported by astropy.tables is the IPAC format, which includes additional information and a very specific set of formatting, but it should be straightforward to include a units-parsing component to the generic ascii reader.

(if we have this feature somewhere, I was unable to find it by reading the docs...)

taldcroft · 2019-11-25T11:46:31Z

I would argue that ECSV is human-readable. What ECSV is NOT is human-writeable.

# %ECSV 0.9
# ---
# datatype:
# - {name: a, datatype: int64}
# - {name: b, unit: m, datatype: float64}
# - {name: c, datatype: string}
# schema: astropy-2.0
a b c
1 1.0 c
2 2.0 d

My question related to these requests to read such files is whether the data files with units in the second line are showing up "in the wild" (from collaborators or whatnot), or if you want to purposely write such a file in a new, implicitly-defined format.

keflavich · 2019-11-25T13:57:31Z

I disagree: ecsv is not very human-readable. It's not entirely unreadable, but this is difficult to parse and doesn't generalize well as more fields are added (while fixed width does):

Re: questions:
(1) Yes, these show in the wild all the time
(2) Yes, I'd like to write to this format. Generally I want to use this for small tables, not gigantic ones, but it's useful

pllim · 2019-11-25T15:42:40Z

I am hesitant to support random formats on a whim. It is a slippery slope. We already have options for you to write your own parser by subclassing Table.

keflavich · 2019-11-25T15:54:39Z

I am not suggesting creating or supporting new formats, really. I just want to be able to treat a specified header row as 'units' in the general case. Sure, my use case is mostly for human-readable formats, but the application is broad.

pllim · 2019-11-25T16:12:34Z

I have a lot of trauma trying to parse human-written units in the past. For example, "angstroms" vs "angstrom" or "erg" vs "ergs" or "ct" vs "cts" vs "count" vs "counts". And when you get to flux or magnitude units, even worse. Where do we draw the line? Is it human reable when the unit string in your header is very very long compared to the actual data for that column?

keflavich · 2019-11-25T17:46:35Z

But units already handles parsing. I'm perfectly fine with a reader that ignores units that it can't parse. If it's a human-readable file, it's also pretty editable.

You've highlighted plenty of corner cases, but corner cases always exist. I'm not asking for a fully flexible, always-solves-all-problems solution here, but I do think it should be possible to read something as simple as this:

Field    | B3_res | B3_sens | B6_res | B6_sens
         | arcsec | mJy     | arcsec | mJy
W51-E    | 0.37   | 0.03    | 0.37   | 0.1
W51-IRS2 | 0.37   | 0.03    | 0.37   | 0.1
W43-MM1  | 0.37   | 0.03    | 0.37   | 0.1
W43-MM2  | 0.37   | 0.03    | 0.37   | 0.1

more simply than:

tbl = ascii.read('/orange/adamginsburg/ALMA_IMF/analysis/requested.txt', data_start=2)
tbl_ = Table.read('/orange/adamginsburg/ALMA_IMF/analysis/requested.txt', format='ascii.fixed_width')
for colname in tbl.colnames:
    try:
        tbl[colname].unit = u.Unit(tbl_[colname][0])
    except:
        pass

which is what I have to do now to get the units extracted in the same format as the data

taldcroft · 2019-11-25T19:21:01Z

My real pushback in supporting easily reading this in astropy core is that it encourages people to write in this format, which is IMO a bad thing. It is basically a new ASCII data format that is not a standardized or a legacy format (like having the column names in the first row). But I do recognize that a certain degree of pragmatism is needed, and this issue comes up every so often, so I'm happy to hear arguments for accepting this format for reading.

In astropy core we could certainly add a subclass of the basic reader and a subclass of the fixed-width read to do this. It might not be entirely trivial just because there are many options and we aim to make any reader class that is in the core actually work correctly for all options.

With #9671 (and a slight mod to accept a masked Row for input) the code to read your file could be 2 lines:

units = Table.read(filename, format='ascii', data_start=1, data_end=2)[0]
tbl = Table.read(filename, format='ascii', units=units)

taldcroft · 2019-11-25T19:22:32Z

Sorry, I didn't get all the back to your first response (about files in the wild), so ignore my first paragraph.

keflavich · 2019-11-25T21:02:56Z

👍 on #9671, that's a nice step in the right direction.

The fixed-width, headers optionally included format is to me an undefined legacy standard that has always existed. Fixed-width is the old FORTRAN standard, and adding headers to it is - imo - a strict improvement. I often encounter these as un-headered fixed-width files that I manually edit to include headers.

keflavich added table io.ascii Effort-medium labels Nov 24, 2019

pllim added the Package-expert label Nov 25, 2019

taldcroft mentioned this issue Nov 26, 2019

Set units and descriptions in Table init #9671

Merged

taldcroft removed the table label Apr 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read units row from ascii files #9665

Read units row from ascii files #9665

keflavich commented Nov 24, 2019

taldcroft commented Nov 25, 2019

keflavich commented Nov 25, 2019

pllim commented Nov 25, 2019

keflavich commented Nov 25, 2019

pllim commented Nov 25, 2019

keflavich commented Nov 25, 2019

taldcroft commented Nov 25, 2019 •

edited

taldcroft commented Nov 25, 2019

keflavich commented Nov 25, 2019

Read units row from ascii files #9665

Read units row from ascii files #9665

Comments

keflavich commented Nov 24, 2019

taldcroft commented Nov 25, 2019

keflavich commented Nov 25, 2019

pllim commented Nov 25, 2019

keflavich commented Nov 25, 2019

pllim commented Nov 25, 2019

keflavich commented Nov 25, 2019

taldcroft commented Nov 25, 2019 • edited

taldcroft commented Nov 25, 2019

keflavich commented Nov 25, 2019

taldcroft commented Nov 25, 2019 •

edited