Support selecting multiple column names in FITS_rec #6820

adrn · 2017-11-08T16:57:32Z

FITS_rec generally behaves like a Numpy structured array, but doesn't support selecting multiple column names (tested with Numpy 1.13 and both Astropy 2.0.2 and master). See the example here:
https://gist.github.com/4f3d0684183da6f413539854f213b845

The workaround I'm using right now is:

data = fits.getdata('/path/to/file.fits')
data = data.view(data.dtype, np.ndarray)

The text was updated successfully, but these errors were encountered:

pllim · 2020-10-21T15:19:09Z

@adrn , has this been solved with Unified I/O using Table?

adrn · 2020-10-21T15:24:20Z

This specific issue is still present with FITS_rec.

This might still be a relevant feature request: I believe Table loads all data into memory (?), whereas FITS_rec could in principle work with mmap=True, and then selecting just a few column names would be more memory efficient. But I could be wrong about memmap...

pllim · 2020-10-21T15:27:11Z

I thought Table added memmap at some point but I don't remember the details. @saimn mentioned that this is not trivial to implement in io.fits.

saimn · 2020-10-21T18:48:24Z

You can use memmap with Table now, e.g. Table.read(..., memmap=True).
However this is interesting when reading rows, for columns I guess it will still load the whole file.

embray · 2020-10-23T13:26:37Z

It depends on the format of the table. For tables with only a few columns, selecting one column will typically mean loading the whole table into memory anyways since the strides will be less than one page size. For tables with many large columns, you can get some savings, depending on the stride size.

These are things that IIRC fitsio handles better anyways, because it read an entire column (not sure about multiple columns simultaneously) without mapping the entire file into memory.

To be clear, though, if you take a view of a table consisting of one or a few columns it's not like it will be paged into memory all at once either--only as parts of the table are accessed.

embray · 2020-10-23T13:27:41Z

The lack of support for selecting multiple column names simultaneously seems to just be a minor interface issue. I forgot you could even do that with Numpy structured arrays (is it a new feature for parity with Pandas or has it always been there?)

adrn added the io.fits label Nov 8, 2017

pllim added the Feature Request label Oct 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support selecting multiple column names in FITS_rec #6820

Support selecting multiple column names in FITS_rec #6820

adrn commented Nov 8, 2017 •

edited

pllim commented Oct 21, 2020

adrn commented Oct 21, 2020

pllim commented Oct 21, 2020

saimn commented Oct 21, 2020

embray commented Oct 23, 2020

embray commented Oct 23, 2020

Support selecting multiple column names in FITS_rec #6820

Support selecting multiple column names in FITS_rec #6820

Comments

adrn commented Nov 8, 2017 • edited

pllim commented Oct 21, 2020

adrn commented Oct 21, 2020

pllim commented Oct 21, 2020

saimn commented Oct 21, 2020

embray commented Oct 23, 2020

embray commented Oct 23, 2020

adrn commented Nov 8, 2017 •

edited