Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support selecting multiple column names in FITS_rec #6820

Open
adrn opened this issue Nov 8, 2017 · 6 comments
Open

Support selecting multiple column names in FITS_rec #6820

adrn opened this issue Nov 8, 2017 · 6 comments

Comments

@adrn
Copy link
Member

adrn commented Nov 8, 2017

FITS_rec generally behaves like a Numpy structured array, but doesn't support selecting multiple column names (tested with Numpy 1.13 and both Astropy 2.0.2 and master). See the example here:
https://gist.github.com/4f3d0684183da6f413539854f213b845

The workaround I'm using right now is:

data = fits.getdata('/path/to/file.fits')
data = data.view(data.dtype, np.ndarray)
@adrn adrn added the io.fits label Nov 8, 2017
@pllim
Copy link
Member

pllim commented Oct 21, 2020

@adrn , has this been solved with Unified I/O using Table?

@adrn
Copy link
Member Author

adrn commented Oct 21, 2020

This specific issue is still present with FITS_rec.

This might still be a relevant feature request: I believe Table loads all data into memory (?), whereas FITS_rec could in principle work with mmap=True, and then selecting just a few column names would be more memory efficient. But I could be wrong about memmap...

@pllim
Copy link
Member

pllim commented Oct 21, 2020

I thought Table added memmap at some point but I don't remember the details. @saimn mentioned that this is not trivial to implement in io.fits.

@saimn
Copy link
Contributor

saimn commented Oct 21, 2020

You can use memmap with Table now, e.g. Table.read(..., memmap=True).
However this is interesting when reading rows, for columns I guess it will still load the whole file.

@embray
Copy link
Member

embray commented Oct 23, 2020

It depends on the format of the table. For tables with only a few columns, selecting one column will typically mean loading the whole table into memory anyways since the strides will be less than one page size. For tables with many large columns, you can get some savings, depending on the stride size.

These are things that IIRC fitsio handles better anyways, because it read an entire column (not sure about multiple columns simultaneously) without mapping the entire file into memory.

To be clear, though, if you take a view of a table consisting of one or a few columns it's not like it will be paged into memory all at once either--only as parts of the table are accessed.

@embray
Copy link
Member

embray commented Oct 23, 2020

The lack of support for selecting multiple column names simultaneously seems to just be a minor interface issue. I forgot you could even do that with Numpy structured arrays (is it a new feature for parity with Pandas or has it always been there?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants