Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option for fits table reader to read a subset of columns #6503

Closed
wants to merge 3 commits into from

Conversation

eteq
Copy link
Member

@eteq eteq commented Sep 1, 2017

This PR adds a keyword to the io.fits reader for fits tables that allows the user to ask for the resulting Table to have only a subset of the columns in the file

Of course, it's always possible to do something like this:

tab = Table.read('filename.fits')
newtab = tab['col1', 'col5', 'col20']

but testing revealed that it can be quite a bit faster to copy over only the desired columns, at least if it's a much smaller subset that's desired then in the whole tables. And for big tables this performance hit can really matter a lot.

cc @taldcroft

BTW, this idea was suggested by @yymao (so feedback from him on whether this satisfies what he was thinking would be helpful!)

@eteq eteq added this to the v3.0.0 milestone Sep 1, 2017
@astropy-bot
Copy link

astropy-bot bot commented Sep 1, 2017

Hi there @eteq 👋 - thanks for the pull request! I'm just a friendly 🤖 that checks for issues related to the changelog and making sure that this pull request is milestoned and labeled correctly. This is mainly intended for the maintainers, so if you are not a maintainer you can ignore this, and a maintainer will let you know if any action is required on your part 😃.

I noticed the following issue with this pull request:

  • The milestone has not been set (this can only be set by a maintainer)

Would it be possible to fix this? Thanks!

If there are any issues with this message, please report them here.

@saimn
Copy link
Contributor

saimn commented Sep 1, 2017

Is this motivated by #6491 ? It's basically what I was suggesting in #6491 (comment) 😉 . io.ascii has similar parameters but with a different name: include_names and exclude_names. May be better to use the same ?

@pllim
Copy link
Member

pllim commented Sep 1, 2017

This only solves memory management "horizontally". Would be nice to also have an option to read all columns but not all rows (in chunks "vertically") but that is a different PR.

@yymao
Copy link

yymao commented Sep 1, 2017

Thank you @eteq! This is exactly the feature I was looking for. Much appreciated. I agree with @saimn that since io.ascii already uses include_names and exclude_names parameters, it might be better to keep them consistent.

if include_columns is None:
t = Table(table.data, masked=masked)
else:
coldct = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be an OrderedDict so the output table names are in the same order as include_columns. (And I agree with @saimn that include_names would be consistent with io.ascii.

(You could also use a list and then init the Table with names=include_names).

hdu = BinTableHDU(self.data)

t = Table.read(hdu, include_columns=include_columns)
assert len(t.columns) == 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If order is preserved then it seems like assert t.colnames == include_columns would be sufficient?

@@ -91,6 +92,11 @@ def read_table_fits(input, hdu=None, astropy_native=False):
astropy_native : bool, optional
Read in FITS columns as native astropy objects where possible instead
of standard Table Column objects. Default is False.
include_columns : list of str or None, optional
A list of column names to include in the output table. If None, include
*all* columns. Note that this is best used for performance on a table
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a severe performance penalty for excluding just a few columns? Anyway, people may have different reasons for doing this, and for instance you may have a table where one column is an image with dims (2048, 2048) for each element and the user wants to exclude it. (Which is a good reason to also have the complementary exclude_names arg.)

@taldcroft
Copy link
Member

BTW here is the logic for include_names and exclude_names in io.ascii. For consistency it would make sense to use the same logic.

@astrofrog
Copy link
Member

Note that this may not really be needed if #6821 goes ahead - please check that PR first

@saimn
Copy link
Contributor

saimn commented Dec 4, 2017

Is this still needed now that #6821 is merged ? I think having the include_names and exclude_names options would still be useful to avoid loading columns, because just simply having a reference to the columns may trigger the loading later (even with memmap, as the fits code cannot know if the data is used or not it counts the references and may copy the data before closing the memmap).

@astrofrog
Copy link
Member

astrofrog commented Dec 4, 2017

I think this doesn't provide benefits if one uses Table.read with the memmap option (not the default at the moment) but @eteq should probably confirm.

@astrofrog astrofrog removed this from the v3.0.0 milestone Jan 23, 2018
@astropy-bot
Copy link

astropy-bot bot commented Feb 8, 2018

Hi humans 👋 - this pull request hasn't had any new commits for approximately 5 months. I plan to close this in a month if the pull request doesn't have any new commits by then.

In lieu of a stalled pull request, please consider closing this and open an issue instead if a reminder is needed to revisit in the future. Maintainers may also choose to add keep-open label to keep this PR open but it is discouraged unless absolutely necessary.

If this PR still needs to be reviewed, as an author, you can rebase it to reset the clock. You may also consider sending a reminder e-mail about it to the astropy-dev mailing list.

If you believe I commented on this pull request incorrectly, please report this here.

@astrofrog
Copy link
Member

As mentioned above, I think we should close this since we have memory mapping now for Table and FITS

@astropy-bot
Copy link

astropy-bot bot commented Mar 8, 2018

⏰ Time's up! ⏰

I'm going to close this pull request as per my previous message. If you think what is being added/fixed here is still important, please remember to open an issue to keep track of it. Thanks!

If this is the first time I am commenting on this issue, or if you believe I closed this issue incorrectly, please report this here.

@astropy-bot astropy-bot bot closed this Mar 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants