New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option for fits table reader to read a subset of columns #6503
Conversation
Hi there @eteq 👋 - thanks for the pull request! I'm just a friendly 🤖 that checks for issues related to the changelog and making sure that this pull request is milestoned and labeled correctly. This is mainly intended for the maintainers, so if you are not a maintainer you can ignore this, and a maintainer will let you know if any action is required on your part 😃. I noticed the following issue with this pull request:
Would it be possible to fix this? Thanks! If there are any issues with this message, please report them here. |
Is this motivated by #6491 ? It's basically what I was suggesting in #6491 (comment) 😉 . |
This only solves memory management "horizontally". Would be nice to also have an option to read all columns but not all rows (in chunks "vertically") but that is a different PR. |
if include_columns is None: | ||
t = Table(table.data, masked=masked) | ||
else: | ||
coldct = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be an OrderedDict
so the output table names are in the same order as include_columns
. (And I agree with @saimn that include_names
would be consistent with io.ascii
.
(You could also use a list and then init the Table with names=include_names
).
hdu = BinTableHDU(self.data) | ||
|
||
t = Table.read(hdu, include_columns=include_columns) | ||
assert len(t.columns) == 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If order is preserved then it seems like assert t.colnames == include_columns
would be sufficient?
@@ -91,6 +92,11 @@ def read_table_fits(input, hdu=None, astropy_native=False): | |||
astropy_native : bool, optional | |||
Read in FITS columns as native astropy objects where possible instead | |||
of standard Table Column objects. Default is False. | |||
include_columns : list of str or None, optional | |||
A list of column names to include in the output table. If None, include | |||
*all* columns. Note that this is best used for performance on a table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a severe performance penalty for excluding just a few columns? Anyway, people may have different reasons for doing this, and for instance you may have a table where one column is an image with dims (2048, 2048) for each element and the user wants to exclude it. (Which is a good reason to also have the complementary exclude_names
arg.)
BTW here is the logic for |
Note that this may not really be needed if #6821 goes ahead - please check that PR first |
Is this still needed now that #6821 is merged ? I think having the |
I think this doesn't provide benefits if one uses Table.read with the |
Hi humans 👋 - this pull request hasn't had any new commits for approximately 5 months. I plan to close this in a month if the pull request doesn't have any new commits by then. In lieu of a stalled pull request, please consider closing this and open an issue instead if a reminder is needed to revisit in the future. Maintainers may also choose to add If this PR still needs to be reviewed, as an author, you can rebase it to reset the clock. You may also consider sending a reminder e-mail about it to the astropy-dev mailing list. If you believe I commented on this pull request incorrectly, please report this here. |
As mentioned above, I think we should close this since we have memory mapping now for Table and FITS |
⏰ Time's up! ⏰ I'm going to close this pull request as per my previous message. If you think what is being added/fixed here is still important, please remember to open an issue to keep track of it. Thanks! If this is the first time I am commenting on this issue, or if you believe I closed this issue incorrectly, please report this here. |
This PR adds a keyword to the
io.fits
reader for fits tables that allows the user to ask for the resultingTable
to have only a subset of the columns in the fileOf course, it's always possible to do something like this:
but testing revealed that it can be quite a bit faster to copy over only the desired columns, at least if it's a much smaller subset that's desired then in the whole tables. And for big tables this performance hit can really matter a lot.
cc @taldcroft
BTW, this idea was suggested by @yymao (so feedback from him on whether this satisfies what he was thinking would be helpful!)