Add option for fits table reader to read a subset of columns #6503

eteq · 2017-09-01T16:29:08Z

This PR adds a keyword to the io.fits reader for fits tables that allows the user to ask for the resulting Table to have only a subset of the columns in the file

Of course, it's always possible to do something like this:

tab = Table.read('filename.fits')
newtab = tab['col1', 'col5', 'col20']

but testing revealed that it can be quite a bit faster to copy over only the desired columns, at least if it's a much smaller subset that's desired then in the whole tables. And for big tables this performance hit can really matter a lot.

cc @taldcroft

BTW, this idea was suggested by @yymao (so feedback from him on whether this satisfies what he was thinking would be helpful!)

astropy-bot · 2017-09-01T16:29:09Z

Hi there @eteq 👋 - thanks for the pull request! I'm just a friendly 🤖 that checks for issues related to the changelog and making sure that this pull request is milestoned and labeled correctly. This is mainly intended for the maintainers, so if you are not a maintainer you can ignore this, and a maintainer will let you know if any action is required on your part 😃.

I noticed the following issue with this pull request:

The milestone has not been set (this can only be set by a maintainer)

Would it be possible to fix this? Thanks!

If there are any issues with this message, please report them here.

saimn · 2017-09-01T20:11:56Z

Is this motivated by #6491 ? It's basically what I was suggesting in #6491 (comment) 😉 . io.ascii has similar parameters but with a different name: include_names and exclude_names. May be better to use the same ?

pllim · 2017-09-01T20:14:13Z

This only solves memory management "horizontally". Would be nice to also have an option to read all columns but not all rows (in chunks "vertically") but that is a different PR.

yymao · 2017-09-01T20:35:16Z

Thank you @eteq! This is exactly the feature I was looking for. Much appreciated. I agree with @saimn that since io.ascii already uses include_names and exclude_names parameters, it might be better to keep them consistent.

taldcroft · 2017-09-02T00:17:15Z

astropy/io/fits/connect.py

+    if include_columns is None:
+        t = Table(table.data, masked=masked)
+    else:
+        coldct = {}


Should be an OrderedDict so the output table names are in the same order as include_columns. (And I agree with @saimn that include_names would be consistent with io.ascii.

(You could also use a list and then init the Table with names=include_names).

taldcroft · 2017-09-02T00:18:38Z

astropy/io/fits/tests/test_connect.py

+        hdu = BinTableHDU(self.data)
+
+        t = Table.read(hdu, include_columns=include_columns)
+        assert len(t.columns) == 2


If order is preserved then it seems like assert t.colnames == include_columns would be sufficient?

taldcroft · 2017-09-02T00:27:29Z

astropy/io/fits/connect.py

@@ -91,6 +92,11 @@ def read_table_fits(input, hdu=None, astropy_native=False):
    astropy_native : bool, optional
        Read in FITS columns as native astropy objects where possible instead
        of standard Table Column objects. Default is False.
+    include_columns : list of str or None, optional
+        A list of column names to include in the output table. If None, include
+        *all* columns.  Note that this is best used for performance on a table


Is there a severe performance penalty for excluding just a few columns? Anyway, people may have different reasons for doing this, and for instance you may have a table where one column is an image with dims (2048, 2048) for each element and the user wants to exclude it. (Which is a good reason to also have the complementary exclude_names arg.)

taldcroft · 2017-09-02T00:33:24Z

BTW here is the logic for include_names and exclude_names in io.ascii. For consistency it would make sense to use the same logic.

astrofrog · 2017-11-08T19:34:16Z

Note that this may not really be needed if #6821 goes ahead - please check that PR first

saimn · 2017-12-04T12:26:55Z

Is this still needed now that #6821 is merged ? I think having the include_names and exclude_names options would still be useful to avoid loading columns, because just simply having a reference to the columns may trigger the loading later (even with memmap, as the fits code cannot know if the data is used or not it counts the references and may copy the data before closing the memmap).

astrofrog · 2017-12-04T12:32:25Z

I think this doesn't provide benefits if one uses Table.read with the memmap option (not the default at the moment) but @eteq should probably confirm.

astropy-bot · 2018-02-08T12:42:37Z

Hi humans 👋 - this pull request hasn't had any new commits for approximately 5 months. I plan to close this in a month if the pull request doesn't have any new commits by then.

In lieu of a stalled pull request, please consider closing this and open an issue instead if a reminder is needed to revisit in the future. Maintainers may also choose to add keep-open label to keep this PR open but it is discouraged unless absolutely necessary.

If this PR still needs to be reviewed, as an author, you can rebase it to reset the clock. You may also consider sending a reminder e-mail about it to the astropy-dev mailing list.

If you believe I commented on this pull request incorrectly, please report this here.

astrofrog · 2018-02-08T14:23:45Z

As mentioned above, I think we should close this since we have memory mapping now for Table and FITS

astropy-bot · 2018-03-08T12:44:31Z

⏰ Time's up! ⏰

I'm going to close this pull request as per my previous message. If you think what is being added/fixed here is still important, please remember to open an issue to keep track of it. Thanks!

If this is the first time I am commenting on this issue, or if you believe I closed this issue incorrectly, please report this here.

eteq added 3 commits August 27, 2017 22:33

add include_columns option to read_table_fits

a208eb1

Add test for include_columns

5c03e8a

Added changelog entry

d0d71d3

eteq added io.fits table labels Sep 1, 2017

eteq added this to the v3.0.0 milestone Sep 1, 2017

taldcroft reviewed Sep 2, 2017

View reviewed changes

astrofrog removed this from the v3.0.0 milestone Jan 23, 2018

astropy-bot bot closed this Mar 8, 2018

pllim added the closed-by-bot label May 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option for fits table reader to read a subset of columns #6503

Add option for fits table reader to read a subset of columns #6503

eteq commented Sep 1, 2017

astropy-bot bot commented Sep 1, 2017 •

edited

saimn commented Sep 1, 2017

pllim commented Sep 1, 2017

yymao commented Sep 1, 2017

taldcroft Sep 2, 2017

taldcroft Sep 2, 2017

taldcroft Sep 2, 2017

taldcroft commented Sep 2, 2017

astrofrog commented Nov 8, 2017

saimn commented Dec 4, 2017

astrofrog commented Dec 4, 2017 •

edited

astropy-bot bot commented Feb 8, 2018

astrofrog commented Feb 8, 2018

astropy-bot bot commented Mar 8, 2018

Add option for fits table reader to read a subset of columns #6503

Add option for fits table reader to read a subset of columns #6503

Conversation

eteq commented Sep 1, 2017

astropy-bot bot commented Sep 1, 2017 • edited

saimn commented Sep 1, 2017

pllim commented Sep 1, 2017

yymao commented Sep 1, 2017

taldcroft Sep 2, 2017

Choose a reason for hiding this comment

taldcroft Sep 2, 2017

Choose a reason for hiding this comment

taldcroft Sep 2, 2017

Choose a reason for hiding this comment

taldcroft commented Sep 2, 2017

astrofrog commented Nov 8, 2017

saimn commented Dec 4, 2017

astrofrog commented Dec 4, 2017 • edited

astropy-bot bot commented Feb 8, 2018

astrofrog commented Feb 8, 2018

astropy-bot bot commented Mar 8, 2018

astropy-bot bot commented Sep 1, 2017 •

edited

astrofrog commented Dec 4, 2017 •

edited