Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation read/write of VOTable with Parquet serialization #15281

Merged
merged 29 commits into from Oct 11, 2023

Conversation

afaisst
Copy link
Contributor

@afaisst afaisst commented Sep 5, 2023

Description

This pull request is to address reading and writing of VOTables with Parquet serialization. Currently this functionality only exists for FITS. In this PR, we modify the read functionality of "astropy.io.votable.parse" to read VOTables with embedded link to an existing Parquet file. In addition, we add functionality to write VOTables serialized with Parquet files using the ".write()" functionality. In addition, a test function "test_read_write_votable_parquet()" in "table_test.py" is added.

  • By checking this box, the PR author has requested that maintainers do NOT use the "Squash and Merge" button. Maintainers should respect this when possible; however, the final decision is at the discretion of the maintainer that merges the PR.

cc @stargaser @bsipocz

@github-actions
Copy link

github-actions bot commented Sep 5, 2023

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

  • Do the proposed changes actually accomplish desired goals?
  • Do the proposed changes follow the Astropy coding guidelines?
  • Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see instructions for rebase and squash.
  • Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
  • Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
  • Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
  • Is a milestone set? Milestone must be set but we cannot check for it on Actions; do not let the green checkmark fool you.
  • At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Welcome to Astropy 👋 and congratulations on your first pull request! 🎉

A project member will respond to you as soon as possible; in the meantime, please have a look over the Checklist for Contributed Code and make sure you've addressed as many of the questions there as possible.

If you feel that this pull request has not been responded to in a timely manner, please send a message directly to the development mailing list. If the issue is urgent or sensitive in nature (e.g., a security vulnerability) please send an e-mail directly to the private e-mail feedback@astropy.org.

@github-actions
Copy link

github-actions bot commented Sep 5, 2023

👋 Thank you for your draft pull request! Do you know that you can use [ci skip] or [skip ci] in your commit messages to skip running continuous integration tests until you are ready?

@bsipocz
Copy link
Member

bsipocz commented Sep 6, 2023

pre-commit.ci autofix

Copy link
Contributor

@stargaser stargaser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly minor comments in-line.

astropy/io/votable/exceptions.py Outdated Show resolved Hide resolved
astropy/io/votable/exceptions.py Outdated Show resolved Hide resolved
)

# Open created VOTable with Parquet serialization
votable = parse(filename)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table write is done with the Table I/O registry but the read is done with the lower-level parse instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the parse parses the VOTable correctly and creates a VOTable object. Reading the tables only also works with Table.read(filename). It adds the units correctly, but does not seem to have the other quantities as a VOTable object.
What would you suggest?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, it is necessary to read the table back in as a VOTable to have access to the metadata. That justifies leaving the use of parse as-is.

astropy/io/votable/tests/test_table.py Outdated Show resolved Hide resolved
astropy/io/votable/tests/test_table.py Outdated Show resolved Hide resolved
Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @afaisst, this is coming along super nicely!

Some minor things are in inline comments. The big missing parts are:

  • narrative docs, a changelog entry and a what's new.
  • fixing the tests, I'll push some commits that do at least some of that
  • adding an example file to the test data and use that in tests

astropy/io/votable/connect.py Outdated Show resolved Hide resolved
astropy/io/votable/connect.py Outdated Show resolved Hide resolved
astropy/io/votable/tests/test_table.py Outdated Show resolved Hide resolved
astropy/io/votable/connect.py Outdated Show resolved Hide resolved
@bsipocz
Copy link
Member

bsipocz commented Sep 20, 2023

pre-commit.ci autofix

1 similar comment
@bsipocz
Copy link
Member

bsipocz commented Sep 20, 2023

pre-commit.ci autofix

@bsipocz
Copy link
Member

bsipocz commented Sep 20, 2023

@pllim - we're at the point where pre-commit is more annoying than useful. It either should autofix or just be quiet

@pllim
Copy link
Member

pllim commented Sep 20, 2023

You might have to rebase to pick up the RTD fix. FYI.

@bsipocz
Copy link
Member

bsipocz commented Sep 20, 2023

Hmm, CI was all good before the rebase 🤦‍♀️

@pllim
Copy link
Member

pllim commented Sep 21, 2023

Failure on Windows looks like a path issue

astropy\io\votable\tests\test_table.py::test_read_write_votable_parquet -
urllib.error.URLError: <urlopen error [WinError 3] The system cannot find the path specified: ''>

@eerovaher
Copy link
Member

This is a reply to #15281 (comment) by @bsipocz

@pllim - we're at the point where pre-commit is more annoying than useful. It either should autofix or just be quiet

pre-commit is capable of automatically applying fixes, but it needs to be installed locally. This is described in the astropy documentation.

@bsipocz
Copy link
Member

bsipocz commented Sep 21, 2023

@eerovaher - thanks for pointing out, but I'm well aware.
Not that I need to explain, but my comment is part of a longer discussion with Pey Lian.

Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two additional suggestions came to mind as I'm working in this file

Comment on lines +3119 to +3125
else: # in this case, there is no STREAM, hence no file linked.
href = ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cause an exception in the next line and therefore should be handled somehow.

in VOTables.
"""
# looks like votable already has a "Table" imported.
from astropy.table import Table as Table2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid this awkward solution, I would suggest directly using the reader rather than via the convenience in Table.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, this wasn't a good idea as unlike ascii/votable/fits, as I see the parquet reader (and the others in misc) are not used directly even in their own tests. So I would leave this as is for now, and address is separately in the PR for #15358

@bsipocz
Copy link
Member

bsipocz commented Sep 21, 2023

@pllim - Indeed, the issue comes down to the relative vs absolute patch in a href=file://. I'm fairly sure we should see this same issue with the method (_parse_fits) as the affected part of the code is shared, but it looks like it doesn't have any code coverage.

@pllim
Copy link
Member

pllim commented Sep 21, 2023

Re: #15281 (comment)

Should this be a separate bug fix PR then?

pre-commit-ci bot and others added 22 commits October 11, 2023 12:19
…table rather to to code) and adding test data files
Updated index.rst (section "Data Serialization Formats") to include a bullet point about the new PARQUET serialization.
Added a section "VOTable now supports PARQUET serialization" with examples to present this new feature.
Co-authored-by: P. L. Lim <2090236+pllim@users.noreply.github.com>
Copy link
Contributor

@tomdonaldson tomdonaldson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this work, and the discussions and updates. It looks good to me now.

@tomdonaldson tomdonaldson merged commit ac9e9f7 into astropy:main Oct 11, 2023
25 of 26 checks passed
pllim added a commit to pllim/astropy that referenced this pull request Oct 13, 2023
in io.votable so ResourceWarning does not pop up on Windows.
This was added in astropy#15281
pllim added a commit to pllim/astropy that referenced this pull request Oct 13, 2023
in io.votable so ResourceWarning does not pop up on Windows.
This was added in astropy#15281
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants