Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-9027: [Python][Testing] Split parquet tests into multiple files + clean-up #8816

Closed
wants to merge 14 commits into from

Conversation

arw2019
Copy link
Contributor

@arw2019 arw2019 commented Dec 2, 2020

Only relocation - none of the tests are touched.

cc @jorisvandenbossche

@github-actions
Copy link

github-actions bot commented Dec 2, 2020

@arw2019 arw2019 force-pushed the ARROW-9027-test_parquet branch 4 times, most recently from 31f997a to 77b89ae Compare December 2, 2020 04:22
@github-actions github-actions bot added the needs-rebase A PR that needs to be rebased by the author label Dec 2, 2020
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for looking into this!

A recent merged PR (#8704) added a parquet test, so maybe ensure that your rebase correctly picked it up.
We probably shouldn't let this take too long, to avoid other conflicts.

The test_basic.py is still a big chunk, wondering if we can further split that. There are some tests specific to ParquetFile API, which could be split (but it's also not a big chunk I think)

python/pyarrow/tests/parquet/common.py Outdated Show resolved Hide resolved
python/pyarrow/tests/parquet/common.py Outdated Show resolved Hide resolved
LocalFileSystem._get_instance(),
fs.LocalFileSystem(),
])
def test_parquet_writer_filesystem_local(tempdir, filesystem):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the tests below are a bunch of ParquetWriter related tests, which are not directly related to multi-file datasets, so can probably be moved elsewhere (either to test_basic, or to separate file)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I separated them out (into test_writer.py)

python/pyarrow/tests/parquet/test_multifile_ds.py Outdated Show resolved Hide resolved
@github-actions github-actions bot removed the needs-rebase A PR that needs to be rebased by the author label Dec 4, 2020
@arw2019 arw2019 marked this pull request as draft December 4, 2020 18:14
@arw2019 arw2019 marked this pull request as ready for review December 5, 2020 06:57
@@ -0,0 +1,759 @@
# Licensed to the Apache Software Foundation (ASF) under one
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these aren't pandas-dependent tests - rather they're the tests that test interop with pandas data structures

@github-actions github-actions bot added the needs-rebase A PR that needs to be rebased by the author label Dec 7, 2020
@github-actions github-actions bot removed the needs-rebase A PR that needs to be rebased by the author label Dec 7, 2020
@arw2019
Copy link
Contributor Author

arw2019 commented Dec 8, 2020

This is ready for re-review (modulopyarrow/tests/test_orc.py failing on Python / AMD64 MacOS 10.15 Python 3 for a reason I haven't figured out yet)

@jorisvandenbossche
Copy link
Member

Sorry, we merged another PR which added a test to to_parquet.py: #8861. Can you check this is included correctly?

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more comments (and thanks again for working on this one, it's not the most rewarding issue ;))

python/pyarrow/tests/parquet/common.py Outdated Show resolved Hide resolved
python/pyarrow/tests/parquet/common.py Show resolved Hide resolved
python/pyarrow/tests/parquet/test_metadata.py Outdated Show resolved Hide resolved
python/pyarrow/tests/parquet/test_basic.py Outdated Show resolved Hide resolved

@parametrize_legacy_dataset
@pytest.mark.pandas
def test_filter_before_validate_schema(tempdir, use_legacy_dataset):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one can be moved to test_dataset.py, I think, since it's a dataset specific feature that is being tested (although it's using the read_table function in the test, that dispatches to ParquetDataset)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it to test_dataset

@arw2019
Copy link
Contributor Author

arw2019 commented Dec 11, 2020

Sorry, we merged another PR which added a test to to_parquet.py: #8861. Can you check this is included correctly?

Yes, on rebasing (it's in test_metadata.py)

@jorisvandenbossche
Copy link
Member

@arw2019 updated this a bit further, and will merge now. Thanks!

With our workflow policy of rebasing / force pushing, it was basically impossible to review your additional changes ... (not your fault to be clear! Just a workflow for which the github interface is not made ..)
So I went through the files locally, and made a few additional changes.

GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
… + clean-up

Only relocation - none of the tests are touched.

cc @jorisvandenbossche

Closes apache#8816 from arw2019/ARROW-9027-test_parquet

Lead-authored-by: Andrew Wieteska <andrew.r.wieteska@gmail.com>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants