-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-9027: [Python][Testing] Split parquet tests into multiple files + clean-up #8816
Conversation
31f997a
to
77b89ae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for looking into this!
A recent merged PR (#8704) added a parquet test, so maybe ensure that your rebase correctly picked it up.
We probably shouldn't let this take too long, to avoid other conflicts.
The test_basic.py is still a big chunk, wondering if we can further split that. There are some tests specific to ParquetFile
API, which could be split (but it's also not a big chunk I think)
LocalFileSystem._get_instance(), | ||
fs.LocalFileSystem(), | ||
]) | ||
def test_parquet_writer_filesystem_local(tempdir, filesystem): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the tests below are a bunch of ParquetWriter related tests, which are not directly related to multi-file datasets, so can probably be moved elsewhere (either to test_basic, or to separate file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I separated them out (into test_writer.py
)
bab48bd
to
1b421cf
Compare
05f73a5
to
b006b26
Compare
@@ -0,0 +1,759 @@ | |||
# Licensed to the Apache Software Foundation (ASF) under one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these aren't pandas-dependent tests - rather they're the tests that test interop with pandas data structures
b006b26
to
26fa390
Compare
26fa390
to
532f33f
Compare
This is ready for re-review (modulo |
c2cff61
to
93f198e
Compare
Sorry, we merged another PR which added a test to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few more comments (and thanks again for working on this one, it's not the most rewarding issue ;))
|
||
@parametrize_legacy_dataset | ||
@pytest.mark.pandas | ||
def test_filter_before_validate_schema(tempdir, use_legacy_dataset): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one can be moved to test_dataset.py
, I think, since it's a dataset specific feature that is being tested (although it's using the read_table
function in the test, that dispatches to ParquetDataset)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved it to test_dataset
93f198e
to
8534c7d
Compare
Yes, on rebasing (it's in |
d18cef7
to
e12fb86
Compare
e12fb86
to
4af670a
Compare
@arw2019 updated this a bit further, and will merge now. Thanks! With our workflow policy of rebasing / force pushing, it was basically impossible to review your additional changes ... (not your fault to be clear! Just a workflow for which the github interface is not made ..) |
… + clean-up Only relocation - none of the tests are touched. cc @jorisvandenbossche Closes apache#8816 from arw2019/ARROW-9027-test_parquet Lead-authored-by: Andrew Wieteska <andrew.r.wieteska@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Only relocation - none of the tests are touched.
cc @jorisvandenbossche