Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-3160: [Python] Improve pathlib.Path support in parquet and filesystem modules #2506

Closed
wants to merge 7 commits into from

Conversation

kszucs
Copy link
Member

@kszucs kszucs commented Sep 2, 2018

  • organize parquet test files to be consistent with the recently added orc subdir
  • use pathlib paths in parquet and orc tests

I intend to consolidate parquet test suite a bit in followup PRs.

@kszucs kszucs changed the title [Python] ParquetManifest and ParquetDatasetPiece accept pathlib.Path ARROW-3160: [Python] ParquetManifest and ParquetDatasetPiece accept pathlib.Path Sep 2, 2018
@xhochy
Copy link
Member

xhochy commented Sep 4, 2018

This PR still has test failures.

@kszucs
Copy link
Member Author

kszucs commented Sep 4, 2018

Sorry, forgot to prepend with WIP. I'll request your review when it's done.

@kszucs kszucs changed the title ARROW-3160: [Python] ParquetManifest and ParquetDatasetPiece accept pathlib.Path ARROW-3160: [Python] Improve pathlib.Path support in parquet and filesystem modules Sep 4, 2018
@kszucs
Copy link
Member Author

kszucs commented Sep 4, 2018

@xhochy What do You think about using pathlib in the implementation too?

@codecov-io
Copy link

Codecov Report

Merging #2506 into master will increase coverage by 1.25%.
The diff coverage is 88.65%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2506      +/-   ##
==========================================
+ Coverage   86.24%   87.49%   +1.25%     
==========================================
  Files         308      246      -62     
  Lines       47094    43441    -3653     
==========================================
- Hits        40615    38010    -2605     
+ Misses       6405     5431     -974     
+ Partials       74        0      -74
Impacted Files Coverage Δ
python/pyarrow/io.pxi 60.52% <0%> (-0.41%) ⬇️
python/pyarrow/tests/test_orc.py 100% <100%> (ø) ⬆️
python/pyarrow/parquet.py 93.56% <100%> (+2.22%) ⬆️
python/pyarrow/tests/test_parquet.py 97.41% <100%> (-0.02%) ⬇️
python/pyarrow/filesystem.py 50.27% <50%> (-1.2%) ⬇️
python/pyarrow/util.py 75% <85%> (+16.66%) ⬆️
python/pyarrow/tests/conftest.py 77.04% <91.66%> (+3.46%) ⬆️
rust/src/record_batch.rs
go/arrow/datatype_nested.go
rust/src/array.rs
... and 59 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ad7e6c3...bf1fe1f. Read the comment docs.

@kszucs
Copy link
Member Author

kszucs commented Sep 5, 2018

@xhochy green

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


@pytest.fixture(scope='session')
def datadir():
return pathlib.Path(__file__).parent / 'data'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be in favor of moving all the test data files to https://github.com/apache/arrow-testing. It would take less stress (for me at least) about adding binary files in the future. There's relatively little need to ship the files (or the unit tests for that matter) with the library

@wesm wesm closed this in 21ef18b Sep 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants