ARROW-6403: [Python] Expose FileReader::ReadRowGroups() to Python #5241
ARROW-6403: [Python] Expose FileReader::ReadRowGroups() to Python #5241ARF1 wants to merge 7 commits intoapache:masterfrom ARF1:ReadRowGroups
Conversation
Expose ReadRowGroups to Python to allow efficient filtered reading implementations as suggested @xhochy in #2491 (comment) Without this PR users would have to re-implement threaded reads in python.
|
As it stands, this PR replicates the C++ interface of parquet in python. Namely it introduces the method It might be worth considering whether maintainers would prefer avoiding the almost-duplication in the API and instead extend the functionality of the existing |
|
@ARF1 Thank you for the PR. Could you open a JIRA and prefix the PR title with: ARROW-{JIRA #}: https://github.com/apache/arrow/blob/master/.github/CONTRIBUTING.md has more thorough documentation on the process. |
|
It also looks like there are flake8 violations: |
@emkornfield Done. Thanks for the guidance. The flake8 violations should now also be fixed. The checks are still running... |
trailing whitespace in cython code Co-Authored-By: Uwe L. Korn <xhochy@users.noreply.github.com>
Codecov Report
@@ Coverage Diff @@
## master #5241 +/- ##
===========================================
- Coverage 87.64% 65.09% -22.55%
===========================================
Files 1033 500 -533
Lines 148463 67929 -80534
Branches 1437 0 -1437
===========================================
- Hits 130118 44219 -85899
- Misses 17983 23710 +5727
+ Partials 362 0 -362
Continue to review full report at Codecov.
|
Expose ReadRowGroups to Python to allow efficient filtered reading implementations as suggested @xhochy in #2491 (comment)_
Without this PR users would have to re-implement threaded reads in python.