Skip to content

ARROW-6403: [Python] Expose FileReader::ReadRowGroups() to Python #5241

Closed
ARF1 wants to merge 7 commits intoapache:masterfrom
ARF1:ReadRowGroups
Closed

ARROW-6403: [Python] Expose FileReader::ReadRowGroups() to Python #5241
ARF1 wants to merge 7 commits intoapache:masterfrom
ARF1:ReadRowGroups

Conversation

@ARF1
Copy link
Copy Markdown
Contributor

@ARF1 ARF1 commented Aug 30, 2019

Expose ReadRowGroups to Python to allow efficient filtered reading implementations as suggested @xhochy in #2491 (comment)_

Without this PR users would have to re-implement threaded reads in python.

ARF1 added 2 commits August 30, 2019 22:11
Expose ReadRowGroups to Python to allow efficient filtered reading
implementations as suggested @xhochy in
#2491 (comment)

Without this PR users would have to re-implement threaded reads in python.
@ARF1 ARF1 changed the title Read row groups ARF1 ARF1 [Python] Expose FileReader::ReadRowGroups() to Python Aug 30, 2019
@ARF1 ARF1 changed the title ARF1 ARF1 [Python] Expose FileReader::ReadRowGroups() to Python [Python] Expose FileReader::ReadRowGroups() to Python Aug 30, 2019
@ARF1
Copy link
Copy Markdown
Contributor Author

ARF1 commented Aug 30, 2019

As it stands, this PR replicates the C++ interface of parquet in python. Namely it introduces the method read_row_groups(row_groups, ...) in addition to the existing read_row_group(i, ...) in ParquetReader and ParquetFile.

It might be worth considering whether maintainers would prefer avoiding the almost-duplication in the API and instead extend the functionality of the existing read_row_group() to accept a list-like object in addition to the currently accepted integer.

@emkornfield
Copy link
Copy Markdown
Contributor

@ARF1 Thank you for the PR. Could you open a JIRA and prefix the PR title with: ARROW-{JIRA #}:

https://github.com/apache/arrow/blob/master/.github/CONTRIBUTING.md has more thorough documentation on the process.

@emkornfield
Copy link
Copy Markdown
Contributor

It also looks like there are flake8 violations:
+python3 -m flake8 --count /home/travis/build/apache/arrow/python
/home/travis/build/apache/arrow/python/pyarrow/parquet.py:193:24: E128 continuation line under-indented for visual indent
/home/travis/build/apache/arrow/python/pyarrow/parquet.py:218:80: E501 line too long (85 > 79 characters)

@ARF1 ARF1 changed the title [Python] Expose FileReader::ReadRowGroups() to Python ARROW-6403: [Python] Expose FileReader::ReadRowGroups() to Python Aug 31, 2019
@ARF1
Copy link
Copy Markdown
Contributor Author

ARF1 commented Aug 31, 2019

@ARF1 Thank you for the PR. Could you open a JIRA and prefix the PR title with: ARROW-{JIRA #}:

https://github.com/apache/arrow/blob/master/.github/CONTRIBUTING.md has more thorough documentation on the process.

@emkornfield Done. Thanks for the guidance. The flake8 violations should now also be fixed. The checks are still running...

trailing whitespace in cython code

Co-Authored-By: Uwe L. Korn <xhochy@users.noreply.github.com>
@codecov-io
Copy link
Copy Markdown

Codecov Report

Merging #5241 into master will decrease coverage by 22.54%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #5241       +/-   ##
===========================================
- Coverage   87.64%   65.09%   -22.55%     
===========================================
  Files        1033      500      -533     
  Lines      148463    67929    -80534     
  Branches     1437        0     -1437     
===========================================
- Hits       130118    44219    -85899     
- Misses      17983    23710     +5727     
+ Partials      362        0      -362
Impacted Files Coverage Δ
python/pyarrow/parquet.py 92.34% <100%> (+0.04%) ⬆️
python/pyarrow/_parquet.pyx 91.33% <100%> (+0.04%) ⬆️
python/pyarrow/tests/test_parquet.py 96.61% <100%> (+0.1%) ⬆️
cpp/src/arrow/util/memory.h 0% <0%> (-100%) ⬇️
cpp/src/gandiva/date_utils.h 0% <0%> (-100%) ⬇️
cpp/src/arrow/util/memory.cc 0% <0%> (-100%) ⬇️
cpp/src/gandiva/decimal_type_util.h 0% <0%> (-100%) ⬇️
cpp/src/arrow/filesystem/util_internal.cc 0% <0%> (-100%) ⬇️
cpp/src/arrow/compute/logical_type.h 0% <0%> (-100%) ⬇️
cpp/src/parquet/hasher.h 0% <0%> (-100%) ⬇️
... and 791 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update beea8f9...d4ab186. Read the comment docs.

Copy link
Copy Markdown
Member

@xhochy xhochy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants