-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-11279: [Rust][Parquet] ArrowWriter Definition Levels Memory Usage #9222
Conversation
Writes leaves immediately after calculating array levels to reduce array level memory usage by the number of rows in a row group.
Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? Then could you also rename pull request title in the following format?
See also: |
Codecov Report
@@ Coverage Diff @@
## master #9222 +/- ##
==========================================
- Coverage 81.61% 81.61% -0.01%
==========================================
Files 215 215
Lines 51867 51860 -7
==========================================
- Hits 42329 42323 -6
+ Misses 9538 9537 -1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @TurnOfACard, good improvement.
Please run cargo +stable fmt
to fix the rustfmt CI issue.
@alamb @sunchao I've reasoned about this change, and it doesn't pose issues for deeply nested structs, and will indeed reduce memory usage. If possible, we can merge this one too for 3.0.0
That should have been updated now |
I apologize for the delay in merging Rust PRs -- the 3.0 release is being finalized now and are planning to minimize entropy by postponing merging changes not critical for the release until the process was complete. I hope the process is complete in the next few days. There is more discussion in the mailing list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Less code and less memory: double win. Thanks a lot @TurnOfACard for this contribution and welcome to the Rust implementation of the Arrow project!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - seems when calculating definition/repetition levels we always assume the input array is of primitive type? (need to catch up on the parquet/arrow writer work)
Edit: ah I see, it starts from 1 but will be incremented when recursing down the nested structure.
Writes leaves immediately after calculating array levels to reduce array level memory usage by the number of rows in a row group. Closes #9222 from TurnOfACard/parquet-memory Authored-by: Ryan Jennings <ryan@ryanj.net> Signed-off-by: Neville Dipale <nevilledips@gmail.com>
Writes leaves immediately after calculating array levels to reduce array level memory usage by the number of rows in a row group. Closes apache#9222 from TurnOfACard/parquet-memory Authored-by: Ryan Jennings <ryan@ryanj.net> Signed-off-by: Neville Dipale <nevilledips@gmail.com>
Writes leaves immediately after calculating array levels to reduce array level memory usage by the number of rows in a row group. Closes apache#9222 from TurnOfACard/parquet-memory Authored-by: Ryan Jennings <ryan@ryanj.net> Signed-off-by: Neville Dipale <nevilledips@gmail.com>
Writes leaves immediately after calculating array levels to reduce array level memory usage by the number of rows in a row group.