Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-11279: [Rust][Parquet] ArrowWriter Definition Levels Memory Usage #9222

Closed
wants to merge 2 commits into from
Closed

ARROW-11279: [Rust][Parquet] ArrowWriter Definition Levels Memory Usage #9222

wants to merge 2 commits into from

Conversation

TurnOfACard
Copy link

Writes leaves immediately after calculating array levels to reduce array level memory usage by the number of rows in a row group.

Writes leaves immediately after calculating array levels to reduce array level memory usage by the number of rows in a row group.
@github-actions
Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on JIRA?
https://issues.apache.org/jira/browse/ARROW

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@TurnOfACard TurnOfACard changed the title [ARROW-11279][Rust][Parquet] ArrowWriter Definition Levels Memory Usage ARROW-11279: [Rust][Parquet] ArrowWriter Definition Levels Memory Usage Jan 17, 2021
@github-actions
Copy link

@codecov-io
Copy link

codecov-io commented Jan 17, 2021

Codecov Report

Merging #9222 (daaf2a7) into master (1393188) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9222      +/-   ##
==========================================
- Coverage   81.61%   81.61%   -0.01%     
==========================================
  Files         215      215              
  Lines       51867    51860       -7     
==========================================
- Hits        42329    42323       -6     
+ Misses       9538     9537       -1     
Impacted Files Coverage Δ
rust/parquet/src/arrow/arrow_writer.rs 95.73% <100.00%> (+0.15%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eaa7b7a...daaf2a7. Read the comment docs.

Copy link
Contributor

@nevi-me nevi-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @TurnOfACard, good improvement.

Please run cargo +stable fmt to fix the rustfmt CI issue.

@alamb @sunchao I've reasoned about this change, and it doesn't pose issues for deeply nested structs, and will indeed reduce memory usage. If possible, we can merge this one too for 3.0.0

@TurnOfACard
Copy link
Author

That should have been updated now

@alamb
Copy link
Contributor

alamb commented Jan 17, 2021

I apologize for the delay in merging Rust PRs -- the 3.0 release is being finalized now and are planning to minimize entropy by postponing merging changes not critical for the release until the process was complete. I hope the process is complete in the next few days. There is more discussion in the mailing list

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Less code and less memory: double win. Thanks a lot @TurnOfACard for this contribution and welcome to the Rust implementation of the Arrow project!

Copy link
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - seems when calculating definition/repetition levels we always assume the input array is of primitive type? (need to catch up on the parquet/arrow writer work)

Edit: ah I see, it starts from 1 but will be incremented when recursing down the nested structure.

@nevi-me nevi-me closed this in e7c69e6 Jan 20, 2021
kszucs pushed a commit that referenced this pull request Jan 25, 2021
Writes leaves immediately after calculating array levels to reduce array level memory usage by the number of rows in a row group.

Closes #9222 from TurnOfACard/parquet-memory

Authored-by: Ryan Jennings <ryan@ryanj.net>
Signed-off-by: Neville Dipale <nevilledips@gmail.com>
GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
Writes leaves immediately after calculating array levels to reduce array level memory usage by the number of rows in a row group.

Closes apache#9222 from TurnOfACard/parquet-memory

Authored-by: Ryan Jennings <ryan@ryanj.net>
Signed-off-by: Neville Dipale <nevilledips@gmail.com>
michalursa pushed a commit to michalursa/arrow that referenced this pull request Jun 13, 2021
Writes leaves immediately after calculating array levels to reduce array level memory usage by the number of rows in a row group.

Closes apache#9222 from TurnOfACard/parquet-memory

Authored-by: Ryan Jennings <ryan@ryanj.net>
Signed-off-by: Neville Dipale <nevilledips@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants