Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW2: Optimize parquet read memory usage #1657

Closed
houqp opened this issue Jan 24, 2022 · 2 comments
Closed

ARROW2: Optimize parquet read memory usage #1657

houqp opened this issue Jan 24, 2022 · 2 comments
Labels
bug Something isn't working
Milestone

Comments

@houqp
Copy link
Member

houqp commented Jan 24, 2022

Describe the bug

First reported by @ic4y at #1556 (comment).

This is also causing TPCH q7 benchmark to fail due to OOM in #1652 (comment).

To Reproduce

Compare peak memory usage between 2008b1d and c0c9c72 when processing a parquet table.

Expected behavior

Memory usage should be on par with arrow-rs or alternatively we should have an option in arrow2 to let user make memory usage and array segmentation tradeoffs.

Additional context

Related upstream issue: jorgecarleitao/arrow2#768

@houqp houqp added the bug Something isn't working label Jan 24, 2022
@houqp houqp added this to the arrow2 milestone Jan 24, 2022
@kosciej
Copy link

kosciej commented Feb 19, 2023

It looks like related upstream issue is now closed: jorgecarleitao/arrow2#768

@Dandandan
Copy link
Contributor

I think we can close it as arrow2 and arrow will be merged in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants