Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compressed Shuffle (Arrow-IPC compression) #4

Closed
yjshen opened this issue Jan 11, 2022 · 5 comments
Closed

Compressed Shuffle (Arrow-IPC compression) #4

yjshen opened this issue Jan 11, 2022 · 5 comments
Labels
performance related Potential optimization point

Comments

@yjshen
Copy link
Contributor

yjshen commented Jan 11, 2022

Upstream issues:
[TODO] Rust side: apache/arrow-rs#1709
[Partly Finished?] Java side: https://issues.apache.org/jira/browse/ARROW-8672

@yjshen yjshen added the performance related Potential optimization point label Jan 11, 2022
@yjshen yjshen changed the title Considering shuffle data compression (Arrow-IPC compression) Compressed Shuffle (Arrow-IPC compression) May 18, 2022
@richox
Copy link
Collaborator

richox commented Jun 2, 2022

IPC block-based compression is supported now. we can still switch to column-based compression if it achieves better compression and performance.

@yjshen
Copy link
Contributor Author

yjshen commented Jun 2, 2022

Great work! Is that possible to report new benchmark results for the latest master? @richox

@yjshen
Copy link
Contributor Author

yjshen commented Jun 2, 2022

We could always explore buffer based compression when it gets direct support from arrow-rs later.

@richox
Copy link
Collaborator

richox commented Jun 8, 2022

Great work! Is that possible to report new benchmark results for the latest master? @richox

we got some performance issue when running on STS with small memory and broadcast join enabled. i guest we have to implement native BHJ before we get a better benchmark result.

@richox
Copy link
Collaborator

richox commented Sep 26, 2023

we implemented a custom designed format for serializing record batches in the latest version. arrow-ipc format is no longer used because we found some performance issues while compressing with low level zstd.

@richox richox closed this as completed Sep 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance related Potential optimization point
Projects
None yet
Development

No branches or pull requests

2 participants