-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-13361][SQL] Add benchmark codes for Encoder#compress() in CompressionSchemeBenchmark #11236
Conversation
@nongli This discussion comes from #10965; |
Anyway, I'd like to making prs to improve compression performance in In a second step, I have a plan to add codes to apply general-purpose compression algorithms like LZ4 and Snappy in the final step of Please give me some suggestion on this? |
Jenkins, retest this please. |
Test build #51484 has finished for PR 11236 at commit
|
I tried to implement The benchmark results are as follows;
The speeds of encoding/decoding get a little worse though, the compression ratios get much better. |
@nongli ping |
LGTM, I think moving forward, I agree that ColumnVector is a natural data structure to decode into, but we should probably not add this logic directly into those classes just from a code maintenance point of view. I think exploring the parquet encodings makes sense but let's start by benchmarking those and see if they have the right performance characteristics. |
Merging this in master. Thanks. |
This pr added benchmark codes for Encoder#compress().
Also, it replaced the benchmark results with new ones because the output format of
Benchmark
changed.