New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: test #2689
WIP: test #2689
Conversation
This PR is a replacement for PR #2628 with no changes, the CI for original PR has problems. |
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8282/ |
Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/211/ |
retest this please |
Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/231/ |
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8301/ |
10ccff8
to
343a57c
Compare
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8323/ |
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/253/ |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/5/ |
1. add zstd compressor for compressing column data 2. add zstd support in thrift 3. legacy store is not considered in this commit 4. since zstd does not support zero-copy while compressing, offheap will not take effect for zstd 5. support lazy load for compressor
In query procedure, we need to decompress the column page. Previously we get the compressor from system property. Now since we support new compressors, we should read the compressor information from the metadata in datafiles. This PR also solve the compatibility related problems on V1/V2 store where we only support snappy.
we will get the column compressor before data loading/compaction start, so that it can make all the pages use the same compressor in case of concurrent modifying compressor during loading.
column compressor is necessary for carbon load model, otherwise load will fail.
optimize parameters for column page, use columnPageEncodeMeta instead of its members
67cccb1
to
81dd2b5
Compare
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/103/ |
Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.3/8341/ |
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/271/ |
A simple test with 1.2GB raw CSV data shows that the size (in MB) of final store with different compressor:
Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:
Any interfaces changed?
Yes, only internal used interfaces are changed
Any backward compatibility impacted?
Yes, backward compatibility is handled
Document update required?
Yes
Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
Added tests
- How it is tested? Please attach test report.
Tested in local machine
- Is it a performance related change? Please attach the performance test report.
The size of final store has been decreased by 40% compared with default snappy
- Any additional information to help reviewers in testing this change.
NA
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
NA