WIP: test #2689

xuchuanyin · 2018-09-04T06:31:26Z

add zstd compressor for compressing column data
add zstd support in thrift
since zstd does not support zero-copy while compressing, offheap will not take effect for zstd
Column compressor is configured through system property and can be changed in each load. Before loading, Carbondata will get the compressor and use that compressor during that loading. During querying, carbondata will get the compressor information from metadata in the file data.
Also support compressing streaming table using zstd. The compressor info is stored in FileHeader of the streaming file.
This PR also considered and verified on the legacy store and compaction

A simple test with 1.2GB raw CSV data shows that the size (in MB) of final store with different compressor:

local dictionary	snappy	zstd	Size Reduced
enabled	335	207	38.2%
disabled	375	225	40%

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

Any interfaces changed?
Yes, only internal used interfaces are changed
Any backward compatibility impacted?
Yes, backward compatibility is handled
Document update required?
Yes
Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
Added tests
- How it is tested? Please attach test report.
Tested in local machine
- Is it a performance related change? Please attach the performance test report.
The size of final store has been decreased by 40% compared with default snappy
- Any additional information to help reviewers in testing this change.
NA
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
NA

xuchuanyin · 2018-09-04T06:32:51Z

This PR is a replacement for PR #2628 with no changes, the CI for original PR has problems.

CarbonDataQA · 2018-09-04T07:15:53Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8282/

CarbonDataQA · 2018-09-04T07:38:01Z

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/211/

xuchuanyin · 2018-09-04T12:03:24Z

retest this please

CarbonDataQA · 2018-09-04T12:59:36Z

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/231/

CarbonDataQA · 2018-09-04T13:03:03Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8301/

CarbonDataQA · 2018-09-05T09:32:05Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8323/

CarbonDataQA · 2018-09-05T09:42:37Z

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/253/

CarbonDataQA · 2018-09-05T14:07:40Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/5/

1. add zstd compressor for compressing column data 2. add zstd support in thrift 3. legacy store is not considered in this commit 4. since zstd does not support zero-copy while compressing, offheap will not take effect for zstd 5. support lazy load for compressor

In query procedure, we need to decompress the column page. Previously we get the compressor from system property. Now since we support new compressors, we should read the compressor information from the metadata in datafiles. This PR also solve the compatibility related problems on V1/V2 store where we only support snappy.

we will get the column compressor before data loading/compaction start, so that it can make all the pages use the same compressor in case of concurrent modifying compressor during loading.

column compressor is necessary for carbon load model, otherwise load will fail.

optimize parameters for column page, use columnPageEncodeMeta instead of its members

CarbonDataQA · 2018-09-06T02:07:15Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/103/

CarbonDataQA · 2018-09-06T03:57:05Z

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.3/8341/

CarbonDataQA · 2018-09-06T03:57:06Z

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/271/

xuchuanyin mentioned this pull request Sep 4, 2018

[CARBONDATA-2851][CARBONDATA-2852] Support zstd as column compressor in final store #2628

Closed

5 tasks

xuchuanyin changed the title ~~[CARBONDATA-2851][CARBONDATA-2852] Support zstd as column compressor in final store~~ WIP:[CARBONDATA-2851][CARBONDATA-2852] Support zstd as column compressor in final store Sep 4, 2018

xuchuanyin changed the title ~~WIP:[CARBONDATA-2851][CARBONDATA-2852] Support zstd as column compressor in final store~~ WIP: test Sep 5, 2018

xuchuanyin force-pushed the 0813_read_compressor_from_datafiles branch from 10ccff8 to 343a57c Compare September 5, 2018 07:54

xuchuanyin added 8 commits September 6, 2018 09:06

fix comments

36fd6d1

Determine the column compressor before data loading

7257571

we will get the column compressor before data loading/compaction start, so that it can make all the pages use the same compressor in case of concurrent modifying compressor during loading.

set compressor in carbon load model

db6bd7e

column compressor is necessary for carbon load model, otherwise load will fail.

fix error in test

1dc44a4

fix review comments

d7ce289

optimize parameters for column page, use columnPageEncodeMeta instead of its members

fix errors in test

81dd2b5

xuchuanyin force-pushed the 0813_read_compressor_from_datafiles branch from 67cccb1 to 81dd2b5 Compare September 6, 2018 01:35

xuchuanyin closed this Sep 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: test #2689

WIP: test #2689

xuchuanyin commented Sep 4, 2018

xuchuanyin commented Sep 4, 2018 •

edited

CarbonDataQA commented Sep 4, 2018

CarbonDataQA commented Sep 4, 2018

xuchuanyin commented Sep 4, 2018

CarbonDataQA commented Sep 4, 2018

CarbonDataQA commented Sep 4, 2018

CarbonDataQA commented Sep 5, 2018

CarbonDataQA commented Sep 5, 2018

CarbonDataQA commented Sep 5, 2018

CarbonDataQA commented Sep 6, 2018

CarbonDataQA commented Sep 6, 2018

CarbonDataQA commented Sep 6, 2018

WIP: test #2689

WIP: test #2689

Conversation

xuchuanyin commented Sep 4, 2018

xuchuanyin commented Sep 4, 2018 • edited

CarbonDataQA commented Sep 4, 2018

CarbonDataQA commented Sep 4, 2018

xuchuanyin commented Sep 4, 2018

CarbonDataQA commented Sep 4, 2018

CarbonDataQA commented Sep 4, 2018

CarbonDataQA commented Sep 5, 2018

CarbonDataQA commented Sep 5, 2018

CarbonDataQA commented Sep 5, 2018

CarbonDataQA commented Sep 6, 2018

CarbonDataQA commented Sep 6, 2018

CarbonDataQA commented Sep 6, 2018

xuchuanyin commented Sep 4, 2018 •

edited