Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[CARBONDATA-2896][Refactor] Adaptive Encoding for Primitive data types
Loading configurations and settings (1) Parse data as like that of measure, so change in FieldEncoderFactory to take up measure flow (2) While creating loading configurations, no dictionary, sort columns should be taken care in all the needed flows Sort rows preparation (1) Prepare the row to be sorted with original data for no dictionary columns (2) Use data type based comparators for the no dictionary sort columns in all the flows like Intermediate Sort, Final sort, Unsafe sort (3) Handle read write of row with no dictionary primitive data types to intermediate files and in the final file merger, as we will be reading and writing as original data (4) Get the no dictionary sort data types from the load configurations what we set in LOAD step Adding to Column page and apply adaptive encoding (1) Add the no dictionary primitive datatypes data as original data (2) Apply adaptive encoding to the page (3) Reuse the adaptive encoding techniques existing for measure column Writing inverted index to adaptive encoded page (1) Prepare in the inverted inverted list based on the datatype based comparison (2) Apply RLE on the inverted index (3) Write the inverted index to the encoded page Create decoder while querying (1) Create proper decoder for the no dictionary column pages (2) Uncompress the column page and also the inverted index Filter flow changes (1) FilterValues will be in bytes, so convert the data to bytes for comparison (2) Change the isScanRequired to compare min/max values based on the data type Fill output row in case of queries (1) Change the noDictionaryKeys to Object, now it can be datatypes based data for no dictionary primitive data types Bloom filter changes (1) Change bloom filter load (2) While rebuilding the data map, the load expects the data to original data. Therefore a conversion is used (3) Fill the no dictionary primitive data as original data Compaction Changes Compaction will get the rows from the result collectors. But the result collectors will give bytes as no dictionary columns. So a conversion is needed to convert the bytes to original data based on the data type.
- Loading branch information
1 parent
476e6b2
commit d687986
Showing
81 changed files
with
2,298 additions
and
615 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.