Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HOTFIX] Fixed compilation issues and bloom clear issue #2428

Closed
wants to merge 1 commit into from

Conversation

ravipesala
Copy link
Contributor

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

  • Any interfaces changed?

  • Any backward compatibility impacted?

  • Document update required?

  • Testing done
    Please provide details on
    - Whether new unit test cases have been added or why no new tests are required?
    - How it is tested? Please attach test report.
    - Is it a performance related change? Please attach the performance test report.
    - Any additional information to help reviewers in testing this change.

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5471/

@CarbonDataQA
Copy link

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6647/

@kumarvishal09
Copy link
Contributor

retest this please

@CarbonDataQA
Copy link

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6656/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5483/

@ravipesala
Copy link
Contributor Author

SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5517/

@ravipesala
Copy link
Contributor Author

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5518/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6660/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5487/

@kumarvishal09
Copy link
Contributor

LGTM

@asfgit asfgit closed this in 589fe18 Jun 29, 2018
@ravipesala
Copy link
Contributor Author

SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5520/

sv71294 added a commit to sv71294/carbondata that referenced this pull request Jul 11, 2018
[CARBONDATA-2587][CARBONDATA-2588] Local Dictionary Data Loading support

What changes are proposed in this PR

Added code to support Local Dictionary Data Loading for primitive type
Added code to support Local Dictionary Data Loading for complex type.
How this PR is tested
Manual testing is done in 3 Node setup.
UT will be raised in different PR

This closes apache#2402

[CARBONDATA-2647] [CARBONDATA-2648] Add support for COLUMN_META_CACHE and CACHE_LEVEL in create table and alter table properties

Things done as part of this PR

Support for configuring COLUMN_META_CACHE in create and alter table set properties DDL.
Support for configuring CACHE_LEVEL in create and alter table set properties DDL.
Describe formatted display support for COLUMN_META_CACHE and CACHE_LEVEL
  Any interfaces changed?
Create Table Syntax
CREATE TABLE [dbName].tableName (col1 String, col2 String, col3 int,…) STORED BY ‘carbondata’ TBLPROPERTIES (‘COLUMN_META_CACHE’=’col1,col2,…’, 'CACHE_LEVEL'='BLOCKLET')

Alter Table set properties Syntax
ALTER TABLE [dbName].tableName SET TBLPROPERTIES (‘COLUMN_META_CACHE’=’col1,col2,…’, 'CACHE_LEVEL'='BLOCKLET')

This closs apache#2418

[CARBONDATA-2549] Bloom remove guava cache and use CarbonCache

Currently, bloom cache is implemented using guava cache, carbon has its own lru cache interfaces and complete sysytem it controls the cache intstead of controlling feature wise. So replace guava cache with carbon lru cache.

This closes apache#2327

[CARBONDATA-2608]Document update about Json Writer with examples.

Document update about Json Writer with examples.

This closes apache#2409

[CARBONDATA-2634][BloomDataMap] Add datamap properties in show datamap outputs

add datamap properties in show datamap outputs

This closes apache#2404

[CARBONDATA-2647] [CARBONDATA-2648] Fix cache level display in describe formatted command

1. Correct CACHE_LEVEL display in describe formatted command. It was always displays BLOCK
   even though val was configured BLOCKLET.
2. Correct the method arguments to pass dbName first and then tableName.
3. Added test case for blocking column_meta_cache and cache_level on child dataMaps.

This closes apache#2426

[CARBONDATA-2669] Local Dictionary Store Size optimisation and other function issues

Problems
Local dictionary store size issue.
When all column data is empty and columns are not present in sort columns local dictionary size was more than no dictionary dictionary store size.
Page level dictionary merging Issue
While merging the page used dictionary values in a blocklet it was missing some of the dictionary values, this is because, AND operation was done on bitset
Local Dictionary null values
Null value was not added in LV because of this new dictionary values was getting generated for null values
Local dictionary generator thread specific
Solution:
Added rle for unsorted dictionary values to reduce the size.
Now OR operation is performed while merging the dictionary values
Added LV for null values
Local dictionary generator task specific

This closes apache#2427

[CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]Local dictionary support for alter table, preaggregate, varchar datatype, alter set and unset

What changes were proposed in this pull request?
In this PR,

local dictionary support is added for alter table, preaggregate, varChar datatype, alter table set and unset command
UTs are added for local dictionary load support
All the validations related to above features are taken care in this PR
How was this patch tested?

All the tests were executed in 3 node cluster.
UTs and SDV test cases are added in the same PR

This closes apache#2401

[HOTFIX] Fixed compilation issues and bloom clear issue

Fixed test

This closes apache#2428

[CARBONDATA-2635][BloomDataMap] Support different index datamaps on same column

User can create different provider based index datamaps on one column,
for example user can create bloomfilter datamap and lucene datamap on
one column, but not able to create two bloomfilter datamap on one
column.

This closes apache#2405

[CARBONDATA-2646][DataLoad]change the log level while loading data into a table with 'sort_column_bounds' property,'ERROR' flag change to 'WARN' flag for some expected tasks.

change the log level while loading data into a table with 'sort_column_bounds' property,'ERROR' flag change to 'WARN' flag for some expected tasks.

This closes apache#2407

[CARBONDATA-2545] Fix some spell error in CarbonData

This closes apache#2419

[CARBONDATA-2629] Support SDK carbon reader read data from HDFS and S3 with filter function

Now SDK carbon reader only support read data from local with filter function, it will throw exception when read data from HDFS and S3 with filter function
This PR support it:
Support SDK carbon reader read data from HDFS and S3 with filter function

This closes apache#2399

[CARBONDATA-2644][DataLoad]ADD carbon.load.sortMemory.spill.percentage parameter invalid value check

This closes apache#2397

[CARBONDATA-2653][BloomDataMap] Fix bugs in incorrect blocklet number in bloomfilter

In non-deferred reuibuild scenario, the last bloomfilter index file has already been written onBlockletEnd, no need to write again, otherwise an extra blocklet number will be
generated in the bloom index file.

This closes apache#2408

[CARBONDATA-2674][Streaming]Streaming with merge index enabled does not consider the merge index file while pruning

This closes apache#2429

[CARBONDATA-2606][Complex DataType Enhancements]Fix for ComplexDataType Projection PushDown

Problem1: Fix for ComplexDataType Projection PushDown when Table Schema contains
ColumnName in UpperCase

Solution: Change ColumnName to Lowercase

Problem2: If Struct contains Array, pushdown only parent column

Solution: Check for ArrayType or GetArrayItem in the Complex Column,
if any ArrayType is found, then pushdown parent column

This closes apache#2421

[CARBONDATA-2633][BloomDataMap] Fix bugs in bloomfilter for dictionary/sort/date/TimeStamp column

for dictionary column, carbon convert literal value to dict value, then
convert dict value to mdk value, at last it stores the mdk value as
internal value in carbonfile.

for other columns, carbon convert literal value to internal value using
field-converter.

Since bloomfilter datamap stores the internal value, during query we
should convert the literal value in filter to internal value in order to
match the value stored in bloomfilter datamap.

Changes are made:

1.FieldConverters were refactored to extract common value convert methods.
2.BloomQueryModel was optimized to support converting literal value to
internal value.
3.fix bugs for int/float/date/timestamp as bloom index column
4.fix bugs in dictionary/sort column as bloom index column
5.add tests
6.block (deferred) rebuild for bloom datamap (contains bugs that does
not fix in this commit)

This closes apache#2403

[HOTFIX][32K]maintain proper mapping for varChar Columns and noDictionary Columns
for all the dimensions while creating sort data rows instance

Problem: when creating the column mapping for varChar columns and no dictionary
 columns for existing dimensions, the mapping is incorrect.

Solution: remove unwanted variable counter and map correct index to varChar columns
 and noDictionary columns based on the number of dimensions

This closes apache#2395

[CARBONDATA-2650][Datamap] Fix bugs in negative number of skipped blocklets

Currently in carbondata, default blocklet datamap will be used to prune
blocklets. Then other indexdatamap will be used.
But the other index datamap works for segment scope, which in some
scenarios, the size of pruned result will be bigger than that of default
datamap, thus causing negative number of skipped blocklets in explain
query output.

Here we add intersection after pruning. If the pruned result size is
zero, we will finish the pruning.

This closes apache#2410

[CARBONDATA-2654][Datamap] Optimize output for explaining querying with datamap

Currently if we have multiple datamaps and the query hits all the
datamaps, carbondata explain command will only show the first datamap
and all the datamaps are not shown. In this commit, we show all the
datamaps that are hitted in this query.

This closes apache#2411

[CARBONDATA-2687][BloomDataMap][Doc] Update document for bloomfilter datamap

In previous PR, cache behaviour for bloomfilter datamap has been changed: changed from guava-cache to carbon-cache. This PR update the document for bloomfilter datamap and remove the description for cache.

This closes apache#2446

Code Generator Error is thrown when Select filter contains more than one count of distinct of ComplexColumn with group by Clause

[CARBONDATA-2684] [PR-2442] Distinct count fails on complex columns

This PR fixes Code Generator Error thrown when Select filter contains more than one count of distinct of ComplexColumn with group by Clause

This closes apache#2449

[CARBONDATA-2645] Segregate block and blocklet cache

Things done as part of this PR

Segregate block and blocklet cache. In this driver will cache the metadata based on CACHE_LEVEL.
If CACHE_LEVEL is set to BLOCK then only carbondata files metadata will be cached in driver.
If CACHE_LEVEL is set to BLOCKLET thenmetadata for number of carbondata files * number of blocklets in each carbondata file will be cached in driver.

This closes apache#2437

[CARBONDATA-2675][32K] Support config long_string_columns when create datamap

Create datamap use select statement, but long string column is defined with StringType in the result dataframe if this column is selected. This PR allows to set long_string_columns property in dmproperties.

This closes apache#2432

[CARBONDATA-2683][32K] fix data convertion problem for Varchar

Spark uses org.apache.spark.unsafe.types.UTF8String for string datatype internally.
In carbon, varchar datatype should do the same convertion as string datatype. Or it may throw exception

This closes apache#2438

[CARBONDATA-2657][BloomDataMap] Fix bugs in loading and querying on bloom column with empty values

Fix bugs in loading and querying on bloom column …
Fix bugs in loading and querying with empty values on bloom index
columns. Convert null values to corresponding values.

This closes apache#2413

[CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]Added test cases for
 local dictionary support for alter table, set, unset and preaggregate

Added test cases for local dictionary support for alter table, set, unset and pre-aggregate
All the validations related to above features are taken care in this PR

This closes apache#2422

[CARBONDATA-2606][Complex DataType Enhancements] Fixed Projection Pushdown when
Select filter contains Struct column.

Problem:
If Select filter contains Struct Column which is not in Projection list,
then only null value is stored for struct column given in filter and select query result is null.
Solution:
Pushdown Parent column of corresponding struct type if any struct column is present in Filter list.

This closes apache#2439

[CARBONDATA-2642] Added configurable Lock path property

A new property is being exposed which will allow the user to configure the lock path "carbon.lock.path"
Refactored code to create a separate implementation for S3CarbonFile.

This closes apache#2642

[CARBONDATA-2686] Implement Left join on MV datamap

This closes apache#2444

[CARBONDATA-2660][BloomDataMap] Add test for querying on longstring bloom index column

Filtering on longstring bloom index column is already supported in PR apache#2403, here we only add test for it.

This closes apache#2416

[CARBONDATA-2689] Added validations for complex columns in alter set statements

Issue: Alter set statements were not validating complex dataType columns correctly.
Fix: Added a recursive method to validate string and varchar child columns of complex dataType columns.

This closes apache#2450

[CARBONDATA-2681][32K] Fix loading problem using global/batch sort fails when table has long string columns

In SortStepRowHandler, global/batch sort use convertRawRowTo3Parts instead of convertIntermediateSortTempRowTo3Parted.

varcharDimCnt was not add up to noDictArray cause error: Problem while converting row to 3 parts.

This closes apache#2435

[CARBONDATA-2658][DataLoad] Fix bugs in spilling in-memory pages

the parameter carbon.load.sortMemory.spill.percentage configured the value range 0-100,according to configuration merge and spill in-memory pages to disk

This closes apache#2414

[CARBONDATA-2666] updated rename command so that table directory is not renamed

rename will not rename table folder but only changes metadata

This closes apache#2420

[CARBONDATA-2637][BloomDataMap] Fix bugs for deferred rebuild for bloomfilter datamap

Previously when we implement ISSUE-2633, deferred rebuild for bloom
datamap is disabled for bloomfilter datamap due to unhandled bugs.
In this commit, we fixed the bugs and bring this feature back.

Since bloomfilter datamap create index on the carbon native raw bytes, we have to convert original literal value to carbon native bytes both in loading and querying.

This closes apache#2425

[CARBONDATA-2701] Refactor code to store minimal required info in Block and Blocklet Cache

1. Refactored code to keep only minimal information in block and blocklet cache.
2. Introduced segment properties holder at JVM level to hold the segment properties.
   As it is heavy object, new segment properties object will be created only when
   schema or cardinality is changed for a table.

This closes apache#2454

[CARBONDATA-2589][CARBONDATA-2590][CARBONDATA-2602]Local dictionary query Support

Supported Non filter query for local dictionary
Supported Filter query on local dictionary
Supported Query on complex column for primitive type local dictionary columns
Local Dictionary support on Varchar columns
Supported Vector reader on local dictionary

[CARBONDATA-2589][CARBONDATA-2590][CARBONDATA-2602]Local dictionary query Support

Supported Non filter query for local dictionary
Supported Filter query on local dictionary
Supported Query on complex column for primitive type local dictionary columns
Local Dictionary support on Varchar columns
Supported Vector reader on local dictionary

This closes apache#2447

[CARBONDATA-2585][CARBONDATA-2586]Fix local dictionary support for preagg and set
localdict info in column schema

This PR fixes local dictionary support for preaggregate and set the column dict info
of each column in column schema read and write for backward compatibility.

This closes apache#2451

[CARBONDATA-2711] carbonFileList is not initalized when updatetablelist call

bug fix: carbon is not initalized within updatetablelist method when we execute 'SELECT table_name FROM information_schema.tables WHERE table_schema = 'tmp_sbu_vadmdb' from command line

This closes apache#2468

[CARBONDATA-2685][DataMap] Parallize datamap rebuild processing for segments

Currently in carbondata, while rebuilding datamap, one spark job will be
started for each segment and all the jobs are executed serailly. If we
have many historical segments, the rebuild will takes a lot of time.

Here we optimize the procedure for datamap rebuild and start one start
for each segments, all the tasks can be done in parallel in one spark
job.

This closes apache#2443

[CARBONDATA-2706][BloomDataMap] clear bloom index files after segment is deleted

clear bloom index files after corresponding segment is deleted and
cleaned

This closes apache#2461

[CARBONDATA-2715][LuceneDataMap] Fix bug in search mode with lucene datamap in windows

While comparing two pathes, the file separator is different in windows,
thus causing empty pruned blocklets. This PR will ignore the file
separator

This closes apache#2470

[CARBONDATA-2703][Tests] Clear up env after tests

1.reset session parameters after test
2.clean up output after test

This closes apache#2458

[CARBONDATA-2607][Complex Column Enhancements] Complex Primitive DataType Adaptive Encoding

In this PR the improvement was done to save the complex type more effectively so that reading becomes more efficient.

The changes are:

Primitive types inside complex types are separate pages. Previously it was a single byte array column page for a complex column. Now all sub-levels inside the complex data types are stored as separate pages with their respective datatypes.

No Dictionary Primitive DataTypes inside Complex Columns will be processed through Adaptive Encoding. Previously only snappy compression was applied.

All Primitive datatypes inside complex if it is now dictionary, only value will be saved except String, Varchar which is saved as ByteArray. Previously all sub-levels are saved as Length And Value Format inside a single Byte Array. Currently only Struct And Array type column pages are saved in ByteArray. All other primitive except String and varchar are saved in respective fixed datatype length.

Support for the Safe and Unsafe Fixed length Column Page to support growing dynamic array implementation. This is done to support Array datatype.

Co-authored-by: sounakr <sounakr@gmail.com>

This closes apache#2417
sv71294 added a commit to sv71294/carbondata that referenced this pull request Jul 11, 2018
[CARBONDATA-2587][CARBONDATA-2588] Local Dictionary Data Loading support

What changes are proposed in this PR

Added code to support Local Dictionary Data Loading for primitive type
Added code to support Local Dictionary Data Loading for complex type.
How this PR is tested
Manual testing is done in 3 Node setup.
UT will be raised in different PR

This closes apache#2402

[CARBONDATA-2647] [CARBONDATA-2648] Add support for COLUMN_META_CACHE and CACHE_LEVEL in create table and alter table properties

Things done as part of this PR

Support for configuring COLUMN_META_CACHE in create and alter table set properties DDL.
Support for configuring CACHE_LEVEL in create and alter table set properties DDL.
Describe formatted display support for COLUMN_META_CACHE and CACHE_LEVEL
  Any interfaces changed?
Create Table Syntax
CREATE TABLE [dbName].tableName (col1 String, col2 String, col3 int,…) STORED BY ‘carbondata’ TBLPROPERTIES (‘COLUMN_META_CACHE’=’col1,col2,…’, 'CACHE_LEVEL'='BLOCKLET')

Alter Table set properties Syntax
ALTER TABLE [dbName].tableName SET TBLPROPERTIES (‘COLUMN_META_CACHE’=’col1,col2,…’, 'CACHE_LEVEL'='BLOCKLET')

This closs apache#2418

[CARBONDATA-2549] Bloom remove guava cache and use CarbonCache

Currently, bloom cache is implemented using guava cache, carbon has its own lru cache interfaces and complete sysytem it controls the cache intstead of controlling feature wise. So replace guava cache with carbon lru cache.

This closes apache#2327

[CARBONDATA-2608]Document update about Json Writer with examples.

Document update about Json Writer with examples.

This closes apache#2409

[CARBONDATA-2634][BloomDataMap] Add datamap properties in show datamap outputs

add datamap properties in show datamap outputs

This closes apache#2404

[CARBONDATA-2647] [CARBONDATA-2648] Fix cache level display in describe formatted command

1. Correct CACHE_LEVEL display in describe formatted command. It was always displays BLOCK
   even though val was configured BLOCKLET.
2. Correct the method arguments to pass dbName first and then tableName.
3. Added test case for blocking column_meta_cache and cache_level on child dataMaps.

This closes apache#2426

[CARBONDATA-2669] Local Dictionary Store Size optimisation and other function issues

Problems
Local dictionary store size issue.
When all column data is empty and columns are not present in sort columns local dictionary size was more than no dictionary dictionary store size.
Page level dictionary merging Issue
While merging the page used dictionary values in a blocklet it was missing some of the dictionary values, this is because, AND operation was done on bitset
Local Dictionary null values
Null value was not added in LV because of this new dictionary values was getting generated for null values
Local dictionary generator thread specific
Solution:
Added rle for unsorted dictionary values to reduce the size.
Now OR operation is performed while merging the dictionary values
Added LV for null values
Local dictionary generator task specific

This closes apache#2427

[CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]Local dictionary support for alter table, preaggregate, varchar datatype, alter set and unset

What changes were proposed in this pull request?
In this PR,

local dictionary support is added for alter table, preaggregate, varChar datatype, alter table set and unset command
UTs are added for local dictionary load support
All the validations related to above features are taken care in this PR
How was this patch tested?

All the tests were executed in 3 node cluster.
UTs and SDV test cases are added in the same PR

This closes apache#2401

[HOTFIX] Fixed compilation issues and bloom clear issue

Fixed test

This closes apache#2428

[CARBONDATA-2635][BloomDataMap] Support different index datamaps on same column

User can create different provider based index datamaps on one column,
for example user can create bloomfilter datamap and lucene datamap on
one column, but not able to create two bloomfilter datamap on one
column.

This closes apache#2405

[CARBONDATA-2646][DataLoad]change the log level while loading data into a table with 'sort_column_bounds' property,'ERROR' flag change to 'WARN' flag for some expected tasks.

change the log level while loading data into a table with 'sort_column_bounds' property,'ERROR' flag change to 'WARN' flag for some expected tasks.

This closes apache#2407

[CARBONDATA-2545] Fix some spell error in CarbonData

This closes apache#2419

[CARBONDATA-2629] Support SDK carbon reader read data from HDFS and S3 with filter function

Now SDK carbon reader only support read data from local with filter function, it will throw exception when read data from HDFS and S3 with filter function
This PR support it:
Support SDK carbon reader read data from HDFS and S3 with filter function

This closes apache#2399

[CARBONDATA-2644][DataLoad]ADD carbon.load.sortMemory.spill.percentage parameter invalid value check

This closes apache#2397

[CARBONDATA-2653][BloomDataMap] Fix bugs in incorrect blocklet number in bloomfilter

In non-deferred reuibuild scenario, the last bloomfilter index file has already been written onBlockletEnd, no need to write again, otherwise an extra blocklet number will be
generated in the bloom index file.

This closes apache#2408

[CARBONDATA-2674][Streaming]Streaming with merge index enabled does not consider the merge index file while pruning

This closes apache#2429

[CARBONDATA-2606][Complex DataType Enhancements]Fix for ComplexDataType Projection PushDown

Problem1: Fix for ComplexDataType Projection PushDown when Table Schema contains
ColumnName in UpperCase

Solution: Change ColumnName to Lowercase

Problem2: If Struct contains Array, pushdown only parent column

Solution: Check for ArrayType or GetArrayItem in the Complex Column,
if any ArrayType is found, then pushdown parent column

This closes apache#2421

[CARBONDATA-2633][BloomDataMap] Fix bugs in bloomfilter for dictionary/sort/date/TimeStamp column

for dictionary column, carbon convert literal value to dict value, then
convert dict value to mdk value, at last it stores the mdk value as
internal value in carbonfile.

for other columns, carbon convert literal value to internal value using
field-converter.

Since bloomfilter datamap stores the internal value, during query we
should convert the literal value in filter to internal value in order to
match the value stored in bloomfilter datamap.

Changes are made:

1.FieldConverters were refactored to extract common value convert methods.
2.BloomQueryModel was optimized to support converting literal value to
internal value.
3.fix bugs for int/float/date/timestamp as bloom index column
4.fix bugs in dictionary/sort column as bloom index column
5.add tests
6.block (deferred) rebuild for bloom datamap (contains bugs that does
not fix in this commit)

This closes apache#2403

[HOTFIX][32K]maintain proper mapping for varChar Columns and noDictionary Columns
for all the dimensions while creating sort data rows instance

Problem: when creating the column mapping for varChar columns and no dictionary
 columns for existing dimensions, the mapping is incorrect.

Solution: remove unwanted variable counter and map correct index to varChar columns
 and noDictionary columns based on the number of dimensions

This closes apache#2395

[CARBONDATA-2650][Datamap] Fix bugs in negative number of skipped blocklets

Currently in carbondata, default blocklet datamap will be used to prune
blocklets. Then other indexdatamap will be used.
But the other index datamap works for segment scope, which in some
scenarios, the size of pruned result will be bigger than that of default
datamap, thus causing negative number of skipped blocklets in explain
query output.

Here we add intersection after pruning. If the pruned result size is
zero, we will finish the pruning.

This closes apache#2410

[CARBONDATA-2654][Datamap] Optimize output for explaining querying with datamap

Currently if we have multiple datamaps and the query hits all the
datamaps, carbondata explain command will only show the first datamap
and all the datamaps are not shown. In this commit, we show all the
datamaps that are hitted in this query.

This closes apache#2411

[CARBONDATA-2687][BloomDataMap][Doc] Update document for bloomfilter datamap

In previous PR, cache behaviour for bloomfilter datamap has been changed: changed from guava-cache to carbon-cache. This PR update the document for bloomfilter datamap and remove the description for cache.

This closes apache#2446

Code Generator Error is thrown when Select filter contains more than one count of distinct of ComplexColumn with group by Clause

[CARBONDATA-2684] [PR-2442] Distinct count fails on complex columns

This PR fixes Code Generator Error thrown when Select filter contains more than one count of distinct of ComplexColumn with group by Clause

This closes apache#2449

[CARBONDATA-2645] Segregate block and blocklet cache

Things done as part of this PR

Segregate block and blocklet cache. In this driver will cache the metadata based on CACHE_LEVEL.
If CACHE_LEVEL is set to BLOCK then only carbondata files metadata will be cached in driver.
If CACHE_LEVEL is set to BLOCKLET thenmetadata for number of carbondata files * number of blocklets in each carbondata file will be cached in driver.

This closes apache#2437

[CARBONDATA-2675][32K] Support config long_string_columns when create datamap

Create datamap use select statement, but long string column is defined with StringType in the result dataframe if this column is selected. This PR allows to set long_string_columns property in dmproperties.

This closes apache#2432

[CARBONDATA-2683][32K] fix data convertion problem for Varchar

Spark uses org.apache.spark.unsafe.types.UTF8String for string datatype internally.
In carbon, varchar datatype should do the same convertion as string datatype. Or it may throw exception

This closes apache#2438

[CARBONDATA-2657][BloomDataMap] Fix bugs in loading and querying on bloom column with empty values

Fix bugs in loading and querying on bloom column …
Fix bugs in loading and querying with empty values on bloom index
columns. Convert null values to corresponding values.

This closes apache#2413

[CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]Added test cases for
 local dictionary support for alter table, set, unset and preaggregate

Added test cases for local dictionary support for alter table, set, unset and pre-aggregate
All the validations related to above features are taken care in this PR

This closes apache#2422

[CARBONDATA-2606][Complex DataType Enhancements] Fixed Projection Pushdown when
Select filter contains Struct column.

Problem:
If Select filter contains Struct Column which is not in Projection list,
then only null value is stored for struct column given in filter and select query result is null.
Solution:
Pushdown Parent column of corresponding struct type if any struct column is present in Filter list.

This closes apache#2439

[CARBONDATA-2642] Added configurable Lock path property

A new property is being exposed which will allow the user to configure the lock path "carbon.lock.path"
Refactored code to create a separate implementation for S3CarbonFile.

This closes apache#2642

[CARBONDATA-2686] Implement Left join on MV datamap

This closes apache#2444

[CARBONDATA-2660][BloomDataMap] Add test for querying on longstring bloom index column

Filtering on longstring bloom index column is already supported in PR apache#2403, here we only add test for it.

This closes apache#2416

[CARBONDATA-2689] Added validations for complex columns in alter set statements

Issue: Alter set statements were not validating complex dataType columns correctly.
Fix: Added a recursive method to validate string and varchar child columns of complex dataType columns.

This closes apache#2450

[CARBONDATA-2681][32K] Fix loading problem using global/batch sort fails when table has long string columns

In SortStepRowHandler, global/batch sort use convertRawRowTo3Parts instead of convertIntermediateSortTempRowTo3Parted.

varcharDimCnt was not add up to noDictArray cause error: Problem while converting row to 3 parts.

This closes apache#2435

[CARBONDATA-2658][DataLoad] Fix bugs in spilling in-memory pages

the parameter carbon.load.sortMemory.spill.percentage configured the value range 0-100,according to configuration merge and spill in-memory pages to disk

This closes apache#2414

[CARBONDATA-2666] updated rename command so that table directory is not renamed

rename will not rename table folder but only changes metadata

This closes apache#2420

[CARBONDATA-2637][BloomDataMap] Fix bugs for deferred rebuild for bloomfilter datamap

Previously when we implement ISSUE-2633, deferred rebuild for bloom
datamap is disabled for bloomfilter datamap due to unhandled bugs.
In this commit, we fixed the bugs and bring this feature back.

Since bloomfilter datamap create index on the carbon native raw bytes, we have to convert original literal value to carbon native bytes both in loading and querying.

This closes apache#2425

[CARBONDATA-2701] Refactor code to store minimal required info in Block and Blocklet Cache

1. Refactored code to keep only minimal information in block and blocklet cache.
2. Introduced segment properties holder at JVM level to hold the segment properties.
   As it is heavy object, new segment properties object will be created only when
   schema or cardinality is changed for a table.

This closes apache#2454

[CARBONDATA-2589][CARBONDATA-2590][CARBONDATA-2602]Local dictionary query Support

Supported Non filter query for local dictionary
Supported Filter query on local dictionary
Supported Query on complex column for primitive type local dictionary columns
Local Dictionary support on Varchar columns
Supported Vector reader on local dictionary

[CARBONDATA-2589][CARBONDATA-2590][CARBONDATA-2602]Local dictionary query Support

Supported Non filter query for local dictionary
Supported Filter query on local dictionary
Supported Query on complex column for primitive type local dictionary columns
Local Dictionary support on Varchar columns
Supported Vector reader on local dictionary

This closes apache#2447

[CARBONDATA-2585][CARBONDATA-2586]Fix local dictionary support for preagg and set
localdict info in column schema

This PR fixes local dictionary support for preaggregate and set the column dict info
of each column in column schema read and write for backward compatibility.

This closes apache#2451

[CARBONDATA-2711] carbonFileList is not initalized when updatetablelist call

bug fix: carbon is not initalized within updatetablelist method when we execute 'SELECT table_name FROM information_schema.tables WHERE table_schema = 'tmp_sbu_vadmdb' from command line

This closes apache#2468

[CARBONDATA-2685][DataMap] Parallize datamap rebuild processing for segments

Currently in carbondata, while rebuilding datamap, one spark job will be
started for each segment and all the jobs are executed serailly. If we
have many historical segments, the rebuild will takes a lot of time.

Here we optimize the procedure for datamap rebuild and start one start
for each segments, all the tasks can be done in parallel in one spark
job.

This closes apache#2443

[CARBONDATA-2706][BloomDataMap] clear bloom index files after segment is deleted

clear bloom index files after corresponding segment is deleted and
cleaned

This closes apache#2461

[CARBONDATA-2715][LuceneDataMap] Fix bug in search mode with lucene datamap in windows

While comparing two pathes, the file separator is different in windows,
thus causing empty pruned blocklets. This PR will ignore the file
separator

This closes apache#2470

[CARBONDATA-2703][Tests] Clear up env after tests

1.reset session parameters after test
2.clean up output after test

This closes apache#2458

[CARBONDATA-2607][Complex Column Enhancements] Complex Primitive DataType Adaptive Encoding

In this PR the improvement was done to save the complex type more effectively so that reading becomes more efficient.

The changes are:

Primitive types inside complex types are separate pages. Previously it was a single byte array column page for a complex column. Now all sub-levels inside the complex data types are stored as separate pages with their respective datatypes.

No Dictionary Primitive DataTypes inside Complex Columns will be processed through Adaptive Encoding. Previously only snappy compression was applied.

All Primitive datatypes inside complex if it is now dictionary, only value will be saved except String, Varchar which is saved as ByteArray. Previously all sub-levels are saved as Length And Value Format inside a single Byte Array. Currently only Struct And Array type column pages are saved in ByteArray. All other primitive except String and varchar are saved in respective fixed datatype length.

Support for the Safe and Unsafe Fixed length Column Page to support growing dynamic array implementation. This is done to support Array datatype.

Co-authored-by: sounakr <sounakr@gmail.com>

This closes apache#2417
sv71294 added a commit to sv71294/carbondata that referenced this pull request Jul 18, 2018
[CARBONDATA-2587][CARBONDATA-2588] Local Dictionary Data Loading support

What changes are proposed in this PR

Added code to support Local Dictionary Data Loading for primitive type
Added code to support Local Dictionary Data Loading for complex type.
How this PR is tested
Manual testing is done in 3 Node setup.
UT will be raised in different PR

This closes apache#2402

[CARBONDATA-2647] [CARBONDATA-2648] Add support for COLUMN_META_CACHE and CACHE_LEVEL in create table and alter table properties

Things done as part of this PR

Support for configuring COLUMN_META_CACHE in create and alter table set properties DDL.
Support for configuring CACHE_LEVEL in create and alter table set properties DDL.
Describe formatted display support for COLUMN_META_CACHE and CACHE_LEVEL
  Any interfaces changed?
Create Table Syntax
CREATE TABLE [dbName].tableName (col1 String, col2 String, col3 int,…) STORED BY ‘carbondata’ TBLPROPERTIES (‘COLUMN_META_CACHE’=’col1,col2,…’, 'CACHE_LEVEL'='BLOCKLET')

Alter Table set properties Syntax
ALTER TABLE [dbName].tableName SET TBLPROPERTIES (‘COLUMN_META_CACHE’=’col1,col2,…’, 'CACHE_LEVEL'='BLOCKLET')

This closs apache#2418

[CARBONDATA-2549] Bloom remove guava cache and use CarbonCache

Currently, bloom cache is implemented using guava cache, carbon has its own lru cache interfaces and complete sysytem it controls the cache intstead of controlling feature wise. So replace guava cache with carbon lru cache.

This closes apache#2327

[CARBONDATA-2608]Document update about Json Writer with examples.

Document update about Json Writer with examples.

This closes apache#2409

[CARBONDATA-2634][BloomDataMap] Add datamap properties in show datamap outputs

add datamap properties in show datamap outputs

This closes apache#2404

[CARBONDATA-2647] [CARBONDATA-2648] Fix cache level display in describe formatted command

1. Correct CACHE_LEVEL display in describe formatted command. It was always displays BLOCK
   even though val was configured BLOCKLET.
2. Correct the method arguments to pass dbName first and then tableName.
3. Added test case for blocking column_meta_cache and cache_level on child dataMaps.

This closes apache#2426

[CARBONDATA-2669] Local Dictionary Store Size optimisation and other function issues

Problems
Local dictionary store size issue.
When all column data is empty and columns are not present in sort columns local dictionary size was more than no dictionary dictionary store size.
Page level dictionary merging Issue
While merging the page used dictionary values in a blocklet it was missing some of the dictionary values, this is because, AND operation was done on bitset
Local Dictionary null values
Null value was not added in LV because of this new dictionary values was getting generated for null values
Local dictionary generator thread specific
Solution:
Added rle for unsorted dictionary values to reduce the size.
Now OR operation is performed while merging the dictionary values
Added LV for null values
Local dictionary generator task specific

This closes apache#2427

[CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]Local dictionary support for alter table, preaggregate, varchar datatype, alter set and unset

What changes were proposed in this pull request?
In this PR,

local dictionary support is added for alter table, preaggregate, varChar datatype, alter table set and unset command
UTs are added for local dictionary load support
All the validations related to above features are taken care in this PR
How was this patch tested?

All the tests were executed in 3 node cluster.
UTs and SDV test cases are added in the same PR

This closes apache#2401

[HOTFIX] Fixed compilation issues and bloom clear issue

Fixed test

This closes apache#2428

[CARBONDATA-2635][BloomDataMap] Support different index datamaps on same column

User can create different provider based index datamaps on one column,
for example user can create bloomfilter datamap and lucene datamap on
one column, but not able to create two bloomfilter datamap on one
column.

This closes apache#2405

[CARBONDATA-2646][DataLoad]change the log level while loading data into a table with 'sort_column_bounds' property,'ERROR' flag change to 'WARN' flag for some expected tasks.

change the log level while loading data into a table with 'sort_column_bounds' property,'ERROR' flag change to 'WARN' flag for some expected tasks.

This closes apache#2407

[CARBONDATA-2545] Fix some spell error in CarbonData

This closes apache#2419

[CARBONDATA-2629] Support SDK carbon reader read data from HDFS and S3 with filter function

Now SDK carbon reader only support read data from local with filter function, it will throw exception when read data from HDFS and S3 with filter function
This PR support it:
Support SDK carbon reader read data from HDFS and S3 with filter function

This closes apache#2399

[CARBONDATA-2644][DataLoad]ADD carbon.load.sortMemory.spill.percentage parameter invalid value check

This closes apache#2397

[CARBONDATA-2653][BloomDataMap] Fix bugs in incorrect blocklet number in bloomfilter

In non-deferred reuibuild scenario, the last bloomfilter index file has already been written onBlockletEnd, no need to write again, otherwise an extra blocklet number will be
generated in the bloom index file.

This closes apache#2408

[CARBONDATA-2674][Streaming]Streaming with merge index enabled does not consider the merge index file while pruning

This closes apache#2429

[CARBONDATA-2606][Complex DataType Enhancements]Fix for ComplexDataType Projection PushDown

Problem1: Fix for ComplexDataType Projection PushDown when Table Schema contains
ColumnName in UpperCase

Solution: Change ColumnName to Lowercase

Problem2: If Struct contains Array, pushdown only parent column

Solution: Check for ArrayType or GetArrayItem in the Complex Column,
if any ArrayType is found, then pushdown parent column

This closes apache#2421

[CARBONDATA-2633][BloomDataMap] Fix bugs in bloomfilter for dictionary/sort/date/TimeStamp column

for dictionary column, carbon convert literal value to dict value, then
convert dict value to mdk value, at last it stores the mdk value as
internal value in carbonfile.

for other columns, carbon convert literal value to internal value using
field-converter.

Since bloomfilter datamap stores the internal value, during query we
should convert the literal value in filter to internal value in order to
match the value stored in bloomfilter datamap.

Changes are made:

1.FieldConverters were refactored to extract common value convert methods.
2.BloomQueryModel was optimized to support converting literal value to
internal value.
3.fix bugs for int/float/date/timestamp as bloom index column
4.fix bugs in dictionary/sort column as bloom index column
5.add tests
6.block (deferred) rebuild for bloom datamap (contains bugs that does
not fix in this commit)

This closes apache#2403

[HOTFIX][32K]maintain proper mapping for varChar Columns and noDictionary Columns
for all the dimensions while creating sort data rows instance

Problem: when creating the column mapping for varChar columns and no dictionary
 columns for existing dimensions, the mapping is incorrect.

Solution: remove unwanted variable counter and map correct index to varChar columns
 and noDictionary columns based on the number of dimensions

This closes apache#2395

[CARBONDATA-2650][Datamap] Fix bugs in negative number of skipped blocklets

Currently in carbondata, default blocklet datamap will be used to prune
blocklets. Then other indexdatamap will be used.
But the other index datamap works for segment scope, which in some
scenarios, the size of pruned result will be bigger than that of default
datamap, thus causing negative number of skipped blocklets in explain
query output.

Here we add intersection after pruning. If the pruned result size is
zero, we will finish the pruning.

This closes apache#2410

[CARBONDATA-2654][Datamap] Optimize output for explaining querying with datamap

Currently if we have multiple datamaps and the query hits all the
datamaps, carbondata explain command will only show the first datamap
and all the datamaps are not shown. In this commit, we show all the
datamaps that are hitted in this query.

This closes apache#2411

[CARBONDATA-2687][BloomDataMap][Doc] Update document for bloomfilter datamap

In previous PR, cache behaviour for bloomfilter datamap has been changed: changed from guava-cache to carbon-cache. This PR update the document for bloomfilter datamap and remove the description for cache.

This closes apache#2446

Code Generator Error is thrown when Select filter contains more than one count of distinct of ComplexColumn with group by Clause

[CARBONDATA-2684] [PR-2442] Distinct count fails on complex columns

This PR fixes Code Generator Error thrown when Select filter contains more than one count of distinct of ComplexColumn with group by Clause

This closes apache#2449

[CARBONDATA-2645] Segregate block and blocklet cache

Things done as part of this PR

Segregate block and blocklet cache. In this driver will cache the metadata based on CACHE_LEVEL.
If CACHE_LEVEL is set to BLOCK then only carbondata files metadata will be cached in driver.
If CACHE_LEVEL is set to BLOCKLET thenmetadata for number of carbondata files * number of blocklets in each carbondata file will be cached in driver.

This closes apache#2437

[CARBONDATA-2675][32K] Support config long_string_columns when create datamap

Create datamap use select statement, but long string column is defined with StringType in the result dataframe if this column is selected. This PR allows to set long_string_columns property in dmproperties.

This closes apache#2432

[CARBONDATA-2683][32K] fix data convertion problem for Varchar

Spark uses org.apache.spark.unsafe.types.UTF8String for string datatype internally.
In carbon, varchar datatype should do the same convertion as string datatype. Or it may throw exception

This closes apache#2438

[CARBONDATA-2657][BloomDataMap] Fix bugs in loading and querying on bloom column with empty values

Fix bugs in loading and querying on bloom column …
Fix bugs in loading and querying with empty values on bloom index
columns. Convert null values to corresponding values.

This closes apache#2413

[CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]Added test cases for
 local dictionary support for alter table, set, unset and preaggregate

Added test cases for local dictionary support for alter table, set, unset and pre-aggregate
All the validations related to above features are taken care in this PR

This closes apache#2422

[CARBONDATA-2606][Complex DataType Enhancements] Fixed Projection Pushdown when
Select filter contains Struct column.

Problem:
If Select filter contains Struct Column which is not in Projection list,
then only null value is stored for struct column given in filter and select query result is null.
Solution:
Pushdown Parent column of corresponding struct type if any struct column is present in Filter list.

This closes apache#2439

[CARBONDATA-2642] Added configurable Lock path property

A new property is being exposed which will allow the user to configure the lock path "carbon.lock.path"
Refactored code to create a separate implementation for S3CarbonFile.

This closes apache#2642

[CARBONDATA-2686] Implement Left join on MV datamap

This closes apache#2444

[CARBONDATA-2660][BloomDataMap] Add test for querying on longstring bloom index column

Filtering on longstring bloom index column is already supported in PR apache#2403, here we only add test for it.

This closes apache#2416

[CARBONDATA-2689] Added validations for complex columns in alter set statements

Issue: Alter set statements were not validating complex dataType columns correctly.
Fix: Added a recursive method to validate string and varchar child columns of complex dataType columns.

This closes apache#2450

[CARBONDATA-2681][32K] Fix loading problem using global/batch sort fails when table has long string columns

In SortStepRowHandler, global/batch sort use convertRawRowTo3Parts instead of convertIntermediateSortTempRowTo3Parted.

varcharDimCnt was not add up to noDictArray cause error: Problem while converting row to 3 parts.

This closes apache#2435

[CARBONDATA-2658][DataLoad] Fix bugs in spilling in-memory pages

the parameter carbon.load.sortMemory.spill.percentage configured the value range 0-100,according to configuration merge and spill in-memory pages to disk

This closes apache#2414

[CARBONDATA-2666] updated rename command so that table directory is not renamed

rename will not rename table folder but only changes metadata

This closes apache#2420

[CARBONDATA-2637][BloomDataMap] Fix bugs for deferred rebuild for bloomfilter datamap

Previously when we implement ISSUE-2633, deferred rebuild for bloom
datamap is disabled for bloomfilter datamap due to unhandled bugs.
In this commit, we fixed the bugs and bring this feature back.

Since bloomfilter datamap create index on the carbon native raw bytes, we have to convert original literal value to carbon native bytes both in loading and querying.

This closes apache#2425

[CARBONDATA-2701] Refactor code to store minimal required info in Block and Blocklet Cache

1. Refactored code to keep only minimal information in block and blocklet cache.
2. Introduced segment properties holder at JVM level to hold the segment properties.
   As it is heavy object, new segment properties object will be created only when
   schema or cardinality is changed for a table.

This closes apache#2454

[CARBONDATA-2589][CARBONDATA-2590][CARBONDATA-2602]Local dictionary query Support

Supported Non filter query for local dictionary
Supported Filter query on local dictionary
Supported Query on complex column for primitive type local dictionary columns
Local Dictionary support on Varchar columns
Supported Vector reader on local dictionary

[CARBONDATA-2589][CARBONDATA-2590][CARBONDATA-2602]Local dictionary query Support

Supported Non filter query for local dictionary
Supported Filter query on local dictionary
Supported Query on complex column for primitive type local dictionary columns
Local Dictionary support on Varchar columns
Supported Vector reader on local dictionary

This closes apache#2447

[CARBONDATA-2585][CARBONDATA-2586]Fix local dictionary support for preagg and set
localdict info in column schema

This PR fixes local dictionary support for preaggregate and set the column dict info
of each column in column schema read and write for backward compatibility.

This closes apache#2451

[CARBONDATA-2711] carbonFileList is not initalized when updatetablelist call

bug fix: carbon is not initalized within updatetablelist method when we execute 'SELECT table_name FROM information_schema.tables WHERE table_schema = 'tmp_sbu_vadmdb' from command line

This closes apache#2468

[CARBONDATA-2685][DataMap] Parallize datamap rebuild processing for segments

Currently in carbondata, while rebuilding datamap, one spark job will be
started for each segment and all the jobs are executed serailly. If we
have many historical segments, the rebuild will takes a lot of time.

Here we optimize the procedure for datamap rebuild and start one start
for each segments, all the tasks can be done in parallel in one spark
job.

This closes apache#2443

[CARBONDATA-2706][BloomDataMap] clear bloom index files after segment is deleted

clear bloom index files after corresponding segment is deleted and
cleaned

This closes apache#2461

[CARBONDATA-2715][LuceneDataMap] Fix bug in search mode with lucene datamap in windows

While comparing two pathes, the file separator is different in windows,
thus causing empty pruned blocklets. This PR will ignore the file
separator

This closes apache#2470

[CARBONDATA-2703][Tests] Clear up env after tests

1.reset session parameters after test
2.clean up output after test

This closes apache#2458

[CARBONDATA-2607][Complex Column Enhancements] Complex Primitive DataType Adaptive Encoding

In this PR the improvement was done to save the complex type more effectively so that reading becomes more efficient.

The changes are:

Primitive types inside complex types are separate pages. Previously it was a single byte array column page for a complex column. Now all sub-levels inside the complex data types are stored as separate pages with their respective datatypes.

No Dictionary Primitive DataTypes inside Complex Columns will be processed through Adaptive Encoding. Previously only snappy compression was applied.

All Primitive datatypes inside complex if it is now dictionary, only value will be saved except String, Varchar which is saved as ByteArray. Previously all sub-levels are saved as Length And Value Format inside a single Byte Array. Currently only Struct And Array type column pages are saved in ByteArray. All other primitive except String and varchar are saved in respective fixed datatype length.

Support for the Safe and Unsafe Fixed length Column Page to support growing dynamic array implementation. This is done to support Array datatype.

Co-authored-by: sounakr <sounakr@gmail.com>

This closes apache#2417
sv71294 added a commit to sv71294/carbondata that referenced this pull request Jul 18, 2018
[CARBONDATA-2587][CARBONDATA-2588] Local Dictionary Data Loading support

What changes are proposed in this PR

Added code to support Local Dictionary Data Loading for primitive type
Added code to support Local Dictionary Data Loading for complex type.
How this PR is tested
Manual testing is done in 3 Node setup.
UT will be raised in different PR

This closes apache#2402

[CARBONDATA-2647] [CARBONDATA-2648] Add support for COLUMN_META_CACHE and CACHE_LEVEL in create table and alter table properties

Things done as part of this PR

Support for configuring COLUMN_META_CACHE in create and alter table set properties DDL.
Support for configuring CACHE_LEVEL in create and alter table set properties DDL.
Describe formatted display support for COLUMN_META_CACHE and CACHE_LEVEL
  Any interfaces changed?
Create Table Syntax
CREATE TABLE [dbName].tableName (col1 String, col2 String, col3 int,…) STORED BY ‘carbondata’ TBLPROPERTIES (‘COLUMN_META_CACHE’=’col1,col2,…’, 'CACHE_LEVEL'='BLOCKLET')

Alter Table set properties Syntax
ALTER TABLE [dbName].tableName SET TBLPROPERTIES (‘COLUMN_META_CACHE’=’col1,col2,…’, 'CACHE_LEVEL'='BLOCKLET')

This closs apache#2418

[CARBONDATA-2549] Bloom remove guava cache and use CarbonCache

Currently, bloom cache is implemented using guava cache, carbon has its own lru cache interfaces and complete sysytem it controls the cache intstead of controlling feature wise. So replace guava cache with carbon lru cache.

This closes apache#2327

[CARBONDATA-2608]Document update about Json Writer with examples.

Document update about Json Writer with examples.

This closes apache#2409

[CARBONDATA-2634][BloomDataMap] Add datamap properties in show datamap outputs

add datamap properties in show datamap outputs

This closes apache#2404

[CARBONDATA-2647] [CARBONDATA-2648] Fix cache level display in describe formatted command

1. Correct CACHE_LEVEL display in describe formatted command. It was always displays BLOCK
   even though val was configured BLOCKLET.
2. Correct the method arguments to pass dbName first and then tableName.
3. Added test case for blocking column_meta_cache and cache_level on child dataMaps.

This closes apache#2426

[CARBONDATA-2669] Local Dictionary Store Size optimisation and other function issues

Problems
Local dictionary store size issue.
When all column data is empty and columns are not present in sort columns local dictionary size was more than no dictionary dictionary store size.
Page level dictionary merging Issue
While merging the page used dictionary values in a blocklet it was missing some of the dictionary values, this is because, AND operation was done on bitset
Local Dictionary null values
Null value was not added in LV because of this new dictionary values was getting generated for null values
Local dictionary generator thread specific
Solution:
Added rle for unsorted dictionary values to reduce the size.
Now OR operation is performed while merging the dictionary values
Added LV for null values
Local dictionary generator task specific

This closes apache#2427

[CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]Local dictionary support for alter table, preaggregate, varchar datatype, alter set and unset

What changes were proposed in this pull request?
In this PR,

local dictionary support is added for alter table, preaggregate, varChar datatype, alter table set and unset command
UTs are added for local dictionary load support
All the validations related to above features are taken care in this PR
How was this patch tested?

All the tests were executed in 3 node cluster.
UTs and SDV test cases are added in the same PR

This closes apache#2401

[HOTFIX] Fixed compilation issues and bloom clear issue

Fixed test

This closes apache#2428

[CARBONDATA-2635][BloomDataMap] Support different index datamaps on same column

User can create different provider based index datamaps on one column,
for example user can create bloomfilter datamap and lucene datamap on
one column, but not able to create two bloomfilter datamap on one
column.

This closes apache#2405

[CARBONDATA-2646][DataLoad]change the log level while loading data into a table with 'sort_column_bounds' property,'ERROR' flag change to 'WARN' flag for some expected tasks.

change the log level while loading data into a table with 'sort_column_bounds' property,'ERROR' flag change to 'WARN' flag for some expected tasks.

This closes apache#2407

[CARBONDATA-2545] Fix some spell error in CarbonData

This closes apache#2419

[CARBONDATA-2629] Support SDK carbon reader read data from HDFS and S3 with filter function

Now SDK carbon reader only support read data from local with filter function, it will throw exception when read data from HDFS and S3 with filter function
This PR support it:
Support SDK carbon reader read data from HDFS and S3 with filter function

This closes apache#2399

[CARBONDATA-2644][DataLoad]ADD carbon.load.sortMemory.spill.percentage parameter invalid value check

This closes apache#2397

[CARBONDATA-2653][BloomDataMap] Fix bugs in incorrect blocklet number in bloomfilter

In non-deferred reuibuild scenario, the last bloomfilter index file has already been written onBlockletEnd, no need to write again, otherwise an extra blocklet number will be
generated in the bloom index file.

This closes apache#2408

[CARBONDATA-2674][Streaming]Streaming with merge index enabled does not consider the merge index file while pruning

This closes apache#2429

[CARBONDATA-2606][Complex DataType Enhancements]Fix for ComplexDataType Projection PushDown

Problem1: Fix for ComplexDataType Projection PushDown when Table Schema contains
ColumnName in UpperCase

Solution: Change ColumnName to Lowercase

Problem2: If Struct contains Array, pushdown only parent column

Solution: Check for ArrayType or GetArrayItem in the Complex Column,
if any ArrayType is found, then pushdown parent column

This closes apache#2421

[CARBONDATA-2633][BloomDataMap] Fix bugs in bloomfilter for dictionary/sort/date/TimeStamp column

for dictionary column, carbon convert literal value to dict value, then
convert dict value to mdk value, at last it stores the mdk value as
internal value in carbonfile.

for other columns, carbon convert literal value to internal value using
field-converter.

Since bloomfilter datamap stores the internal value, during query we
should convert the literal value in filter to internal value in order to
match the value stored in bloomfilter datamap.

Changes are made:

1.FieldConverters were refactored to extract common value convert methods.
2.BloomQueryModel was optimized to support converting literal value to
internal value.
3.fix bugs for int/float/date/timestamp as bloom index column
4.fix bugs in dictionary/sort column as bloom index column
5.add tests
6.block (deferred) rebuild for bloom datamap (contains bugs that does
not fix in this commit)

This closes apache#2403

[HOTFIX][32K]maintain proper mapping for varChar Columns and noDictionary Columns
for all the dimensions while creating sort data rows instance

Problem: when creating the column mapping for varChar columns and no dictionary
 columns for existing dimensions, the mapping is incorrect.

Solution: remove unwanted variable counter and map correct index to varChar columns
 and noDictionary columns based on the number of dimensions

This closes apache#2395

[CARBONDATA-2650][Datamap] Fix bugs in negative number of skipped blocklets

Currently in carbondata, default blocklet datamap will be used to prune
blocklets. Then other indexdatamap will be used.
But the other index datamap works for segment scope, which in some
scenarios, the size of pruned result will be bigger than that of default
datamap, thus causing negative number of skipped blocklets in explain
query output.

Here we add intersection after pruning. If the pruned result size is
zero, we will finish the pruning.

This closes apache#2410

[CARBONDATA-2654][Datamap] Optimize output for explaining querying with datamap

Currently if we have multiple datamaps and the query hits all the
datamaps, carbondata explain command will only show the first datamap
and all the datamaps are not shown. In this commit, we show all the
datamaps that are hitted in this query.

This closes apache#2411

[CARBONDATA-2687][BloomDataMap][Doc] Update document for bloomfilter datamap

In previous PR, cache behaviour for bloomfilter datamap has been changed: changed from guava-cache to carbon-cache. This PR update the document for bloomfilter datamap and remove the description for cache.

This closes apache#2446

Code Generator Error is thrown when Select filter contains more than one count of distinct of ComplexColumn with group by Clause

[CARBONDATA-2684] [PR-2442] Distinct count fails on complex columns

This PR fixes Code Generator Error thrown when Select filter contains more than one count of distinct of ComplexColumn with group by Clause

This closes apache#2449

[CARBONDATA-2645] Segregate block and blocklet cache

Things done as part of this PR

Segregate block and blocklet cache. In this driver will cache the metadata based on CACHE_LEVEL.
If CACHE_LEVEL is set to BLOCK then only carbondata files metadata will be cached in driver.
If CACHE_LEVEL is set to BLOCKLET thenmetadata for number of carbondata files * number of blocklets in each carbondata file will be cached in driver.

This closes apache#2437

[CARBONDATA-2675][32K] Support config long_string_columns when create datamap

Create datamap use select statement, but long string column is defined with StringType in the result dataframe if this column is selected. This PR allows to set long_string_columns property in dmproperties.

This closes apache#2432

[CARBONDATA-2683][32K] fix data convertion problem for Varchar

Spark uses org.apache.spark.unsafe.types.UTF8String for string datatype internally.
In carbon, varchar datatype should do the same convertion as string datatype. Or it may throw exception

This closes apache#2438

[CARBONDATA-2657][BloomDataMap] Fix bugs in loading and querying on bloom column with empty values

Fix bugs in loading and querying on bloom column …
Fix bugs in loading and querying with empty values on bloom index
columns. Convert null values to corresponding values.

This closes apache#2413

[CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]Added test cases for
 local dictionary support for alter table, set, unset and preaggregate

Added test cases for local dictionary support for alter table, set, unset and pre-aggregate
All the validations related to above features are taken care in this PR

This closes apache#2422

[CARBONDATA-2606][Complex DataType Enhancements] Fixed Projection Pushdown when
Select filter contains Struct column.

Problem:
If Select filter contains Struct Column which is not in Projection list,
then only null value is stored for struct column given in filter and select query result is null.
Solution:
Pushdown Parent column of corresponding struct type if any struct column is present in Filter list.

This closes apache#2439

[CARBONDATA-2642] Added configurable Lock path property

A new property is being exposed which will allow the user to configure the lock path "carbon.lock.path"
Refactored code to create a separate implementation for S3CarbonFile.

This closes apache#2642

[CARBONDATA-2686] Implement Left join on MV datamap

This closes apache#2444

[CARBONDATA-2660][BloomDataMap] Add test for querying on longstring bloom index column

Filtering on longstring bloom index column is already supported in PR apache#2403, here we only add test for it.

This closes apache#2416

[CARBONDATA-2689] Added validations for complex columns in alter set statements

Issue: Alter set statements were not validating complex dataType columns correctly.
Fix: Added a recursive method to validate string and varchar child columns of complex dataType columns.

This closes apache#2450

[CARBONDATA-2681][32K] Fix loading problem using global/batch sort fails when table has long string columns

In SortStepRowHandler, global/batch sort use convertRawRowTo3Parts instead of convertIntermediateSortTempRowTo3Parted.

varcharDimCnt was not add up to noDictArray cause error: Problem while converting row to 3 parts.

This closes apache#2435

[CARBONDATA-2658][DataLoad] Fix bugs in spilling in-memory pages

the parameter carbon.load.sortMemory.spill.percentage configured the value range 0-100,according to configuration merge and spill in-memory pages to disk

This closes apache#2414

[CARBONDATA-2666] updated rename command so that table directory is not renamed

rename will not rename table folder but only changes metadata

This closes apache#2420

[CARBONDATA-2637][BloomDataMap] Fix bugs for deferred rebuild for bloomfilter datamap

Previously when we implement ISSUE-2633, deferred rebuild for bloom
datamap is disabled for bloomfilter datamap due to unhandled bugs.
In this commit, we fixed the bugs and bring this feature back.

Since bloomfilter datamap create index on the carbon native raw bytes, we have to convert original literal value to carbon native bytes both in loading and querying.

This closes apache#2425

[CARBONDATA-2701] Refactor code to store minimal required info in Block and Blocklet Cache

1. Refactored code to keep only minimal information in block and blocklet cache.
2. Introduced segment properties holder at JVM level to hold the segment properties.
   As it is heavy object, new segment properties object will be created only when
   schema or cardinality is changed for a table.

This closes apache#2454

[CARBONDATA-2589][CARBONDATA-2590][CARBONDATA-2602]Local dictionary query Support

Supported Non filter query for local dictionary
Supported Filter query on local dictionary
Supported Query on complex column for primitive type local dictionary columns
Local Dictionary support on Varchar columns
Supported Vector reader on local dictionary

[CARBONDATA-2589][CARBONDATA-2590][CARBONDATA-2602]Local dictionary query Support

Supported Non filter query for local dictionary
Supported Filter query on local dictionary
Supported Query on complex column for primitive type local dictionary columns
Local Dictionary support on Varchar columns
Supported Vector reader on local dictionary

This closes apache#2447

[CARBONDATA-2585][CARBONDATA-2586]Fix local dictionary support for preagg and set
localdict info in column schema

This PR fixes local dictionary support for preaggregate and set the column dict info
of each column in column schema read and write for backward compatibility.

This closes apache#2451

[CARBONDATA-2711] carbonFileList is not initalized when updatetablelist call

bug fix: carbon is not initalized within updatetablelist method when we execute 'SELECT table_name FROM information_schema.tables WHERE table_schema = 'tmp_sbu_vadmdb' from command line

This closes apache#2468

[CARBONDATA-2685][DataMap] Parallize datamap rebuild processing for segments

Currently in carbondata, while rebuilding datamap, one spark job will be
started for each segment and all the jobs are executed serailly. If we
have many historical segments, the rebuild will takes a lot of time.

Here we optimize the procedure for datamap rebuild and start one start
for each segments, all the tasks can be done in parallel in one spark
job.

This closes apache#2443

[CARBONDATA-2706][BloomDataMap] clear bloom index files after segment is deleted

clear bloom index files after corresponding segment is deleted and
cleaned

This closes apache#2461

[CARBONDATA-2715][LuceneDataMap] Fix bug in search mode with lucene datamap in windows

While comparing two pathes, the file separator is different in windows,
thus causing empty pruned blocklets. This PR will ignore the file
separator

This closes apache#2470

[CARBONDATA-2703][Tests] Clear up env after tests

1.reset session parameters after test
2.clean up output after test

This closes apache#2458

[CARBONDATA-2607][Complex Column Enhancements] Complex Primitive DataType Adaptive Encoding

In this PR the improvement was done to save the complex type more effectively so that reading becomes more efficient.

The changes are:

Primitive types inside complex types are separate pages. Previously it was a single byte array column page for a complex column. Now all sub-levels inside the complex data types are stored as separate pages with their respective datatypes.

No Dictionary Primitive DataTypes inside Complex Columns will be processed through Adaptive Encoding. Previously only snappy compression was applied.

All Primitive datatypes inside complex if it is now dictionary, only value will be saved except String, Varchar which is saved as ByteArray. Previously all sub-levels are saved as Length And Value Format inside a single Byte Array. Currently only Struct And Array type column pages are saved in ByteArray. All other primitive except String and varchar are saved in respective fixed datatype length.

Support for the Safe and Unsafe Fixed length Column Page to support growing dynamic array implementation. This is done to support Array datatype.

Co-authored-by: sounakr <sounakr@gmail.com>

This closes apache#2417
asfgit pushed a commit that referenced this pull request Jul 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants