[CARBONDATA-3023] Alter add column issue with SORT_COLUMNS #2826

dhatchayani · 2018-10-17T11:56:45Z

Problem-1:
In case of ALTER ADD columns, the newly added column will be added to the schema at the last. But if that column is a SORT_COLUMN then while loading we expect all the SORT_COLUMNS to be at first.

Solution:
While getting the schema from the carbonTable, reshuffle/rearrange the schema with all the SORT_COLUMNS at the first.

Problem-2:
After ALTER DROP followed by ADD a new column to a partition table, LOAD is failing. In the load we are considering the dropped columns also.

Solution:
While loading the partition table take only the existing visible columns from the table. After DROP column, it becomes invisible. We should not considered the dropped columns while loading.

Problem-3:
(1) While checking for the null bitsets for the adaptive encoded primitive types, the null bitsets are based on the actual rowId. Now we are checking on the reverseInvertedIndex.
(2) In case of range filters @nu#LL$!/EMPTY_BYTE_ARRAY values will be removed in case of noDictionary Column for binary search. But now in the adaptive encoded page we are not using @nu#LL$!/EMPTY_BYTE_ARRAY for binary search.

Solution:
(1) The acutal rowId for checking nullBitSets should be taken from the invertedIndex.
(2) Dont remove @nu#LL$! values in case of adaptive encoded page.

Any interfaces changed?
Any backward compatibility impacted?
Document update required?
Testing done
UT Added
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

CarbonDataQA · 2018-10-17T12:10:02Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/831/

CarbonDataQA · 2018-10-17T12:18:49Z

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9096/

CarbonDataQA · 2018-10-17T12:26:00Z

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1028/

CarbonDataQA · 2018-10-17T14:24:50Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/838/

CarbonDataQA · 2018-10-17T15:47:49Z

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9103/

CarbonDataQA · 2018-10-17T15:49:14Z

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1035/

kumarvishal09 · 2018-10-22T08:51:19Z

core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java

+   * @param carbonDimensions
+   * @return
+   */
+  private List<CarbonDimension> reArrangeColumnSchema(List<CarbonDimension> carbonDimensions) {


Its better to rewrite the schema instead of updating everytime in memory

CarbonDataQA · 2018-10-22T13:12:55Z

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9182/

CarbonDataQA · 2018-10-22T13:24:14Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/926/

CarbonDataQA · 2018-10-22T13:36:15Z

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1124/

CarbonDataQA · 2018-10-22T14:57:17Z

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9188/

CarbonDataQA · 2018-10-22T17:29:23Z

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9195/

dhatchayani · 2018-10-23T03:07:21Z

Retest this please

CarbonDataQA · 2018-10-23T03:45:01Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/943/

CarbonDataQA · 2018-10-23T04:50:32Z

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9207/

CarbonDataQA · 2018-10-23T05:03:22Z

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1149/

CarbonDataQA · 2018-10-23T06:38:23Z

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1159/

CarbonDataQA · 2018-10-23T06:38:58Z

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9216/

CarbonDataQA · 2018-10-23T07:10:32Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/951/

CarbonDataQA · 2018-10-23T09:22:59Z

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9222/

CarbonDataQA · 2018-10-23T09:25:00Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/958/

CarbonDataQA · 2018-10-23T09:30:01Z

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1166/

dhatchayani · 2018-10-23T09:31:03Z

retest this please

CarbonDataQA · 2018-10-23T11:08:33Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/963/

CarbonDataQA · 2018-10-23T12:17:28Z

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1176/

CarbonDataQA · 2018-10-23T12:20:44Z

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9231/

CarbonDataQA · 2018-10-23T13:06:34Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/970/

jackylk · 2018-10-25T07:13:57Z

.../main/java/org/apache/carbondata/core/scan/filter/executer/RangeValueFilterExecuterImpl.java

@@ -511,7 +511,7 @@ private BitSet setFilterdIndexToBitSetWithColumnIndex(

    // Binary Search cannot be done on '@NU#LL$!", so need to check and compare for null on
    // matching row.
-    if (dimensionColumnPage.isNoDicitionaryColumn()) {
+    if (dimensionColumnPage.isNoDicitionaryColumn() && !dimensionColumnPage.isAdaptiveEncoded()) {


Not related to this PR. But I think we better have a function to return the encoding type of the columnPage instead of having isAdaptiveEncoded, since we will add more encoding in the future.
@ravipesala @ajantha-bhat please check

@jackylk isAdaptiveEncoded() is already set/decided by the function org.apache.carbondata.core.datastore.chunk.reader.dimension.v3.CompressedDimensionChunkFileBasedReaderV3#isEncodedWithAdaptiveMeta. In future also we can change the function accordingly

@jackylk : yes, we can have a BitSet in columnPage for enum of Encoding, and from PageMetaData encodings we can set the bitset. And to check whether particular encoding is there or not, we just have to check whether that bit is set or not.

Currently we don't need this, but like you said, it will help for future changes. I can do this change if you and @ravipesala agrees for this change.

CarbonDataQA · 2018-10-25T11:59:36Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1019/

CarbonDataQA · 2018-10-25T12:54:10Z

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1232/

dhatchayani · 2018-10-25T12:55:01Z

retest this please

CarbonDataQA · 2018-10-25T13:47:02Z

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9284/

CarbonDataQA · 2018-10-25T13:57:19Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1023/

CarbonDataQA · 2018-10-25T14:04:23Z

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1235/

CarbonDataQA · 2018-10-25T14:47:03Z

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9288/

ravipesala · 2018-10-27T09:50:35Z

LGTM

Problem-1: In case of ALTER ADD columns, the newly added column will be added to the schema at the last. But if that column is a SORT_COLUMN then while loading we expect all the SORT_COLUMNS to be at first. Solution: While getting the schema from the carbonTable, reshuffle/rearrange the schema with all the SORT_COLUMNS at the first. Problem-2: After ALTER DROP followed by ADD a new column to a partition table, LOAD is failing. In the load we are considering the dropped columns also. Solution: While loading the partition table take only the existing visible columns from the table. After DROP column, it becomes invisible. We should not considered the dropped columns while loading. Problem-3: (1) While checking for the null bitsets for the adaptive encoded primitive types, the null bitsets are based on the actual rowId. Now we are checking on the reverseInvertedIndex. (2) In case of range filters @nu#LL will be removed in case of noDictionary Column for binary search. But now in the adaptive encoded page we are not using special null character for binary search. Solution: (1) The acutal rowId for checking nullBitSets should be taken from the invertedIndex. (2) Dont remove @nu#LL0 values in case of adaptive encoded page. This closes #2826

dhatchayani changed the title ~~[CARBONDATA-3023] Alter add column issue with reading a row~~ [WIP][CARBONDATA-3023] Alter add column issue with reading a row Oct 17, 2018

dhatchayani changed the title ~~[WIP][CARBONDATA-3023] Alter add column issue with reading a row~~ [CARBONDATA-3023] Alter add column issue with SORT_COLUMNS Oct 17, 2018

dhatchayani force-pushed the CARBONDATA-3023 branch from 8269397 to ff43c1f Compare October 17, 2018 14:11

kumarvishal09 reviewed Oct 22, 2018

View reviewed changes

dhatchayani force-pushed the CARBONDATA-3023 branch from ff43c1f to 7843072 Compare October 22, 2018 09:22

dhatchayani force-pushed the CARBONDATA-3023 branch from 7843072 to 7ca0321 Compare October 23, 2018 05:55

dhatchayani force-pushed the CARBONDATA-3023 branch from 7ca0321 to f919430 Compare October 23, 2018 06:47

jackylk reviewed Oct 25, 2018

View reviewed changes

[CARBONDATA-3023] Alter add column issue with SORT_COLUMNS

14517a4

dhatchayani force-pushed the CARBONDATA-3023 branch from f919430 to 14517a4 Compare October 25, 2018 11:22

asfgit closed this in 8e57036 Oct 27, 2018

[CARBONDATA-3023] Alter add column issue with SORT_COLUMNS #2826

[CARBONDATA-3023] Alter add column issue with SORT_COLUMNS #2826

Conversation

dhatchayani commented Oct 17, 2018 • edited

CarbonDataQA commented Oct 17, 2018

CarbonDataQA commented Oct 17, 2018

CarbonDataQA commented Oct 17, 2018

CarbonDataQA commented Oct 17, 2018

CarbonDataQA commented Oct 17, 2018

CarbonDataQA commented Oct 17, 2018

kumarvishal09 Oct 22, 2018

Choose a reason for hiding this comment

CarbonDataQA commented Oct 22, 2018

CarbonDataQA commented Oct 22, 2018

CarbonDataQA commented Oct 22, 2018

CarbonDataQA commented Oct 22, 2018

CarbonDataQA commented Oct 22, 2018

dhatchayani commented Oct 23, 2018

CarbonDataQA commented Oct 23, 2018

CarbonDataQA commented Oct 23, 2018

CarbonDataQA commented Oct 23, 2018

CarbonDataQA commented Oct 23, 2018

CarbonDataQA commented Oct 23, 2018

CarbonDataQA commented Oct 23, 2018

CarbonDataQA commented Oct 23, 2018

CarbonDataQA commented Oct 23, 2018

CarbonDataQA commented Oct 23, 2018

dhatchayani commented Oct 23, 2018

CarbonDataQA commented Oct 23, 2018

CarbonDataQA commented Oct 23, 2018

CarbonDataQA commented Oct 23, 2018

CarbonDataQA commented Oct 23, 2018

jackylk Oct 25, 2018

Choose a reason for hiding this comment

dhatchayani Oct 26, 2018

Choose a reason for hiding this comment

ajantha-bhat Oct 26, 2018

Choose a reason for hiding this comment

CarbonDataQA commented Oct 25, 2018

CarbonDataQA commented Oct 25, 2018

dhatchayani commented Oct 25, 2018

CarbonDataQA commented Oct 25, 2018

CarbonDataQA commented Oct 25, 2018

CarbonDataQA commented Oct 25, 2018

CarbonDataQA commented Oct 25, 2018

ravipesala commented Oct 27, 2018

dhatchayani commented Oct 17, 2018 •

edited