New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CARBONDATA-3023] Alter add column issue with SORT_COLUMNS #2826
Conversation
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/831/ |
Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9096/ |
Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1028/ |
8269397
to
ff43c1f
Compare
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/838/ |
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9103/ |
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1035/ |
* @param carbonDimensions | ||
* @return | ||
*/ | ||
private List<CarbonDimension> reArrangeColumnSchema(List<CarbonDimension> carbonDimensions) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its better to rewrite the schema instead of updating everytime in memory
ff43c1f
to
7843072
Compare
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9182/ |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/926/ |
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1124/ |
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9188/ |
Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9195/ |
Retest this please |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/943/ |
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9207/ |
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1149/ |
7843072
to
7ca0321
Compare
Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1159/ |
Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9216/ |
7ca0321
to
f919430
Compare
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/951/ |
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9222/ |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/958/ |
Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1166/ |
retest this please |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/963/ |
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1176/ |
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9231/ |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/970/ |
@@ -511,7 +511,7 @@ private BitSet setFilterdIndexToBitSetWithColumnIndex( | |||
|
|||
// Binary Search cannot be done on '@NU#LL$!", so need to check and compare for null on | |||
// matching row. | |||
if (dimensionColumnPage.isNoDicitionaryColumn()) { | |||
if (dimensionColumnPage.isNoDicitionaryColumn() && !dimensionColumnPage.isAdaptiveEncoded()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR. But I think we better have a function to return the encoding type of the columnPage instead of having isAdaptiveEncoded
, since we will add more encoding in the future.
@ravipesala @ajantha-bhat please check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jackylk isAdaptiveEncoded() is already set/decided by the function org.apache.carbondata.core.datastore.chunk.reader.dimension.v3.CompressedDimensionChunkFileBasedReaderV3#isEncodedWithAdaptiveMeta. In future also we can change the function accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jackylk : yes, we can have a BitSet in columnPage for enum of Encoding, and from PageMetaData encodings we can set the bitset. And to check whether particular encoding is there or not, we just have to check whether that bit is set or not.
Currently we don't need this, but like you said, it will help for future changes. I can do this change if you and @ravipesala agrees for this change.
f919430
to
14517a4
Compare
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1019/ |
Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1232/ |
retest this please |
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9284/ |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1023/ |
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1235/ |
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9288/ |
LGTM |
Problem-1: In case of ALTER ADD columns, the newly added column will be added to the schema at the last. But if that column is a SORT_COLUMN then while loading we expect all the SORT_COLUMNS to be at first. Solution: While getting the schema from the carbonTable, reshuffle/rearrange the schema with all the SORT_COLUMNS at the first. Problem-2: After ALTER DROP followed by ADD a new column to a partition table, LOAD is failing. In the load we are considering the dropped columns also. Solution: While loading the partition table take only the existing visible columns from the table. After DROP column, it becomes invisible. We should not considered the dropped columns while loading. Problem-3: (1) While checking for the null bitsets for the adaptive encoded primitive types, the null bitsets are based on the actual rowId. Now we are checking on the reverseInvertedIndex. (2) In case of range filters @nu#LL will be removed in case of noDictionary Column for binary search. But now in the adaptive encoded page we are not using special null character for binary search. Solution: (1) The acutal rowId for checking nullBitSets should be taken from the invertedIndex. (2) Dont remove @nu#LL0 values in case of adaptive encoded page. This closes #2826
Problem-1:
In case of ALTER ADD columns, the newly added column will be added to the schema at the last. But if that column is a SORT_COLUMN then while loading we expect all the SORT_COLUMNS to be at first.
Solution:
While getting the schema from the carbonTable, reshuffle/rearrange the schema with all the SORT_COLUMNS at the first.
Problem-2:
After ALTER DROP followed by ADD a new column to a partition table, LOAD is failing. In the load we are considering the dropped columns also.
Solution:
While loading the partition table take only the existing visible columns from the table. After DROP column, it becomes invisible. We should not considered the dropped columns while loading.
Problem-3:
(1) While checking for the null bitsets for the adaptive encoded primitive types, the null bitsets are based on the actual rowId. Now we are checking on the reverseInvertedIndex.
(2) In case of range filters @nu#LL$!/EMPTY_BYTE_ARRAY values will be removed in case of noDictionary Column for binary search. But now in the adaptive encoded page we are not using @nu#LL$!/EMPTY_BYTE_ARRAY for binary search.
Solution:
(1) The acutal rowId for checking nullBitSets should be taken from the invertedIndex.
(2) Dont remove @nu#LL$! values in case of adaptive encoded page.
Any interfaces changed?
Any backward compatibility impacted?
Document update required?
Testing done
UT Added
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.