New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CARBONDATA-3083] Fixed data mismatch issue after update #2902
Conversation
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1298/ |
@@ -1734,7 +1734,7 @@ private CarbonCommonConstants() { | |||
public static final String CARBON_PUSH_ROW_FILTERS_FOR_VECTOR = | |||
"carbon.push.rowfilters.for.vector"; | |||
|
|||
public static final String CARBON_PUSH_ROW_FILTERS_FOR_VECTOR_DEFAULT = "false"; | |||
public static final String CARBON_PUSH_ROW_FILTERS_FOR_VECTOR_DEFAULT = "true"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any specific reason for changing the default value?
bbd3dc8
to
e4584c7
Compare
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1512/ |
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9559/ |
override def afterAll { | ||
sql("use default") | ||
sql("drop database if exists iud cascade") | ||
CarbonProperties.getInstance() | ||
.addProperty(CarbonCommonConstants.isHorizontalCompactionEnabled , "true") | ||
CarbonProperties.getInstance() | ||
.addProperty(CarbonCommonConstants.ENABLE_VECTOR_READER , "true") | ||
CarbonProperties.getInstance().addProperty(CarbonCommonConstants | ||
.CARBON_PUSH_ROW_FILTERS_FOR_VECTOR, "false") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of hard coding "false"
use default value from constants
, Row(100), Row(-100), Row(null))) | ||
sql("""drop table if exists iud.dest33_part""") | ||
CarbonProperties.getInstance().addProperty(CarbonCommonConstants | ||
.CARBON_PUSH_ROW_FILTERS_FOR_VECTOR, "true") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After test case completion we should set the default value for CARBON_PUSH_ROW_FILTERS_FOR_VECTOR
?...default property is false so I think at the start of test case no need to modify the property value
vector.putShort(i, shortData[i]); | ||
} | ||
} else { | ||
vector.putShorts(0, pageSize, shortData, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using putShorts/putFloats
is common and unavoidable. In future also any new encoding class can make use of these method and then again the same problem can occur. Is it feasible to modify the vector classes implementation methods itself just like an example below
public void putShorts(int rowId, int count, short[] src, int srcIndex) { for (int i = srcIndex; i < count; i++) { putShort(rowId++, src[i]); } }
This way it will be better
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1303/ |
Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1516/ |
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9564/ |
Problem: When filling a columnPage directly to vector, we are skipping the deleted rows based on BitSet value. Now consider a situation where the 6th row is Null i.e BitSet(6) and the 3rd row is marked as deleted i.e BitSet(3).
During filling of the vector we will skip the 3rd row and the final vector will have 1 less row(total 5 rows) than the columnPage.
While reading we will only read 5 rows and when trying to set the 6th row as null we will end up making the wrong row as null.
Solution: Check if the vector has inverted index or deleted rows. If it has then dont blindly copy the array using System.arrayCopy instead iterated over the values check for null and insert the appropriate values.
Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:
Any interfaces changed?
Any backward compatibility impacted?
Document update required?
Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.