[CARBONDATA-3083] Fixed data mismatch issue after update #2902

kunal642 · 2018-11-06T05:24:38Z

Problem: When filling a columnPage directly to vector, we are skipping the deleted rows based on BitSet value. Now consider a situation where the 6th row is Null i.e BitSet(6) and the 3rd row is marked as deleted i.e BitSet(3).
During filling of the vector we will skip the 3rd row and the final vector will have 1 less row(total 5 rows) than the columnPage.
While reading we will only read 5 rows and when trying to set the 6th row as null we will end up making the wrong row as null.

Solution: Check if the vector has inverted index or deleted rows. If it has then dont blindly copy the array using System.arrayCopy instead iterated over the values check for null and insert the appropriate values.

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

Any interfaces changed?
Any backward compatibility impacted?
Document update required?
Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

CarbonDataQA · 2018-11-06T05:52:56Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1298/

manishgupta88 · 2018-11-06T06:29:25Z

core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java

@@ -1734,7 +1734,7 @@ private CarbonCommonConstants() {
  public static final String CARBON_PUSH_ROW_FILTERS_FOR_VECTOR =
      "carbon.push.rowfilters.for.vector";

-  public static final String CARBON_PUSH_ROW_FILTERS_FOR_VECTOR_DEFAULT = "false";
+  public static final String CARBON_PUSH_ROW_FILTERS_FOR_VECTOR_DEFAULT = "true";


Any specific reason for changing the default value?

CarbonDataQA · 2018-11-06T07:02:00Z

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1512/

CarbonDataQA · 2018-11-06T07:05:27Z

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9559/

manishgupta88 · 2018-11-06T06:46:44Z

...est/src/test/scala/org/apache/carbondata/spark/testsuite/iud/UpdateCarbonTableTestCase.scala

  override def afterAll {
    sql("use default")
    sql("drop database  if exists iud cascade")
    CarbonProperties.getInstance()
      .addProperty(CarbonCommonConstants.isHorizontalCompactionEnabled , "true")
    CarbonProperties.getInstance()
      .addProperty(CarbonCommonConstants.ENABLE_VECTOR_READER , "true")
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants
+      .CARBON_PUSH_ROW_FILTERS_FOR_VECTOR, "false")


instead of hard coding "false" use default value from constants

manishgupta88 · 2018-11-06T07:00:44Z

...est/src/test/scala/org/apache/carbondata/spark/testsuite/iud/UpdateCarbonTableTestCase.scala

+      , Row(100), Row(-100), Row(null)))
+    sql("""drop table if exists iud.dest33_part""")
+    CarbonProperties.getInstance().addProperty(CarbonCommonConstants
+      .CARBON_PUSH_ROW_FILTERS_FOR_VECTOR, "true")


After test case completion we should set the default value for CARBON_PUSH_ROW_FILTERS_FOR_VECTOR?...default property is false so I think at the start of test case no need to modify the property value

manishgupta88 · 2018-11-06T07:12:35Z

...in/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java

+              vector.putShort(i, shortData[i]);
+            }
+          } else {
+            vector.putShorts(0, pageSize, shortData, 0);


I think using putShorts/putFloats is common and unavoidable. In future also any new encoding class can make use of these method and then again the same problem can occur. Is it feasible to modify the vector classes implementation methods itself just like an example below
public void putShorts(int rowId, int count, short[] src, int srcIndex) { for (int i = srcIndex; i < count; i++) { putShort(rowId++, src[i]); } }
This way it will be better

CarbonDataQA · 2018-11-06T07:39:37Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1303/

CarbonDataQA · 2018-11-06T07:53:31Z

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1516/

ravipesala · 2018-11-06T08:07:25Z

@kunal642 Please check PR #2863 . This issue should not happen there. Please verify once

CarbonDataQA · 2018-11-06T08:09:21Z

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9564/

kunal642 changed the title ~~[WIP] Fixed data mismatch issue after update~~ [CARBONDATA-3083] Fixed data mismatch issue after update Nov 6, 2018

manishgupta88 suggested changes Nov 6, 2018

View reviewed changes

fixed data mismatch issue after update

e4584c7

kunal642 force-pushed the update_data_mismatch_fix branch from bbd3dc8 to e4584c7 Compare November 6, 2018 06:36

manishgupta88 suggested changes Nov 6, 2018

View reviewed changes

kunal642 closed this Nov 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CARBONDATA-3083] Fixed data mismatch issue after update #2902

[CARBONDATA-3083] Fixed data mismatch issue after update #2902

kunal642 commented Nov 6, 2018 •

edited

CarbonDataQA commented Nov 6, 2018

manishgupta88 Nov 6, 2018

CarbonDataQA commented Nov 6, 2018

CarbonDataQA commented Nov 6, 2018

manishgupta88 Nov 6, 2018

manishgupta88 Nov 6, 2018

manishgupta88 Nov 6, 2018

CarbonDataQA commented Nov 6, 2018

CarbonDataQA commented Nov 6, 2018

ravipesala commented Nov 6, 2018

CarbonDataQA commented Nov 6, 2018

[CARBONDATA-3083] Fixed data mismatch issue after update #2902

[CARBONDATA-3083] Fixed data mismatch issue after update #2902

Conversation

kunal642 commented Nov 6, 2018 • edited

CarbonDataQA commented Nov 6, 2018

manishgupta88 Nov 6, 2018

Choose a reason for hiding this comment

CarbonDataQA commented Nov 6, 2018

CarbonDataQA commented Nov 6, 2018

manishgupta88 Nov 6, 2018

Choose a reason for hiding this comment

manishgupta88 Nov 6, 2018

Choose a reason for hiding this comment

manishgupta88 Nov 6, 2018

Choose a reason for hiding this comment

CarbonDataQA commented Nov 6, 2018

CarbonDataQA commented Nov 6, 2018

ravipesala commented Nov 6, 2018

CarbonDataQA commented Nov 6, 2018

kunal642 commented Nov 6, 2018 •

edited