[CARBONDATA-3134] fixed null values when cachelevel is set as blocklet #2956

kunal642 · 2018-11-27T08:52:50Z

Problem:
For each blocklet an object of SegmentPropertiesAndSchemaHolder is created to store the schema used for query. This object is created only if no other blocklet has the same schema. To check the schema we are comparing List, as the equals method in ColumnSchema does not check for columnUniqueId therefore this check is failing and the new restructured blocklet is using the schema of the old blocklet. Due to this the newly added column is being ignored as the old blocklet schema specifies that the column is delete(alter drop).

Solution:
Instead of checking the equality through equals and hashcode, write a new implementation for both and check based on columnUniqueId.

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

Any interfaces changed?
Any backward compatibility impacted?
Document update required?
Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

CarbonDataQA · 2018-11-27T09:04:25Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1547/

CarbonDataQA · 2018-11-27T10:03:52Z

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1759/

CarbonDataQA · 2018-11-27T10:04:07Z

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9807/

CarbonDataQA · 2018-11-27T11:39:06Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1551/

CarbonDataQA · 2018-11-27T12:41:24Z

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9811/

manishgupta88 · 2018-11-27T12:39:14Z

...c/main/java/org/apache/carbondata/core/datastore/block/SegmentPropertiesAndSchemaHolder.java

          .equals(columnCardinality, other.columnCardinality);
    }

+    private boolean checkColumnSchemaEquality(List<ColumnSchema> obj1, List<ColumnSchema> obj2) {
+      List<ColumnSchema> clonedObj1 = new ArrayList<>(obj1);


You can add the first check for length in the first line of method...if length of 2 lists is not same then we can return false from here itself

I think checkColumnSchemaEquality method need consider "obj2 == null"

manishgupta88 · 2018-11-27T12:43:21Z

...c/main/java/org/apache/carbondata/core/datastore/block/SegmentPropertiesAndSchemaHolder.java

-      return tableIdentifier.hashCode() + columnsInTable.hashCode() + Arrays
+      int hashCode = 0;
+      for (ColumnSchema columnSchema: columnsInTable) {
+        hashCode += columnSchema.hashCodeWithColumnId();


rename variable name to allColumnsHashCode

CarbonDataQA · 2018-11-27T13:00:25Z

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1763/

qiuchenjian · 2018-11-27T14:27:43Z

...c/main/java/org/apache/carbondata/core/datastore/block/SegmentPropertiesAndSchemaHolder.java

+          return o1.getColumnUniqueId().compareTo(o2.getColumnUniqueId());
+        }
+      });
+      Collections.sort(clonedObj2, new Comparator<ColumnSchema>() {


You can optimize the duplicate code of two comparators, because they are same

CarbonDataQA · 2018-11-27T15:14:35Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1557/

CarbonDataQA · 2018-11-27T16:13:36Z

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1769/

CarbonDataQA · 2018-11-27T16:16:48Z

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9816/

CarbonDataQA · 2018-11-27T17:39:22Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1561/

CarbonDataQA · 2018-11-27T17:48:16Z

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9819/

CarbonDataQA · 2018-11-27T18:01:09Z

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1772/

CarbonDataQA · 2018-11-27T18:44:22Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1562/

CarbonDataQA · 2018-11-27T19:42:32Z

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9820/

CarbonDataQA · 2018-11-27T19:43:14Z

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1773/

CarbonDataQA · 2018-11-28T03:54:18Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1565/

CarbonDataQA · 2018-11-28T04:56:58Z

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9823/

CarbonDataQA · 2018-11-28T05:15:12Z

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1776/

Problem: For each blocklet an object of SegmentPropertiesAndSchemaHolder is created to store the schema used for query. This object is created only if no other blocklet has the same schema. To check the schema we are comparing List<ColumnSchema>, as the equals method in ColumnSchema does not check for columnUniqueId therefore this check is failing and the new restructured blocklet is using the schema of the old blocklet. Due to this the newly added column is being ignored as the old blocklet schema specifies that the column is delete(alter drop). Solution: Instead of checking the equality through equals and hashcode, write a new implementation for both and check based on columnUniqueId.

CarbonDataQA · 2018-11-28T06:39:17Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1567/

CarbonDataQA · 2018-11-28T07:34:10Z

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1778/

CarbonDataQA · 2018-11-28T07:37:34Z

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9825/

kunal642 · 2018-11-28T08:08:31Z

@manishgupta88 Please review

manishgupta88 · 2018-11-28T10:03:08Z

LGTM

Problem: For each blocklet an object of SegmentPropertiesAndSchemaHolder is created to store the schema used for query. This object is created only if no other blocklet has the same schema. To check the schema we are comparing List<ColumnSchema>, as the equals method in ColumnSchema does not check for columnUniqueId therefore this check is failing and the new restructured blocklet is using the schema of the old blocklet. Due to this the newly added column is being ignored as the old blocklet schema specifies that the column is delete(alter drop). Solution: Instead of checking the equality through equals and hashcode, write a new implementation for both and check based on columnUniqueId. This closes #2956

kunal642 force-pushed the bug/CARBONDATA-3134 branch 2 times, most recently from 5468f98 to ff20746 Compare November 27, 2018 11:25

manishgupta88 suggested changes Nov 27, 2018

View reviewed changes

qiuchenjian reviewed Nov 27, 2018

View reviewed changes

kunal642 force-pushed the bug/CARBONDATA-3134 branch from ff20746 to 7ce866e Compare November 27, 2018 15:03

kunal642 force-pushed the bug/CARBONDATA-3134 branch from 7ce866e to bcf9e45 Compare November 27, 2018 17:27

kunal642 force-pushed the bug/CARBONDATA-3134 branch from bcf9e45 to 0c7ec87 Compare November 27, 2018 18:34

kunal642 force-pushed the bug/CARBONDATA-3134 branch from 0c7ec87 to 1c7bb2c Compare November 28, 2018 03:41

kunal642 force-pushed the bug/CARBONDATA-3134 branch from 1c7bb2c to dd5244a Compare November 28, 2018 06:26

asfgit closed this in a5f080b Nov 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CARBONDATA-3134] fixed null values when cachelevel is set as blocklet #2956

[CARBONDATA-3134] fixed null values when cachelevel is set as blocklet #2956

kunal642 commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

manishgupta88 Nov 27, 2018

qiuchenjian Nov 27, 2018

kunal642 Nov 27, 2018

manishgupta88 Nov 27, 2018

kunal642 Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

qiuchenjian Nov 27, 2018

kunal642 Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 28, 2018

CarbonDataQA commented Nov 28, 2018

CarbonDataQA commented Nov 28, 2018

CarbonDataQA commented Nov 28, 2018

CarbonDataQA commented Nov 28, 2018

CarbonDataQA commented Nov 28, 2018

kunal642 commented Nov 28, 2018

manishgupta88 commented Nov 28, 2018

[CARBONDATA-3134] fixed null values when cachelevel is set as blocklet #2956

[CARBONDATA-3134] fixed null values when cachelevel is set as blocklet #2956

Conversation

kunal642 commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

manishgupta88 Nov 27, 2018

Choose a reason for hiding this comment

qiuchenjian Nov 27, 2018

Choose a reason for hiding this comment

kunal642 Nov 27, 2018

Choose a reason for hiding this comment

manishgupta88 Nov 27, 2018

Choose a reason for hiding this comment

kunal642 Nov 27, 2018

Choose a reason for hiding this comment

CarbonDataQA commented Nov 27, 2018

qiuchenjian Nov 27, 2018

Choose a reason for hiding this comment

kunal642 Nov 27, 2018

Choose a reason for hiding this comment

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 27, 2018

CarbonDataQA commented Nov 28, 2018

CarbonDataQA commented Nov 28, 2018

CarbonDataQA commented Nov 28, 2018

CarbonDataQA commented Nov 28, 2018

CarbonDataQA commented Nov 28, 2018

CarbonDataQA commented Nov 28, 2018

kunal642 commented Nov 28, 2018

manishgupta88 commented Nov 28, 2018