[CARBONDATA-3013] Added support for pruning pages for vector direct fill. #2820

ravipesala · 2018-10-16T10:27:21Z

This PR depends on PR #2819

First, apply page level pruning using the min/max of each page and get the valid pages of blocklet. Decompress only valid pages and fill the vector directly as mentioned in full scan query scenario.
For this purpose to prune pages first before decompressing the data, added new method inside a class FilterExecuter.

BitSet prunePages(RawBlockletColumnChunks rawBlockletColumnChunks)
      throws FilterUnsupportedException, IOException;

The above method reads the necessary column chunk metadata and prunes the pages as per the min/max meta. Based on the pruned pages BlockletScannedResult decompresses and fills the column page data to vector as described in full scan in above mentioned PR .

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

Any interfaces changed?
Any backward compatibility impacted?
Document update required?
Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

CarbonDataQA · 2018-10-16T10:39:35Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/809/

CarbonDataQA · 2018-10-16T11:48:09Z

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1006/

CarbonDataQA · 2018-10-16T11:55:17Z

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9074/

CarbonDataQA · 2018-10-16T13:29:45Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/815/

CarbonDataQA · 2018-10-16T14:43:09Z

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9080/

CarbonDataQA · 2018-10-16T14:56:27Z

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1012/

jackylk · 2018-10-18T14:25:50Z

core/src/main/java/org/apache/carbondata/core/datastore/chunk/impl/DimensionRawColumnChunk.java

+    try {
+      chunkReader.decodeColumnPageAndFillVector(this, pageNumber, vectorInfo);
+    } catch (Exception e) {
+      throw new RuntimeException(e);


Why not throw underlying exception

Because it is checked exception, that is why wrapped with unchecked exception

jackylk · 2018-10-18T14:26:17Z

core/src/main/java/org/apache/carbondata/core/datastore/chunk/impl/MeasureRawColumnChunk.java

@@ -94,7 +95,7 @@ public ColumnPage decodeColumnPage(int pageNumber) {
  public ColumnPage convertToColumnPageWithOutCache(int index) {
    assert index < pagesCount;
    // in case of filter query filter columns blocklet pages will uncompressed
-    // so no need to decode again
+    // so no need to decodeAndFillVector again


seems no need to modify

kumarvishal09 · 2018-10-18T15:03:07Z

...src/main/java/org/apache/carbondata/core/scan/filter/executer/ExcludeFilterExecuterImpl.java

+    if (isDimensionPresentInCurrentBlock) {
+      int chunkIndex = segmentProperties.getDimensionOrdinalToChunkMapping()
+          .get(dimColEvaluatorInfo.getColumnIndex());
+      if (null == rawBlockletColumnChunks.getDimensionRawColumnChunks()[chunkIndex]) {


For exclude filter case no need to read blocklet column data as every time we are returning true

it is read to get page count

we get the page count from rawBlockletColumnChunks.getDataBlock().numberOfPages()

kumarvishal09 · 2018-10-18T15:07:21Z

...src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java

@@ -179,6 +167,75 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
    return null;
  }

+  private boolean isScanRequired(DimensionRawColumnChunk dimensionRawColumnChunk, int i) {


please change i to columnIndex

kumarvishal09 · 2018-10-18T17:09:23Z

.../main/java/org/apache/carbondata/core/scan/filter/executer/RangeValueFilterExecuterImpl.java

+    // false, in that scenario the default values of the column should be shown.
+    // select all rows if dimension does not exists in the current block
+    if (!isDimensionPresentInCurrentBlock) {
+      int i = blockChunkHolder.getDataBlock().numberOfPages();


change i to numberOfPages

kumarvishal09 · 2018-10-18T17:15:06Z

...src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java

+      }
+      return bitSet;
+    }
+    return null;


for dimension/measure column which is not present in current blocklet(alter) returning null is ok?? i think we need to return all the pages

this case not supposed to happen, even in applyFilter also return null.

kumarvishal09 · 2018-10-18T17:24:02Z

core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFilterScanner.java

+    }
+    long dimensionReadTime = System.currentTimeMillis();
+    dimensionReadTime = System.currentTimeMillis() - dimensionReadTime;
+


Please remove empty lines

kumarvishal09 · 2018-10-18T17:25:33Z

...a/org/apache/carbondata/core/scan/filter/executer/RowLevelRangeGrtThanFiterExecuterImpl.java

@@ -148,6 +148,61 @@ private void ifDefaultValueMatchesFilter() {
    return bitSet;
  }

+  @Override
+  public BitSet prunePages(RawBlockletColumnChunks rawBlockletColumnChunks)
+      throws FilterUnsupportedException, IOException {


For all the RowLevelRangeFilters can we move some part of code to it's super class to remove code duplication??

Yes, lot of code is duplicated across all range filters, maybe we should combine some of the classes. We can do this refactoring in another PR.

CarbonDataQA · 2018-10-21T15:31:18Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/886/

CarbonDataQA · 2018-10-21T15:33:02Z

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9151/

CarbonDataQA · 2018-10-21T15:33:05Z

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1084/

CarbonDataQA · 2018-10-22T07:33:28Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/899/

CarbonDataQA · 2018-10-22T07:46:10Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/903/

CarbonDataQA · 2018-10-22T09:04:30Z

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9167/

CarbonDataQA · 2018-10-22T09:25:12Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/910/

CarbonDataQA · 2018-10-22T10:10:47Z

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1110/

CarbonDataQA · 2018-10-22T11:19:33Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/917/

CarbonDataQA · 2018-10-22T11:28:10Z

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9177/

CarbonDataQA · 2018-10-22T11:28:47Z

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1117/

manishgupta88 · 2018-10-22T12:36:40Z

.../src/main/java/org/apache/carbondata/core/scan/filter/executer/RestructureEvaluatorImpl.java

+  @Override
+  public BitSet prunePages(RawBlockletColumnChunks rawBlockletColumnChunks)
+      throws FilterUnsupportedException, IOException {
+    return new BitSet();


I think for this operation we need to throw FilterUnsupportedException similar to applyFilter implementation

manishgupta88 · 2018-10-22T12:37:28Z

...ache/carbondata/core/scan/filter/executer/RowLevelRangeGrtrThanEquaToFilterExecuterImpl.java

@@ -331,6 +319,80 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
    }
  }

+  private boolean isScanRequired(DimensionRawColumnChunk rawColumnChunk, int i) {


Change i to columnIndex

manishgupta88 · 2018-10-22T12:37:55Z

core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFilterScanner.java

+    if (blockExecutionInfo.isDirectVectorFill()) {
+      return executeFilterForPages(rawBlockletColumnChunks);
+    } else {
+      return executeFilter(rawBlockletColumnChunks);


As per the design I think we should follow the below hierarchy
prune block -> prune blocklet -> prune pages -> prune rows (if row filtering is enabled)
With current implementation we have 2 branches after prune blocklet -> prune pages and prune rows in parallel based on directVectorFill configuration. The effort to correct the design will be more so I think we can raise a jira to track the issue and correct it in near future

ok, this would be big refactoring, we can consider in future PR

manishgupta88 · 2018-10-22T12:38:55Z

core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFilterScanner.java

+      numberOfRows[i] = rawBlockletColumnChunks.getDataBlock().getPageRowCount(i);
+    }
+    long dimensionReadTime = System.currentTimeMillis();
+    dimensionReadTime = System.currentTimeMillis() - dimensionReadTime;


dimensionReadTime is not at the correct place compute this time properly

CarbonDataQA · 2018-10-22T13:34:48Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/929/

CarbonDataQA · 2018-10-22T14:18:28Z

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1127/

CarbonDataQA · 2018-10-25T09:48:49Z

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1227/

CarbonDataQA · 2018-10-25T09:49:02Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1014/

CarbonDataQA · 2018-10-25T09:50:02Z

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9279/

CarbonDataQA · 2018-10-25T17:49:03Z

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1244/

CarbonDataQA · 2018-10-25T17:49:15Z

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1032/

CarbonDataQA · 2018-10-25T17:59:04Z

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9297/

CarbonDataQA · 2018-10-25T19:29:17Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1034/

CarbonDataQA · 2018-10-25T20:53:53Z

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9299/

CarbonDataQA · 2018-10-25T20:53:54Z

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1246/

jackylk · 2018-10-26T09:15:35Z

.../java/org/apache/carbondata/core/scan/filter/executer/ImplicitIncludeFilterExecutorImpl.java

+  public BitSet prunePages(RawBlockletColumnChunks rawBlockletColumnChunks)
+      throws FilterUnsupportedException, IOException {
+    BitSet bitSet = new BitSet(rawBlockletColumnChunks.getDataBlock().numberOfPages());
+    bitSet.set(0, rawBlockletColumnChunks.getDataBlock().numberOfPages());


add a local variable for rawBlockletColumnChunks.getDataBlock().numberOfPages()

CarbonDataQA · 2018-10-26T10:00:06Z

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1040/

kumarvishal09 · 2018-10-26T10:56:06Z

LGTM

CarbonDataQA · 2018-10-26T11:05:11Z

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1253/

CarbonDataQA · 2018-10-26T11:14:21Z

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9305/

…ill. First, apply page level pruning using the min/max of each page and get the valid pages of blocklet. Decompress only valid pages and fill the vector directly as mentioned in full scan query scenario. For this purpose to prune pages first before decompressing the data, added new method inside a class FilterExecuter. BitSet prunePages(RawBlockletColumnChunks rawBlockletColumnChunks) throws FilterUnsupportedException, IOException; The above method reads the necessary column chunk metadata and prunes the pages as per the min/max meta. Based on the pruned pages BlockletScannedResult decompresses and fills the column page data to vector as described in full scan in above mentioned PR . This closes #2820

ravipesala mentioned this pull request Oct 16, 2018

[CARBONDATA-3014] Added support for inverted index and delete delta for direct scan queries #2822

Closed

5 tasks

ravipesala force-pushed the perf-filter-scan1 branch 2 times, most recently from 70d467c to 5d20d55 Compare October 16, 2018 13:16

jackylk reviewed Oct 18, 2018

View reviewed changes

kumarvishal09 reviewed Oct 18, 2018

View reviewed changes

ravipesala force-pushed the perf-filter-scan1 branch from 8bcc094 to af45aad Compare October 22, 2018 06:51

manishgupta88 suggested changes Oct 22, 2018

View reviewed changes

ravipesala force-pushed the perf-filter-scan1 branch from af45aad to 50c9c4b Compare October 25, 2018 09:12

Added support for pruning pages for vector direct fill.

856f8ae

ravipesala force-pushed the perf-filter-scan1 branch from 50c9c4b to d4833f7 Compare October 25, 2018 17:07

ravipesala force-pushed the perf-filter-scan1 branch from d4833f7 to 81fb740 Compare October 25, 2018 19:19

jackylk reviewed Oct 26, 2018

View reviewed changes

Fix comments

58288d6

ravipesala force-pushed the perf-filter-scan1 branch from 81fb740 to 58288d6 Compare October 26, 2018 09:49

asfgit closed this in e6d15da Oct 26, 2018

[CARBONDATA-3013] Added support for pruning pages for vector direct fill. #2820

[CARBONDATA-3013] Added support for pruning pages for vector direct fill. #2820

Conversation

ravipesala commented Oct 16, 2018

CarbonDataQA commented Oct 16, 2018

CarbonDataQA commented Oct 16, 2018

CarbonDataQA commented Oct 16, 2018

CarbonDataQA commented Oct 16, 2018

CarbonDataQA commented Oct 16, 2018

CarbonDataQA commented Oct 16, 2018

jackylk Oct 18, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kumarvishal09 Oct 18, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarbonDataQA commented Oct 21, 2018

CarbonDataQA commented Oct 21, 2018

CarbonDataQA commented Oct 21, 2018

CarbonDataQA commented Oct 22, 2018

CarbonDataQA commented Oct 22, 2018

CarbonDataQA commented Oct 22, 2018

CarbonDataQA commented Oct 22, 2018

CarbonDataQA commented Oct 22, 2018

CarbonDataQA commented Oct 22, 2018

CarbonDataQA commented Oct 22, 2018

CarbonDataQA commented Oct 22, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarbonDataQA commented Oct 22, 2018

CarbonDataQA commented Oct 22, 2018

CarbonDataQA commented Oct 25, 2018

CarbonDataQA commented Oct 25, 2018

CarbonDataQA commented Oct 25, 2018

CarbonDataQA commented Oct 25, 2018

CarbonDataQA commented Oct 25, 2018

CarbonDataQA commented Oct 25, 2018

CarbonDataQA commented Oct 25, 2018

CarbonDataQA commented Oct 25, 2018

CarbonDataQA commented Oct 25, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarbonDataQA commented Oct 26, 2018

kumarvishal09 commented Oct 26, 2018

CarbonDataQA commented Oct 26, 2018

CarbonDataQA commented Oct 26, 2018

jackylk Oct 18, 2018 •

edited

kumarvishal09 Oct 18, 2018 •

edited