Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CARBONDATA-3013] Added support for pruning pages for vector direct fill. #2820

Closed
wants to merge 2 commits into from

Conversation

ravipesala
Copy link
Contributor

This PR depends on PR #2819

First, apply page level pruning using the min/max of each page and get the valid pages of blocklet. Decompress only valid pages and fill the vector directly as mentioned in full scan query scenario.
For this purpose to prune pages first before decompressing the data, added new method inside a class FilterExecuter.

BitSet prunePages(RawBlockletColumnChunks rawBlockletColumnChunks)
      throws FilterUnsupportedException, IOException;

The above method reads the necessary column chunk metadata and prunes the pages as per the min/max meta. Based on the pruned pages BlockletScannedResult decompresses and fills the column page data to vector as described in full scan in above mentioned PR .

Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:

  • Any interfaces changed?

  • Any backward compatibility impacted?

  • Document update required?

  • Testing done
    Please provide details on
    - Whether new unit test cases have been added or why no new tests are required?
    - How it is tested? Please attach test report.
    - Is it a performance related change? Please attach the performance test report.
    - Any additional information to help reviewers in testing this change.

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/809/

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1006/

@CarbonDataQA
Copy link

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9074/

@ravipesala ravipesala force-pushed the perf-filter-scan1 branch 2 times, most recently from 70d467c to 5d20d55 Compare October 16, 2018 13:16
@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/815/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9080/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1012/

try {
chunkReader.decodeColumnPageAndFillVector(this, pageNumber, vectorInfo);
} catch (Exception e) {
throw new RuntimeException(e);
Copy link
Contributor

@jackylk jackylk Oct 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not throw underlying exception

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is checked exception, that is why wrapped with unchecked exception

@@ -94,7 +95,7 @@ public ColumnPage decodeColumnPage(int pageNumber) {
public ColumnPage convertToColumnPageWithOutCache(int index) {
assert index < pagesCount;
// in case of filter query filter columns blocklet pages will uncompressed
// so no need to decode again
// so no need to decodeAndFillVector again
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems no need to modify

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

if (isDimensionPresentInCurrentBlock) {
int chunkIndex = segmentProperties.getDimensionOrdinalToChunkMapping()
.get(dimColEvaluatorInfo.getColumnIndex());
if (null == rawBlockletColumnChunks.getDimensionRawColumnChunks()[chunkIndex]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For exclude filter case no need to read blocklet column data as every time we are returning true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is read to get page count

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we get the page count from rawBlockletColumnChunks.getDataBlock().numberOfPages()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we get the page count from rawBlockletColumnChunks.getDataBlock().numberOfPages()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we get the page count from rawBlockletColumnChunks.getDataBlock().numberOfPages()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -179,6 +167,75 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
return null;
}

private boolean isScanRequired(DimensionRawColumnChunk dimensionRawColumnChunk, int i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change i to columnIndex

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

// false, in that scenario the default values of the column should be shown.
// select all rows if dimension does not exists in the current block
if (!isDimensionPresentInCurrentBlock) {
int i = blockChunkHolder.getDataBlock().numberOfPages();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change i to numberOfPages

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

}
return bitSet;
}
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for dimension/measure column which is not present in current blocklet(alter) returning null is ok?? i think we need to return all the pages

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this case not supposed to happen, even in applyFilter also return null.

}
long dimensionReadTime = System.currentTimeMillis();
dimensionReadTime = System.currentTimeMillis() - dimensionReadTime;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove empty lines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -148,6 +148,61 @@ private void ifDefaultValueMatchesFilter() {
return bitSet;
}

@Override
public BitSet prunePages(RawBlockletColumnChunks rawBlockletColumnChunks)
throws FilterUnsupportedException, IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all the RowLevelRangeFilters can we move some part of code to it's super class to remove code duplication??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, lot of code is duplicated across all range filters, maybe we should combine some of the classes. We can do this refactoring in another PR.

@CarbonDataQA
Copy link

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/886/

@CarbonDataQA
Copy link

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9151/

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1084/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/899/

@CarbonDataQA
Copy link

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/903/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9167/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/910/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1110/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/917/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9177/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1117/

@Override
public BitSet prunePages(RawBlockletColumnChunks rawBlockletColumnChunks)
throws FilterUnsupportedException, IOException {
return new BitSet();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for this operation we need to throw FilterUnsupportedException similar to applyFilter implementation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -331,6 +319,80 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks,
}
}

private boolean isScanRequired(DimensionRawColumnChunk rawColumnChunk, int i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change i to columnIndex

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

if (blockExecutionInfo.isDirectVectorFill()) {
return executeFilterForPages(rawBlockletColumnChunks);
} else {
return executeFilter(rawBlockletColumnChunks);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the design I think we should follow the below hierarchy
prune block -> prune blocklet -> prune pages -> prune rows (if row filtering is enabled)
With current implementation we have 2 branches after prune blocklet -> prune pages and prune rows in parallel based on directVectorFill configuration. The effort to correct the design will be more so I think we can raise a jira to track the issue and correct it in near future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, this would be big refactoring, we can consider in future PR

numberOfRows[i] = rawBlockletColumnChunks.getDataBlock().getPageRowCount(i);
}
long dimensionReadTime = System.currentTimeMillis();
dimensionReadTime = System.currentTimeMillis() - dimensionReadTime;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dimensionReadTime is not at the correct place compute this time properly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/929/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1127/

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1227/

@CarbonDataQA
Copy link

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1014/

@CarbonDataQA
Copy link

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9279/

@CarbonDataQA
Copy link

Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1244/

@CarbonDataQA
Copy link

Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1032/

@CarbonDataQA
Copy link

Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9297/

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1034/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9299/

@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1246/

public BitSet prunePages(RawBlockletColumnChunks rawBlockletColumnChunks)
throws FilterUnsupportedException, IOException {
BitSet bitSet = new BitSet(rawBlockletColumnChunks.getDataBlock().numberOfPages());
bitSet.set(0, rawBlockletColumnChunks.getDataBlock().numberOfPages());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a local variable for rawBlockletColumnChunks.getDataBlock().numberOfPages()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@CarbonDataQA
Copy link

Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1040/

@kumarvishal09
Copy link
Contributor

LGTM

@asfgit asfgit closed this in e6d15da Oct 26, 2018
@CarbonDataQA
Copy link

Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1253/

@CarbonDataQA
Copy link

Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9305/

asfgit pushed a commit that referenced this pull request Nov 21, 2018
…ill.

First, apply page level pruning using the min/max of each page and get the valid pages of blocklet.
Decompress only valid pages and fill the vector directly as mentioned in full scan query scenario.
For this purpose to prune pages first before decompressing the data, added new method inside a class FilterExecuter.

BitSet prunePages(RawBlockletColumnChunks rawBlockletColumnChunks) throws FilterUnsupportedException, IOException;

The above method reads the necessary column chunk metadata and prunes the pages as per the min/max meta.
Based on the pruned pages BlockletScannedResult decompresses and fills the column page data to vector as described in full scan in above mentioned PR .

This closes #2820
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants