New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CARBONDATA-3013] Added support for pruning pages for vector direct fill. #2820
Conversation
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/809/ |
Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1006/ |
Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9074/ |
70d467c
to
5d20d55
Compare
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/815/ |
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9080/ |
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1012/ |
try { | ||
chunkReader.decodeColumnPageAndFillVector(this, pageNumber, vectorInfo); | ||
} catch (Exception e) { | ||
throw new RuntimeException(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not throw underlying exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it is checked exception, that is why wrapped with unchecked exception
@@ -94,7 +95,7 @@ public ColumnPage decodeColumnPage(int pageNumber) { | |||
public ColumnPage convertToColumnPageWithOutCache(int index) { | |||
assert index < pagesCount; | |||
// in case of filter query filter columns blocklet pages will uncompressed | |||
// so no need to decode again | |||
// so no need to decodeAndFillVector again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems no need to modify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
if (isDimensionPresentInCurrentBlock) { | ||
int chunkIndex = segmentProperties.getDimensionOrdinalToChunkMapping() | ||
.get(dimColEvaluatorInfo.getColumnIndex()); | ||
if (null == rawBlockletColumnChunks.getDimensionRawColumnChunks()[chunkIndex]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For exclude filter case no need to read blocklet column data as every time we are returning true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is read to get page count
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we get the page count from rawBlockletColumnChunks.getDataBlock().numberOfPages()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we get the page count from rawBlockletColumnChunks.getDataBlock().numberOfPages()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we get the page count from rawBlockletColumnChunks.getDataBlock().numberOfPages()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@@ -179,6 +167,75 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks, | |||
return null; | |||
} | |||
|
|||
private boolean isScanRequired(DimensionRawColumnChunk dimensionRawColumnChunk, int i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please change i to columnIndex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
// false, in that scenario the default values of the column should be shown. | ||
// select all rows if dimension does not exists in the current block | ||
if (!isDimensionPresentInCurrentBlock) { | ||
int i = blockChunkHolder.getDataBlock().numberOfPages(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change i to numberOfPages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
} | ||
return bitSet; | ||
} | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for dimension/measure column which is not present in current blocklet(alter) returning null is ok?? i think we need to return all the pages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this case not supposed to happen, even in applyFilter also return null.
} | ||
long dimensionReadTime = System.currentTimeMillis(); | ||
dimensionReadTime = System.currentTimeMillis() - dimensionReadTime; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove empty lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@@ -148,6 +148,61 @@ private void ifDefaultValueMatchesFilter() { | |||
return bitSet; | |||
} | |||
|
|||
@Override | |||
public BitSet prunePages(RawBlockletColumnChunks rawBlockletColumnChunks) | |||
throws FilterUnsupportedException, IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all the RowLevelRangeFilters can we move some part of code to it's super class to remove code duplication??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, lot of code is duplicated across all range filters, maybe we should combine some of the classes. We can do this refactoring in another PR.
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/886/ |
Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9151/ |
Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1084/ |
8bcc094
to
af45aad
Compare
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/899/ |
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/903/ |
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9167/ |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/910/ |
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1110/ |
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/917/ |
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9177/ |
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1117/ |
@Override | ||
public BitSet prunePages(RawBlockletColumnChunks rawBlockletColumnChunks) | ||
throws FilterUnsupportedException, IOException { | ||
return new BitSet(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for this operation we need to throw FilterUnsupportedException
similar to applyFilter implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@@ -331,6 +319,80 @@ public BitSetGroup applyFilter(RawBlockletColumnChunks rawBlockletColumnChunks, | |||
} | |||
} | |||
|
|||
private boolean isScanRequired(DimensionRawColumnChunk rawColumnChunk, int i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change i
to columnIndex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
if (blockExecutionInfo.isDirectVectorFill()) { | ||
return executeFilterForPages(rawBlockletColumnChunks); | ||
} else { | ||
return executeFilter(rawBlockletColumnChunks); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per the design I think we should follow the below hierarchy
prune block -> prune blocklet -> prune pages -> prune rows (if row filtering is enabled)
With current implementation we have 2 branches after prune blocklet -> prune pages and prune rows
in parallel based on directVectorFill configuration. The effort to correct the design will be more so I think we can raise a jira to track the issue and correct it in near future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, this would be big refactoring, we can consider in future PR
numberOfRows[i] = rawBlockletColumnChunks.getDataBlock().getPageRowCount(i); | ||
} | ||
long dimensionReadTime = System.currentTimeMillis(); | ||
dimensionReadTime = System.currentTimeMillis() - dimensionReadTime; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dimensionReadTime
is not at the correct place compute this time properly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/929/ |
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1127/ |
af45aad
to
50c9c4b
Compare
Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1227/ |
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1014/ |
Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9279/ |
50c9c4b
to
d4833f7
Compare
Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1244/ |
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1032/ |
Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9297/ |
d4833f7
to
81fb740
Compare
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1034/ |
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9299/ |
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1246/ |
public BitSet prunePages(RawBlockletColumnChunks rawBlockletColumnChunks) | ||
throws FilterUnsupportedException, IOException { | ||
BitSet bitSet = new BitSet(rawBlockletColumnChunks.getDataBlock().numberOfPages()); | ||
bitSet.set(0, rawBlockletColumnChunks.getDataBlock().numberOfPages()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a local variable for rawBlockletColumnChunks.getDataBlock().numberOfPages()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
81fb740
to
58288d6
Compare
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1040/ |
LGTM |
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1253/ |
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9305/ |
…ill. First, apply page level pruning using the min/max of each page and get the valid pages of blocklet. Decompress only valid pages and fill the vector directly as mentioned in full scan query scenario. For this purpose to prune pages first before decompressing the data, added new method inside a class FilterExecuter. BitSet prunePages(RawBlockletColumnChunks rawBlockletColumnChunks) throws FilterUnsupportedException, IOException; The above method reads the necessary column chunk metadata and prunes the pages as per the min/max meta. Based on the pruned pages BlockletScannedResult decompresses and fills the column page data to vector as described in full scan in above mentioned PR . This closes #2820
This PR depends on PR #2819
First, apply page level pruning using the min/max of each page and get the valid pages of blocklet. Decompress only valid pages and fill the vector directly as mentioned in full scan query scenario.
For this purpose to prune pages first before decompressing the data, added new method inside a class
FilterExecuter
.The above method reads the necessary column chunk metadata and prunes the pages as per the min/max meta. Based on the pruned pages BlockletScannedResult decompresses and fills the column page data to vector as described in full scan in above mentioned PR .
Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:
Any interfaces changed?
Any backward compatibility impacted?
Document update required?
Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.