Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRILL-7240: Catch runtime pruning filter-match exceptions and do not prune these rowgroups #1783

Closed
wants to merge 1 commit into from

Conversation

Ben-Zvi
Copy link
Contributor

@Ben-Zvi Ben-Zvi commented May 6, 2019

The basic change is simple: Catch exceptions thrown by the match() method checking the filter predicate over a rowgroup's statistics, and just avoid pruning that rowgroup.
The current known case of such exception has to do with type casting (Long-Integer). In that case an INFO message is logged (only once per fragment). Note the case of mixed rowgroups/files, where some have the "correct" type and some don't - in that case the rowgroups with the "correct" type (i.e. no exception) would be pruned as needed (I tested that).
In case other exception ever shows up, it would log a WARN message (so eventually this failure could be analyzed).

@Ben-Zvi Ben-Zvi requested a review from amansinha100 May 6, 2019 20:56
matchResult == RowsMatch.NONE ? "Excluded" : "Included", rowGroup.getPath(), rowGroupIndex, footerRowCount, timeToRead);
} catch (ClassCastException cce) {
if ( ! matchCastErrorNotified ) {
logger.info("Run-time pruning check failed due to type casting. Skipping pruning rowgroups starting from {}. (Error: {})", rowGroup.getPath(), cce.getMessage());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the current behavior with non-interesting columns, there could be thousands of row groups which have this same ClassCastException. It would be better to consolidate and generate a single INFO message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, here is a sample from the log (tested with 6 rowgroups/files; three of which fail, or the remaining three one was pruned):

2019-05-06 18:37:19,809 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0] TRACE o.a.d.e.s.p.AbstractParquetScanBatchCreator - ParquetTrace,Read Footer,,/tmp/twofoo/sub/0_0_3.parquet,,0,0,0,170059
2019-05-06 18:37:19,828 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0] TRACE o.a.d.e.s.p.AbstractParquetScanBatchCreator - ParquetTrace,Read Footer,,/tmp/twofoo/sub/0_0_2.parquet,,0,0,0,4691
2019-05-06 18:37:19,829 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0] TRACE o.a.d.e.s.p.AbstractParquetScanBatchCreator - Run-time pruning: Included row-group /tmp/twofoo/sub/0_0_2.parquet (RG index: 0 row count: 2), took 466 usec
2019-05-06 18:37:19,869 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0] TRACE o.a.d.e.s.p.AbstractParquetScanBatchCreator - ParquetTrace,Read Footer,,/tmp/twofoo/sub/0_0_0.parquet,,0,0,0,37316
2019-05-06 18:37:19,870 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0] TRACE o.a.d.e.s.p.AbstractParquetScanBatchCreator - Run-time pruning: Excluded row-group /tmp/twofoo/sub/0_0_0.parquet (RG index: 0 row count: 2), took 683 usec
2019-05-06 18:37:19,909 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0] TRACE o.a.d.e.s.p.AbstractParquetScanBatchCreator - ParquetTrace,Read Footer,,/tmp/twofoo/sub/0_0_1.parquet,,0,0,0,38874
2019-05-06 18:37:19,914 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0] TRACE o.a.d.e.s.p.AbstractParquetScanBatchCreator - ParquetTrace,Read Footer,,/tmp/twofoo/sub/0_0_4.parquet,,0,0,0,2417
2019-05-06 18:37:19,915 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0] TRACE o.a.d.e.s.p.AbstractParquetScanBatchCreator - Run-time pruning: Included row-group /tmp/twofoo/sub/0_0_4.parquet (RG index: 0 row count: 2), took 356 usec
2019-05-06 18:37:19,919 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0] TRACE o.a.d.e.s.p.AbstractParquetScanBatchCreator - ParquetTrace,Read Footer,,/tmp/twofoo/sub/0_0_5.parquet,,0,0,0,2749
2019-05-06 18:37:19,922 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0] INFO  o.a.d.e.s.p.AbstractParquetScanBatchCreator - Finished parquet_runtime_pruning in 1505 usec. Out of given 6 rowgroups, 1 were pruned. 
2019-05-06 18:37:19,922 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0] INFO  o.a.d.e.s.p.AbstractParquetScanBatchCreator - Run-time pruning check skipped for 3 out of 6 rowgroups due to: java.lang.Integer cannot be cast to java.lang.Long

if ( totalPruneTime > 0 ) {
logger.info("Finished parquet_runtime_pruning in {} usec. Out of given {} rowgroups, {} were pruned. {}", totalPruneTime, totalRowgroups, rowgroupsPruned,
totalRowgroups == rowgroupsPruned ? "ALL_PRUNED !!" : "");
}
if ( countMatchClassCastExceptions > 0 ) {
logger.info("Run-time pruning check skipped for {} out of {} rowgroups due to: {}",countMatchClassCastExceptions, totalRowgroups, matchCastErrorMessage);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor point: The word 'check' should be removed since at least a partial check was performed and the failure happened during that process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, done

Copy link

@amansinha100 amansinha100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A follow-up minor comment. Overall LGTM. +1

@Ben-Zvi Ben-Zvi closed this in e5e9b35 May 9, 2019
lushuifeng pushed a commit to lushuifeng/drill that referenced this pull request Jun 21, 2019
xiangt920 pushed a commit to xiangt920/drill that referenced this pull request Dec 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants