-
Notifications
You must be signed in to change notification settings - Fork 984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRILL-7240: Catch runtime pruning filter-match exceptions and do not prune these rowgroups #1783
Conversation
matchResult == RowsMatch.NONE ? "Excluded" : "Included", rowGroup.getPath(), rowGroupIndex, footerRowCount, timeToRead); | ||
} catch (ClassCastException cce) { | ||
if ( ! matchCastErrorNotified ) { | ||
logger.info("Run-time pruning check failed due to type casting. Skipping pruning rowgroups starting from {}. (Error: {})", rowGroup.getPath(), cce.getMessage()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the current behavior with non-interesting columns, there could be thousands of row groups which have this same ClassCastException. It would be better to consolidate and generate a single INFO message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, here is a sample from the log (tested with 6 rowgroups/files; three of which fail, or the remaining three one was pruned):
2019-05-06 18:37:19,809 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0] TRACE o.a.d.e.s.p.AbstractParquetScanBatchCreator - ParquetTrace,Read Footer,,/tmp/twofoo/sub/0_0_3.parquet,,0,0,0,170059
2019-05-06 18:37:19,828 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0] TRACE o.a.d.e.s.p.AbstractParquetScanBatchCreator - ParquetTrace,Read Footer,,/tmp/twofoo/sub/0_0_2.parquet,,0,0,0,4691
2019-05-06 18:37:19,829 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0] TRACE o.a.d.e.s.p.AbstractParquetScanBatchCreator - Run-time pruning: Included row-group /tmp/twofoo/sub/0_0_2.parquet (RG index: 0 row count: 2), took 466 usec
2019-05-06 18:37:19,869 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0] TRACE o.a.d.e.s.p.AbstractParquetScanBatchCreator - ParquetTrace,Read Footer,,/tmp/twofoo/sub/0_0_0.parquet,,0,0,0,37316
2019-05-06 18:37:19,870 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0] TRACE o.a.d.e.s.p.AbstractParquetScanBatchCreator - Run-time pruning: Excluded row-group /tmp/twofoo/sub/0_0_0.parquet (RG index: 0 row count: 2), took 683 usec
2019-05-06 18:37:19,909 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0] TRACE o.a.d.e.s.p.AbstractParquetScanBatchCreator - ParquetTrace,Read Footer,,/tmp/twofoo/sub/0_0_1.parquet,,0,0,0,38874
2019-05-06 18:37:19,914 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0] TRACE o.a.d.e.s.p.AbstractParquetScanBatchCreator - ParquetTrace,Read Footer,,/tmp/twofoo/sub/0_0_4.parquet,,0,0,0,2417
2019-05-06 18:37:19,915 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0] TRACE o.a.d.e.s.p.AbstractParquetScanBatchCreator - Run-time pruning: Included row-group /tmp/twofoo/sub/0_0_4.parquet (RG index: 0 row count: 2), took 356 usec
2019-05-06 18:37:19,919 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0] TRACE o.a.d.e.s.p.AbstractParquetScanBatchCreator - ParquetTrace,Read Footer,,/tmp/twofoo/sub/0_0_5.parquet,,0,0,0,2749
2019-05-06 18:37:19,922 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0] INFO o.a.d.e.s.p.AbstractParquetScanBatchCreator - Finished parquet_runtime_pruning in 1505 usec. Out of given 6 rowgroups, 1 were pruned.
2019-05-06 18:37:19,922 [232f1eb3-2cab-7d16-1222-c05eca0fdceb:frag:0:0] INFO o.a.d.e.s.p.AbstractParquetScanBatchCreator - Run-time pruning check skipped for 3 out of 6 rowgroups due to: java.lang.Integer cannot be cast to java.lang.Long
if ( totalPruneTime > 0 ) { | ||
logger.info("Finished parquet_runtime_pruning in {} usec. Out of given {} rowgroups, {} were pruned. {}", totalPruneTime, totalRowgroups, rowgroupsPruned, | ||
totalRowgroups == rowgroupsPruned ? "ALL_PRUNED !!" : ""); | ||
} | ||
if ( countMatchClassCastExceptions > 0 ) { | ||
logger.info("Run-time pruning check skipped for {} out of {} rowgroups due to: {}",countMatchClassCastExceptions, totalRowgroups, matchCastErrorMessage); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor point: The word 'check' should be removed since at least a partial check was performed and the failure happened during that process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A follow-up minor comment. Overall LGTM. +1
…prune these rowgroups
…prune these rowgroups closes apache#1783
…prune these rowgroups closes apache#1783
The basic change is simple: Catch exceptions thrown by the
match()
method checking the filter predicate over a rowgroup's statistics, and just avoid pruning that rowgroup.The current known case of such exception has to do with type casting (Long-Integer). In that case an INFO message is logged (only once per fragment). Note the case of mixed rowgroups/files, where some have the "correct" type and some don't - in that case the rowgroups with the "correct" type (i.e. no exception) would be pruned as needed (I tested that).
In case other exception ever shows up, it would log a WARN message (so eventually this failure could be analyzed).