[AURON #2369] fix incorrect ORC predicate pushdown with OR#2370
Conversation
lyne7-sc
left a comment
There was a problem hiding this comment.
Nice fix! One small test coverage thought: maybe consider adding a similar regression test with SCOrExpr(id = 1, id = age). The implementation updates both BinaryExpr::Or and SCOrExpr, while the new mixed convertible/unconvertible OR tests only cover the BinaryExpr::new(..., Operator::Or, ...) path.
While reviewing this, I also noticed a related existing case around NOT: NOT(id = 5 AND unsupported_expr) seems like it could become NOT(id = 5) if the inner AND is partiallyconverted. That would be narrower than the original filter and could prune matching rows. This seems to be an existing issue, but it may be worth a follow-up issue/PR.
d29a43c to
fc4377a
Compare
Thanks for the review! I have added the corresponding unit test for SCOrExpr as you suggested. Regarding your second point about the NOT issue, I agree that it should be tracked and fixed in a follow-up issue/PR. That being said, in a typical production environment, the |
| // Not an OR expression, convert the whole expression as a single disjunct | ||
| // (could be AND, comparison, IS NULL, etc.). If it cannot be converted, the | ||
| // entire OR is unpushable. | ||
| match convert_expr_to_orc(expr, schema) { |
There was a problem hiding this comment.
Could this still have a similar implication issue through NOT? NotExpr converts its child with convert_expr_to_orc and then wraps it in Predicate::not(...), but child conversion can be partial for AND by dropping unconvertible conjuncts. That is safe before negation, but maybe not after it.
For example, NOT(id = 1 AND id = age) could become NOT(id = 1) if id = age is not convertible. That seems narrower than the original predicate and might let ORC pruning skip row groups with valid rows.
Would it be worth making NOT conversion all-or-nothing, or adding a regression test for this shape?
Which issue does this PR close?
Closes #2369
Rationale for this change
Fix some ORC predicate pushdown with OR losing data.
What changes are included in this PR?
around the OR still push down safely.
Are there any user-facing changes?
no
How was this patch tested?
unit tests