Skip to content

[AURON #2369] fix incorrect ORC predicate pushdown with OR#2370

Open
Flyangz wants to merge 1 commit into
apache:masterfrom
Flyangz:bugfix/open-orc-or-pushdown
Open

[AURON #2369] fix incorrect ORC predicate pushdown with OR#2370
Flyangz wants to merge 1 commit into
apache:masterfrom
Flyangz:bugfix/open-orc-or-pushdown

Conversation

@Flyangz

@Flyangz Flyangz commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #2369

Rationale for this change

Fix some ORC predicate pushdown with OR losing data.

What changes are included in this PR?

  • Made OR pushdown all-or-nothing: collect_or_predicates now returns bool, and if any disjunct fails to convert, the whole OR is not pushed down (convert_expr_to_orc returns None). Convertible AND conjuncts
    around the OR still push down safely.
  • Added unit tests covering: OR with an unconvertible disjunct, OR whose disjunct is a fully-unconvertible AND, and an unconvertible OR nested in an AND where the sibling conjunct still pushes down.

Are there any user-facing changes?

no

How was this patch tested?

unit tests

@lyne7-sc lyne7-sc left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix! One small test coverage thought: maybe consider adding a similar regression test with SCOrExpr(id = 1, id = age). The implementation updates both BinaryExpr::Or and SCOrExpr, while the new mixed convertible/unconvertible OR tests only cover the BinaryExpr::new(..., Operator::Or, ...) path.

While reviewing this, I also noticed a related existing case around NOT: NOT(id = 5 AND unsupported_expr) seems like it could become NOT(id = 5) if the inner AND is partiallyconverted. That would be narrower than the original filter and could prune matching rows. This seems to be an existing issue, but it may be worth a follow-up issue/PR.

@Flyangz Flyangz force-pushed the bugfix/open-orc-or-pushdown branch from d29a43c to fc4377a Compare July 3, 2026 03:59
@Flyangz

Flyangz commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

Nice fix! One small test coverage thought: maybe consider adding a similar regression test with SCOrExpr(id = 1, id = age). The implementation updates both BinaryExpr::Or and SCOrExpr, while the new mixed convertible/unconvertible OR tests only cover the BinaryExpr::new(..., Operator::Or, ...) path.

While reviewing this, I also noticed a related existing case around NOT: NOT(id = 5 AND unsupported_expr) seems like it could become NOT(id = 5) if the inner AND is partiallyconverted. That would be narrower than the original filter and could prune matching rows. This seems to be an existing issue, but it may be worth a follow-up issue/PR.

Thanks for the review! I have added the corresponding unit test for SCOrExpr as you suggested.

Regarding your second point about the NOT issue, I agree that it should be tracked and fixed in a follow-up issue/PR. That being said, in a typical production environment, the BooleanSimplification rule usually optimizes this away, so the practical impact is likely limited.

// Not an OR expression, convert the whole expression as a single disjunct
// (could be AND, comparison, IS NULL, etc.). If it cannot be converted, the
// entire OR is unpushable.
match convert_expr_to_orc(expr, schema) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this still have a similar implication issue through NOT? NotExpr converts its child with convert_expr_to_orc and then wraps it in Predicate::not(...), but child conversion can be partial for AND by dropping unconvertible conjuncts. That is safe before negation, but maybe not after it.

For example, NOT(id = 1 AND id = age) could become NOT(id = 1) if id = age is not convertible. That seems narrower than the original predicate and might let ORC pruning skip row groups with valid rows.

Would it be worth making NOT conversion all-or-nothing, or adding a regression test for this shape?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: incorrect ORC predicate pushdown with OR

3 participants