Skip to content

feat: rebalance associative bitwise/Add/Multiply chains to avoid protobuf recursion limit#4588

Open
schenksj wants to merge 1 commit into
apache:mainfrom
schenksj:fix/4577-rebalance-associative-binary
Open

feat: rebalance associative bitwise/Add/Multiply chains to avoid protobuf recursion limit#4588
schenksj wants to merge 1 commit into
apache:mainfrom
schenksj:fix/4577-rebalance-associative-binary

Conversation

@schenksj
Copy link
Copy Markdown
Contributor

@schenksj schenksj commented Jun 4, 2026

Closes #4577.

Follow-up to #4531 (deep And/Or chains). Protobuf's recursion limit (100) applies to
any deeply nested BinaryExpr, so a long left-deep chain of other associative operators
overflows the same way when the serialized plan is re-parsed.

What

Extend the rebalancing (flattenAssociative + a balanced O(log n)-depth tree) to:

  • BitwiseAnd / BitwiseOr / BitwiseXor — always integral and exactly associative, so
    they reuse the existing createBalancedBinaryExpr directly.
  • Add / Multiply — gated via isAssociativeAndRebalanceable to integral types in
    LEGACY (wrapping, modular) eval mode, the only exactly-associative case. Float isn't
    associative (Spark's ReorderAssociativeOperator excludes it too); ANSI/TRY make
    integer-overflow position observable and grouping changes it; decimal precision grows
    per op. Those keep the existing left-deep serialization. Add/Multiply emit a
    MathExpr (eval_mode + return_type) rather than a BinaryExpr, so a new
    createBalancedMathExpr builds the balanced tree with the chain's uniform type and
    eval mode at every inner node.

Tests

Mirror #4531: project 200-deep chains and assert Comet runs them natively with results
matching Spark (which also verifies the associativity guarantee).

)

Follow-up to apache#4531 (deep And/Or chains). Protobuf's recursion limit (100) applies
to any deeply nested BinaryExpr, so a long left-deep chain of other associative
operators overflows the same way when the serialized plan is re-parsed.

Extend the rebalancing (flattenAssociative + a balanced O(log n)-depth tree) to:
  - BitwiseAnd / BitwiseOr / BitwiseXor: always integral and exactly associative,
    so they reuse the existing createBalancedBinaryExpr directly.
  - Add / Multiply: gated via isAssociativeAndRebalanceable to integral types in
    LEGACY (wrapping, modular) eval mode -- the only exactly-associative case. Float
    isn't associative (Spark's ReorderAssociativeOperator excludes it too); ANSI/TRY
    make integer overflow position (which the grouping changes) observable; decimal
    precision grows per op. Those keep the existing left-deep serialization. Add and
    Multiply emit a MathExpr (eval_mode + return_type) rather than a BinaryExpr, so a
    new createBalancedMathExpr builds the balanced tree with the chain's uniform type
    and eval mode at every inner node.

Tests mirror apache#4531: project 200-deep chains and assert Comet runs them natively with
results matching Spark (which also verifies the associativity guarantee).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rebalance deep associative binary expression chains (Add, Multiply, bitwise) to avoid protobuf recursion limit

1 participant