Skip to content

[SPARK-49618][SQL]: Union & UnionExec nodes equality not take into account unaligned positions of branches causing cache miss and non reuse of exchange#48094

Closed
ahshahid wants to merge 11 commits intoapache:masterfrom
ahshahid:SPARK-49618
Closed

[SPARK-49618][SQL]: Union & UnionExec nodes equality not take into account unaligned positions of branches causing cache miss and non reuse of exchange#48094
ahshahid wants to merge 11 commits intoapache:masterfrom
ahshahid:SPARK-49618

Conversation

@ahshahid
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

A Trait UnionEquality is introduced which is implemented by Union and UnionExec nodes. It contains code to check equality of Union node legs in an order agnostic manner and also hashCode independent of the order of the legs. The equality does consider if the output attributes of the head nodes are same in terms of name, datatype, metadata, nullability etc (but not exprIDs).
It is true that converting Sequence of Legs into set to get order agnostic hashCode can result in situation like:
Seq(leg1, leg2) and Seq(leg1, leg2, leg2) to have same hashCode when converted to Set, but that should not cause logical problem as equality checks for length.
Though if we want to avoid hash collision in that situation, the code can be changed to
Objects.hashCode(Seq(leg1, leg2).map(_.hashCode).sorted: _*)

Why are the changes needed?

Because of the way the equality of Union nodes behave currently, changing the order of the legs, will cause cache miss and reuse of exchange not happening, as the canonicalized plans will not match.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added tests to check the equality of Union and UnionExec nodes with unaligned order of the legs.
Added test to verify cache lookup of InMemoryRelation and reuse of exchange.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Sep 12, 2024
@ahshahid ahshahid closed this by deleting the head repository Jun 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant