[SPARK-32693][SQL][2.4] Compare two dataframes with same schema except nullable property #29576
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR changes key data types check in
HashJoin
to usesameType
. This backports #29555 to branch-2.4.Why are the changes needed?
Looks at the resolving condition of
SetOperation
, it requires only each left data types should besameType
as the right ones. Logically theEqualTo
expression in equi-join, also requires only left data typesameType
as right data type. ThenHashJoin
requires left keys data type exactly the same as right keys data type, looks not reasonable.It makes inconsistent results when doing
except
between two dataframes.If two dataframes don't have nested fields, even their field nullable property different,
HashJoin
passes the key type check because it checks field individually so field nullable property is ignored.If two dataframes have nested fields like struct,
HashJoin
fails the key type check because now it compare two struct types and nullable property now affects.Does this PR introduce any user-facing change?
Yes. Making consistent
except
operation between dataframes.How was this patch tested?
Unit test.