New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-36352][SQL][3.0] Spark should check result plan's output schema name #33764
Conversation
FYi @cloud-fan |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #142560 has finished for PR 33764 at commit
|
retest this please |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #142737 has finished for PR 33764 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #142792 has finished for PR 33764 at commit
|
ping @cloud-fan |
thanks, merging to 3.0! |
…a name ### What changes were proposed in this pull request? Spark should check result plan's output schema name ### Why are the changes needed? In current code, some optimizer rule may change plan's output schema, since in the code we always use semantic equal to check output, but it may change the plan's output schema. For example, for SchemaPruning, if we have a plan ``` Project[a, B] |--Scan[A, b, c] ``` the origin output schema is `a, B`, after SchemaPruning. it become ``` Project[A, b] |--Scan[A, b] ``` It change the plan's schema. when we use CTAS, the schema is same as query plan's output. Then since we change the schema, it not consistent with origin SQL. So we need to check final result plan's schema with origin plan's schema ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existed UT Closes #33764 from AngersZhuuuu/SPARK-36352-3.0. Authored-by: Angerszhuuuu <angers.zhu@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
Spark should check result plan's output schema name
Why are the changes needed?
In current code, some optimizer rule may change plan's output schema, since in the code we always use semantic equal to check output, but it may change the plan's output schema.
For example, for SchemaPruning, if we have a plan
the origin output schema is
a, B
, after SchemaPruning. it becomeIt change the plan's schema. when we use CTAS, the schema is same as query plan's output.
Then since we change the schema, it not consistent with origin SQL. So we need to check final result plan's schema with origin plan's schema
Does this PR introduce any user-facing change?
No
How was this patch tested?
existed UT