New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-18058][SQL] Comparing column types ignoring Nullability in Union and SetOperation #15595
Conversation
Test build #67364 has finished for PR 15595 at commit
|
Test build #67367 has finished for PR 15595 at commit
|
@CodingCat I think we need a test maybe (at least the one specified in the JIRA) to confirm this PR fixes that problem and avoid the regression in the future. |
+1 on adding a test. |
Test build #67397 has finished for PR 15595 at commit
|
@@ -377,4 +377,14 @@ class AnalysisSuite extends AnalysisTest { | |||
assertExpressionType(sum(Divide(Decimal(1), 2.0)), DoubleType) | |||
assertExpressionType(sum(Divide(1.0, Decimal(2.0))), DoubleType) | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: delete extra newline
@@ -377,4 +377,14 @@ class AnalysisSuite extends AnalysisTest { | |||
assertExpressionType(sum(Divide(Decimal(1), 2.0)), DoubleType) | |||
assertExpressionType(sum(Divide(1.0, Decimal(2.0))), DoubleType) | |||
} | |||
|
|||
|
|||
test("SPARK-18058: union operations shall not care about the nullability of columns") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR also affects SetOperation
. Could you please also add tests for that ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 (actually, it'd be nicer if it has a end-to-end test as well).
Test build #67416 has finished for PR 15595 at commit
|
LGTM - merging to master. Thanks! |
@CodingCat could you open a backport for branch-2.0? |
sure, doing that now |
Thanks! |
val unionPlan = Union(firstTable, secondTable) | ||
assertAnalysisSuccess(unionPlan) | ||
|
||
val r1 = Except(firstTable, secondTable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this var could have been named better.
Actually, you could have gotten rid of the vars by doing :
assertAnalysisSuccess(Union(firstTable, secondTable))
assertAnalysisSuccess(Except(firstTable, secondTable))
assertAnalysisSuccess(Intersect(firstTable, secondTable))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch, I have followed this in the backport version of the patch
Just saw that this got merged. I had a tiny nit but its ok without it. |
…on and SetOperation ## What changes were proposed in this pull request? The PR tries to fix [SPARK-18058](https://issues.apache.org/jira/browse/SPARK-18058) which refers to a bug that the column types are compared with the extra care about Nullability in Union and SetOperation. This PR converts the columns types by setting all fields as nullable before comparison ## How was this patch tested? regular unit test cases Author: CodingCat <zhunansjtu@gmail.com> Closes apache#15595 from CodingCat/SPARK-18058.
…lity on asNullable datatypes ## What changes were proposed in this pull request? This is absolutely minor. PR #15595 uses `dt1.asNullable == dt2.asNullable` expressions in a few places. It is however more efficient to call `dt1.sameType(dt2)`. I have replaced every instance of the first pattern with the second pattern (3/5 were introduced by #15595). ## How was this patch tested? Existing tests. Author: Herman van Hovell <hvanhovell@databricks.com> Closes #16041 from hvanhovell/SPARK-18058.
…lity on asNullable datatypes ## What changes were proposed in this pull request? This is absolutely minor. PR #15595 uses `dt1.asNullable == dt2.asNullable` expressions in a few places. It is however more efficient to call `dt1.sameType(dt2)`. I have replaced every instance of the first pattern with the second pattern (3/5 were introduced by #15595). ## How was this patch tested? Existing tests. Author: Herman van Hovell <hvanhovell@databricks.com> Closes #16041 from hvanhovell/SPARK-18058. (cherry picked from commit d449988) Signed-off-by: Reynold Xin <rxin@databricks.com>
…lity on asNullable datatypes ## What changes were proposed in this pull request? This is absolutely minor. PR apache#15595 uses `dt1.asNullable == dt2.asNullable` expressions in a few places. It is however more efficient to call `dt1.sameType(dt2)`. I have replaced every instance of the first pattern with the second pattern (3/5 were introduced by apache#15595). ## How was this patch tested? Existing tests. Author: Herman van Hovell <hvanhovell@databricks.com> Closes apache#16041 from hvanhovell/SPARK-18058.
…on and SetOperation ## What changes were proposed in this pull request? The PR tries to fix [SPARK-18058](https://issues.apache.org/jira/browse/SPARK-18058) which refers to a bug that the column types are compared with the extra care about Nullability in Union and SetOperation. This PR converts the columns types by setting all fields as nullable before comparison ## How was this patch tested? regular unit test cases Author: CodingCat <zhunansjtu@gmail.com> Closes apache#15595 from CodingCat/SPARK-18058.
…lity on asNullable datatypes ## What changes were proposed in this pull request? This is absolutely minor. PR apache#15595 uses `dt1.asNullable == dt2.asNullable` expressions in a few places. It is however more efficient to call `dt1.sameType(dt2)`. I have replaced every instance of the first pattern with the second pattern (3/5 were introduced by apache#15595). ## How was this patch tested? Existing tests. Author: Herman van Hovell <hvanhovell@databricks.com> Closes apache#16041 from hvanhovell/SPARK-18058.
What changes were proposed in this pull request?
The PR tries to fix SPARK-18058 which refers to a bug that the column types are compared with the extra care about Nullability in Union and SetOperation.
This PR converts the columns types by setting all fields as nullable before comparison
How was this patch tested?
regular unit test cases