Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18058][SQL] Comparing column types ignoring Nullability in Union and SetOperation #15595

Closed
wants to merge 6 commits into from

Conversation

CodingCat
Copy link
Contributor

@CodingCat CodingCat commented Oct 22, 2016

What changes were proposed in this pull request?

The PR tries to fix SPARK-18058 which refers to a bug that the column types are compared with the extra care about Nullability in Union and SetOperation.

This PR converts the columns types by setting all fields as nullable before comparison

How was this patch tested?

regular unit test cases

@CodingCat CodingCat changed the title [SPARK-18058][SQL] Comparing column types ignoring Nullability in BinaryOperator [SPARK-18058][SQL] Comparing column types ignoring Nullability in Union and SetOperation Oct 22, 2016
@SparkQA
Copy link

SparkQA commented Oct 22, 2016

Test build #67364 has finished for PR 15595 at commit c62f84f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 22, 2016

Test build #67367 has finished for PR 15595 at commit 4a79782.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

@CodingCat I think we need a test maybe (at least the one specified in the JIRA) to confirm this PR fixes that problem and avoid the regression in the future.

@hvanhovell
Copy link
Contributor

+1 on adding a test.

@SparkQA
Copy link

SparkQA commented Oct 23, 2016

Test build #67397 has finished for PR 15595 at commit e7b5a9b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -377,4 +377,14 @@ class AnalysisSuite extends AnalysisTest {
assertExpressionType(sum(Divide(Decimal(1), 2.0)), DoubleType)
assertExpressionType(sum(Divide(1.0, Decimal(2.0))), DoubleType)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: delete extra newline

@@ -377,4 +377,14 @@ class AnalysisSuite extends AnalysisTest {
assertExpressionType(sum(Divide(Decimal(1), 2.0)), DoubleType)
assertExpressionType(sum(Divide(1.0, Decimal(2.0))), DoubleType)
}


test("SPARK-18058: union operations shall not care about the nullability of columns") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR also affects SetOperation. Could you please also add tests for that ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 (actually, it'd be nicer if it has a end-to-end test as well).

@SparkQA
Copy link

SparkQA commented Oct 23, 2016

Test build #67416 has finished for PR 15595 at commit e691714.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

LGTM - merging to master. Thanks!

@asfgit asfgit closed this in a81fba0 Oct 23, 2016
@hvanhovell
Copy link
Contributor

@CodingCat could you open a backport for branch-2.0?

@CodingCat
Copy link
Contributor Author

sure, doing that now

@hvanhovell
Copy link
Contributor

Thanks!

val unionPlan = Union(firstTable, secondTable)
assertAnalysisSuccess(unionPlan)

val r1 = Except(firstTable, secondTable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this var could have been named better.

Actually, you could have gotten rid of the vars by doing :

assertAnalysisSuccess(Union(firstTable, secondTable))
assertAnalysisSuccess(Except(firstTable, secondTable))
assertAnalysisSuccess(Intersect(firstTable, secondTable))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch, I have followed this in the backport version of the patch

@tejasapatil
Copy link
Contributor

Just saw that this got merged. I had a tiny nit but its ok without it.

robert3005 pushed a commit to palantir/spark that referenced this pull request Nov 1, 2016
…on and SetOperation

## What changes were proposed in this pull request?

The PR tries to fix [SPARK-18058](https://issues.apache.org/jira/browse/SPARK-18058) which refers to a bug that the column types are compared with the extra care about Nullability in Union and SetOperation.

This PR converts the columns types by setting all fields as nullable before comparison

## How was this patch tested?

regular unit test cases

Author: CodingCat <zhunansjtu@gmail.com>

Closes apache#15595 from CodingCat/SPARK-18058.
asfgit pushed a commit that referenced this pull request Nov 29, 2016
…lity on asNullable datatypes

## What changes were proposed in this pull request?
This is absolutely minor. PR #15595 uses `dt1.asNullable == dt2.asNullable` expressions in a few places. It is however more efficient to call `dt1.sameType(dt2)`. I have replaced every instance of the first pattern with the second pattern (3/5 were introduced by #15595).

## How was this patch tested?
Existing tests.

Author: Herman van Hovell <hvanhovell@databricks.com>

Closes #16041 from hvanhovell/SPARK-18058.
asfgit pushed a commit that referenced this pull request Nov 29, 2016
…lity on asNullable datatypes

## What changes were proposed in this pull request?
This is absolutely minor. PR #15595 uses `dt1.asNullable == dt2.asNullable` expressions in a few places. It is however more efficient to call `dt1.sameType(dt2)`. I have replaced every instance of the first pattern with the second pattern (3/5 were introduced by #15595).

## How was this patch tested?
Existing tests.

Author: Herman van Hovell <hvanhovell@databricks.com>

Closes #16041 from hvanhovell/SPARK-18058.

(cherry picked from commit d449988)
Signed-off-by: Reynold Xin <rxin@databricks.com>
robert3005 pushed a commit to palantir/spark that referenced this pull request Dec 2, 2016
…lity on asNullable datatypes

## What changes were proposed in this pull request?
This is absolutely minor. PR apache#15595 uses `dt1.asNullable == dt2.asNullable` expressions in a few places. It is however more efficient to call `dt1.sameType(dt2)`. I have replaced every instance of the first pattern with the second pattern (3/5 were introduced by apache#15595).

## How was this patch tested?
Existing tests.

Author: Herman van Hovell <hvanhovell@databricks.com>

Closes apache#16041 from hvanhovell/SPARK-18058.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…on and SetOperation

## What changes were proposed in this pull request?

The PR tries to fix [SPARK-18058](https://issues.apache.org/jira/browse/SPARK-18058) which refers to a bug that the column types are compared with the extra care about Nullability in Union and SetOperation.

This PR converts the columns types by setting all fields as nullable before comparison

## How was this patch tested?

regular unit test cases

Author: CodingCat <zhunansjtu@gmail.com>

Closes apache#15595 from CodingCat/SPARK-18058.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…lity on asNullable datatypes

## What changes were proposed in this pull request?
This is absolutely minor. PR apache#15595 uses `dt1.asNullable == dt2.asNullable` expressions in a few places. It is however more efficient to call `dt1.sameType(dt2)`. I have replaced every instance of the first pattern with the second pattern (3/5 were introduced by apache#15595).

## How was this patch tested?
Existing tests.

Author: Herman van Hovell <hvanhovell@databricks.com>

Closes apache#16041 from hvanhovell/SPARK-18058.
@CodingCat CodingCat deleted the SPARK-18058 branch December 30, 2020 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants