[SPARK-15620][SQL] Fix transformed dataset attributes revolve failure #13399

jerryshao · 2016-05-30T11:20:10Z

What changes were proposed in this pull request?

Join on transformed dataset has attributes conflicts, which make query execution failure, for example:

val dataset = Seq(1, 2, 3).toDs
val mappedDs = dataset.map(_ + 1)

mappedDs.as("t1").joinWith(mappedDs.as("t2"), $"t1.value" === $"t2.value").show()

will throw exception:

org.apache.spark.sql.AnalysisException: cannot resolve '`t1.value`' given input columns: [value];
  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:62)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:59)
  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:287)
  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:287)

How was this patch tested?

Unit test.

SparkQA · 2016-05-30T12:44:24Z

Test build #59614 has finished for PR 13399 at commit 00864db.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jerryshao · 2016-06-01T02:10:07Z

@cloud-fan , would you please help to review this patch, not sure it is the right fix. Thanks a lot.

cloud-fan · 2016-06-01T06:22:26Z

sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala

@@ -783,6 +783,16 @@ class DatasetSuite extends QueryTest with SharedSQLContext {
      ds.filter(_.b > 1).collect().toSeq
    }
  }
+
+  test("transformed dataset correctly solve the attributes") {


can you double check this test? I tried this on latest master branch and it passed...

Thanks a lot @cloud-fan for your review. I think the issue still exists, problem is that unit test actually doesn't reflect this issue, let me change the unit test.

You could try the below to see the issue.

val dataset = Seq(1, 2, 3).toDS val mappedDataset = dataset.map(_ + 1) mappedDataset.as("t1").joinWith(mappedDataset.as("t2"), $"t1.value" === $"t2.value").show()

jerryshao · 2016-06-01T07:02:14Z

Unit test updated, please review again, thanks a lot :)

SparkQA · 2016-06-01T08:49:12Z

Test build #59727 has finished for PR 13399 at commit 2e436c1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2016-06-01T22:23:25Z

sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala

+
+  test("transformed dataset correctly solve the attributes") {
+    val dataset = Seq(1, 2, 3).toDS()
+    val ds1 = dataset.map(_ + 1)


nit: I'd like it to be:

val ds = Seq(1, 2, 3).toDS().map(_ + 1) val d1 = ds.as("d1") val d2 = ds.as("d2") ...

and the test name can be: mapped dataset should resolve duplicated attributes for self join and set operations

Sure, let me update it.

cloud-fan · 2016-06-01T22:23:55Z

LGTM except a minor style comment, thanks for working on it!

SparkQA · 2016-06-02T04:24:40Z

Test build #59802 has finished for PR 13399 at commit dcca095.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

## What changes were proposed in this pull request? Join on transformed dataset has attributes conflicts, which make query execution failure, for example: ``` val dataset = Seq(1, 2, 3).toDs val mappedDs = dataset.map(_ + 1) mappedDs.as("t1").joinWith(mappedDs.as("t2"), $"t1.value" === $"t2.value").show() ``` will throw exception: ``` org.apache.spark.sql.AnalysisException: cannot resolve '`t1.value`' given input columns: [value]; at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:62) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:59) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:287) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:287) ``` ## How was this patch tested? Unit test. Author: jerryshao <sshao@hortonworks.com> Closes #13399 from jerryshao/SPARK-15620. (cherry picked from commit 8288e16) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

cloud-fan · 2016-06-02T04:59:06Z

thanks, merging to master and 2.0!

### What changes were proposed in this pull request? This pr aims to upgrade netty from 4.1.92 to 4.1.93. ### Why are the changes needed? 1.v4.1.92 VS v4.1.93 netty/netty@netty-4.1.92.Final...netty-4.1.93.Final 2.The new version brings some bug fix, eg: - Reset byte buffer in loop for AbstractDiskHttpData.setContent ([#13320](netty/netty#13320)) - OpenSSL MAX_CERTIFICATE_LIST_BYTES option supported ([#13365](netty/netty#13365)) - Adapt to DirectByteBuffer constructor in Java 21 ([#13366](netty/netty#13366)) - HTTP/2 encoder: allow HEADER_TABLE_SIZE greater than Integer.MAX_VALUE ([#13368](netty/netty#13368)) - Upgrade to latest netty-tcnative to fix memory leak ([#13375](netty/netty#13375)) - H2/H2C server stream channels deactivated while write still in progress ([#13388](netty/netty#13388)) - Channel#bytesBefore(un)writable off by 1 ([#13389](netty/netty#13389)) - HTTP/2 should forward shutdown user events to active streams ([#13394](netty/netty#13394)) - Respect the number of bytes read per datagram when using recvmmsg ([#13399](netty/netty#13399)) 3.The release notes as follows: - https://netty.io/news/2023/05/25/4-1-93-Final.html 4.Why not upgrade to `4-1-94-Final` version? Because the return value of the 'threadCache()' (from `PoolThreadCache` to `PoolArenasCache`) method of the netty Inner class used in the 'arrow memory netty' version '12.0.1' has changed and belongs to break change, let's wait for the upgrade of the 'arrow memory netty' before upgrading to the '4-1-94-Final' version. The reference is as follows: https://github.com/apache/arrow/blob/6af660f48472b8b45a5e01b7136b9b040b185eb1/java/memory/memory-netty/src/main/java/io/netty/buffer/PooledByteBufAllocatorL.java#L164 https://github.com/netty/netty/blob/da1a448d5bc4f36cc1744db93fcaf64e198db2bd/buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java#L732-L736 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GA. Closes #41681 from panbingkun/upgrade_netty. Authored-by: panbingkun <pbk1982@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

cloud-fan reviewed Jun 1, 2016
View reviewed changes

jerryshao force-pushed the SPARK-15620 branch from 00864db to 2e436c1 Compare June 1, 2016 06:59

cloud-fan reviewed Jun 1, 2016
View reviewed changes

jerryshao added 4 commits June 2, 2016 10:14

Fix transformed Dataset attributes resolve failure

4f4d7db

Style fix

e3d0893

Update the unittest

4f51791

Address the comments

dcca095

jerryshao force-pushed the SPARK-15620 branch from 2e436c1 to dcca095 Compare June 2, 2016 02:35

asfgit closed this in 8288e16 Jun 2, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-15620][SQL] Fix transformed dataset attributes revolve failure #13399

[SPARK-15620][SQL] Fix transformed dataset attributes revolve failure #13399

jerryshao commented May 30, 2016

SparkQA commented May 30, 2016

jerryshao commented Jun 1, 2016

cloud-fan Jun 1, 2016

jerryshao Jun 1, 2016

jerryshao commented Jun 1, 2016

SparkQA commented Jun 1, 2016

cloud-fan Jun 1, 2016

jerryshao Jun 2, 2016

cloud-fan commented Jun 1, 2016

SparkQA commented Jun 2, 2016

cloud-fan commented Jun 2, 2016

[SPARK-15620][SQL] Fix transformed dataset attributes revolve failure #13399

[SPARK-15620][SQL] Fix transformed dataset attributes revolve failure #13399

Conversation

jerryshao commented May 30, 2016

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented May 30, 2016

jerryshao commented Jun 1, 2016

cloud-fan Jun 1, 2016

Choose a reason for hiding this comment

jerryshao Jun 1, 2016

Choose a reason for hiding this comment

jerryshao commented Jun 1, 2016

SparkQA commented Jun 1, 2016

cloud-fan Jun 1, 2016

Choose a reason for hiding this comment

jerryshao Jun 2, 2016

Choose a reason for hiding this comment

cloud-fan commented Jun 1, 2016

SparkQA commented Jun 2, 2016

cloud-fan commented Jun 2, 2016