Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-15620][SQL] Fix transformed dataset attributes revolve failure #13399

Closed
wants to merge 4 commits into from

Conversation

jerryshao
Copy link
Contributor

What changes were proposed in this pull request?

Join on transformed dataset has attributes conflicts, which make query execution failure, for example:

val dataset = Seq(1, 2, 3).toDs
val mappedDs = dataset.map(_ + 1)

mappedDs.as("t1").joinWith(mappedDs.as("t2"), $"t1.value" === $"t2.value").show()

will throw exception:

org.apache.spark.sql.AnalysisException: cannot resolve '`t1.value`' given input columns: [value];
  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:62)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:59)
  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:287)
  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:287)

How was this patch tested?

Unit test.

@SparkQA
Copy link

SparkQA commented May 30, 2016

Test build #59614 has finished for PR 13399 at commit 00864db.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jerryshao
Copy link
Contributor Author

@cloud-fan , would you please help to review this patch, not sure it is the right fix. Thanks a lot.

@@ -783,6 +783,16 @@ class DatasetSuite extends QueryTest with SharedSQLContext {
ds.filter(_.b > 1).collect().toSeq
}
}

test("transformed dataset correctly solve the attributes") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you double check this test? I tried this on latest master branch and it passed...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @cloud-fan for your review. I think the issue still exists, problem is that unit test actually doesn't reflect this issue, let me change the unit test.

You could try the below to see the issue.

val dataset = Seq(1, 2, 3).toDS
val mappedDataset = dataset.map(_ + 1)
mappedDataset.as("t1").joinWith(mappedDataset.as("t2"), $"t1.value" === $"t2.value").show()

@jerryshao
Copy link
Contributor Author

Unit test updated, please review again, thanks a lot :)

@SparkQA
Copy link

SparkQA commented Jun 1, 2016

Test build #59727 has finished for PR 13399 at commit 2e436c1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


test("transformed dataset correctly solve the attributes") {
val dataset = Seq(1, 2, 3).toDS()
val ds1 = dataset.map(_ + 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd like it to be:

val ds = Seq(1, 2, 3).toDS().map(_ + 1)
val d1 = ds.as("d1")
val d2 = ds.as("d2")
...

and the test name can be: mapped dataset should resolve duplicated attributes for self join and set operations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me update it.

@cloud-fan
Copy link
Contributor

LGTM except a minor style comment, thanks for working on it!

@SparkQA
Copy link

SparkQA commented Jun 2, 2016

Test build #59802 has finished for PR 13399 at commit dcca095.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Jun 2, 2016
## What changes were proposed in this pull request?

Join on transformed dataset has attributes conflicts, which make query execution failure, for example:

```
val dataset = Seq(1, 2, 3).toDs
val mappedDs = dataset.map(_ + 1)

mappedDs.as("t1").joinWith(mappedDs.as("t2"), $"t1.value" === $"t2.value").show()
```

will throw exception:

```
org.apache.spark.sql.AnalysisException: cannot resolve '`t1.value`' given input columns: [value];
  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:62)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:59)
  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:287)
  at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:287)
```

## How was this patch tested?

Unit test.

Author: jerryshao <sshao@hortonworks.com>

Closes #13399 from jerryshao/SPARK-15620.

(cherry picked from commit 8288e16)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan
Copy link
Contributor

thanks, merging to master and 2.0!

@asfgit asfgit closed this in 8288e16 Jun 2, 2016
dongjoon-hyun pushed a commit that referenced this pull request Jun 28, 2023
### What changes were proposed in this pull request?
This pr aims to upgrade netty from 4.1.92 to 4.1.93.

### Why are the changes needed?
1.v4.1.92 VS v4.1.93
netty/netty@netty-4.1.92.Final...netty-4.1.93.Final

2.The new version brings some bug fix, eg:
- Reset byte buffer in loop for AbstractDiskHttpData.setContent ([#13320](netty/netty#13320))
- OpenSSL MAX_CERTIFICATE_LIST_BYTES option supported ([#13365](netty/netty#13365))
- Adapt to DirectByteBuffer constructor in Java 21 ([#13366](netty/netty#13366))
- HTTP/2 encoder: allow HEADER_TABLE_SIZE greater than Integer.MAX_VALUE ([#13368](netty/netty#13368))
- Upgrade to latest netty-tcnative to fix memory leak ([#13375](netty/netty#13375))
- H2/H2C server stream channels deactivated while write still in progress ([#13388](netty/netty#13388))
- Channel#bytesBefore(un)writable off by 1 ([#13389](netty/netty#13389))
- HTTP/2 should forward shutdown user events to active streams ([#13394](netty/netty#13394))
- Respect the number of bytes read per datagram when using recvmmsg ([#13399](netty/netty#13399))

3.The release notes as follows:
- https://netty.io/news/2023/05/25/4-1-93-Final.html

4.Why not upgrade to `4-1-94-Final` version?
Because the return value of the 'threadCache()' (from `PoolThreadCache` to `PoolArenasCache`) method of the netty Inner class used in the 'arrow memory netty' version '12.0.1' has changed and belongs to break change, let's wait for the upgrade of the 'arrow memory netty' before upgrading to the '4-1-94-Final' version.

The reference is as follows:
https://github.com/apache/arrow/blob/6af660f48472b8b45a5e01b7136b9b040b185eb1/java/memory/memory-netty/src/main/java/io/netty/buffer/PooledByteBufAllocatorL.java#L164
https://github.com/netty/netty/blob/da1a448d5bc4f36cc1744db93fcaf64e198db2bd/buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java#L732-L736

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GA.

Closes #41681 from panbingkun/upgrade_netty.

Authored-by: panbingkun <pbk1982@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants