Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-42667][CONNECT][FOLLOW-UP] SparkSession created by newSession should not share the channel #40346

Closed

Conversation

amaliujia
Copy link
Contributor

What changes were proposed in this pull request?

SparkSession created by newSession should not share the channel. This is because that a SparkSession might be called stop in which the channel it uses will be shutdown. If the channel is shared, other non-stop SparkSession that is sharing this channel will get into trouble.

Why are the changes needed?

This fixes the issue when one SparkSession is stopped to cause other active SparkSession not working in Spark Connect.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing UT

@amaliujia
Copy link
Contributor Author

@hvanhovell

@@ -164,7 +165,7 @@ private[sql] class SparkConnectClient(
}

def copy(): SparkConnectClient = {
new SparkConnectClient(userContext, channel, userAgent)
SparkConnectClient.builder().connectionString(connectString).build()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only works if you only use a connection string right? So what about SSL, or when the host and port are set manually? Can't we just pass the ChannelBuilder around?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you see the current SparkSession builder, I think it only has two versions to build the SparkSession:

  1. with ConnectionString.
  2. nothing but default build.

that is why I only keep ConnectString. But If you think keep the builder to preserve everything is better I can update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't buy that argument. The problem is that the client builder supports all these things (for good reason), now we are making the assumption that this thing will only be used to create a SparkSession, while there are also other use cases.

Please keep the builder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to use the builder thus we can copy entire channel setup to new client.

@amaliujia
Copy link
Contributor Author

@hvanhovell do you know how to fix this

Error: ] /home/runner/work/spark/spark/connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/connect/client/SparkConnectClient.scala:395: inferred existential type io.grpc.ManagedChannelBuilder[?0]( forSome { type ?0 <: io.grpc.ManagedChannelBuilder[?0] }), which cannot be expressed by wildcards,  should be enabled
by making the implicit value scala.language.existentials visible.

@amaliujia
Copy link
Contributor Author

In fact, the complete error message says how to fix it....

@amaliujia
Copy link
Contributor Author

@hvanhovell can you take another look?

@hvanhovell
Copy link
Contributor

Merging.

hvanhovell pushed a commit that referenced this pull request Mar 10, 2023
…should not share the channel

### What changes were proposed in this pull request?

SparkSession created by newSession should not share the channel. This is because that a SparkSession might be called `stop` in which the channel it uses will be shutdown. If the channel is shared, other non-stop SparkSession that is sharing this channel will get into trouble.

### Why are the changes needed?

This fixes the issue when one SparkSession is stopped to cause other active SparkSession not working in Spark Connect.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing UT

Closes #40346 from amaliujia/rw-session-do-not-share-channel.

Authored-by: Rui Wang <rui.wang@databricks.com>
Signed-off-by: Herman van Hovell <herman@databricks.com>
(cherry picked from commit e5f56e5)
Signed-off-by: Herman van Hovell <herman@databricks.com>
snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
…should not share the channel

### What changes were proposed in this pull request?

SparkSession created by newSession should not share the channel. This is because that a SparkSession might be called `stop` in which the channel it uses will be shutdown. If the channel is shared, other non-stop SparkSession that is sharing this channel will get into trouble.

### Why are the changes needed?

This fixes the issue when one SparkSession is stopped to cause other active SparkSession not working in Spark Connect.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing UT

Closes apache#40346 from amaliujia/rw-session-do-not-share-channel.

Authored-by: Rui Wang <rui.wang@databricks.com>
Signed-off-by: Herman van Hovell <herman@databricks.com>
(cherry picked from commit e5f56e5)
Signed-off-by: Herman van Hovell <herman@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants