Skip to content

[SPARK-40536][CONNECT] Make Spark Connect port configurable#38006

Closed
amaliujia wants to merge 5 commits intoapache:masterfrom
amaliujia:SPARK-40536
Closed

[SPARK-40536][CONNECT] Make Spark Connect port configurable#38006
amaliujia wants to merge 5 commits intoapache:masterfrom
amaliujia:SPARK-40536

Conversation

@amaliujia
Copy link
Contributor

@amaliujia amaliujia commented Sep 26, 2022

What changes were proposed in this pull request?

Add Connect config and one connect gRPC config keys: spark.connect.grpc.binding.port as Int type.

Why are the changes needed?

Currently Spark Connect gRPC port is hardcoded and we can make it configurable.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing UT

@amaliujia
Copy link
Contributor Author

@amaliujia amaliujia changed the title [SPARK-40536] Make Spark Connect port configurable [SPARK-40536][CONNECT] Make Spark Connect port configurable Sep 26, 2022

package org.apache.spark.internal.config

private[spark] object Connect {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Spark connect is build as plugin, all of the configuration should ideally be not located in core. Is there a way we can move this to the connect module?

Copy link
Contributor Author

@amaliujia amaliujia Sep 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I am not sure if there is a way to have config only for a plugin. let's see other reviewer's suggestions on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it should better be placed within connect if possible. How do we use this plugin? I suspect the jar should be provided anyway (?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Spark Connect module depend on Spark Core? If yes, we can move this object to Spark Connect module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it depends on core. Let me move this config and also address other comments.

Copy link
Contributor Author

@amaliujia amaliujia Sep 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it should better be placed within connect if possible. How do we use this plugin? I suspect the jar should be provided anyway (?)

I believe there is a way to decide if the plugin is loaded and there is a default way of if this is loaded or not. Let me check that. I am not sure if the default behavior is loading the jar or not loading the jar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to connect module.

private[spark] val CONNECT_GRPC_DEBUG_MODE =
ConfigBuilder("spark.connect.grpc.debug.enabled")
.version("3.4.0")
.booleanConf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add docs?

Copy link
Contributor Author

@amaliujia amaliujia Sep 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deleted this config to make this PR only focus on one thing, which is the configurable port.

This config and what it tries to enable worth a different effort to fully finalize.

private[spark] object Connect {

private[spark] val CONNECT_GRPC_DEBUG_MODE =
ConfigBuilder("spark.connect.grpc.debug.enabled")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not used now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  def startGRPCService(): Unit = {
    val debugMode = SparkEnv.get.conf.getBoolean("spark.connect.grpc.debug.enabled", true)

This is being called in the grpc service. I was thinking given that this is already define so probably just add this config as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used already but I guess this flag and what is tries to enable worth extra work and PR.

I deleted this config to make this PR only focus on one thing.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

def startGRPCService(): Unit = {
val debugMode = SparkEnv.get.conf.getBoolean("spark.connect.grpc.debug.enabled", true)
val port = 15002
val port = SparkEnv.get.conf.getInt("spark.connect.grpc.binding.port", 15002)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we call SparkEnv.get.conf.get(CONNECT_GRPC_BINDING_PORT)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! Done!

def startGRPCService(): Unit = {
val debugMode = SparkEnv.get.conf.getBoolean("spark.connect.grpc.debug.enabled", true)
val port = 15002
val port = SparkEnv.get.conf.getInt(CONNECT_GRPC_BINDING_PORT.key, 15002)
Copy link
Contributor

@cloud-fan cloud-fan Sep 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means we are duplicating the default value 15002 in 2 places. SparkConf.get is a better API to use

  def get[T](entry: ConfigEntry[T]): T

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is what I confused on the getInt API which always ask for a default value, and the get, instead, return a string.

The difference of this get is based on the Entry parameter, which was ignored when I searched the API.

Done.

@HyukjinKwon
Copy link
Member

Merged to master.

@amaliujia amaliujia deleted the SPARK-40536 branch September 29, 2022 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants