Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-42764][K8S] Parameterize the max number of attempts for driver props fetcher in KubernetesExecutorBackend #40387

Closed
wants to merge 3 commits into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Mar 13, 2023

What changes were proposed in this pull request?

This PR aims to parameterize the max number of attempts for driver props fetcher in KubernetesExecutorBackend by introducing a new environment variable, EXECUTOR_DRIVER_PROPS_FETCHER_MAX_ATTEMPTS, for testing purpose.

Note that this feature is proposed as a new environment variable because this happens before getting SparkConf from the driver.

Why are the changes needed?

In case of K8s network issues, the executor pods could fail with UnknownHostException. EXECUTOR_DRIVER_PROPS_FETCHER_MAX_ATTEMPTS could be helpful in that case.

Caused by: java.io.IOException: Failed to connect to pi-....svc/<unresolved>:7078
Caused by: java.net.UnknownHostException: pi-....svc

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs.

@@ -82,7 +82,7 @@ private[spark] object KubernetesExecutorBackend extends Logging {
clientMode = true)

var driver: RpcEndpointRef = null
val nTries = 3
val nTries = sys.env.getOrElse("EXECUTOR_DRIVER_PROPS_FETCHER_MAX_ATTEMPTS", 3)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun
Copy link
Member Author

Could you review this PR, @viirya ?

@@ -82,7 +82,7 @@ private[spark] object KubernetesExecutorBackend extends Logging {
clientMode = true)

var driver: RpcEndpointRef = null
val nTries = 3
val nTries = sys.env.getOrElse("EXECUTOR_DRIVER_PROPS_FETCHER_MAX_ATTEMPTS", 3).toInt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to document this parameter? Or it is just one internal parameter for test purpose only?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for review. This is an internal parameter like .internal config. However, this code path is invoked to get SparkConf. So, we cannot use SparkConf value. We may document this env variable later in some debug section.

@dongjoon-hyun
Copy link
Member Author

All tests passed. Merged to master.

snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
… props fetcher in KubernetesExecutorBackend

### What changes were proposed in this pull request?

This PR aims to parameterize the max number of attempts for driver props fetcher in `KubernetesExecutorBackend` by introducing a new environment variable, `EXECUTOR_DRIVER_PROPS_FETCHER_MAX_ATTEMPTS`, for testing purpose.

Note that this feature is proposed as a new environment variable because this happens before getting `SparkConf` from the driver.

### Why are the changes needed?

In case of K8s network issues, the executor pods could fail with `UnknownHostException`. `EXECUTOR_DRIVER_PROPS_FETCHER_MAX_ATTEMPTS` could be helpful in that case.
```
Caused by: java.io.IOException: Failed to connect to pi-....svc/<unresolved>:7078
Caused by: java.net.UnknownHostException: pi-....svc
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes apache#40387 from dongjoon-hyun/SPARK-42764.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants