New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-42764][K8S] Parameterize the max number of attempts for driver props fetcher in KubernetesExecutorBackend #40387
Conversation
… props fetcher in KubernetesExecutorBackend
@@ -82,7 +82,7 @@ private[spark] object KubernetesExecutorBackend extends Logging { | |||
clientMode = true) | |||
|
|||
var driver: RpcEndpointRef = null | |||
val nTries = 3 | |||
val nTries = sys.env.getOrElse("EXECUTOR_DRIVER_PROPS_FETCHER_MAX_ATTEMPTS", 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This env name came from the name of this RPCEnv.
Line 75 in 71a54f0
"driverPropsFetcher", |
Could you review this PR, @viirya ? |
@@ -82,7 +82,7 @@ private[spark] object KubernetesExecutorBackend extends Logging { | |||
clientMode = true) | |||
|
|||
var driver: RpcEndpointRef = null | |||
val nTries = 3 | |||
val nTries = sys.env.getOrElse("EXECUTOR_DRIVER_PROPS_FETCHER_MAX_ATTEMPTS", 3).toInt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to document this parameter? Or it is just one internal parameter for test purpose only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for review. This is an internal parameter like .internal
config. However, this code path is invoked to get SparkConf
. So, we cannot use SparkConf
value. We may document this env variable later in some debug section.
All tests passed. Merged to master. |
… props fetcher in KubernetesExecutorBackend ### What changes were proposed in this pull request? This PR aims to parameterize the max number of attempts for driver props fetcher in `KubernetesExecutorBackend` by introducing a new environment variable, `EXECUTOR_DRIVER_PROPS_FETCHER_MAX_ATTEMPTS`, for testing purpose. Note that this feature is proposed as a new environment variable because this happens before getting `SparkConf` from the driver. ### Why are the changes needed? In case of K8s network issues, the executor pods could fail with `UnknownHostException`. `EXECUTOR_DRIVER_PROPS_FETCHER_MAX_ATTEMPTS` could be helpful in that case. ``` Caused by: java.io.IOException: Failed to connect to pi-....svc/<unresolved>:7078 Caused by: java.net.UnknownHostException: pi-....svc ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. Closes apache#40387 from dongjoon-hyun/SPARK-42764. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
This PR aims to parameterize the max number of attempts for driver props fetcher in
KubernetesExecutorBackend
by introducing a new environment variable,EXECUTOR_DRIVER_PROPS_FETCHER_MAX_ATTEMPTS
, for testing purpose.Note that this feature is proposed as a new environment variable because this happens before getting
SparkConf
from the driver.Why are the changes needed?
In case of K8s network issues, the executor pods could fail with
UnknownHostException
.EXECUTOR_DRIVER_PROPS_FETCHER_MAX_ATTEMPTS
could be helpful in that case.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Pass the CIs.