-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-25876][k8s] Simplify kubernetes configuration types. #22959
Conversation
There are a few issues with the current configuration types used in the kubernetes backend: - they use type parameters for role-specific specialization, which makes type signatures really noisy throughout the code base. - they break encapsulation by forcing the code that creates the config object to remove the configuration from SparkConf before creating the k8s-specific wrapper. - they don't provide an easy way for tests to have default values for fields they do not use. This change fixes those problems by: - creating a base config type with role-specific specialization using inheritance - encapsulating the logic of parsing SparkConf into k8s-specific views inside the k8s config classes - providing some helper code for tests to easily override just the part of the configs they want. Most of the change relates to the above, especially cleaning up the tests. While doing that, I also madke some smaller changes elsewhere: - removed unnecessary type parameters in KubernetesVolumeSpec - simplified the error detection logic in KubernetesVolumeUtils; all the call sites would just throw the first exception collected by that class, since they all called "get" on the "Try" object. Now the unnecessary wrapping is gone and the exception is just thrown where it occurs. - removed a lot of unnecessary mocking from tests. - changed the kerberos-related code so that less logic needs to live in the driver builder. In spirit it should be part of the upcoming work in this series of cleanups, but it made parts of this change simpler. Tested with existing unit tests and integration tests.
Kubernetes integration test starting |
Test build #98526 has finished for PR 22959 at commit
|
Kubernetes integration test status success |
First glance this looks like a lot of nice simplification, will take a proper look over this tomorrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks Rob. @mccheah @liyinan926 |
@@ -85,7 +83,7 @@ private[spark] class DriverCommandFeatureStep(conf: KubernetesConf[KubernetesDri | |||
val pythonEnvs = | |||
Seq(new EnvVarBuilder() | |||
.withName(ENV_PYSPARK_MAJOR_PYTHON_VERSION) | |||
.withValue(conf.sparkConf.get(PYSPARK_MAJOR_PYTHON_VERSION)) | |||
.withValue(driverConf.sparkConf.get(PYSPARK_MAJOR_PYTHON_VERSION)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you can simply do driverConf.get
.
@@ -124,7 +122,7 @@ private[spark] class DriverCommandFeatureStep(conf: KubernetesConf[KubernetesDri | |||
} | |||
|
|||
private def mergeFileList(key: String, filesToAdd: Seq[String]): Map[String, String] = { | |||
val existing = Utils.stringToSeq(conf.sparkConf.get(key, "")) | |||
val existing = Utils.stringToSeq(driverConf.sparkConf.get(key, "")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
extends KubernetesFeatureConfigStep with Logging { | ||
|
||
override def configurePod(pod: SparkPod): SparkPod = { | ||
val sparkConf = kubernetesConf.sparkConf | ||
val hadoopConfDirCMapName = sparkConf.getOption(HADOOP_CONFIG_MAP_NAME) | ||
val hadoopConfDirCMapName = conf.sparkConf.getOption(HADOOP_CONFIG_MAP_NAME) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you can do conf. getOption
.
|
||
override def configurePod(pod: SparkPod): SparkPod = { | ||
val sparkUserName = kubernetesConf.sparkConf.get(KERBEROS_SPARK_USER_NAME) | ||
val sparkUserName = conf.sparkConf.get(KERBEROS_SPARK_USER_NAME) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
private val conf = kubernetesConf.sparkConf | ||
|
||
private val hadoopConfDir = Option(kubernetesConf.sparkConf.getenv(ENV_HADOOP_CONF_DIR)) | ||
private val hadoopConfigMapName = kubernetesConf.sparkConf.get(KUBERNETES_HADOOP_CONF_CONFIG_MAP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
|
||
require(kubernetesConf.hadoopConfSpec.isDefined, | ||
"Ensure that HADOOP_CONF_DIR is defined either via env or a pre-defined ConfigMap") | ||
private val hadoopConfDirSpec = kubernetesConf.hadoopConfSpec.get | ||
private val conf = kubernetesConf.sparkConf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is conf
still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Test build #99297 has finished for PR 22959 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
@mccheah do you have any comment? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine, think we can merge soon! Only want some feedback on a small point as follows.
extends KubernetesConf(sparkConf) { | ||
|
||
override val resourceNamePrefix: String = { | ||
val custom = if (Utils.isTesting) get(KUBERNETES_DRIVER_POD_NAME_PREFIX) else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly inject this in the test so that we don't have to use Utils.isTesting
? Preference against using test flags to override behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to avoid creating custom test classes here (that's what I understand by "inject", since there's no way to "inject" this otherwise). There's really a single test that needs this functionality, IIRC, and this pattern is way more common in Spark than what you're suggesting.
Given that |
Ah I forgot to merge, sorry! Merging into master. |
There are a few issues with the current configuration types used in the kubernetes backend: - they use type parameters for role-specific specialization, which makes type signatures really noisy throughout the code base. - they break encapsulation by forcing the code that creates the config object to remove the configuration from SparkConf before creating the k8s-specific wrapper. - they don't provide an easy way for tests to have default values for fields they do not use. This change fixes those problems by: - creating a base config type with role-specific specialization using inheritance - encapsulating the logic of parsing SparkConf into k8s-specific views inside the k8s config classes - providing some helper code for tests to easily override just the part of the configs they want. Most of the change relates to the above, especially cleaning up the tests. While doing that, I also made some smaller changes elsewhere: - removed unnecessary type parameters in KubernetesVolumeSpec - simplified the error detection logic in KubernetesVolumeUtils; all the call sites would just throw the first exception collected by that class, since they all called "get" on the "Try" object. Now the unnecessary wrapping is gone and the exception is just thrown where it occurs. - removed a lot of unnecessary mocking from tests. - changed the kerberos-related code so that less logic needs to live in the driver builder. In spirit it should be part of the upcoming work in this series of cleanups, but it made parts of this change simpler. Tested with existing unit tests and integration tests. Author: Marcelo Vanzin <vanzin@cloudera.com> Closes apache#22959 from vanzin/SPARK-25876.
…riverFeatureStep` ### What changes were proposed in this pull request? This PR removes a variable `hadoopConf` from `KerberosConfDriverFeatureStep`. ### Why are the changes needed? #22959 added a variable `hadoopConf` to generate `tokenManager`. And, #22911 removed `tokenManager` and `buildKerberosSpec`, so `hadoopConf` is no-use. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the GA. Closes #36283 from dcoliversun/SPARK-38968. Authored-by: Qian.Sun <qian.sun2020@gmail.com> Signed-off-by: Sean Owen <srowen@gmail.com>
### What changes were proposed in this pull request? This PR aims to promote `KubernetesVolumeUtils` to `DeveloperApi` from Apache Spark 4.0.0 for Apache Spark Kubernetes Operator. ### Why are the changes needed? This API was added by the following at `Apache Spark 3.0.0` and has been stable. - #22959 Since `Apache Spark Kubernetes Operator` requires this, we had better maintain it as a developer API officially from `Apache Spark 4.0.0`. - apache/spark-kubernetes-operator#10 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46326 from dongjoon-hyun/SPARK-48076. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request? This PR aims to promote `KubernetesVolumeUtils` to `DeveloperApi` from Apache Spark 4.0.0 for Apache Spark Kubernetes Operator. ### Why are the changes needed? This API was added by the following at `Apache Spark 3.0.0` and has been stable. - apache#22959 Since `Apache Spark Kubernetes Operator` requires this, we had better maintain it as a developer API officially from `Apache Spark 4.0.0`. - apache/spark-kubernetes-operator#10 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#46326 from dongjoon-hyun/SPARK-48076. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
### What changes were proposed in this pull request? This PR aims to promote `KubernetesVolumeUtils` to `DeveloperApi` from Apache Spark 4.0.0 for Apache Spark Kubernetes Operator. ### Why are the changes needed? This API was added by the following at `Apache Spark 3.0.0` and has been stable. - apache#22959 Since `Apache Spark Kubernetes Operator` requires this, we had better maintain it as a developer API officially from `Apache Spark 4.0.0`. - apache/spark-kubernetes-operator#10 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#46326 from dongjoon-hyun/SPARK-48076. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
There are a few issues with the current configuration types used in
the kubernetes backend:
they use type parameters for role-specific specialization, which makes
type signatures really noisy throughout the code base.
they break encapsulation by forcing the code that creates the config
object to remove the configuration from SparkConf before creating the
k8s-specific wrapper.
they don't provide an easy way for tests to have default values for
fields they do not use.
This change fixes those problems by:
creating a base config type with role-specific specialization using
inheritance
encapsulating the logic of parsing SparkConf into k8s-specific views
inside the k8s config classes
providing some helper code for tests to easily override just the part
of the configs they want.
Most of the change relates to the above, especially cleaning up the
tests. While doing that, I also made some smaller changes elsewhere:
removed unnecessary type parameters in KubernetesVolumeSpec
simplified the error detection logic in KubernetesVolumeUtils; all
the call sites would just throw the first exception collected by
that class, since they all called "get" on the "Try" object. Now
the unnecessary wrapping is gone and the exception is just thrown
where it occurs.
removed a lot of unnecessary mocking from tests.
changed the kerberos-related code so that less logic needs to live
in the driver builder. In spirit it should be part of the upcoming
work in this series of cleanups, but it made parts of this change
simpler.
Tested with existing unit tests and integration tests.