Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-25876][k8s] Simplify kubernetes configuration types. #22959

Closed
wants to merge 5 commits into from

Conversation

vanzin
Copy link
Contributor

@vanzin vanzin commented Nov 6, 2018

There are a few issues with the current configuration types used in
the kubernetes backend:

  • they use type parameters for role-specific specialization, which makes
    type signatures really noisy throughout the code base.

  • they break encapsulation by forcing the code that creates the config
    object to remove the configuration from SparkConf before creating the
    k8s-specific wrapper.

  • they don't provide an easy way for tests to have default values for
    fields they do not use.

This change fixes those problems by:

  • creating a base config type with role-specific specialization using
    inheritance

  • encapsulating the logic of parsing SparkConf into k8s-specific views
    inside the k8s config classes

  • providing some helper code for tests to easily override just the part
    of the configs they want.

Most of the change relates to the above, especially cleaning up the
tests. While doing that, I also made some smaller changes elsewhere:

  • removed unnecessary type parameters in KubernetesVolumeSpec

  • simplified the error detection logic in KubernetesVolumeUtils; all
    the call sites would just throw the first exception collected by
    that class, since they all called "get" on the "Try" object. Now
    the unnecessary wrapping is gone and the exception is just thrown
    where it occurs.

  • removed a lot of unnecessary mocking from tests.

  • changed the kerberos-related code so that less logic needs to live
    in the driver builder. In spirit it should be part of the upcoming
    work in this series of cleanups, but it made parts of this change
    simpler.

Tested with existing unit tests and integration tests.

There are a few issues with the current configuration types used in
the kubernetes backend:

- they use type parameters for role-specific specialization, which makes
  type signatures really noisy throughout the code base.

- they break encapsulation by forcing the code that creates the config
  object to remove the configuration from SparkConf before creating the
  k8s-specific wrapper.

- they don't provide an easy way for tests to have default values for
  fields they do not use.

This change fixes those problems by:

- creating a base config type with role-specific specialization using
  inheritance

- encapsulating the logic of parsing SparkConf into k8s-specific views
  inside the k8s config classes

- providing some helper code for tests to easily override just the part
  of the configs they want.

Most of the change relates to the above, especially cleaning up the
tests. While doing that, I also madke some smaller changes elsewhere:

- removed unnecessary type parameters in KubernetesVolumeSpec

- simplified the error detection logic in KubernetesVolumeUtils; all
  the call sites would just throw the first exception collected by
  that class, since they all called "get" on the "Try" object. Now
  the unnecessary wrapping is gone and the exception is just thrown
  where it occurs.

- removed a lot of unnecessary mocking from tests.

- changed the kerberos-related code so that less logic needs to live
  in the driver builder. In spirit it should be part of the upcoming
  work in this series of cleanups, but it made parts of this change
  simpler.

Tested with existing unit tests and integration tests.
@SparkQA
Copy link

SparkQA commented Nov 6, 2018

@SparkQA
Copy link

SparkQA commented Nov 6, 2018

Test build #98526 has finished for PR 22959 at commit ea0f8bc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 6, 2018

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4799/

@rvesse
Copy link
Member

rvesse commented Nov 7, 2018

First glance this looks like a lot of nice simplification, will take a proper look over this tomorrow

Copy link
Member

@rvesse rvesse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vanzin
Copy link
Contributor Author

vanzin commented Nov 8, 2018

Thanks Rob. @mccheah @liyinan926

@@ -85,7 +83,7 @@ private[spark] class DriverCommandFeatureStep(conf: KubernetesConf[KubernetesDri
val pythonEnvs =
Seq(new EnvVarBuilder()
.withName(ENV_PYSPARK_MAJOR_PYTHON_VERSION)
.withValue(conf.sparkConf.get(PYSPARK_MAJOR_PYTHON_VERSION))
.withValue(driverConf.sparkConf.get(PYSPARK_MAJOR_PYTHON_VERSION))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you can simply do driverConf.get.

@@ -124,7 +122,7 @@ private[spark] class DriverCommandFeatureStep(conf: KubernetesConf[KubernetesDri
}

private def mergeFileList(key: String, filesToAdd: Seq[String]): Map[String, String] = {
val existing = Utils.stringToSeq(conf.sparkConf.get(key, ""))
val existing = Utils.stringToSeq(driverConf.sparkConf.get(key, ""))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

extends KubernetesFeatureConfigStep with Logging {

override def configurePod(pod: SparkPod): SparkPod = {
val sparkConf = kubernetesConf.sparkConf
val hadoopConfDirCMapName = sparkConf.getOption(HADOOP_CONFIG_MAP_NAME)
val hadoopConfDirCMapName = conf.sparkConf.getOption(HADOOP_CONFIG_MAP_NAME)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you can do conf. getOption.


override def configurePod(pod: SparkPod): SparkPod = {
val sparkUserName = kubernetesConf.sparkConf.get(KERBEROS_SPARK_USER_NAME)
val sparkUserName = conf.sparkConf.get(KERBEROS_SPARK_USER_NAME)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

private val conf = kubernetesConf.sparkConf

private val hadoopConfDir = Option(kubernetesConf.sparkConf.getenv(ENV_HADOOP_CONF_DIR))
private val hadoopConfigMapName = kubernetesConf.sparkConf.get(KUBERNETES_HADOOP_CONF_CONFIG_MAP)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.


require(kubernetesConf.hadoopConfSpec.isDefined,
"Ensure that HADOOP_CONF_DIR is defined either via env or a pre-defined ConfigMap")
private val hadoopConfDirSpec = kubernetesConf.hadoopConfSpec.get
private val conf = kubernetesConf.sparkConf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is conf still needed?

Copy link
Contributor

@liyinan926 liyinan926 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SparkQA
Copy link

SparkQA commented Nov 26, 2018

Test build #99297 has finished for PR 22959 at commit 516ae68.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 26, 2018

@SparkQA
Copy link

SparkQA commented Nov 26, 2018

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/5380/

@liyinan926
Copy link
Contributor

@mccheah do you have any comment?

Copy link
Contributor

@mccheah mccheah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine, think we can merge soon! Only want some feedback on a small point as follows.

extends KubernetesConf(sparkConf) {

override val resourceNamePrefix: String = {
val custom = if (Utils.isTesting) get(KUBERNETES_DRIVER_POD_NAME_PREFIX) else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly inject this in the test so that we don't have to use Utils.isTesting? Preference against using test flags to override behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to avoid creating custom test classes here (that's what I understand by "inject", since there's no way to "inject" this otherwise). There's really a single test that needs this functionality, IIRC, and this pattern is way more common in Spark than what you're suggesting.

@mccheah
Copy link
Contributor

mccheah commented Nov 29, 2018

Given that Utils.isTesting is used elsewhere, I'm fine with merging this. Going to in a few hours if there are no further comments.

@mccheah
Copy link
Contributor

mccheah commented Dec 1, 2018

Ah I forgot to merge, sorry! Merging into master.

@asfgit asfgit closed this in 6be272b Dec 1, 2018
@vanzin vanzin deleted the SPARK-25876 branch December 3, 2018 21:55
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
There are a few issues with the current configuration types used in
the kubernetes backend:

- they use type parameters for role-specific specialization, which makes
  type signatures really noisy throughout the code base.

- they break encapsulation by forcing the code that creates the config
  object to remove the configuration from SparkConf before creating the
  k8s-specific wrapper.

- they don't provide an easy way for tests to have default values for
  fields they do not use.

This change fixes those problems by:

- creating a base config type with role-specific specialization using
  inheritance

- encapsulating the logic of parsing SparkConf into k8s-specific views
  inside the k8s config classes

- providing some helper code for tests to easily override just the part
  of the configs they want.

Most of the change relates to the above, especially cleaning up the
tests. While doing that, I also made some smaller changes elsewhere:

- removed unnecessary type parameters in KubernetesVolumeSpec

- simplified the error detection logic in KubernetesVolumeUtils; all
  the call sites would just throw the first exception collected by
  that class, since they all called "get" on the "Try" object. Now
  the unnecessary wrapping is gone and the exception is just thrown
  where it occurs.

- removed a lot of unnecessary mocking from tests.

- changed the kerberos-related code so that less logic needs to live
  in the driver builder. In spirit it should be part of the upcoming
  work in this series of cleanups, but it made parts of this change
  simpler.

Tested with existing unit tests and integration tests.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes apache#22959 from vanzin/SPARK-25876.
srowen pushed a commit that referenced this pull request Apr 22, 2022
…riverFeatureStep`

### What changes were proposed in this pull request?

This PR removes a variable `hadoopConf` from `KerberosConfDriverFeatureStep`.

### Why are the changes needed?

#22959 added a variable `hadoopConf` to generate `tokenManager`. And, #22911 removed `tokenManager` and `buildKerberosSpec`, so `hadoopConf` is no-use.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the GA.

Closes #36283 from dcoliversun/SPARK-38968.

Authored-by: Qian.Sun <qian.sun2020@gmail.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
dongjoon-hyun added a commit that referenced this pull request May 2, 2024
### What changes were proposed in this pull request?

This PR aims to promote `KubernetesVolumeUtils` to `DeveloperApi` from Apache Spark 4.0.0 for Apache Spark Kubernetes Operator.

### Why are the changes needed?

This API was added by the following at `Apache Spark 3.0.0` and has been stable.
- #22959

Since `Apache Spark Kubernetes Operator` requires this, we had better maintain it as a developer API officially from `Apache Spark 4.0.0`.
- apache/spark-kubernetes-operator#10

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46326 from dongjoon-hyun/SPARK-48076.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
JacobZheng0927 pushed a commit to JacobZheng0927/spark that referenced this pull request May 11, 2024
### What changes were proposed in this pull request?

This PR aims to promote `KubernetesVolumeUtils` to `DeveloperApi` from Apache Spark 4.0.0 for Apache Spark Kubernetes Operator.

### Why are the changes needed?

This API was added by the following at `Apache Spark 3.0.0` and has been stable.
- apache#22959

Since `Apache Spark Kubernetes Operator` requires this, we had better maintain it as a developer API officially from `Apache Spark 4.0.0`.
- apache/spark-kubernetes-operator#10

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#46326 from dongjoon-hyun/SPARK-48076.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
szehon-ho pushed a commit to szehon-ho/spark that referenced this pull request Aug 7, 2024
### What changes were proposed in this pull request?

This PR aims to promote `KubernetesVolumeUtils` to `DeveloperApi` from Apache Spark 4.0.0 for Apache Spark Kubernetes Operator.

### Why are the changes needed?

This API was added by the following at `Apache Spark 3.0.0` and has been stable.
- apache#22959

Since `Apache Spark Kubernetes Operator` requires this, we had better maintain it as a developer API officially from `Apache Spark 4.0.0`.
- apache/spark-kubernetes-operator#10

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#46326 from dongjoon-hyun/SPARK-48076.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants