Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-25295][K8S]Fix executor names collision #22405

Closed

Conversation

skonto
Copy link
Contributor

@skonto skonto commented Sep 12, 2018

What changes were proposed in this pull request?

Fixes the collision issue with spark executor names in client mode, see SPARK-25295 for the details.
It follows the cluster name convention as app-name will be used as the prefix and if that is not defined we use "spark" as the default prefix. Eg. spark-pi-1536781360723-exec-1 where spark-pi is the name of the app passed at the config side or transformed if it contains illegal characters.

Also fixes the issue with spark app name having spaces in cluster mode.
If you run the Spark Pi test in client mode it passes.
The tricky part is the user may set the app name:


If i do:

./bin/spark-submit 
...
 --deploy-mode cluster --name "spark pi"
...

it will fail as the app name is used for the prefix of driver's pod name and it cannot have spaces (according to k8s conventions).

How was this patch tested?

Manually by running spark job in client mode.
To reproduce do:

kubectl create -f service.yaml
kubectl create -f pod.yaml

service.yaml :

kind: Service
apiVersion: v1
metadata:
  name: spark-test-app-1-svc
spec:
  clusterIP: None
  selector:
    spark-app-selector: spark-test-app-1
  ports:
  - protocol: TCP
    name: driver-port
    port: 7077
    targetPort: 7077
  - protocol: TCP
    name: block-manager
    port: 10000
    targetPort: 10000

pod.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: spark-test-app-1
  labels:
    spark-app-selector: spark-test-app-1
spec:
  containers:
  - name: spark-test
    image: skonto/spark:k8s-client-fix
    imagePullPolicy: Always
    command:
      - 'sh'
      - '-c'
      -  "/opt/spark/bin/spark-submit
              --verbose
              --master k8s://https://kubernetes.default.svc
              --deploy-mode client
              --class org.apache.spark.examples.SparkPi
              --conf spark.app.name=spark
              --conf spark.executor.instances=1
              --conf spark.kubernetes.container.image=skonto/spark:k8s-client-fix
              --conf spark.kubernetes.container.image.pullPolicy=Always
              --conf spark.kubernetes.authenticate.oauthTokenFile=/var/run/secrets/kubernetes.io/serviceaccount/token
              --conf spark.kubernetes.authenticate.caCertFile=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              --conf spark.executor.memory=500m
              --conf spark.executor.cores=1
              --conf spark.executor.instances=1
              --conf spark.driver.host=spark-test-app-1-svc.default.svc
              --conf spark.driver.port=7077
              --conf spark.driver.blockManager.port=10000
              local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0-SNAPSHOT.jar 1000000"

@skonto
Copy link
Contributor Author

skonto commented Sep 12, 2018

@liyinan926 @mccheah pls review.

@SparkQA
Copy link

SparkQA commented Sep 12, 2018

Test build #96003 has finished for PR 22405 at commit d6b4a9f.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

.toLowerCase
.replaceAll(" +", " ")
.replaceAll("\\s", "-")
.replaceAll("[^A-Za-z0-9\\-]", "")
Copy link
Contributor Author

@skonto skonto Sep 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a bit strict but if people want weird names then they should know k8s does not accept it and we use the appname for their convenience when they list pods. If they want a weird appname then they should set the pod name explicitly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K8s resource names by convention are all lower cases, and can have - and .. So I would suggest adding . here and add another call to toLowerCase() after. See https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct will update.

@SparkQA
Copy link

SparkQA commented Sep 12, 2018

@skonto skonto force-pushed the fix-k8s-client-mode-executor-names branch from d6b4a9f to 245d1ec Compare September 12, 2018 20:19
@SparkQA
Copy link

SparkQA commented Sep 12, 2018

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/3057/

s"$appName-$launchTime"
.toLowerCase
.replaceAll(" +", " ")
.replaceAll("\\s", "-")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just do replaceAll("\\s+", "-") instead of the two above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah sure

.toLowerCase
.replaceAll(" +", " ")
.replaceAll("\\s", "-")
.replaceAll("[^A-Za-z0-9\\-]", "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K8s resource names by convention are all lower cases, and can have - and .. So I would suggest adding . here and add another call to toLowerCase() after. See https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names.

@SparkQA
Copy link

SparkQA commented Sep 12, 2018

@SparkQA
Copy link

SparkQA commented Sep 12, 2018

Test build #96004 has finished for PR 22405 at commit 245d1ec.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 12, 2018

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/3058/

@skonto
Copy link
Contributor Author

skonto commented Sep 12, 2018

@liyinan926 the prefix before this PR is defined as: val kubernetesResourceNamePrefix = { s"$appName-$launchTime".toLowerCase.replaceAll("\\.", "-") }
So it looks better to replace . with - will do that.

@skonto skonto force-pushed the fix-k8s-client-mode-executor-names branch from 245d1ec to 62729b6 Compare September 12, 2018 22:49
@skonto
Copy link
Contributor Author

skonto commented Sep 12, 2018

scalastyle &tests fixed...

@skonto
Copy link
Contributor Author

skonto commented Sep 12, 2018

jenkins, test this please.

@skonto skonto force-pushed the fix-k8s-client-mode-executor-names branch from 62729b6 to 689f09b Compare September 12, 2018 22:54
// string for the prefix, based on the app name, and this comparison here will fail.
val k8sConfCopy = k8sConf
.copy(appResourceNamePrefix = "")
.copy(sparkConf = conf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indention here seems off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

.replaceAll("\\.", "-")
.replaceAll("[^a-z0-9\\-]", "")
.replaceAll("-+", "-")
}
Copy link
Contributor Author

@skonto skonto Sep 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reduce dashes. for example:

scala> " Spark   #$ d #.Pi"
.trim
.toLowerCase
.replaceAll("\\s+", "-")
.replaceAll("\\.", "-")
.replaceAll("[^a-z0-9\\-]", "")
.replaceAll("-+", "-")
res15: String = spark-d-pi

@skonto skonto force-pushed the fix-k8s-client-mode-executor-names branch from 689f09b to 6975d69 Compare September 12, 2018 22:57
@SparkQA
Copy link

SparkQA commented Sep 12, 2018

@skonto skonto force-pushed the fix-k8s-client-mode-executor-names branch from 6975d69 to e28c2da Compare September 12, 2018 23:05
@SparkQA
Copy link

SparkQA commented Sep 12, 2018

Test build #96010 has finished for PR 22405 at commit 62729b6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 12, 2018

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/3063/

@liyinan926
Copy link
Contributor

Will merge later today if there's no more comments.

@SparkQA
Copy link

SparkQA commented Sep 12, 2018

Test build #96011 has finished for PR 22405 at commit 689f09b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 12, 2018

@SparkQA
Copy link

SparkQA commented Sep 12, 2018

Test build #96012 has finished for PR 22405 at commit 6975d69.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 12, 2018

Test build #96013 has finished for PR 22405 at commit e28c2da.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@skonto
Copy link
Contributor Author

skonto commented Sep 12, 2018

Thanks @liyinan926!

@SparkQA
Copy link

SparkQA commented Sep 12, 2018

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/3064/

@SparkQA
Copy link

SparkQA commented Sep 12, 2018

@SparkQA
Copy link

SparkQA commented Sep 12, 2018

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/3065/

@SparkQA
Copy link

SparkQA commented Sep 13, 2018

@SparkQA
Copy link

SparkQA commented Sep 13, 2018

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/3066/

asfgit pushed a commit that referenced this pull request Sep 13, 2018
## What changes were proposed in this pull request?
Fixes the collision issue with spark executor names in client mode, see SPARK-25295 for the details.
It follows the cluster name convention as app-name will be used as the prefix and if that is not defined we use "spark" as the default prefix. Eg. `spark-pi-1536781360723-exec-1` where spark-pi is the name of the app passed at the config side or transformed if it contains illegal characters.

Also fixes the issue with spark app name having spaces in cluster mode.
If you run the Spark Pi test in client mode it passes.
The tricky part is the user may set the app name:
https://github.com/apache/spark/blob/3030b82c89d3e45a2e361c469fbc667a1e43b854/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala#L30
If i do:

```
./bin/spark-submit
...
 --deploy-mode cluster --name "spark pi"
...
```
it will fail as the app name is used for the prefix of driver's pod name and it cannot have spaces (according to k8s conventions).

## How was this patch tested?

Manually by running spark job in client mode.
To reproduce do:
```
kubectl create -f service.yaml
kubectl create -f pod.yaml
```
 service.yaml :
```
kind: Service
apiVersion: v1
metadata:
  name: spark-test-app-1-svc
spec:
  clusterIP: None
  selector:
    spark-app-selector: spark-test-app-1
  ports:
  - protocol: TCP
    name: driver-port
    port: 7077
    targetPort: 7077
  - protocol: TCP
    name: block-manager
    port: 10000
    targetPort: 10000
```
pod.yaml:

```
apiVersion: v1
kind: Pod
metadata:
  name: spark-test-app-1
  labels:
    spark-app-selector: spark-test-app-1
spec:
  containers:
  - name: spark-test
    image: skonto/spark:k8s-client-fix
    imagePullPolicy: Always
    command:
      - 'sh'
      - '-c'
      -  "/opt/spark/bin/spark-submit
              --verbose
              --master k8s://https://kubernetes.default.svc
              --deploy-mode client
              --class org.apache.spark.examples.SparkPi
              --conf spark.app.name=spark
              --conf spark.executor.instances=1
              --conf spark.kubernetes.container.image=skonto/spark:k8s-client-fix
              --conf spark.kubernetes.container.image.pullPolicy=Always
              --conf spark.kubernetes.authenticate.oauthTokenFile=/var/run/secrets/kubernetes.io/serviceaccount/token
              --conf spark.kubernetes.authenticate.caCertFile=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              --conf spark.executor.memory=500m
              --conf spark.executor.cores=1
              --conf spark.executor.instances=1
              --conf spark.driver.host=spark-test-app-1-svc.default.svc
              --conf spark.driver.port=7077
              --conf spark.driver.blockManager.port=10000
              local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0-SNAPSHOT.jar 1000000"
```

Closes #22405 from skonto/fix-k8s-client-mode-executor-names.

Authored-by: Stavros Kontopoulos <stavros.kontopoulos@lightbend.com>
Signed-off-by: Yinan Li <ynli@google.com>
(cherry picked from commit 3e75a9f)
Signed-off-by: Yinan Li <ynli@google.com>
@asfgit asfgit closed this in 3e75a9f Sep 13, 2018
fjh100456 pushed a commit to fjh100456/spark that referenced this pull request Sep 13, 2018
## What changes were proposed in this pull request?
Fixes the collision issue with spark executor names in client mode, see SPARK-25295 for the details.
It follows the cluster name convention as app-name will be used as the prefix and if that is not defined we use "spark" as the default prefix. Eg. `spark-pi-1536781360723-exec-1` where spark-pi is the name of the app passed at the config side or transformed if it contains illegal characters.

Also fixes the issue with spark app name having spaces in cluster mode.
If you run the Spark Pi test in client mode it passes.
The tricky part is the user may set the app name:
https://github.com/apache/spark/blob/3030b82c89d3e45a2e361c469fbc667a1e43b854/examples/src/main/scala/org/apache/spark/examples/SparkPi.scala#L30
If i do:

```
./bin/spark-submit
...
 --deploy-mode cluster --name "spark pi"
...
```
it will fail as the app name is used for the prefix of driver's pod name and it cannot have spaces (according to k8s conventions).

## How was this patch tested?

Manually by running spark job in client mode.
To reproduce do:
```
kubectl create -f service.yaml
kubectl create -f pod.yaml
```
 service.yaml :
```
kind: Service
apiVersion: v1
metadata:
  name: spark-test-app-1-svc
spec:
  clusterIP: None
  selector:
    spark-app-selector: spark-test-app-1
  ports:
  - protocol: TCP
    name: driver-port
    port: 7077
    targetPort: 7077
  - protocol: TCP
    name: block-manager
    port: 10000
    targetPort: 10000
```
pod.yaml:

```
apiVersion: v1
kind: Pod
metadata:
  name: spark-test-app-1
  labels:
    spark-app-selector: spark-test-app-1
spec:
  containers:
  - name: spark-test
    image: skonto/spark:k8s-client-fix
    imagePullPolicy: Always
    command:
      - 'sh'
      - '-c'
      -  "/opt/spark/bin/spark-submit
              --verbose
              --master k8s://https://kubernetes.default.svc
              --deploy-mode client
              --class org.apache.spark.examples.SparkPi
              --conf spark.app.name=spark
              --conf spark.executor.instances=1
              --conf spark.kubernetes.container.image=skonto/spark:k8s-client-fix
              --conf spark.kubernetes.container.image.pullPolicy=Always
              --conf spark.kubernetes.authenticate.oauthTokenFile=/var/run/secrets/kubernetes.io/serviceaccount/token
              --conf spark.kubernetes.authenticate.caCertFile=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
              --conf spark.executor.memory=500m
              --conf spark.executor.cores=1
              --conf spark.executor.instances=1
              --conf spark.driver.host=spark-test-app-1-svc.default.svc
              --conf spark.driver.port=7077
              --conf spark.driver.blockManager.port=10000
              local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0-SNAPSHOT.jar 1000000"
```

Closes apache#22405 from skonto/fix-k8s-client-mode-executor-names.

Authored-by: Stavros Kontopoulos <stavros.kontopoulos@lightbend.com>
Signed-off-by: Yinan Li <ynli@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants