Skip to content

Conversation

@jshmchenxi
Copy link
Contributor

@jshmchenxi jshmchenxi commented Jun 20, 2024

What changes were proposed in this pull request?

This PR aims to support jobs with long spark.app.name in K8s. The resource name prefix should be truncated for the resource names to follow DNS Subdomain Names.

The current used resource suffixes are as follows:

  • -driver
  • -driver-podspec-conf-map
  • -driver-pvc-$i
  • -driver-svc
  • -exec-${executorId}
  • -exec-${executorId}-pvc-$i
  • -hadoop-config
  • -kubernetes-credentials
  • -delegation-tokens
  • -kerberos-keytab
  • -krb5-file

Among them, the longest one is -driver-podspec-conf-map of length 24.
The max length of -exec-${executorId}-pvc-$i is also 24, as the max length of executorId is 10 (length of Integer.MAX_VALUE) and the max allowed PVC specs is 128 of length 3.

Why are the changes needed?

Currently, when a job with long spark.app.name is submitted, K8s will reject the creation of driver pod due to the pod name is exceeded 253.

Error example:

Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://kubernetes.default.svc.cluster.local:443/api/v1/namespaces/foo/pods. Message: Pod "some-super-long-spark-pod-name-exceeded-length-253-driver" is invalid: metadata.name: Invalid value: "some-super-long-spark-pod-name-exceeded-length-253-driver": must be no more than 253 characters. 

Does this PR introduce any user-facing change?

Yes, users can run jobs on K8s with longer spark.app.name.

How was this patch tested?

Pass the CIs with the updated unit tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@jshmchenxi jshmchenxi force-pushed the SPARK-48669/limit-k8s-pod-name-length branch 3 times, most recently from 2a96cc9 to 1d51788 Compare June 20, 2024 13:08
@jshmchenxi jshmchenxi force-pushed the SPARK-48669/limit-k8s-pod-name-length branch from 1d51788 to 3954302 Compare June 20, 2024 13:17
@jshmchenxi
Copy link
Contributor Author

Kindly ping @dongjoon-hyun as this is a continuation of SPARK-39614

@jshmchenxi
Copy link
Contributor Author

cc @pan3793 @yaooqinn @LuciferYang
Please take a look, thanks!

@LuciferYang
Copy link
Contributor

cc @dongjoon-hyun FYI

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Oct 10, 2024
@github-actions github-actions bot closed this Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants