Skip to content

[SPARK-42060][K8S][WIP] add new config to override driver/executor k8s containers names#39563

Closed
hussein-awala wants to merge 1 commit intoapache:masterfrom
hussein-awala:SPARK-42060/containers_names
Closed

[SPARK-42060][K8S][WIP] add new config to override driver/executor k8s containers names#39563
hussein-awala wants to merge 1 commit intoapache:masterfrom
hussein-awala:SPARK-42060/containers_names

Conversation

@hussein-awala
Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

Adding two new config spark.kubernetes.driver.container.name and spark.kubernetes.executor.container.name to override the default containers names.

Why are the changes needed?

We are using CloudWatch to collect the pods logs, and we partition/group the logs by the container name. Providing a pod template for each job just to override the container name is complicated where we have more than 500 different job, so the best solution is overriding the default containers names when the pod template is not provided, or when it is provided but without forcing a container name.

Does this PR introduce any user-facing change?

How was this patch tested?

@hussein-awala hussein-awala changed the title [SPARK-42060] add new config to override driver/executor k8s containers names [SPARK-42060][K8S][WIP] add new config to override driver/executor k8s containers names Jan 14, 2023
@dongjoon-hyun dongjoon-hyun marked this pull request as ready for review January 14, 2023 05:29
@dongjoon-hyun dongjoon-hyun marked this pull request as draft January 14, 2023 05:29
Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making a PR, but Apache Spark already supports container name overriding via the more general way, spark.kubernetes.driver.podTemplateFile and spark.kubernetes.executor.podTemplateFile. Please use the existing ones.

@hussein-awala
Copy link
Copy Markdown
Member Author

@dongjoon-hyun as I mentioned in my PR, we have more than 500 jobs, and providing a different pod template for each job just to override the container name is very complicated.
I think pod template file is used to define common configuration (we create <10 templates per project), but where the container name can be used for monitoring and log collection, sometime it should be unique for each job (like the pod name), and I believe we should support overriding it with a new conf.
What do you think?

@dongjoon-hyun
Copy link
Copy Markdown
Member

Are annotations and labels not enough for your goals? That's the recommended way in K8s eco-system. We want to keep a single and simple way as much as possible instead of duplicating all K8s spec.

@hussein-awala
Copy link
Copy Markdown
Member Author

We have other services running on the same cluster, and some of their pods have multiple containers. Using annotations and labels in this case is not enough to separate containers logs.
I will try to use different configurations for the log collector in spark namespace. But I will also finish the change I started and tested from my fork, if you ever find an interest to add these configurations, you can re-open the JIRA ticket and I will re-create the PR.
Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants