Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-38455][SPARK-38187][K8S] Support driver/executor PodGroup templates #35776

Closed
wants to merge 4 commits into from
Closed

[SPARK-38455][SPARK-38187][K8S] Support driver/executor PodGroup templates #35776

wants to merge 4 commits into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Mar 8, 2022

What changes were proposed in this pull request?

This PR aims to support driver/executor PodGroup templates like the following.

apiVersion: scheduling.volcano.sh/v1beta1
kind: PodGroup
spec:
  minMember: 1000
  minResources:
    cpu: "4"
    memory: "16Gi"
  priorityClassName: executor-priority
  queue: executor-queue

Why are the changes needed?

This is a simpler, more extensible and robust way to support Volcano future because we don't need to add new configurations like #35640 for all Volcano features.

Does this PR introduce any user-facing change?

No because this is a new feature.

How was this patch tested?

Pass the CIs.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-38455][K8S] Support driver/executor PodGroup templates [SPARK-38455][K8S] Support driver/executor PodGroup templates Mar 8, 2022
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-38455][K8S] Support driver/executor PodGroup templates [SPARK-38455][SPARK-38187][K8S] Support driver/executor PodGroup templates Mar 8, 2022
@dongjoon-hyun
Copy link
Member Author

Could you review this, @viirya and @Yikun ?

priorityClassName.foreach(podGroup.editOrNewSpec().withPriorityClassName(_).endSpec())
var spec = pg.getSpec
if (spec == null) spec = new PodGroupSpec
queue.foreach(spec.setQueue(_))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw there is queue in the PodGroup template. So this will overwrite it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, currently, spark.kubernetes.job.queue configuration will overwrite it. It's the behavior for non-Volcano driver/executor pod template, too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I like this direction instead of adding many individual configurations.

@Yikun
Copy link
Member

Yikun commented Mar 8, 2022

I take a roughly look it's a better choice to help to avoid configuration introduced. I will take a deep look and test this today.

@dongjoon-hyun Thanks!

@dongjoon-hyun
Copy link
Member Author

Thanks, @viirya and @Yikun .


priorityClassName.foreach(podGroup.editOrNewSpec().withPriorityClassName(_).endSpec())
var spec = pg.getSpec
if (spec == null) spec = new PodGroupSpec
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no template given, do we need some default pod group spec?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's the existing code's behavior at PodGroupBuilder.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea looks good to me.

val template = if (kubernetesConf.isInstanceOf[KubernetesDriverConf]) {
kubernetesConf.get(KUBERNETES_DRIVER_PODGROUP_TEMPLATE_FILE)
} else {
kubernetesConf.get(KUBERNETES_EXECUTOR_PODGROUP_TEMPLATE_FILE)
Copy link
Member

@Yikun Yikun Mar 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we are not support executor's separated podgroup yet, currently no ability to create pre resource for excutors in spark on k8s.

we only supported driver side podgroup or job level podgroup (share driver podgroup) now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, this is added for feature parity for your existing contribution.
getAdditionalPreKubernetesResources is used only for KubernetesDriverBuilder.

@dongjoon-hyun
Copy link
Member Author

To fix license linter error, the licenses are added.

BTW, do you have any other concerns, @Yikun ?

var spec = pg.getSpec
if (spec == null) spec = new PodGroupSpec
queue.foreach(spec.setQueue(_))
priorityClassName.foreach(spec.setPriorityClassName(_))
Copy link
Member

@Yikun Yikun Mar 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have only one left conern in priorityClassName about configuration overwrite priority.

Compare to pod template this might be a reasonable overwirte sort for me:

  • Top 1 configuration value (like queue)
  • Top 2 config template value (like driver/executor template)
  • Top 3 default value (like priorityClassName which reuse pod priority as default value)

So do you think we should change it to:

    // Overwrite if queue configuration specified
    queue.foreach(spec.setQueue(_))
    // Reuse Pod priority if priority is not set in template
    if (spec.getPriorityClassName == null) priorityClassName.foreach(spec.setPriorityClassName(_))

Then users can set PriorityClassName by specifying priority in template. WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand you are exploring all possibility. However, Apache Spark doesn't work like that. :) We prefer the user given configurations over any template files and it's simpler always. So, the current behavior is intentionally consistent with the existing one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I can understand it. This is not only exploring all possibility, but also there are many users want to specify priority more flexiable in job level.

An alternative way to help user use priority conveniently, we still keep this as your current implemtation, consider priority scheduling is very common case, do you think we could introduce a configuration like spark.kuberentes.driver.PriorityClassName to simplify config template? This also help both native default scheduler and also custom scheduler to specify spark job priority easily.

cpu: "4"
memory: "16Gi"
priorityClassName: executor-priority
queue: executor-queue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: new line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed now, @Yikun .

@dongjoon-hyun
Copy link
Member Author

BTW, @Yikun . We can take advantage this PodGroup templates more actively in order to isolate from the other customer schedulers, the default K8s scheduler or Apache Spark itself. I'm going to propose that after this PR.

Copy link
Member

@Yikun Yikun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[info] VolcanoSuite:
[info] - Run SparkPi with volcano scheduler (10 seconds, 10 milliseconds)
[info] - SPARK-38188: Run SparkPi jobs with 2 queues (only 1 enabled) (14 seconds, 304 milliseconds)
[info] - SPARK-38188: Run SparkPi jobs with 2 queues (all enabled) (13 seconds, 252 milliseconds)
[info] - SPARK-38423: Run SparkPi Jobs with priorityClassName (15 seconds, 264 milliseconds)
[info] - SPARK-38423: Run driver job to validate priority order (16 seconds, 315 milliseconds)
- (coming soon)SPARK-38187: Run SparkPi Jobs with minCPU (28 seconds, 525 milliseconds)
- (coming soon)SPARK-38187: Run SparkPi Jobs with minMemory (26 seconds, 497 milliseconds)

Except the priority concern otherwise LGTM. Thanks, I also test it in my env for existing case and coming soon minRes case.

@dongjoon-hyun
Copy link
Member Author

Thank you, @viirya and @Yikun . Merged to master for Apache Spark 3.3.0.

@Yikun
Copy link
Member

Yikun commented Mar 9, 2022

We can take advantage this PodGroup templates more actively in order to isolate from the other customer schedulers, the default K8s scheduler or Apache Spark itself. I'm going to propose that after this PR.

Expected!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants