Skip to content

MultiKueue dispatcher API #5141

@mwielgus

Description

@mwielgus
Contributor

What would you like to be added:

An api through which an external controller can tell, per workload, which of the worker clusters (out of all configured for the given ClusterQueue) should be attempted for that specific workload.

Why is this needed:

Currently MK tires all clusters configured for a ClusterQueue at the very same time, leading to problems when the number of clusters is large (for example 40).

Completion requirements:

This enhancement requires the following artifacts:

  • Design doc
    API change
    Docs update

The artifacts should be linked in subsequent comments.

Activity

added
kind/featureCategorizes issue or PR as related to a new feature.
on Apr 28, 2025
tenzen-y

tenzen-y commented on May 1, 2025

@tenzen-y
Member

Currently MK tires all clusters configured for a ClusterQueue at the very same time, leading to problems when the number of clusters is large (for example 40).

Does the problem indicate performance or else?

mwielgus

mwielgus commented on May 27, 2025

@mwielgus
ContributorAuthor

Both performance (distributing and keeping 40 copies of workload in cluster informers can be expensive) and practical (trying all 40 clusters at the very same time can lead to lots of unnecessary preemptions).

tenzen-y

tenzen-y commented on May 27, 2025

@tenzen-y
Member

Both performance (distributing and keeping 40 copies of workload in cluster informers can be expensive) and practical (trying all 40 clusters at the very same time can lead to lots of unnecessary preemptions).

That makes sense. I can imagine a lot of workload copies cause the delayed informer sync.
For a practical problem, what are the differences between this proposal and #3757?
#3757 proposes the dispatching mechanism as well.

mszadkow

mszadkow commented on May 27, 2025

@mszadkow
Contributor

/assign

mimowo

mimowo commented on May 29, 2025

@mimowo
Contributor

Yeah, I would probably call one of them a duplicate, I'm to close mine, wdyt @mwielgus ?

mimowo

mimowo commented on Jun 2, 2025

@mimowo
Contributor

Actually, when I thought about the proposed KEP #5410 I realize there is a difference @tenzen-y .

Both issues mitigate the issue with too many preemptions, and performance, but:

  1. in the issue https://github.com/kubernetes-sigs/kueue/issues/3757m the admin only configures strategy as sequential. Kueue could use the order of declaration of the clusters.
  2. in this issue the admin needs to provide an external controller for the load-balancing.

I think both are useful, and would prefer keeping them in mind when designing the solution.

Maybe we could extend the MultiKueue AC with MultiKueueConfig, analogous to ProvisioningRequestConfig. Inside we could set

dispatcherMode: default / sequential
sequentialDispatcherTimeout: 5min
sequentialDispatcherName: example.com/mydispatcher
  • If dispatcherMode: default or empty, then we get current behavior.
  • If dispatcherMode: sequential then we use built-in sequential dispatching one-by-one.
  • If sequentialDispatcherName is set then external dispatcher can be used

For the needs of the dispatcher we probably will need on workload workload.status.multiKueueNominatedCluster.

wdyt @tenzen-y @mwielgus @mszadkow ?

mszadkow

mszadkow commented on Jun 2, 2025

@mszadkow
Contributor

IIUC

Whatever WorkloadReconciler did so far gets divided in 2 steps.

  1. a) Determine where to clone workload from the local.
    b) skip it if it's done by external controller.
  2. Clone workload from local to the remote based on the workload.status.multiKueueNominatedCluster.
mimowo

mimowo commented on Jun 2, 2025

@mimowo
Contributor

Whatever WorkloadReconciler did so far gets divided in 2 steps.

yes, assuming you mean Multikueue Workload Controller.

tenzen-y

tenzen-y commented on Jun 27, 2025

@tenzen-y
Member

Actually, when I thought about the proposed KEP #5410 I realize there is a difference @tenzen-y .

Both issues mitigate the issue with too many preemptions, and performance, but:

  1. in the issue https://github.com/kubernetes-sigs/kueue/issues/3757m the admin only configures strategy as sequential. Kueue could use the order of declaration of the clusters.
  2. in this issue the admin needs to provide an external controller for the load-balancing.

I think both are useful, and would prefer keeping them in mind when designing the solution.

Maybe we could extend the MultiKueue AC with MultiKueueConfig, analogous to ProvisioningRequestConfig. Inside we could set

dispatcherMode: default / sequential
sequentialDispatcherTimeout: 5min
sequentialDispatcherName: example.com/mydispatcher

  • If dispatcherMode: default or empty, then we get current behavior.
  • If dispatcherMode: sequential then we use built-in sequential dispatching one-by-one.
  • If sequentialDispatcherName is set then external dispatcher can be used

For the needs of the dispatcher we probably will need on workload workload.status.multiKueueNominatedCluster.

wdyt @tenzen-y @mwielgus @mszadkow ?

Thank you for describing that. IIUC, this issue aims to provide interface for providing dispatching algorightm, right?
And #3757 will provide the sequential strategy on top of this dispatcher API, right?

mimowo

mimowo commented on Jun 27, 2025

@mimowo
Contributor

Correct. We will introduce the new sequenial strategy with hardcoded params for now, for simplicity, on top of the generic dispatching alogithm. The current strategy will also be selectable as a special case dispatcher.

tenzen-y

tenzen-y commented on Jun 27, 2025

@tenzen-y
Member

Correct. We will introduce the new sequenial strategy with hardcoded params for now, for simplicity, on top of the generic dispatching alogithm. The current strategy will also be selectable as a special case dispatcher.

That makes sense. In that case, let us keep both issues.

pramodbindal

pramodbindal commented on Jul 9, 2025

@pramodbindal

+1
There certain use-cases when we want a workload to be picked by specific cluster only.

mimowo

mimowo commented on Jul 9, 2025

@mimowo
Contributor

@pramodbindal thank you for the input. IIUC with the with this new Workload's nominatedClusterNames you can restrict the set of clusters to only a subset. Does it cover your use-case?

pramodbindal

pramodbindal commented on Jul 9, 2025

@pramodbindal

@mimowo . Will check this one
I have different challenge right now.

I am not able to admit my workload into multiKueue :-(

Mine is custom workload tekton/PIpelineRun which I want to manage via MultiKueue but did not find any support for External Frameworks in MultiKueue

Am i misssing something

mimowo

mimowo commented on Jul 9, 2025

@mimowo
Contributor

Mine is custom workload tekton/PIpelineRun which I want to manage via MultiKueue but did not find any support for External Frameworks in MultiKueue

External frameworks aren't currently supported in MultiKueue :(, we have an issue open about it #2349.

As a workaround you may try using AppWrapper, or PodGroups which are both supported solutions.

cc @dgrove-oss

tenzen-y

tenzen-y commented on Jul 17, 2025

@tenzen-y
Member

/reopen

k8s-ci-robot

k8s-ci-robot commented on Jul 17, 2025

@k8s-ci-robot
Contributor

@tenzen-y: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

kind/featureCategorizes issue or PR as related to a new feature.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Participants

    @pramodbindal@mszadkow@mimowo@mwielgus@k8s-ci-robot

    Issue actions

      MultiKueue dispatcher API · Issue #5141 · kubernetes-sigs/kueue