-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to reduce unnecessary preemptions in target clusters??? #97
Comments
Excellent question. I think there may be a way to coordinate preemptions by waiting in the PostFilter plugin in the candidate schedulers, and, in the proxy scheduler, requesting one of the candidate schedulers to proceed with preemption if and when all of them are waiting (in the Filter plugin or a different channel, not sure yet). The waiting-to-preempt status and preemption request could use annotations, like the rest of the algorithm. Would you like to prototype this solution? |
ah, nice idea. But, I can see one thing that we need to discuss in this algorithm. As you may know, preemption is performed in scheduling cycle and scheduling cycle is single-threaded. So, waiting something in scheduling cycle does have huge impact for reducing scheduling throughput (please consider the case which Cluster X is targeted from bunch of source clusters and bunch of high priority pods were created in source clusters). |
You're right. Here's another suggestion. Instead of waiting, the candidate scheduler PostFilter plugin could "succeed" to bypass preemption, unless the candidate pod is annotated to allow preemption; either way, the pod would be requeued for scheduling. The proxy scheduler would ensure that only one candidate pod for a given proxy pod is allowed to preempt (and would somehow cycle through them). Another option, readily available but poorly documented, is to use the alternate scheduling algorithm, enabled by the See also: https://github.com/admiraltyio/admiralty/blob/master/pkg/scheduler_plugins/proxy/plugin.go I'd actually have to think more to see if my suggestion above is any better than the existing alternate algorithm, which was originally designed to work with custom third-party schedulers like Fargate, cf. "caution" box in documentation: https://admiralty.io/docs/concepts/scheduling |
Thank you for your further suggestion and nice ideas. I think your both plans would work to some extent without reducing scheduling throughput. I'm a little bit uneasy about schedule latency because both plans try to schedule candidate pod in one by one manner. Re: bypassing preemption in candidate schedulers and controlling which candidate pod can go preemption by the proxy scheduler This idea avoids blocking the scheduling cycle and utilizes the binding cycle instead. I think it sounds nice to me overall. I would propose to introduce a Re: bypassing candidate scheduler I didn't know this is already supported. This idea also utilizes the binding cycle. This selects a target cluster one by one and waiting in the PreBind cycle similarly. But, I can see there exists a difficult situation in this algorithm. If I understood correctly, that is what is mixed by candidate schedulers and 3rd-party schedulers, right? For example, Source cluster X, target cluster Y where candidate scheduler can live and target Z(e.g. Fargate) where candidate scheduler can not live. In this case, users must select the
Thank you very much. I will also try thinking about other ideas. |
Great idea!
Right. You bring up a very good point. I wonder if Actually, no-reservation per target rather than per pod could be inferred via the ClusterSummaries (given by targets), rather than specified on Targets, and the feature would be transparent for users on the source side. |
As described official document, filter phase in proxy scheduler waits until candidate pods reaches at reserve phase in candidate scheduler. This might happen pod preemptions in candiadte schedulers.
However, at most one candidate pod can be the delegate pod. So, unnecessary preemption could happen in many target clusters.
For example, if I have 10
Target
and all the target are very full, and, I created a source pod with high priority. Then, preemptions might happen in all the target clusters. So, preemptions in 9 target clusters will be unnecessary resultantly.I don't have any specific solution, but how to reduce it? Any idea??
The text was updated successfully, but these errors were encountered: