-
Couldn't load subscription status.
- Fork 449
Support Elastic Jobs via WorkloadSlices #5510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
|
Hi @ichekrygin. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
d9f4768 to
fe05a9f
Compare
|
@ichekrygin I think this is very close to be mergable. I left a bunch of comments, mostly renames to use "replacing" terminology consistently, rather than preemptions, because the mechanism only marginally relies on preemptions. it would also be great to add integration tests for the happy path. The release is on Friday, so we still have a bit of time to address the comments I think. Feel free to also squash the commits. There are 33 of them, I highly doubt anyone would like to be traversing them :) |
Co-authored-by: Michał Woźniak <mimowo@users.noreply.github.com>
Signed-off-by: ichekrygin <illya.chekrygin@gmail.com>
Co-authored-by: Michał Woźniak <mimowo@users.noreply.github.com>
…logy. Signed-off-by: ichekrygin <illya.chekrygin@gmail.com>
|
LGTM, but please address the remaining comments |
…rom "scheduler" to "workloadslicing" package. Signed-off-by: ichekrygin <illya.chekrygin@gmail.com>
…s feature. Signed-off-by: ichekrygin <illya.chekrygin@gmail.com>
|
Let's make the note a bit more user-oriented, I think the workload-slices replacement is more of a technical detail. Putting a link to KEP77 is probably enough for interested readers. |
|
/lgtm FYI @tenzen-y: Since the release is approaching and all of my comments have been addressed, I am merging this now to avoid potential conflicts with other PRs. I've taken extra care to ensure all new code is behind the alpha feature gate. Please feel free to add any further comments or open a new issue for follow-up items. I'm confident we can address them. |
|
LGTM label has been added. Git tree hash: b632836c18466b71df5bac3e1328c769844678ef
|
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ichekrygin, mimowo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Add workload slice conditions constants to track workload slice aggregation and deactivation. Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * Add support for generating unique workload names based on owner object generation. Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * Add and enhance workload PodSet count manipulation functions This commit introduces new functions and enhances existing ones for manipulating a Workload's PodSet counts, including: - Retrieving PodSet counts - Detecting PodSet count reduction - Checking for equality of PodSet counts - Updating PodSet counts These functions will be used in the upcoming workload-slice implementation. They also replace existing, similar functionality that is now marked for deprecation, promoting code consolidation and maintainability. Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * Add initial support for WorkloadSlices to support (dynamically)scaled jobs. This change introduces core support for WorkloadSlices as outlined in KEP-77, enabling the scheduler to handle dynamically sized jobs through fine-grained workload subdivision. Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * Post-rebase update adding "WorkloadReference". Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * Refactored prepareWorkload to extract workload slice handling into a separate function and added unit test coverage Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * Rename WorkloadSlice feature gate and job annotation to follow DynamicallySizedJob naming convention Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * Refactor workload slice scheduling: move capacity calculation to flavor assignment and assert flavor persistence between slices. Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * Post-rebase update changing "WorkloadReference" -> "Reference" Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * Restore removed (by accident) "blank" imports. Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * Relax workload PodSet validation only when the "DynamicallySizedJob" feature is enabled, and update the associated unit and integration tests. Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * Update DynamicallySizedJobs feature check. Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * Rebrand DynamicallySizedJobs -> ElasticJobs[ViaWorkloadSlices] Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * Address PR review feedback. Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * PR: address kubernetes-sigs#5510 (comment) Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * PR: address kubernetes-sigs#5510 (review) Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * Refactor workload-slice eviction/preemption replacing it with workload slice aggregation. Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * Revert change to `ensureWorkload` merging with `ensureOneWorkload` as per PR review. Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * Update `WorkloadPreemptibleSliceNameKey` annotation key per PR feedback. Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * Update apis/kueue/v1beta1/constants.go Co-authored-by: Michał Woźniak <mimowo@users.noreply.github.com> * Update `WorkoadSlice` related constants and address additional comments per PR feedback, Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * Update apis/kueue/v1beta1/workload_types.go Co-authored-by: Michał Woźniak <mimowo@users.noreply.github.com> * Update to address PR feedback. Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * Convert replaceable slice target name to use workload.Reference for consistency. Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * Update workload slice deactivation. Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * Update feature gate activation test. Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * Update per PR feedback. Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * Update flavor assignment after rebase. Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * Update per PR review. Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * Remove obsolete WorkloadSliceReplacementReason constant. Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * Fix linter errors. Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * Update workload slice related integration tests. Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * Refactor name WorkloadSliceReplacementForKey -> WorkloadSliceReplacementFor for clarity. Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * Update pkg/scheduler/scheduler.go Co-authored-by: Michał Woźniak <mimowo@users.noreply.github.com> * Remove check for old workload-slice name collision. Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * Update pkg/scheduler/scheduler.go Co-authored-by: Michał Woźniak <mimowo@users.noreply.github.com> * Update preemptable workload slice naming to utilize "Replace" terminology. Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * Refactor workload-slice replacement/preemption target functionality from "scheduler" to "workloadslicing" package. Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> * Add integration test for job with enabled ElasticJobsViaWorkloadSlices feature. Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> --------- Signed-off-by: ichekrygin <illya.chekrygin@gmail.com> Co-authored-by: Michał Woźniak <mimowo@users.noreply.github.com>
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR introduces the foundational implementation of WorkloadSlices in Kueue, as proposed in KEP-77. WorkloadSlices enable controlled scaling of admitted workloads (e.g., scale-up) while preserving Kueue's scheduling guarantees and resource tracking semantics.
📌 Summary
📎 Additional Notes
Which issue(s) this PR fixes:
Fixes #5528
Special notes for your reviewer:
Does this PR introduce a user-facing change?