scheduler: add 'none' type that bypasses PodGroup creation (#936)#975
scheduler: add 'none' type that bypasses PodGroup creation (#936)#975jiaenren wants to merge 1 commit into
Conversation
Adds BackendSchedulerType.NONE so backends can run without kai-scheduler. The new NoneK8sObjectFactory skips PodGroup CR creation, leaves schedulerName unset (use the cluster's default scheduler), and emits no kai/runai labels or scheduler-side resources (Queue/Topology). Sacrifices gang scheduling and priority/topology constraints to remove the kai-scheduler dependency. Tests: - 14 new unit tests in src/utils/job/tests/test_kb_objects.py covering enum parsing, factory dispatch, pod spec absence-of-fields, no PodGroup in output, no kai/runai labels, empty scheduler resources, and priority/topology unsupported. - New run/tests/test_scheduler_none_kind.py e2e harness that, on a kind cluster with no kai-scheduler installed, asserts (a) the kai factory's PodGroup is rejected by the API server (proving the negative) and (b) the none factory's pod applies cleanly and reaches Running on default-scheduler.
📝 WalkthroughWalkthroughThis PR implements the ChangesNONE Scheduler Type Implementation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #975 +/- ##
==========================================
+ Coverage 44.53% 44.59% +0.06%
==========================================
Files 218 218
Lines 28537 28545 +8
Branches 4260 4261 +1
==========================================
+ Hits 12708 12730 +22
+ Misses 15208 15194 -14
Partials 621 621
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@run/tests/test_scheduler_none_kind.py`:
- Around line 1-2: The SPDX header in this new test file exceeds 100 characters
and needs the same pylint waiver used elsewhere; edit the file-level header (the
top triple-quoted string containing the SPDX line) to append the inline pylint
disable for line-too-long (i.e., add the comment "# pylint:
disable=line-too-long" alongside the SPDX copyright statement) so the long
copyright line is exempted from the linter.
In `@src/utils/job/kb_objects.py`:
- Around line 620-623: The override update_pod_k8s_resource currently
intentionally ignores all parameters but doesn't suppress pylint
unused-argument; modify the method to silence the linter by either renaming
unused parameters with a leading underscore (e.g., _pod, _group_uuid,
_pool_name, _priority) or add a pylint disable comment (e.g., # pylint:
disable=unused-argument) on the method signature, keeping the no-op pass body
and the method name unchanged so it still overrides the base implementation.
In `@src/utils/job/tests/test_kb_objects.py`:
- Around line 1-2: Add the inline pylint waiver to the existing SPDX copyright
header line (the long SPDX-FileCopyrightText header string) so it won't trigger
line-too-long checks; specifically append the comment "# pylint:
disable=line-too-long" to that SPDX header line in test_kb_objects.py (the
top-of-file SPDX header) without splitting the copyright text across multiple
lines.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 4674f97e-9cd6-4e53-b5b8-39e8214e33b1
📒 Files selected for processing (6)
run/tests/BUILDrun/tests/test_scheduler_none_kind.pysrc/utils/connectors/postgres.pysrc/utils/job/kb_objects.pysrc/utils/job/tests/BUILDsrc/utils/job/tests/test_kb_objects.py
💤 Files with no reviewable changes (1)
- src/utils/job/tests/BUILD
| """ | ||
| SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
There was a problem hiding this comment.
Add the long-line pylint waiver to this SPDX header too.
The new file has the same over-100-character copyright line, so it needs the inline line-too-long disable to stay consistent with repo policy.
As per coding guidelines, "If copyright lines exceed 100 characters, add # pylint: disable=line-too-long comment instead of breaking into multiple lines".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@run/tests/test_scheduler_none_kind.py` around lines 1 - 2, The SPDX header in
this new test file exceeds 100 characters and needs the same pylint waiver used
elsewhere; edit the file-level header (the top triple-quoted string containing
the SPDX line) to append the inline pylint disable for line-too-long (i.e., add
the comment "# pylint: disable=line-too-long" alongside the SPDX copyright
statement) so the long copyright line is exempted from the linter.
| def update_pod_k8s_resource(self, pod: Dict, group_uuid: str, pool_name: str, | ||
| priority: wf_priority.WorkflowPriority): | ||
| # No-op: leave schedulerName unset so the default scheduler picks it up. | ||
| pass |
There was a problem hiding this comment.
Suppress the intentional unused parameters in the no-op override.
This override ignores every argument, but unlike the similar base-class methods it does not disable unused-argument, so pylint will flag it.
Proposed fix
def update_pod_k8s_resource(self, pod: Dict, group_uuid: str, pool_name: str,
priority: wf_priority.WorkflowPriority):
+ # pylint: disable=unused-argument
# No-op: leave schedulerName unset so the default scheduler picks it up.
pass🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/utils/job/kb_objects.py` around lines 620 - 623, The override
update_pod_k8s_resource currently intentionally ignores all parameters but
doesn't suppress pylint unused-argument; modify the method to silence the linter
by either renaming unused parameters with a leading underscore (e.g., _pod,
_group_uuid, _pool_name, _priority) or add a pylint disable comment (e.g., #
pylint: disable=unused-argument) on the method signature, keeping the no-op pass
body and the method name unchanged so it still overrides the base
implementation.
| """ | ||
| SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
There was a problem hiding this comment.
Add the long-line pylint waiver to the SPDX header.
This copyright line is over the repo’s 100-character limit, so it should carry the same inline waiver used in the other Python files touched by this PR.
As per coding guidelines, "If copyright lines exceed 100 characters, add # pylint: disable=line-too-long comment instead of breaking into multiple lines".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/utils/job/tests/test_kb_objects.py` around lines 1 - 2, Add the inline
pylint waiver to the existing SPDX copyright header line (the long
SPDX-FileCopyrightText header string) so it won't trigger line-too-long checks;
specifically append the comment "# pylint: disable=line-too-long" to that SPDX
header line in test_kb_objects.py (the top-of-file SPDX header) without
splitting the copyright text across multiple lines.
Summary
Resolves #936 by adopting Option C — Add 'none': a new
BackendSchedulerType.NONEthat bypasses PodGroup CR creation entirely and lets the cluster's default scheduler place pods. Removes the hard dependency on kai-scheduler for backends that don't need gang scheduling.NoneK8sObjectFactoryinsrc/utils/job/kb_objects.py: returns pods only (no PodGroup), leavesschedulerNameunset, emits no kai/runai labels, returns empty scheduler-resource specs.BackendSchedulerType.NONE = 'none'insrc/utils/connectors/postgres.py.get_k8s_object_factory()dispatchesNONE→NoneK8sObjectFactory.priority_supported()andtopology_supported()correctly returnFalseforNONE, so the existing validation inPool.validateandWorkflow.submitalready produces clear errors when users try to use those features against anonebackend (no extra plumbing needed).Trade-offs
Test plan
src/utils/job/tests/test_kb_objects.pycovering enum parsing, factory dispatch, absence ofschedulerName, absence ofkai.scheduler/*andrunai/*labels, no PodGroup in output, nopod-group-nameannotation, empty scheduler/cleanup specs,priority_supported()/topology_supported()returningFalse, and topology-keys-ignored.tags = [\"manual\"]ontest_kb_objectsso the suite runs in CI; fixed 2 pre-existing tests using deprecatedassertEquals.bazel test //src/utils/job/...— 9/9 passing.bazel test //src/utils/connectors/...— passing.bazel test //src/service/core/config:config-pylint //src/service/core/config/tests:test_config_history— passing.run/tests/test_scheduler_none_kind.py):KaiK8sObjectFactory's PodGroup is rejected by the API server withensure CRDs are installed first— proves the cluster genuinely lacks kai and the test can fail.NoneK8sObjectFactory's pod applies cleanly, reachesRunning, and is scheduled bydefault-scheduleron the kind node.The e2e harness is wired as
osmo_py_binary //run/tests:test_scheduler_none_kindand runs against a user-suppliedKIND_CONTEXTenv var, so it can be invoked locally or hooked into a kind-only CI lane later.Summary by CodeRabbit
New Features
Tests