fix: jobs tests and repo config yaml by mikeknep · Pull Request #58 · NVIDIA-NeMo/nemo-platform

mikeknep · 2026-05-26T19:02:37Z

What was happening

If you set NMP_CONFIG_FILE_PATH="packages/nmp_platform/config/local.yaml" and have docker running when you spin up nemo services run, the Jobs service registers the docker executor but not the subprocess executor, and all jobs you attempt to create try to go through the docker backed only to fail with an Image Not Found error (or, if you have a stale version of nmp-cpu-tasks laying around, some other error since that image hasn't been rebuilt).

What's here

This PR registers the subprocess executor on that YAML config so that its executors match the bundled YAML config that gets shipped in the wheel and that serves as the default when the env var is unset: packages/nmp_platform_runner/src/nmp/platform_runner/config/local.yaml

Longer term

It'd be nice to consolidate these two YAML files into a single source of truth, but this is a larger effort.

Unit test cleanup

Fixes some tests failing on main by:

Deduping a bunch of constants appearing in two places
Ensuring the test executor sets an entrypoint and command (required by the subprocess executor)

Summary by CodeRabbit

Refactor
- Job creation now converts CPU container execution steps into subprocess-compatible steps prior to storage.
- Default executor selection temporarily uses a subprocess-only default.
- Test helpers and fixtures reorganized; test container specs now include explicit entrypoint and command.
Tests
- Updated tests and assertions to match the translation behavior and subprocess-only defaults, including GPU/Docker scenarios.

coderabbitai · 2026-05-26T19:12:45Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Forces subprocess-only executor defaults, changes endpoint translation to read subprocess profiles from merged profiles, consolidates test fixtures and expands container specs, and updates tests to assert CPU→subprocess translation results.

Changes

CPU-to-Subprocess Execution Translation

Layer / File(s)	Summary
Backend executor defaults to subprocess-only `services/core/jobs/src/nmp/core/jobs/controllers/backends/config.py`	`get_default_executor_profiles_for_runtime()` now forces subprocess-only defaults across runtimes instead of selecting Docker/Kubernetes/Volcano backends.
Endpoint CPU→subprocess translation wiring `services/core/jobs/src/nmp/core/jobs/api/v2/jobs/endpoints.py`	`configured_subprocess_translation_profiles()` derives subprocess names from merged `profiles` list instead of `config.executors`; module now imports `profiles` from controllers config.
Test fixture consolidation and container spec updates `services/core/jobs/src/nmp/core/jobs/app/test_helpers.py`, `services/core/jobs/tests/conftest.py`	`TestConstants` moved to shared `test_helpers`, `WORKSPACE` added, `TEST_EXECUTOR` container spec expanded with explicit `entrypoint` and `command`, and conftest imports updated.
Config tests updated for subprocess-only defaults `services/core/jobs/tests/test_config.py`	Default profile tests rewritten to expect subprocess-only for Docker and Kubernetes; `test_merged_profiles` now builds representative `default_executors` explicitly (including subprocess).
Jobs API tests expect translated subprocess executors `services/core/jobs/tests/test_jobs_api.py`	Adds `expected_translated_executor_dump()` helper; SDK job creation tests include container entrypoint/command; lifecycle assertions now expect translated `SubprocessExecutionProvider` dumps; GPU fail-fast test patches docker profile registration.
Integration & evaluator test container spec updates `services/core/jobs/tests/integration/test_jobs_auth_propagation.py`, `services/core/jobs/tests/integration/test_jobs_secrets_access.py`, `services/evaluator/tests/integration/tasks/conftest.py`, `services/evaluator/tests/integration/tasks/test_results_handler.py`	Integration and evaluator tests/helpers expand `executor.container` objects to include `entrypoint` and `command` fields so CPU→subprocess translation produces concrete subprocess commands.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 52.94% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	Title is vague and generic. 'fix: jobs tests and repo config yaml' uses non-descriptive terms that don't convey the main change: switching to subprocess-only backend registration and fixing test fixtures.	Revise title to specify the primary change, e.g., 'fix: register subprocess backend only for jobs service' or 'fix: switch jobs to subprocess-only backend'.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix-job-executors/mknepper

Warning

Review ran into problems

🔥 Problems

Stopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a @coderabbit review after the pipeline has finished.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@services/core/jobs/tests/test_config.py`:
- Line 375: The variable default_executors is annotated with a bare list type;
change its annotation to a concrete element type (e.g. list[ExecutorConfig] or
list[dict] depending on the actual items) so the declaration becomes something
like default_executors: list[ExecutorConfig] = [...] (use the actual element
type used in the list), updating any necessary imports for the referenced type.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c0ada19c-5f89-4f68-888d-3c7f6f5f9d06

📥 Commits

Reviewing files that changed from the base of the PR and between cea5857 and bca45a5.

📒 Files selected for processing (6)

services/core/jobs/src/nmp/core/jobs/api/v2/jobs/endpoints.py
services/core/jobs/src/nmp/core/jobs/app/test_helpers.py
services/core/jobs/src/nmp/core/jobs/controllers/backends/config.py
services/core/jobs/tests/conftest.py
services/core/jobs/tests/test_config.py
services/core/jobs/tests/test_jobs_api.py

github-actions · 2026-05-26T19:15:27Z

Suite	Lines Covered	Line Rate	Branch Rate
Unit Tests	18246/24195	75.4%	61.9%
Integration Tests	11665/22977	50.8%	25.9%

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@services/core/jobs/tests/integration/test_jobs_secrets_access.py`:
- Around line 55-56: The test sets "entrypoint" as a string but the job endpoint
code expects executor.container.entrypoint to be a list (it builds command via
[*executor.container.entrypoint, *executor.container.command] in
jobs/endpoints.py), so update the test in test_jobs_secrets_access.py to use a
list form for entrypoint (e.g., ["entrypoint"]) to match list[str] and prevent
unintended unpacking; ensure any other fixtures in that test that provide
container.entrypoint follow the same list format.

In `@services/evaluator/tests/integration/tasks/test_results_handler.py`:
- Around line 80-81: The test sets "entrypoint" as a string but
ContainerSpec.entrypoint expects list[str], causing Pydantic validation to fail;
update the test data in test_results_handler.py so the entrypoint value is a
list (e.g., ["entrypoint"]) wherever the ContainerSpec/Job creation payload is
assembled (look for the dict literal with keys "entrypoint" and "command" in the
test), ensuring the test constructs a list[str] to match
ContainerSpec.entrypoint.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 00279bd4-5421-4d7c-9353-5707e30fb74a

📥 Commits

Reviewing files that changed from the base of the PR and between bca45a5 and d384049.

📒 Files selected for processing (4)

services/core/jobs/tests/integration/test_jobs_auth_propagation.py
services/core/jobs/tests/integration/test_jobs_secrets_access.py
services/evaluator/tests/integration/tasks/conftest.py
services/evaluator/tests/integration/tasks/test_results_handler.py

coderabbitai

♻️ Duplicate comments (1)

services/core/jobs/tests/test_config.py (1)
375-375: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Missing concrete type hint.

Line 375 still has bare list assignment. Annotate as list[KubernetesJobExecutionProfile | VolcanoJobExecutionProfile | SubprocessJobExecutionProfile] or import/use the common base type if one exists.
Proposed fix
-    default_executors = [
+    default_executors: list[KubernetesJobExecutionProfile | VolcanoJobExecutionProfile | SubprocessJobExecutionProfile] = [
As per coding guidelines, always prefer concrete type hints over string based ones.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/core/jobs/tests/test_config.py` at line 375, The variable
default_executors is currently untyped; add a concrete type annotation such as
list[KubernetesJobExecutionProfile | VolcanoJobExecutionProfile |
SubprocessJobExecutionProfile] (or the shared base type if one exists) to the
default_executors declaration, and add any necessary imports for
KubernetesJobExecutionProfile, VolcanoJobExecutionProfile,
SubprocessJobExecutionProfile (or the base type) so the annotation resolves.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@services/core/jobs/tests/test_config.py`:
- Line 375: The variable default_executors is currently untyped; add a concrete
type annotation such as list[KubernetesJobExecutionProfile |
VolcanoJobExecutionProfile | SubprocessJobExecutionProfile] (or the shared base
type if one exists) to the default_executors declaration, and add any necessary
imports for KubernetesJobExecutionProfile, VolcanoJobExecutionProfile,
SubprocessJobExecutionProfile (or the base type) so the annotation resolves.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: cfe22434-2559-43fa-85b7-5d32c06c9c7d

📥 Commits

Reviewing files that changed from the base of the PR and between d384049 and cc02ab6.

📒 Files selected for processing (4)

services/core/jobs/tests/integration/test_jobs_auth_propagation.py
services/core/jobs/tests/test_config.py
services/evaluator/tests/integration/tasks/conftest.py
services/evaluator/tests/integration/tasks/test_results_handler.py

Signed-off-by: Mike Knepper <mknepper@nvidia.com>

* Only support subprocess backend for now Signed-off-by: Mike Knepper <mknepper@nvidia.com> * Fix TEST_EXECUTOR Signed-off-by: Mike Knepper <mknepper@nvidia.com> * Dedupe test helpers Signed-off-by: Mike Knepper <mknepper@nvidia.com> * Fix remaining unit tests Signed-off-by: Mike Knepper <mknepper@nvidia.com> * More executor container fixes Signed-off-by: Mike Knepper <mknepper@nvidia.com> * entrypoint is a seq Signed-off-by: Mike Knepper <mknepper@nvidia.com> * Drop superfluous type hint Signed-off-by: Mike Knepper <mknepper@nvidia.com> * And one more Signed-off-by: Mike Knepper <mknepper@nvidia.com> * Ok seriously this is the last one Signed-off-by: Mike Knepper <mknepper@nvidia.com> * Pivot per Sam's changes Signed-off-by: Mike Knepper <mknepper@nvidia.com> * Revert some more, closer to main Signed-off-by: Mike Knepper <mknepper@nvidia.com> --------- Signed-off-by: Mike Knepper <mknepper@nvidia.com> Signed-off-by: Alex Ray <alray@nvidia.com>

mikeknep requested review from a team as code owners May 26, 2026 19:02

mikeknep requested a review from matthewgrossman May 26, 2026 19:04

coderabbitai Bot reviewed May 26, 2026

View reviewed changes

Comment thread services/core/jobs/tests/test_config.py Outdated

mikeknep commented May 26, 2026

View reviewed changes

Comment thread services/evaluator/tests/integration/tasks/conftest.py

coderabbitai Bot reviewed May 26, 2026

View reviewed changes

Comment thread services/core/jobs/tests/integration/test_jobs_secrets_access.py Outdated

Comment thread services/evaluator/tests/integration/tasks/test_results_handler.py Outdated

coderabbitai Bot reviewed May 26, 2026

View reviewed changes

mikeknep added 11 commits May 27, 2026 09:13

Only support subprocess backend for now

bc091ec

Signed-off-by: Mike Knepper <mknepper@nvidia.com>

Fix TEST_EXECUTOR

b5f2d07

Signed-off-by: Mike Knepper <mknepper@nvidia.com>

Dedupe test helpers

7063607

Signed-off-by: Mike Knepper <mknepper@nvidia.com>

Fix remaining unit tests

e0ec325

Signed-off-by: Mike Knepper <mknepper@nvidia.com>

More executor container fixes

83840e3

Signed-off-by: Mike Knepper <mknepper@nvidia.com>

entrypoint is a seq

7d22019

Signed-off-by: Mike Knepper <mknepper@nvidia.com>

Drop superfluous type hint

cccdb90

Signed-off-by: Mike Knepper <mknepper@nvidia.com>

And one more

840f790

Signed-off-by: Mike Knepper <mknepper@nvidia.com>

Ok seriously this is the last one

c0b190b

Signed-off-by: Mike Knepper <mknepper@nvidia.com>

Pivot per Sam's changes

dea65e1

Signed-off-by: Mike Knepper <mknepper@nvidia.com>

Revert some more, closer to main

8e656bd

Signed-off-by: Mike Knepper <mknepper@nvidia.com>

mikeknep force-pushed the fix-job-executors/mknepper branch from 29002a3 to 8e656bd Compare May 27, 2026 14:13

mikeknep changed the title ~~fix: job executors~~ fix: jobs tests and repo config yaml May 27, 2026

benmccown approved these changes May 27, 2026

View reviewed changes

mikeknep added this pull request to the merge queue May 27, 2026

Merged via the queue into main with commit f3d87f7 May 27, 2026
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: jobs tests and repo config yaml#58

fix: jobs tests and repo config yaml#58
mikeknep merged 11 commits into
mainfrom
fix-job-executors/mknepper

mikeknep commented May 26, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 26, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

❌ Failed checks (1 warning, 1 inconclusive)

Review ran into problems

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

github-actions Bot commented May 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mikeknep commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What was happening

What's here

Longer term

Unit test cleanup

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

❌ Failed checks (1 warning, 1 inconclusive)

Review ran into problems

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mikeknep commented May 26, 2026 •

edited

Loading

coderabbitai Bot commented May 26, 2026 •

edited

Loading

github-actions Bot commented May 26, 2026 •

edited

Loading