Skip to content

fix: jobs tests and repo config yaml#58

Merged
mikeknep merged 11 commits into
mainfrom
fix-job-executors/mknepper
May 27, 2026
Merged

fix: jobs tests and repo config yaml#58
mikeknep merged 11 commits into
mainfrom
fix-job-executors/mknepper

Conversation

@mikeknep
Copy link
Copy Markdown
Contributor

@mikeknep mikeknep commented May 26, 2026

What was happening

If you set NMP_CONFIG_FILE_PATH="packages/nmp_platform/config/local.yaml" and have docker running when you spin up nemo services run, the Jobs service registers the docker executor but not the subprocess executor, and all jobs you attempt to create try to go through the docker backed only to fail with an Image Not Found error (or, if you have a stale version of nmp-cpu-tasks laying around, some other error since that image hasn't been rebuilt).

What's here

This PR registers the subprocess executor on that YAML config so that its executors match the bundled YAML config that gets shipped in the wheel and that serves as the default when the env var is unset: packages/nmp_platform_runner/src/nmp/platform_runner/config/local.yaml

Longer term

It'd be nice to consolidate these two YAML files into a single source of truth, but this is a larger effort.

Unit test cleanup

Fixes some tests failing on main by:

  • Deduping a bunch of constants appearing in two places
  • Ensuring the test executor sets an entrypoint and command (required by the subprocess executor)

Summary by CodeRabbit

  • Refactor

    • Job creation now converts CPU container execution steps into subprocess-compatible steps prior to storage.
    • Default executor selection temporarily uses a subprocess-only default.
    • Test helpers and fixtures reorganized; test container specs now include explicit entrypoint and command.
  • Tests

    • Updated tests and assertions to match the translation behavior and subprocess-only defaults, including GPU/Docker scenarios.

Review Change Stack

@mikeknep mikeknep requested review from a team as code owners May 26, 2026 19:02
@mikeknep mikeknep requested a review from matthewgrossman May 26, 2026 19:04
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Forces subprocess-only executor defaults, changes endpoint translation to read subprocess profiles from merged profiles, consolidates test fixtures and expands container specs, and updates tests to assert CPU→subprocess translation results.

Changes

CPU-to-Subprocess Execution Translation

Layer / File(s) Summary
Backend executor defaults to subprocess-only
services/core/jobs/src/nmp/core/jobs/controllers/backends/config.py
get_default_executor_profiles_for_runtime() now forces subprocess-only defaults across runtimes instead of selecting Docker/Kubernetes/Volcano backends.
Endpoint CPU→subprocess translation wiring
services/core/jobs/src/nmp/core/jobs/api/v2/jobs/endpoints.py
configured_subprocess_translation_profiles() derives subprocess names from merged profiles list instead of config.executors; module now imports profiles from controllers config.
Test fixture consolidation and container spec updates
services/core/jobs/src/nmp/core/jobs/app/test_helpers.py, services/core/jobs/tests/conftest.py
TestConstants moved to shared test_helpers, WORKSPACE added, TEST_EXECUTOR container spec expanded with explicit entrypoint and command, and conftest imports updated.
Config tests updated for subprocess-only defaults
services/core/jobs/tests/test_config.py
Default profile tests rewritten to expect subprocess-only for Docker and Kubernetes; test_merged_profiles now builds representative default_executors explicitly (including subprocess).
Jobs API tests expect translated subprocess executors
services/core/jobs/tests/test_jobs_api.py
Adds expected_translated_executor_dump() helper; SDK job creation tests include container entrypoint/command; lifecycle assertions now expect translated SubprocessExecutionProvider dumps; GPU fail-fast test patches docker profile registration.
Integration & evaluator test container spec updates
services/core/jobs/tests/integration/test_jobs_auth_propagation.py, services/core/jobs/tests/integration/test_jobs_secrets_access.py, services/evaluator/tests/integration/tasks/conftest.py, services/evaluator/tests/integration/tasks/test_results_handler.py
Integration and evaluator tests/helpers expand executor.container objects to include entrypoint and command fields so CPU→subprocess translation produces concrete subprocess commands.
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 52.94% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive Title is vague and generic. 'fix: jobs tests and repo config yaml' uses non-descriptive terms that don't convey the main change: switching to subprocess-only backend registration and fixing test fixtures. Revise title to specify the primary change, e.g., 'fix: register subprocess backend only for jobs service' or 'fix: switch jobs to subprocess-only backend'.
✅ Passed checks (3 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-job-executors/mknepper

Warning

Review ran into problems

🔥 Problems

Stopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a @coderabbit review after the pipeline has finished.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@services/core/jobs/tests/test_config.py`:
- Line 375: The variable default_executors is annotated with a bare list type;
change its annotation to a concrete element type (e.g. list[ExecutorConfig] or
list[dict] depending on the actual items) so the declaration becomes something
like default_executors: list[ExecutorConfig] = [...] (use the actual element
type used in the list), updating any necessary imports for the referenced type.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c0ada19c-5f89-4f68-888d-3c7f6f5f9d06

📥 Commits

Reviewing files that changed from the base of the PR and between cea5857 and bca45a5.

📒 Files selected for processing (6)
  • services/core/jobs/src/nmp/core/jobs/api/v2/jobs/endpoints.py
  • services/core/jobs/src/nmp/core/jobs/app/test_helpers.py
  • services/core/jobs/src/nmp/core/jobs/controllers/backends/config.py
  • services/core/jobs/tests/conftest.py
  • services/core/jobs/tests/test_config.py
  • services/core/jobs/tests/test_jobs_api.py

Comment thread services/core/jobs/tests/test_config.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 26, 2026

Suite Lines Covered Line Rate Branch Rate
Unit Tests 18246/24195 75.4% 61.9%
Integration Tests 11665/22977 50.8% 25.9%

Comment thread services/evaluator/tests/integration/tasks/conftest.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@services/core/jobs/tests/integration/test_jobs_secrets_access.py`:
- Around line 55-56: The test sets "entrypoint" as a string but the job endpoint
code expects executor.container.entrypoint to be a list (it builds command via
[*executor.container.entrypoint, *executor.container.command] in
jobs/endpoints.py), so update the test in test_jobs_secrets_access.py to use a
list form for entrypoint (e.g., ["entrypoint"]) to match list[str] and prevent
unintended unpacking; ensure any other fixtures in that test that provide
container.entrypoint follow the same list format.

In `@services/evaluator/tests/integration/tasks/test_results_handler.py`:
- Around line 80-81: The test sets "entrypoint" as a string but
ContainerSpec.entrypoint expects list[str], causing Pydantic validation to fail;
update the test data in test_results_handler.py so the entrypoint value is a
list (e.g., ["entrypoint"]) wherever the ContainerSpec/Job creation payload is
assembled (look for the dict literal with keys "entrypoint" and "command" in the
test), ensuring the test constructs a list[str] to match
ContainerSpec.entrypoint.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 00279bd4-5421-4d7c-9353-5707e30fb74a

📥 Commits

Reviewing files that changed from the base of the PR and between bca45a5 and d384049.

📒 Files selected for processing (4)
  • services/core/jobs/tests/integration/test_jobs_auth_propagation.py
  • services/core/jobs/tests/integration/test_jobs_secrets_access.py
  • services/evaluator/tests/integration/tasks/conftest.py
  • services/evaluator/tests/integration/tasks/test_results_handler.py

Comment thread services/core/jobs/tests/integration/test_jobs_secrets_access.py Outdated
Comment thread services/evaluator/tests/integration/tasks/test_results_handler.py Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
services/core/jobs/tests/test_config.py (1)

375-375: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Missing concrete type hint.

Line 375 still has bare list assignment. Annotate as list[KubernetesJobExecutionProfile | VolcanoJobExecutionProfile | SubprocessJobExecutionProfile] or import/use the common base type if one exists.

Proposed fix
-    default_executors = [
+    default_executors: list[KubernetesJobExecutionProfile | VolcanoJobExecutionProfile | SubprocessJobExecutionProfile] = [

As per coding guidelines, always prefer concrete type hints over string based ones.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/core/jobs/tests/test_config.py` at line 375, The variable
default_executors is currently untyped; add a concrete type annotation such as
list[KubernetesJobExecutionProfile | VolcanoJobExecutionProfile |
SubprocessJobExecutionProfile] (or the shared base type if one exists) to the
default_executors declaration, and add any necessary imports for
KubernetesJobExecutionProfile, VolcanoJobExecutionProfile,
SubprocessJobExecutionProfile (or the base type) so the annotation resolves.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@services/core/jobs/tests/test_config.py`:
- Line 375: The variable default_executors is currently untyped; add a concrete
type annotation such as list[KubernetesJobExecutionProfile |
VolcanoJobExecutionProfile | SubprocessJobExecutionProfile] (or the shared base
type if one exists) to the default_executors declaration, and add any necessary
imports for KubernetesJobExecutionProfile, VolcanoJobExecutionProfile,
SubprocessJobExecutionProfile (or the base type) so the annotation resolves.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: cfe22434-2559-43fa-85b7-5d32c06c9c7d

📥 Commits

Reviewing files that changed from the base of the PR and between d384049 and cc02ab6.

📒 Files selected for processing (4)
  • services/core/jobs/tests/integration/test_jobs_auth_propagation.py
  • services/core/jobs/tests/test_config.py
  • services/evaluator/tests/integration/tasks/conftest.py
  • services/evaluator/tests/integration/tasks/test_results_handler.py

mikeknep added 11 commits May 27, 2026 09:13
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Mike Knepper <mknepper@nvidia.com>
@mikeknep mikeknep force-pushed the fix-job-executors/mknepper branch from 29002a3 to 8e656bd Compare May 27, 2026 14:13
@mikeknep mikeknep changed the title fix: job executors fix: jobs tests and repo config yaml May 27, 2026
@mikeknep mikeknep added this pull request to the merge queue May 27, 2026
Merged via the queue into main with commit f3d87f7 May 27, 2026
13 checks passed
aray12 pushed a commit that referenced this pull request May 28, 2026
* Only support subprocess backend for now

Signed-off-by: Mike Knepper <mknepper@nvidia.com>

* Fix TEST_EXECUTOR

Signed-off-by: Mike Knepper <mknepper@nvidia.com>

* Dedupe test helpers

Signed-off-by: Mike Knepper <mknepper@nvidia.com>

* Fix remaining unit tests

Signed-off-by: Mike Knepper <mknepper@nvidia.com>

* More executor container fixes

Signed-off-by: Mike Knepper <mknepper@nvidia.com>

* entrypoint is a seq

Signed-off-by: Mike Knepper <mknepper@nvidia.com>

* Drop superfluous type hint

Signed-off-by: Mike Knepper <mknepper@nvidia.com>

* And one more

Signed-off-by: Mike Knepper <mknepper@nvidia.com>

* Ok seriously this is the last one

Signed-off-by: Mike Knepper <mknepper@nvidia.com>

* Pivot per Sam's changes

Signed-off-by: Mike Knepper <mknepper@nvidia.com>

* Revert some more, closer to main

Signed-off-by: Mike Knepper <mknepper@nvidia.com>

---------

Signed-off-by: Mike Knepper <mknepper@nvidia.com>
Signed-off-by: Alex Ray <alray@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants