Tune single-GPU embed resource heuristic by jioffe502 · Pull Request #2089 · NVIDIA/NeMo-Retriever

jioffe502 · 2026-05-21T19:07:39Z

NOTE: general perf bump, not too worried about this functionality for 26.05 release. PR against main

Description

Tune the default local embedding resource plan for single-GPU batch ingest.
Use one embedding actor with a 0.2 GPU Ray reservation on single-GPU systems.
Keep multi-GPU defaults unchanged.
Preserve explicit --embed-workers and --embed-gpus-per-actor overrides.

Motivation

On a single H100 NVL, the previous default local embedding plan used two embedding actors at 0.5 GPU each. In retriever ingest --run-mode batch, that shape can over-reserve the single GPU for embedding and reduce scheduler room for the rest of the local pipeline.

Empirically, one local vLLM embedding actor with a 0.2 GPU reservation kept the GPU busier and improved end-to-end ingest time while producing the same LanceDB output row count.

Measured on the same machine:

Dataset	Prior default	Tuned setting	Result
jp20	168.4s	150.8s	~10.4% faster
bo767	3631.8s	2069.7s	~43.0% faster

For bo767, LanceDB row count remained 79978, with dropped_no_embedding=0 and the same dropped_bad_length=64 pattern as the prior run. Average GPU utilization increased from ~38.8% to ~65.6%.

Implementation

This keeps the existing heuristic model intact:

the change is isolated to resolve_requested_plan();
it only applies when available_gpu_count == 1;
multi-GPU defaults continue to use the existing per-GPU scaling;
explicit embed worker/GPU overrides still win.

Validation

uv run --frozen --extra local --extra dev pytest tests/test_resource_heuristics.py

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

greptile-apps · 2026-05-21T19:10:50Z

Greptile Summary

This PR tunes the default Ray resource plan for single-GPU batch ingest: on a system with exactly one GPU, the embedder is now allocated 1 actor at 0.2 GPU rather than the prior scaled default (1 actor at 0.5 GPU), leaving more headroom for OCR and page-element actors. Multi-GPU paths and all explicit overrides are unchanged.

Adds EMBED_SINGLE_GPU_ACTORS = 1 and EMBED_SINGLE_GPU_GPUS_PER_ACTOR = 0.2 module-level constants, then applies them in resolve_requested_plan only when available_gpu_count == 1 and no explicit override is present.
Test file updates the 1-GPU default assertions, plugs the previously missing embed_min_actors assertion, and adds a new test confirming explicit overrides take precedence over the heuristic.

Confidence Score: 5/5

Safe to merge — the change is narrowly scoped to a single post-resolution override block that only fires on single-GPU systems, multi-GPU paths are unaffected, and the new tests verify both the heuristic defaults and the override-wins path.

The heuristic change is well-contained: it touches one function, fires only when available_gpu_count == 1, and all explicit override parameters still take precedence. The updated and new tests cover the primary cases. No new correctness defects were found in the changed code paths beyond what has already been discussed in existing review threads.

No files require special attention; both changed files are small and self-contained.

Important Files Changed

Filename	Overview
nemo_retriever/src/nemo_retriever/utils/ray_resource_hueristics.py	Adds two single-GPU constants and a post-resolution override block in `resolve_requested_plan` that applies them when `available_gpu_count == 1` and no explicit override is set; multi-GPU paths are untouched.
nemo_retriever/tests/test_resource_heuristics.py	Updates the 1-GPU default assertion to match the new constants, adds `embed_min_actors` coverage that was previously absent, and adds a new test verifying that explicit overrides win over the single-GPU heuristic.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[resolve_requested_plan called] --> B[Compute available_gpu_count]
    B --> C{available_gpu_count == 0\nand not allow_no_gpu?}
    C -- Yes --> D[Raise ValueError]
    C -- No --> E[_resolve_int/float_actors\nfor all actors using\nmulti-GPU defaults]
    E --> F{available_gpu_count == 1?}
    F -- No --> G[Keep multi-GPU resolved values]
    F -- Yes --> H{override_embed_initial_actors\nis None?}
    H -- Yes --> I[embed_initial_actors =\nEMBED_SINGLE_GPU_ACTORS=1]
    H -- No --> J[Keep override value]
    I --> K{override_embed_min_actors\nis None?}
    J --> K
    K -- Yes --> L[embed_min_actors =\nEMBED_SINGLE_GPU_ACTORS=1]
    K -- No --> M[Keep override value]
    L --> N{override_embed_max_actors\nis None?}
    M --> N
    N -- Yes --> O[embed_max_actors =\nEMBED_SINGLE_GPU_ACTORS=1]
    N -- No --> P[Keep override value]
    O --> Q{override_embed_gpus_per_actor\nis None?}
    P --> Q
    Q -- Yes --> R[embed_gpus_per_actor =\nEMBED_SINGLE_GPU_GPUS_PER_ACTOR=0.2]
    Q -- No --> S[Keep override value]
    R --> T[Continue resolving\nall other actor types]
    S --> T
    G --> T
    T --> U[Return RequestedPlan]

_{Reviews (2): Last reviewed commit: "Update nemo_retriever/src/nemo_retriever..." | Re-trigger Greptile}

greptile-apps · 2026-05-21T19:10:57Z

+        if override_embed_initial_actors is None:
+            embed_initial_actors = EMBED_SINGLE_GPU_ACTORS
+        if override_embed_min_actors is None:
+            embed_min_actors = EMBED_SINGLE_GPU_ACTORS
+        if override_embed_max_actors is None:
+            embed_max_actors = EMBED_SINGLE_GPU_ACTORS
+        if override_embed_gpus_per_actor is None:
+            embed_gpus_per_actor = EMBED_SINGLE_GPU_GPUS_PER_ACTOR


Override sentinel mismatch between heuristic block and resolvers

The single-GPU block guards on override is None, but the _resolve_int_actors / _resolve_float_actors helpers treat any value <= 0 the same as None (i.e., they fall through to the default). If a caller passes override_embed_gpus_per_actor=0.0, _resolve_float_actors ignores it and returns 0.5, but the single-GPU block also skips the heuristic (since 0.0 is not None), so the final value is 0.5 — neither the caller's intended value nor the single-GPU default of 0.2. A guard of override is None or override <= 0 (matching the resolver semantics) would make the two layers consistent.

Prompt To Fix With AI

This is a comment left during a code review. Path: nemo_retriever/src/nemo_retriever/utils/ray_resource_hueristics.py Line: 612-619 Comment: **Override sentinel mismatch between heuristic block and resolvers** The single-GPU block guards on `override is None`, but the `_resolve_int_actors` / `_resolve_float_actors` helpers treat any value `<= 0` the same as `None` (i.e., they fall through to the default). If a caller passes `override_embed_gpus_per_actor=0.0`, `_resolve_float_actors` ignores it and returns `0.5`, but the single-GPU block also skips the heuristic (since `0.0 is not None`), so the final value is `0.5` — neither the caller's intended value nor the single-GPU default of `0.2`. A guard of `override is None or override <= 0` (matching the resolver semantics) would make the two layers consistent. How can I resolve this? If you propose a fix, please make it concise.

…s.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

Tune single-GPU embed resource heuristic

476f857

jioffe502 requested review from a team as code owners May 21, 2026 19:07

jioffe502 requested a review from ChrisJar May 21, 2026 19:07

jdye64 approved these changes May 21, 2026

View reviewed changes

greptile-apps Bot reviewed May 21, 2026

View reviewed changes

Update nemo_retriever/src/nemo_retriever/utils/ray_resource_hueristic…

18964e5

…s.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

jioffe502 merged commit 03cb241 into NVIDIA:main May 21, 2026
7 of 8 checks passed

jioffe502 mentioned this pull request May 22, 2026

Replace ingest input-type routing with manifest branches #2095

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tune single-GPU embed resource heuristic#2089

Tune single-GPU embed resource heuristic#2089
jioffe502 merged 2 commits into
NVIDIA:mainfrom
jioffe502:codex/single-gpu-embed-heuristic

jioffe502 commented May 21, 2026

Uh oh!

greptile-apps Bot commented May 21, 2026 •

edited

Loading

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

greptile-apps Bot May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jioffe502 commented May 21, 2026

Description

Motivation

Implementation

Validation

Checklist

Uh oh!

greptile-apps Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

greptile-apps Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented May 21, 2026 •

edited

Loading