Skip to content

Tune single-GPU embed resource heuristic#2089

Merged
jioffe502 merged 2 commits into
NVIDIA:mainfrom
jioffe502:codex/single-gpu-embed-heuristic
May 21, 2026
Merged

Tune single-GPU embed resource heuristic#2089
jioffe502 merged 2 commits into
NVIDIA:mainfrom
jioffe502:codex/single-gpu-embed-heuristic

Conversation

@jioffe502
Copy link
Copy Markdown
Collaborator

NOTE: general perf bump, not too worried about this functionality for 26.05 release. PR against main

Description

  • Tune the default local embedding resource plan for single-GPU batch ingest.
  • Use one embedding actor with a 0.2 GPU Ray reservation on single-GPU systems.
  • Keep multi-GPU defaults unchanged.
  • Preserve explicit --embed-workers and --embed-gpus-per-actor overrides.

Motivation

On a single H100 NVL, the previous default local embedding plan used two embedding actors at 0.5 GPU each. In retriever ingest --run-mode batch, that shape can over-reserve the single GPU for embedding and reduce scheduler room for the rest of the local pipeline.

Empirically, one local vLLM embedding actor with a 0.2 GPU reservation kept the GPU busier and improved end-to-end ingest time while producing the same LanceDB output row count.

Measured on the same machine:

Dataset Prior default Tuned setting Result
jp20 168.4s 150.8s ~10.4% faster
bo767 3631.8s 2069.7s ~43.0% faster

For bo767, LanceDB row count remained 79978, with dropped_no_embedding=0 and the same dropped_bad_length=64 pattern as the prior run. Average GPU utilization increased from ~38.8% to ~65.6%.

Implementation

This keeps the existing heuristic model intact:

  • the change is isolated to resolve_requested_plan();
  • it only applies when available_gpu_count == 1;
  • multi-GPU defaults continue to use the existing per-GPU scaling;
  • explicit embed worker/GPU overrides still win.

Validation

  • uv run --frozen --extra local --extra dev pytest tests/test_resource_heuristics.py

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

@jioffe502 jioffe502 requested review from a team as code owners May 21, 2026 19:07
@jioffe502 jioffe502 requested a review from ChrisJar May 21, 2026 19:07
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 21, 2026

Greptile Summary

This PR tunes the default Ray resource plan for single-GPU batch ingest: on a system with exactly one GPU, the embedder is now allocated 1 actor at 0.2 GPU rather than the prior scaled default (1 actor at 0.5 GPU), leaving more headroom for OCR and page-element actors. Multi-GPU paths and all explicit overrides are unchanged.

  • Adds EMBED_SINGLE_GPU_ACTORS = 1 and EMBED_SINGLE_GPU_GPUS_PER_ACTOR = 0.2 module-level constants, then applies them in resolve_requested_plan only when available_gpu_count == 1 and no explicit override is present.
  • Test file updates the 1-GPU default assertions, plugs the previously missing embed_min_actors assertion, and adds a new test confirming explicit overrides take precedence over the heuristic.

Confidence Score: 5/5

Safe to merge — the change is narrowly scoped to a single post-resolution override block that only fires on single-GPU systems, multi-GPU paths are unaffected, and the new tests verify both the heuristic defaults and the override-wins path.

The heuristic change is well-contained: it touches one function, fires only when available_gpu_count == 1, and all explicit override parameters still take precedence. The updated and new tests cover the primary cases. No new correctness defects were found in the changed code paths beyond what has already been discussed in existing review threads.

No files require special attention; both changed files are small and self-contained.

Important Files Changed

Filename Overview
nemo_retriever/src/nemo_retriever/utils/ray_resource_hueristics.py Adds two single-GPU constants and a post-resolution override block in resolve_requested_plan that applies them when available_gpu_count == 1 and no explicit override is set; multi-GPU paths are untouched.
nemo_retriever/tests/test_resource_heuristics.py Updates the 1-GPU default assertion to match the new constants, adds embed_min_actors coverage that was previously absent, and adds a new test verifying that explicit overrides win over the single-GPU heuristic.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[resolve_requested_plan called] --> B[Compute available_gpu_count]
    B --> C{available_gpu_count == 0\nand not allow_no_gpu?}
    C -- Yes --> D[Raise ValueError]
    C -- No --> E[_resolve_int/float_actors\nfor all actors using\nmulti-GPU defaults]
    E --> F{available_gpu_count == 1?}
    F -- No --> G[Keep multi-GPU resolved values]
    F -- Yes --> H{override_embed_initial_actors\nis None?}
    H -- Yes --> I[embed_initial_actors =\nEMBED_SINGLE_GPU_ACTORS=1]
    H -- No --> J[Keep override value]
    I --> K{override_embed_min_actors\nis None?}
    J --> K
    K -- Yes --> L[embed_min_actors =\nEMBED_SINGLE_GPU_ACTORS=1]
    K -- No --> M[Keep override value]
    L --> N{override_embed_max_actors\nis None?}
    M --> N
    N -- Yes --> O[embed_max_actors =\nEMBED_SINGLE_GPU_ACTORS=1]
    N -- No --> P[Keep override value]
    O --> Q{override_embed_gpus_per_actor\nis None?}
    P --> Q
    Q -- Yes --> R[embed_gpus_per_actor =\nEMBED_SINGLE_GPU_GPUS_PER_ACTOR=0.2]
    Q -- No --> S[Keep override value]
    R --> T[Continue resolving\nall other actor types]
    S --> T
    G --> T
    T --> U[Return RequestedPlan]
Loading

Reviews (2): Last reviewed commit: "Update nemo_retriever/src/nemo_retriever..." | Re-trigger Greptile

Comment thread nemo_retriever/src/nemo_retriever/utils/ray_resource_hueristics.py Outdated
Comment on lines +612 to +619
if override_embed_initial_actors is None:
embed_initial_actors = EMBED_SINGLE_GPU_ACTORS
if override_embed_min_actors is None:
embed_min_actors = EMBED_SINGLE_GPU_ACTORS
if override_embed_max_actors is None:
embed_max_actors = EMBED_SINGLE_GPU_ACTORS
if override_embed_gpus_per_actor is None:
embed_gpus_per_actor = EMBED_SINGLE_GPU_GPUS_PER_ACTOR
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Override sentinel mismatch between heuristic block and resolvers

The single-GPU block guards on override is None, but the _resolve_int_actors / _resolve_float_actors helpers treat any value <= 0 the same as None (i.e., they fall through to the default). If a caller passes override_embed_gpus_per_actor=0.0, _resolve_float_actors ignores it and returns 0.5, but the single-GPU block also skips the heuristic (since 0.0 is not None), so the final value is 0.5 — neither the caller's intended value nor the single-GPU default of 0.2. A guard of override is None or override <= 0 (matching the resolver semantics) would make the two layers consistent.

Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/utils/ray_resource_hueristics.py
Line: 612-619

Comment:
**Override sentinel mismatch between heuristic block and resolvers**

The single-GPU block guards on `override is None`, but the `_resolve_int_actors` / `_resolve_float_actors` helpers treat any value `<= 0` the same as `None` (i.e., they fall through to the default). If a caller passes `override_embed_gpus_per_actor=0.0`, `_resolve_float_actors` ignores it and returns `0.5`, but the single-GPU block also skips the heuristic (since `0.0 is not None`), so the final value is `0.5` — neither the caller's intended value nor the single-GPU default of `0.2`. A guard of `override is None or override <= 0` (matching the resolver semantics) would make the two layers consistent.

How can I resolve this? If you propose a fix, please make it concise.

…s.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@jioffe502 jioffe502 merged commit 03cb241 into NVIDIA:main May 21, 2026
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants