Helm nemotron parse#2092
Conversation
Greptile SummaryThis PR updates Helm and library-mode configurations to support the Nemotron Parse v1.2 dual-path (legacy v1.0/v1.1 vs. newer) remote API and promotes the VL embedding model to the default across the stack.
|
| Filename | Overview |
|---|---|
| ci/scripts/validate_deployment_configs.py | Removes VLM service mapping — VLM config parity between docker-compose and Helm values will no longer be validated if the service still exists in docker-compose.yaml. |
| nemo_retriever/src/nemo_retriever/model/init.py | Changes _DEFAULT_EMBED_MODEL from the text-only embed-1b-v2 to VL_EMBED_MODEL (embed-vl-1b-v2), a breaking default that silently switches callers passing model_name=None to a heavier VL model with different dimensions and dependency requirements. |
| nemo_retriever/src/nemo_retriever/parse/nemotron_parse.py | Adds version-based branching (legacy v1.0/v1.1 vs newer) for chat-completions payload construction: legacy models receive tools+max_tokens and task_prompt=None; non-legacy models receive plain max_tokens with task_prompt forwarded. Detection uses substring matching on the model name. |
| nemo_retriever/src/nemo_retriever/service/client.py | Adds an early-return guard in _on_sse_event to discard document events with statuses other than "completed" or "failed", preventing unexpected statuses from reaching the event queue. |
| nemo_retriever/src/nemo_retriever/service_ingestor.py | Adds a symmetric status guard in the document_complete event handler — functionally unreachable in practice since client.py already filters non-completed/failed statuses before enqueuing. |
| nemo_retriever/src/nemo_retriever/text_embed/runtime.py | Updates the secondary fallback in _embed_group to VL_EMBED_MODEL, consistent with the init.py default change; carries the same silent model-switch impact for any code path where resolved_model_name is falsy. |
| nemo_retriever/tests/test_create_local_embedder.py | Test suite updated to match the new VL default: mock fixtures and assertions repointed from text-model mocks to VL-model mocks; normalize/max_length kwarg assertions removed since the VL path doesn't forward them. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[nemotron_parse_pages called] --> B{use_remote?}
B -- No --> C[Local vLLM model\ntask_prompt forwarded]
B -- Yes --> D{invoke_url contains\n/v1/chat/completions?}
D -- No --> E[invoke_image_inference_batches\nlegacy image API]
D -- Yes --> F[Resolve model name]
F --> G{_is_legacy?\nsubstring v1.0/v1.1/v1_0/v1_1}
G -- Yes --> H[extra_body: tools + max_tokens\ntask_prompt = None\nused_v1_api = True]
G -- No --> I[extra_body: max_tokens only\ntask_prompt forwarded]
H --> J[invoke_chat_completions_images]
I --> J
J --> K{nim_client provided?}
K -- Yes --> L[nim_client.invoke_chat_completions_images]
K -- No --> M[standalone invoke_chat_completions_images\nwith thread pool]
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
ci/scripts/validate_deployment_configs.py:48-58
**VLM service silently removed from deployment validation**
The `"vlm": "nemotron_nano_12b_v2_vl"` entry has been removed from `SERVICE_MAPPING`. On line 149 the script uses `if service_name not in SERVICE_MAPPING: continue`, so any `vlm` service still present in `docker-compose.yaml` will now be silently skipped — image-tag or environment-variable mismatches between Compose and Helm for that service will no longer be reported. Is VLM intentionally being dropped from the deployment, or should the mapping remain for coverage?
Reviews (3): Last reviewed commit: "test fixes" | Re-trigger Greptile
| } | ||
|
|
||
| _DEFAULT_EMBED_MODEL = "nvidia/llama-nemotron-embed-1b-v2" | ||
| _DEFAULT_EMBED_MODEL = VL_EMBED_MODEL |
There was a problem hiding this comment.
Silent breaking change to default embed model
_DEFAULT_EMBED_MODEL is now VL_EMBED_MODEL (nvidia/llama-nemotron-embed-vl-1b-v2), replacing the previous default of nvidia/llama-nemotron-embed-1b-v2. Any caller that passes model_name=None — including resolve_embed_model(None) and create_local_embedder(model_name=None) — will silently get the VL model, which has different dependency requirements (VLM packages), different GPU memory footprint, and different embedding dimensions than the text-only model. Existing pipelines that relied on the default will break or produce incompatible embeddings without any warning. Per the public-api-contract rule, this is a breaking change that requires a deprecation cycle or at minimum a clear release-note callout. The same implicit break is carried into text_embed/runtime.py line 78 where the fallback also changes to VL_EMBED_MODEL.
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/model/__init__.py
Line: 39
Comment:
**Silent breaking change to default embed model**
`_DEFAULT_EMBED_MODEL` is now `VL_EMBED_MODEL` (`nvidia/llama-nemotron-embed-vl-1b-v2`), replacing the previous default of `nvidia/llama-nemotron-embed-1b-v2`. Any caller that passes `model_name=None` — including `resolve_embed_model(None)` and `create_local_embedder(model_name=None)` — will silently get the VL model, which has different dependency requirements (VLM packages), different GPU memory footprint, and different embedding dimensions than the text-only model. Existing pipelines that relied on the default will break or produce incompatible embeddings without any warning. Per the `public-api-contract` rule, this is a breaking change that requires a deprecation cycle or at minimum a clear release-note callout. The same implicit break is carried into `text_embed/runtime.py` line 78 where the fallback also changes to `VL_EMBED_MODEL`.
How can I resolve this? If you propose a fix, please make it concise.| if "/v1/chat/completions" in invoke_url: | ||
| used_v1_api = True | ||
| _model_name = nemotron_parse_model or NEMOTRON_PARSE_REMOTE_DEFAULT_MODEL | ||
| _is_legacy = any(v in _model_name.lower() for v in ("v1.0", "v1.1", "v1_0", "v1_1")) |
There was a problem hiding this comment.
Fragile model version detection via substring match
_is_legacy is determined by checking whether the lowercased model name contains "v1.0", "v1.1", "v1_0", or "v1_1" as a substring. A future model named, for example, "nvidia/nemotron-parse-v2-v1.1-finetuned" or a path like /models/v1.0-backup/nemotron-parse-v2 would be incorrectly classified as legacy, silently injecting the tool-call extra_body and suppressing task_prompt. A structured version comparison — parsing the version token from the model name, or using an explicit allowlist of known legacy model IDs — would be more robust.
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/parse/nemotron_parse.py
Line: 317
Comment:
**Fragile model version detection via substring match**
`_is_legacy` is determined by checking whether the lowercased model name contains `"v1.0"`, `"v1.1"`, `"v1_0"`, or `"v1_1"` as a substring. A future model named, for example, `"nvidia/nemotron-parse-v2-v1.1-finetuned"` or a path like `/models/v1.0-backup/nemotron-parse-v2` would be incorrectly classified as legacy, silently injecting the tool-call `extra_body` and suppressing `task_prompt`. A structured version comparison — parsing the version token from the model name, or using an explicit allowlist of known legacy model IDs — would be more robust.
How can I resolve this? If you propose a fix, please make it concise.| } | ||
|
|
||
| _DEFAULT_EMBED_MODEL = "nvidia/llama-nemotron-embed-1b-v2" | ||
| _DEFAULT_EMBED_MODEL = VL_EMBED_MODEL |
There was a problem hiding this comment.
Silent breaking change to the default embedding model
Every caller that omits model_name (i.e. passes None or "") now silently receives nvidia/llama-nemotron-embed-vl-1b-v2 instead of nvidia/llama-nemotron-embed-1b-v2. The VL model may produce embeddings with a different hidden dimension and requires different GPU memory, so any existing index built against the text-only default will silently produce mismatched vectors on lookup. This change lacks a deprecation warning and migration path, which violates the backward-compatibility contract for library users who relied on the previous default.
Rule Used: Changes to public API surfaces (FastAPI endpoints,... (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/model/__init__.py
Line: 39
Comment:
**Silent breaking change to the default embedding model**
Every caller that omits `model_name` (i.e. passes `None` or `""`) now silently receives `nvidia/llama-nemotron-embed-vl-1b-v2` instead of `nvidia/llama-nemotron-embed-1b-v2`. The VL model may produce embeddings with a different hidden dimension and requires different GPU memory, so any existing index built against the text-only default will silently produce mismatched vectors on lookup. This change lacks a deprecation warning and migration path, which violates the backward-compatibility contract for library users who relied on the previous default.
**Rule Used:** Changes to public API surfaces (FastAPI endpoints,... ([source](https://app.greptile.com/review/custom-context?memory=api-backward-compatibility))
How can I resolve this? If you propose a fix, please make it concise.| if status not in ("completed", "failed"): | ||
| return |
There was a problem hiding this comment.
Unexpected document statuses dropped without logging
When the server emits a document event with a status value that is neither "completed" nor "failed", the event is silently discarded. If the server introduces a new intermediate status (e.g. "processing", "partial") or emits an unexpected value due to a bug, the document will disappear from the client's accounting with no diagnostic trace. A logger.debug or logger.warning here would preserve observability without changing the filtering behaviour.
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/service/client.py
Line: 620-621
Comment:
**Unexpected document statuses dropped without logging**
When the server emits a document event with a status value that is neither `"completed"` nor `"failed"`, the event is silently discarded. If the server introduces a new intermediate status (e.g. `"processing"`, `"partial"`) or emits an unexpected value due to a bug, the document will disappear from the client's accounting with no diagnostic trace. A `logger.debug` or `logger.warning` here would preserve observability without changing the filtering behaviour.
How can I resolve this? If you propose a fix, please make it concise.| if status not in ("completed", "failed"): | ||
| continue |
There was a problem hiding this comment.
Redundant status guard — unreachable in practice
client.py's _on_sse_event already returns early (and never calls event_queue.put_nowait) when the status is not "completed" or "failed", so no document_complete event with another status can reach this branch. The guard is dead code in the current architecture. If the intent is purely defensive, a comment explaining that would help future readers; otherwise it can be removed to keep the event loop clear.
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/service_ingestor.py
Line: 974-975
Comment:
**Redundant status guard — unreachable in practice**
`client.py`'s `_on_sse_event` already returns early (and never calls `event_queue.put_nowait`) when the status is not `"completed"` or `"failed"`, so no `document_complete` event with another status can reach this branch. The guard is dead code in the current architecture. If the intent is purely defensive, a comment explaining that would help future readers; otherwise it can be removed to keep the event loop clear.
How can I resolve this? If you propose a fix, please make it concise.| if "/v1/chat/completions" in invoke_url: | ||
| used_v1_api = True | ||
| _model_name = nemotron_parse_model or NEMOTRON_PARSE_REMOTE_DEFAULT_MODEL | ||
| _is_legacy = any(v in _model_name.lower() for v in ("v1.0", "v1.1", "v1_0", "v1_1")) |
There was a problem hiding this comment.
Version detection relies on substring matching
_is_legacy is determined by whether the model name contains "v1.0", "v1.1", "v1_0", or "v1_1" as a substring. A model named e.g. "my-org/nemotron-parse-v1.0-finetuned" would be correctly flagged, but so would any future model whose path or organisation name happens to contain those strings. An explicit allowlist of known legacy model IDs (or a version-prefixed constant) would be more robust and easier to document.
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/parse/nemotron_parse.py
Line: 317
Comment:
**Version detection relies on substring matching**
`_is_legacy` is determined by whether the model name contains `"v1.0"`, `"v1.1"`, `"v1_0"`, or `"v1_1"` as a substring. A model named e.g. `"my-org/nemotron-parse-v1.0-finetuned"` would be correctly flagged, but so would any future model whose path or organisation name happens to contain those strings. An explicit allowlist of known legacy model IDs (or a version-prefixed constant) would be more robust and easier to document.
How can I resolve this? If you propose a fix, please make it concise.
Description
Checklist