Update ASR model to use batch mode and auto-select using batch/streaming by jdye64 · Pull Request #2153 · NVIDIA/NeMo-Retriever

jdye64 · 2026-05-28T18:18:36Z

Description

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

copy-pr-bot · 2026-05-28T18:18:40Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-05-28T18:21:18Z

Greptile Summary

This PR switches the Parakeet ASR NIM from streaming (mode=str) to offline/batch (mode=ofl) as the default deployment profile, updating both docker-compose and Helm values in lock-step. The auto logic in resolve_audio_infer_mode is updated to match: non-NVCF endpoints now resolve to offline instead of online, while NVCF (grpc.nvcf.nvidia.com) remains streaming.

docker-compose.yaml and helm/values.yaml updated together to mode=ofl (docker-Helm parity maintained).
resolve_audio_infer_mode auto branch now returns offline for all non-NVCF gRPC endpoints; tests updated accordingly.
No migration path or warning is provided for users who deployed a streaming NIM on-prem and relied on auto defaulting to online.

Confidence Score: 3/5

Safe for new deployments and NVCF users; existing self-hosted streaming NIM deployments using the default auto mode will silently break at runtime without any warning.

The auto-mode resolution for non-NVCF endpoints flips from streaming to offline with no deprecation warning. Any user who had a self-hosted Parakeet NIM registered with a streaming profile and was relying on the auto default will get a runtime gRPC failure the moment they update, with no signal at configuration time. The change itself is correct and well-tested for the intended deployment pattern, but the silent behavioral shift for the previous default path is a real regression risk for existing users.

parakeet.py — the resolve_audio_infer_mode auto branch and the surrounding migration story; test_parakeet_infer_mode.py — missing coverage for explicit online mode against a local endpoint.

Important Files Changed

Filename	Overview
nemo_retriever/src/nemo_retriever/api/internal/primitives/nim/model_interface/parakeet.py	Core logic change: resolve_audio_infer_mode in auto mode now returns 'offline' for all non-NVCF endpoints (previously returned 'online'). This silently changes the behavior for self-hosted streaming NIMs using the default auto mode.
nemo_retriever/tests/test_parakeet_infer_mode.py	Tests updated to match the new default offline behavior; includes parametrized cases for the new auto-selection logic and a new named test case for self-hosted offline. A test for explicit online mode targeting a local endpoint is missing.
docker-compose.yaml	NIM_TAGS_SELECTOR updated from mode=str (streaming) to mode=ofl (offline) for the Parakeet service.
nemo_retriever/helm/values.yaml	NIM_TAGS_SELECTOR updated from mode=str to mode=ofl, mirroring the docker-compose.yaml change. Docker/Helm parity is maintained.
nemo_retriever/src/nemo_retriever/params/models.py	Inline comment on audio_infer_mode updated to reflect the new auto-selection semantics. No code changes.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["resolve_audio_infer_mode(mode, endpoint)"] --> B{normalized mode?}
    B -->|online| C["return 'online'"]
    B -->|offline| D["return 'offline'"]
    B -->|other, not 'auto'| E["raise ValueError"]
    B -->|auto| F{"'nvcf.nvidia.com'\nin endpoint?"}
    F -->|yes| G["return 'online'\n(streaming RPC)"]
    F -->|no| H["return 'offline'\n(batch recognize RPC)"]
    H --> I["self-hosted NIM\n(mode=ofl default)"]
    G --> J["NVCF cloud endpoint\ngrpc.nvcf.nvidia.com"]

Comments Outside Diff (1)

nemo_retriever/tests/test_parakeet_infer_mode.py, line 57-72 (link)

Missing test for explicit online mode against a local endpoint

test_parakeet_client_transcribe_uses_offline_when_explicit tests explicit offline mode on a local endpoint, but there is no corresponding test for infer_mode="online" against a local (non-NVCF) endpoint, which is now the only way to reach the streaming path for self-hosted NIMs. Adding this case would confirm that the _streaming_transcribe path is still reachable and exercised after this change.

Rule Used: New functionality must include corresponding unit ... (source)

Prompt To Fix With AI

This is a comment left during a code review.
Path: nemo_retriever/tests/test_parakeet_infer_mode.py
Line: 57-72

Comment:
**Missing test for explicit `online` mode against a local endpoint**

`test_parakeet_client_transcribe_uses_offline_when_explicit` tests explicit `offline` mode on a local endpoint, but there is no corresponding test for `infer_mode="online"` against a local (non-NVCF) endpoint, which is now the only way to reach the streaming path for self-hosted NIMs. Adding this case would confirm that the `_streaming_transcribe` path is still reachable and exercised after this change.

**Rule Used:** New functionality must include corresponding unit ... ([source](https://app.greptile.com/review/custom-context?memory=test-coverage-new-code))

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix All With AI

Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
nemo_retriever/src/nemo_retriever/api/internal/primitives/nim/model_interface/parakeet.py:71-73
**Silent breaking change for self-hosted streaming NIM deployments**

Any user who deployed a Parakeet NIM with a streaming profile (`mode=str`) on-prem and relied on the default `audio_infer_mode='auto'` will now silently route to `offline_recognize` instead of the streaming RPC. If that endpoint does not have an offline model registered, the call will fail at runtime with a gRPC error. There is no deprecation warning or migration note.

Per the `api-backward-compatibility` standard, the semantic change to what `auto` resolves to for non-NVCF endpoints should be accompanied by a note in the CHANGELOG/release notes or at minimum a `warnings.warn(...)` when `auto` selects `offline` for a non-NVCF endpoint, so users who relied on the old default are alerted.

### Issue 2 of 2
nemo_retriever/tests/test_parakeet_infer_mode.py:57-72
**Missing test for explicit `online` mode against a local endpoint**

`test_parakeet_client_transcribe_uses_offline_when_explicit` tests explicit `offline` mode on a local endpoint, but there is no corresponding test for `infer_mode="online"` against a local (non-NVCF) endpoint, which is now the only way to reach the streaming path for self-hosted NIMs. Adding this case would confirm that the `_streaming_transcribe` path is still reachable and exercised after this change.

_{Reviews (1): Last reviewed commit: "Update ASR model to use batch mode and a..." | Re-trigger Greptile}

greptile-apps · 2026-05-28T18:21:22Z

+    if "nvcf.nvidia.com" in (endpoint or "").lower():
+        return "online"
+    return "offline"


Silent breaking change for self-hosted streaming NIM deployments

Any user who deployed a Parakeet NIM with a streaming profile (mode=str) on-prem and relied on the default audio_infer_mode='auto' will now silently route to offline_recognize instead of the streaming RPC. If that endpoint does not have an offline model registered, the call will fail at runtime with a gRPC error. There is no deprecation warning or migration note.

Per the api-backward-compatibility standard, the semantic change to what auto resolves to for non-NVCF endpoints should be accompanied by a note in the CHANGELOG/release notes or at minimum a warnings.warn(...) when auto selects offline for a non-NVCF endpoint, so users who relied on the old default are alerted.

Rule Used: Changes to public API surfaces (FastAPI endpoints,... (source)

Prompt To Fix With AI

This is a comment left during a code review. Path: nemo_retriever/src/nemo_retriever/api/internal/primitives/nim/model_interface/parakeet.py Line: 71-73 Comment: **Silent breaking change for self-hosted streaming NIM deployments** Any user who deployed a Parakeet NIM with a streaming profile (`mode=str`) on-prem and relied on the default `audio_infer_mode='auto'` will now silently route to `offline_recognize` instead of the streaming RPC. If that endpoint does not have an offline model registered, the call will fail at runtime with a gRPC error. There is no deprecation warning or migration note. Per the `api-backward-compatibility` standard, the semantic change to what `auto` resolves to for non-NVCF endpoints should be accompanied by a note in the CHANGELOG/release notes or at minimum a `warnings.warn(...)` when `auto` selects `offline` for a non-NVCF endpoint, so users who relied on the old default are alerted. **Rule Used:** Changes to public API surfaces (FastAPI endpoints,... ([source](https://app.greptile.com/review/custom-context?memory=api-backward-compatibility)) How can I resolve this? If you propose a fix, please make it concise.

Update ASR model to use batch mode and auto-select using batch/streaming

0b8c9a2

jdye64 requested review from a team as code owners May 28, 2026 18:18

jdye64 requested review from edknv and removed request for a team May 28, 2026 18:18

greptile-apps Bot reviewed May 28, 2026

View reviewed changes

jdye64 merged commit 8856a87 into NVIDIA:26.05 May 28, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update ASR model to use batch mode and auto-select using batch/streaming#2153

Update ASR model to use batch mode and auto-select using batch/streaming#2153
jdye64 merged 1 commit into
NVIDIA:26.05from
jdye64:asr-fixes-2

jdye64 commented May 28, 2026

Uh oh!

copy-pr-bot Bot commented May 28, 2026

Uh oh!

greptile-apps Bot commented May 28, 2026 •

edited

Loading

Confidence Score: 3/5

Important Files Changed

Flowchart

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot May 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jdye64 commented May 28, 2026

Description

Checklist

Uh oh!

copy-pr-bot Bot commented May 28, 2026

Uh oh!

greptile-apps Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Comments Outside Diff (1)

Uh oh!

greptile-apps Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

greptile-apps Bot commented May 28, 2026 •

edited

Loading