Skip to content

Update ASR model to use batch mode and auto-select using batch/streaming#2153

Merged
jdye64 merged 1 commit into
NVIDIA:26.05from
jdye64:asr-fixes-2
May 28, 2026
Merged

Update ASR model to use batch mode and auto-select using batch/streaming#2153
jdye64 merged 1 commit into
NVIDIA:26.05from
jdye64:asr-fixes-2

Conversation

@jdye64
Copy link
Copy Markdown
Collaborator

@jdye64 jdye64 commented May 28, 2026

Description

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

@jdye64 jdye64 requested review from a team as code owners May 28, 2026 18:18
@jdye64 jdye64 requested review from edknv and removed request for a team May 28, 2026 18:18
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 28, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 28, 2026

Greptile Summary

This PR switches the Parakeet ASR NIM from streaming (mode=str) to offline/batch (mode=ofl) as the default deployment profile, updating both docker-compose and Helm values in lock-step. The auto logic in resolve_audio_infer_mode is updated to match: non-NVCF endpoints now resolve to offline instead of online, while NVCF (grpc.nvcf.nvidia.com) remains streaming.

  • docker-compose.yaml and helm/values.yaml updated together to mode=ofl (docker-Helm parity maintained).
  • resolve_audio_infer_mode auto branch now returns offline for all non-NVCF gRPC endpoints; tests updated accordingly.
  • No migration path or warning is provided for users who deployed a streaming NIM on-prem and relied on auto defaulting to online.

Confidence Score: 3/5

Safe for new deployments and NVCF users; existing self-hosted streaming NIM deployments using the default auto mode will silently break at runtime without any warning.

The auto-mode resolution for non-NVCF endpoints flips from streaming to offline with no deprecation warning. Any user who had a self-hosted Parakeet NIM registered with a streaming profile and was relying on the auto default will get a runtime gRPC failure the moment they update, with no signal at configuration time. The change itself is correct and well-tested for the intended deployment pattern, but the silent behavioral shift for the previous default path is a real regression risk for existing users.

parakeet.py — the resolve_audio_infer_mode auto branch and the surrounding migration story; test_parakeet_infer_mode.py — missing coverage for explicit online mode against a local endpoint.

Important Files Changed

Filename Overview
nemo_retriever/src/nemo_retriever/api/internal/primitives/nim/model_interface/parakeet.py Core logic change: resolve_audio_infer_mode in auto mode now returns 'offline' for all non-NVCF endpoints (previously returned 'online'). This silently changes the behavior for self-hosted streaming NIMs using the default auto mode.
nemo_retriever/tests/test_parakeet_infer_mode.py Tests updated to match the new default offline behavior; includes parametrized cases for the new auto-selection logic and a new named test case for self-hosted offline. A test for explicit online mode targeting a local endpoint is missing.
docker-compose.yaml NIM_TAGS_SELECTOR updated from mode=str (streaming) to mode=ofl (offline) for the Parakeet service.
nemo_retriever/helm/values.yaml NIM_TAGS_SELECTOR updated from mode=str to mode=ofl, mirroring the docker-compose.yaml change. Docker/Helm parity is maintained.
nemo_retriever/src/nemo_retriever/params/models.py Inline comment on audio_infer_mode updated to reflect the new auto-selection semantics. No code changes.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["resolve_audio_infer_mode(mode, endpoint)"] --> B{normalized mode?}
    B -->|online| C["return 'online'"]
    B -->|offline| D["return 'offline'"]
    B -->|other, not 'auto'| E["raise ValueError"]
    B -->|auto| F{"'nvcf.nvidia.com'\nin endpoint?"}
    F -->|yes| G["return 'online'\n(streaming RPC)"]
    F -->|no| H["return 'offline'\n(batch recognize RPC)"]
    H --> I["self-hosted NIM\n(mode=ofl default)"]
    G --> J["NVCF cloud endpoint\ngrpc.nvcf.nvidia.com"]
Loading

Comments Outside Diff (1)

  1. nemo_retriever/tests/test_parakeet_infer_mode.py, line 57-72 (link)

    P2 Missing test for explicit online mode against a local endpoint

    test_parakeet_client_transcribe_uses_offline_when_explicit tests explicit offline mode on a local endpoint, but there is no corresponding test for infer_mode="online" against a local (non-NVCF) endpoint, which is now the only way to reach the streaming path for self-hosted NIMs. Adding this case would confirm that the _streaming_transcribe path is still reachable and exercised after this change.

    Rule Used: New functionality must include corresponding unit ... (source)

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: nemo_retriever/tests/test_parakeet_infer_mode.py
    Line: 57-72
    
    Comment:
    **Missing test for explicit `online` mode against a local endpoint**
    
    `test_parakeet_client_transcribe_uses_offline_when_explicit` tests explicit `offline` mode on a local endpoint, but there is no corresponding test for `infer_mode="online"` against a local (non-NVCF) endpoint, which is now the only way to reach the streaming path for self-hosted NIMs. Adding this case would confirm that the `_streaming_transcribe` path is still reachable and exercised after this change.
    
    **Rule Used:** New functionality must include corresponding unit ... ([source](https://app.greptile.com/review/custom-context?memory=test-coverage-new-code))
    
    How can I resolve this? If you propose a fix, please make it concise.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
nemo_retriever/src/nemo_retriever/api/internal/primitives/nim/model_interface/parakeet.py:71-73
**Silent breaking change for self-hosted streaming NIM deployments**

Any user who deployed a Parakeet NIM with a streaming profile (`mode=str`) on-prem and relied on the default `audio_infer_mode='auto'` will now silently route to `offline_recognize` instead of the streaming RPC. If that endpoint does not have an offline model registered, the call will fail at runtime with a gRPC error. There is no deprecation warning or migration note.

Per the `api-backward-compatibility` standard, the semantic change to what `auto` resolves to for non-NVCF endpoints should be accompanied by a note in the CHANGELOG/release notes or at minimum a `warnings.warn(...)` when `auto` selects `offline` for a non-NVCF endpoint, so users who relied on the old default are alerted.

### Issue 2 of 2
nemo_retriever/tests/test_parakeet_infer_mode.py:57-72
**Missing test for explicit `online` mode against a local endpoint**

`test_parakeet_client_transcribe_uses_offline_when_explicit` tests explicit `offline` mode on a local endpoint, but there is no corresponding test for `infer_mode="online"` against a local (non-NVCF) endpoint, which is now the only way to reach the streaming path for self-hosted NIMs. Adding this case would confirm that the `_streaming_transcribe` path is still reachable and exercised after this change.

Reviews (1): Last reviewed commit: "Update ASR model to use batch mode and a..." | Re-trigger Greptile

Comment on lines +71 to +73
if "nvcf.nvidia.com" in (endpoint or "").lower():
return "online"
return "offline"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Silent breaking change for self-hosted streaming NIM deployments

Any user who deployed a Parakeet NIM with a streaming profile (mode=str) on-prem and relied on the default audio_infer_mode='auto' will now silently route to offline_recognize instead of the streaming RPC. If that endpoint does not have an offline model registered, the call will fail at runtime with a gRPC error. There is no deprecation warning or migration note.

Per the api-backward-compatibility standard, the semantic change to what auto resolves to for non-NVCF endpoints should be accompanied by a note in the CHANGELOG/release notes or at minimum a warnings.warn(...) when auto selects offline for a non-NVCF endpoint, so users who relied on the old default are alerted.

Rule Used: Changes to public API surfaces (FastAPI endpoints,... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/api/internal/primitives/nim/model_interface/parakeet.py
Line: 71-73

Comment:
**Silent breaking change for self-hosted streaming NIM deployments**

Any user who deployed a Parakeet NIM with a streaming profile (`mode=str`) on-prem and relied on the default `audio_infer_mode='auto'` will now silently route to `offline_recognize` instead of the streaming RPC. If that endpoint does not have an offline model registered, the call will fail at runtime with a gRPC error. There is no deprecation warning or migration note.

Per the `api-backward-compatibility` standard, the semantic change to what `auto` resolves to for non-NVCF endpoints should be accompanied by a note in the CHANGELOG/release notes or at minimum a `warnings.warn(...)` when `auto` selects `offline` for a non-NVCF endpoint, so users who relied on the old default are alerted.

**Rule Used:** Changes to public API surfaces (FastAPI endpoints,... ([source](https://app.greptile.com/review/custom-context?memory=api-backward-compatibility))

How can I resolve this? If you propose a fix, please make it concise.

@jdye64 jdye64 merged commit 8856a87 into NVIDIA:26.05 May 28, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant