Update ASR model to use batch mode and auto-select using batch/streaming#2153
Conversation
Greptile SummaryThis PR switches the Parakeet ASR NIM from streaming (
|
| Filename | Overview |
|---|---|
| nemo_retriever/src/nemo_retriever/api/internal/primitives/nim/model_interface/parakeet.py | Core logic change: resolve_audio_infer_mode in auto mode now returns 'offline' for all non-NVCF endpoints (previously returned 'online'). This silently changes the behavior for self-hosted streaming NIMs using the default auto mode. |
| nemo_retriever/tests/test_parakeet_infer_mode.py | Tests updated to match the new default offline behavior; includes parametrized cases for the new auto-selection logic and a new named test case for self-hosted offline. A test for explicit online mode targeting a local endpoint is missing. |
| docker-compose.yaml | NIM_TAGS_SELECTOR updated from mode=str (streaming) to mode=ofl (offline) for the Parakeet service. |
| nemo_retriever/helm/values.yaml | NIM_TAGS_SELECTOR updated from mode=str to mode=ofl, mirroring the docker-compose.yaml change. Docker/Helm parity is maintained. |
| nemo_retriever/src/nemo_retriever/params/models.py | Inline comment on audio_infer_mode updated to reflect the new auto-selection semantics. No code changes. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["resolve_audio_infer_mode(mode, endpoint)"] --> B{normalized mode?}
B -->|online| C["return 'online'"]
B -->|offline| D["return 'offline'"]
B -->|other, not 'auto'| E["raise ValueError"]
B -->|auto| F{"'nvcf.nvidia.com'\nin endpoint?"}
F -->|yes| G["return 'online'\n(streaming RPC)"]
F -->|no| H["return 'offline'\n(batch recognize RPC)"]
H --> I["self-hosted NIM\n(mode=ofl default)"]
G --> J["NVCF cloud endpoint\ngrpc.nvcf.nvidia.com"]
Comments Outside Diff (1)
-
nemo_retriever/tests/test_parakeet_infer_mode.py, line 57-72 (link)Missing test for explicit
onlinemode against a local endpointtest_parakeet_client_transcribe_uses_offline_when_explicittests explicitofflinemode on a local endpoint, but there is no corresponding test forinfer_mode="online"against a local (non-NVCF) endpoint, which is now the only way to reach the streaming path for self-hosted NIMs. Adding this case would confirm that the_streaming_transcribepath is still reachable and exercised after this change.Rule Used: New functionality must include corresponding unit ... (source)
Prompt To Fix With AI
This is a comment left during a code review. Path: nemo_retriever/tests/test_parakeet_infer_mode.py Line: 57-72 Comment: **Missing test for explicit `online` mode against a local endpoint** `test_parakeet_client_transcribe_uses_offline_when_explicit` tests explicit `offline` mode on a local endpoint, but there is no corresponding test for `infer_mode="online"` against a local (non-NVCF) endpoint, which is now the only way to reach the streaming path for self-hosted NIMs. Adding this case would confirm that the `_streaming_transcribe` path is still reachable and exercised after this change. **Rule Used:** New functionality must include corresponding unit ... ([source](https://app.greptile.com/review/custom-context?memory=test-coverage-new-code)) How can I resolve this? If you propose a fix, please make it concise.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
nemo_retriever/src/nemo_retriever/api/internal/primitives/nim/model_interface/parakeet.py:71-73
**Silent breaking change for self-hosted streaming NIM deployments**
Any user who deployed a Parakeet NIM with a streaming profile (`mode=str`) on-prem and relied on the default `audio_infer_mode='auto'` will now silently route to `offline_recognize` instead of the streaming RPC. If that endpoint does not have an offline model registered, the call will fail at runtime with a gRPC error. There is no deprecation warning or migration note.
Per the `api-backward-compatibility` standard, the semantic change to what `auto` resolves to for non-NVCF endpoints should be accompanied by a note in the CHANGELOG/release notes or at minimum a `warnings.warn(...)` when `auto` selects `offline` for a non-NVCF endpoint, so users who relied on the old default are alerted.
### Issue 2 of 2
nemo_retriever/tests/test_parakeet_infer_mode.py:57-72
**Missing test for explicit `online` mode against a local endpoint**
`test_parakeet_client_transcribe_uses_offline_when_explicit` tests explicit `offline` mode on a local endpoint, but there is no corresponding test for `infer_mode="online"` against a local (non-NVCF) endpoint, which is now the only way to reach the streaming path for self-hosted NIMs. Adding this case would confirm that the `_streaming_transcribe` path is still reachable and exercised after this change.
Reviews (1): Last reviewed commit: "Update ASR model to use batch mode and a..." | Re-trigger Greptile
| if "nvcf.nvidia.com" in (endpoint or "").lower(): | ||
| return "online" | ||
| return "offline" |
There was a problem hiding this comment.
Silent breaking change for self-hosted streaming NIM deployments
Any user who deployed a Parakeet NIM with a streaming profile (mode=str) on-prem and relied on the default audio_infer_mode='auto' will now silently route to offline_recognize instead of the streaming RPC. If that endpoint does not have an offline model registered, the call will fail at runtime with a gRPC error. There is no deprecation warning or migration note.
Per the api-backward-compatibility standard, the semantic change to what auto resolves to for non-NVCF endpoints should be accompanied by a note in the CHANGELOG/release notes or at minimum a warnings.warn(...) when auto selects offline for a non-NVCF endpoint, so users who relied on the old default are alerted.
Rule Used: Changes to public API surfaces (FastAPI endpoints,... (source)
Prompt To Fix With AI
This is a comment left during a code review.
Path: nemo_retriever/src/nemo_retriever/api/internal/primitives/nim/model_interface/parakeet.py
Line: 71-73
Comment:
**Silent breaking change for self-hosted streaming NIM deployments**
Any user who deployed a Parakeet NIM with a streaming profile (`mode=str`) on-prem and relied on the default `audio_infer_mode='auto'` will now silently route to `offline_recognize` instead of the streaming RPC. If that endpoint does not have an offline model registered, the call will fail at runtime with a gRPC error. There is no deprecation warning or migration note.
Per the `api-backward-compatibility` standard, the semantic change to what `auto` resolves to for non-NVCF endpoints should be accompanied by a note in the CHANGELOG/release notes or at minimum a `warnings.warn(...)` when `auto` selects `offline` for a non-NVCF endpoint, so users who relied on the old default are alerted.
**Rule Used:** Changes to public API surfaces (FastAPI endpoints,... ([source](https://app.greptile.com/review/custom-context?memory=api-backward-compatibility))
How can I resolve this? If you propose a fix, please make it concise.
Description
Checklist