Skip to content

upgrade deepgram to the latest models - nova-3 20251210; enable custom vocabulary#3746

Merged
beastoin merged 3 commits into
mainfrom
9skhd_dg_nova3_20251210
Dec 14, 2025
Merged

upgrade deepgram to the latest models - nova-3 20251210; enable custom vocabulary#3746
beastoin merged 3 commits into
mainfrom
9skhd_dg_nova3_20251210

Conversation

@beastoin
Copy link
Copy Markdown
Collaborator

@beastoin beastoin commented Dec 13, 2025

upgrade deepgram to the latest models - nova-3 20251210, including 33 languages, keyword/keyterm support, and enhanced transcription/diarization quality

nova-2 now handles 4 languages that are missing from nova-3, defaulting to nova-3

enable custom vocabulary on stt

demo:

image
omi     | connect_to_deepgram_with_backoff
omi     | {                                                                                               
omi     |     "channels": 1,
omi     |     "diarize": true,                                                                            
omi     |     "encoding": "linear16",
omi     |     "endpointing": 300,
omi     |     "filler_words": false,
omi     |     "interim_results": false,
omi     |     "keyterm": [
omi     |         "omi",
omi     |         "openai",
omi     |         "Omi"
omi     |     ],
omi     |     "language": "en",
omi     |     "model": "nova-3",                                                                          
omi     |     "multichannel": false,
omi     |     "no_delay": true,
omi     |     "punctuate": true,
omi     |     "profanity_filter": false,
omi     |     "sample_rate": 16000,                                                                       
omi     |     "smart_format": true
omi     | }
Screenshot 2025-12-13 at 14 50 20
omi     | connect_to_deepgram_with_backoff
omi     | {
omi     |     "channels": 1,
omi     |     "diarize": true,
omi     |     "encoding": "linear16",
omi     |     "endpointing": 300,
omi     |     "filler_words": false,
omi     |     "interim_results": false,
omi     |     "keyterm": [
omi     |         "omi",
omi     |         "openai",
omi     |         "Omi"
omi     |     ],
omi     |     "language": "multi",
omi     |     "model": "nova-3",
omi     |     "multichannel": false,
omi     |     "no_delay": true,
omi     |     "punctuate": true,
omi     |     "profanity_filter": false,
omi     |     "sample_rate": 16000,
omi     |     "smart_format": true
omi     | }

deploy:

#3720

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the Deepgram STT integration by limiting the vocabulary passed as keywords to the first 100 items in backend/routers/transcribe.py. In backend/utils/stt/streaming.py, it refactors Deepgram language support by removing a generic deepgram_languages list, introducing deepgram_nova2_languages for specific non-multi-language support, expanding deepgram_nova2_multi_languages and deepgram_nova3_multi_languages to include more English variants, and populating the deepgram_nova3_languages list with a comprehensive set of supported languages. The changes also include updating the language check for nova-2 models, changing the fallback STT service from Deepgram Nova-2 to Nova-3, and re-enabling the passing of keywords to Deepgram connections. Review comments suggest converting deepgram_nova2_multi_languages and deepgram_nova3_languages from lists to sets for improved O(1) lookup performance in the get_stt_service_for_language function, which is on a hot path, and also recommend importing Set from typing and considering similar changes for other language lists.

Comment thread backend/utils/stt/streaming.py Outdated
Comment thread backend/utils/stt/streaming.py Outdated
@beastoin beastoin changed the title 9skhd dg nova3 20251210 upgrade deepgram to the latest models - nova-3 20251210; enable custom vocabulary Dec 13, 2025
@beastoin
Copy link
Copy Markdown
Collaborator Author

--

nova-2 + nova-3 support:

Count Description
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
37 Unique languages
49 Unique language codes

Nova-3 only: 33 languages Nova-2 adds: 4 additional languages (Chinese Simplified, Chinese Traditional,
Cantonese, Thai)

Here are the 4 languages unique to nova-2:

• Chinese (Mandarin, Simplified)
• Chinese (Mandarin, Traditional)
• Chinese (Cantonese, Traditional)
• Thai

@beastoin beastoin marked this pull request as ready for review December 13, 2025 08:17
@beastoin beastoin merged commit 9480885 into main Dec 14, 2025
1 check passed
@beastoin beastoin deleted the 9skhd_dg_nova3_20251210 branch December 14, 2025 03:24
Glucksberg pushed a commit to Glucksberg/omi-local that referenced this pull request Apr 28, 2026
…m vocabulary (BasedHardware#3746)

* Add 34 nova-3 languages; prioritise nova-3 over nova-2; unlock keywords/keyterm supports

* Limit 100 keywords/keyterms with deepgram

* Use set{} instead of list[] for pre-defined supported langs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant