upgrade deepgram to the latest models - nova-3 20251210; enable custom vocabulary#3746
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the Deepgram STT integration by limiting the vocabulary passed as keywords to the first 100 items in backend/routers/transcribe.py. In backend/utils/stt/streaming.py, it refactors Deepgram language support by removing a generic deepgram_languages list, introducing deepgram_nova2_languages for specific non-multi-language support, expanding deepgram_nova2_multi_languages and deepgram_nova3_multi_languages to include more English variants, and populating the deepgram_nova3_languages list with a comprehensive set of supported languages. The changes also include updating the language check for nova-2 models, changing the fallback STT service from Deepgram Nova-2 to Nova-3, and re-enabling the passing of keywords to Deepgram connections. Review comments suggest converting deepgram_nova2_multi_languages and deepgram_nova3_languages from lists to sets for improved O(1) lookup performance in the get_stt_service_for_language function, which is on a hot path, and also recommend importing Set from typing and considering similar changes for other language lists.
|
-- nova-2 + nova-3 support: Count Description Nova-3 only: 33 languages Nova-2 adds: 4 additional languages (Chinese Simplified, Chinese Traditional, Here are the 4 languages unique to nova-2: • Chinese (Mandarin, Simplified) |
…m vocabulary (BasedHardware#3746) * Add 34 nova-3 languages; prioritise nova-3 over nova-2; unlock keywords/keyterm supports * Limit 100 keywords/keyterms with deepgram * Use set{} instead of list[] for pre-defined supported langs
upgrade deepgram to the latest models - nova-3 20251210, including 33 languages, keyword/keyterm support, and enhanced transcription/diarization quality
nova-2 now handles 4 languages that are missing from nova-3, defaulting to nova-3
enable custom vocabulary on stt
demo:
deploy:
#3720