SenseVoice v1.0.0
The first official release of SenseVoice, a speech foundation model for multilingual speech understanding.
Highlights
- Multilingual ASR — 50+ languages, superior to Whisper on Chinese and Cantonese
- Speech Emotion Recognition — Happy, Sad, Angry, Neutral detection
- Audio Event Detection — Background music, applause, laughter, crying, coughing
- Ultra-fast inference — Non-autoregressive, 70ms for 10 seconds of audio (15x faster than Whisper)
- Speaker Diarization — Works with FunASR's VAD + SPK pipeline for who-said-what
Quick Start
from funasr import AutoModel
model = AutoModel(model="iic/SenseVoiceSmall", device="cuda")
result = model.generate(input="audio.wav")
print(result[0]["text"])Models
| Model | Languages | Parameters | Download |
|---|---|---|---|
| SenseVoice-Small | 5 (zh/en/ja/ko/yue) | 234M | ModelScope · HuggingFace |
Links
- Paper: FunAudioLLM
- Demo: ModelScope · HuggingFace
- Toolkit: github.com/modelscope/FunASR
- Website: www.funasr.com