Skip to content

v1.0.0: SenseVoice — Multilingual Speech Understanding

Latest

Choose a tag to compare

@LauraGPT LauraGPT released this 25 May 16:50
· 3 commits to main since this release
05ecb6e

SenseVoice v1.0.0

The first official release of SenseVoice, a speech foundation model for multilingual speech understanding.

Highlights

  • Multilingual ASR — 50+ languages, superior to Whisper on Chinese and Cantonese
  • Speech Emotion Recognition — Happy, Sad, Angry, Neutral detection
  • Audio Event Detection — Background music, applause, laughter, crying, coughing
  • Ultra-fast inference — Non-autoregressive, 70ms for 10 seconds of audio (15x faster than Whisper)
  • Speaker Diarization — Works with FunASR's VAD + SPK pipeline for who-said-what

Quick Start

from funasr import AutoModel

model = AutoModel(model="iic/SenseVoiceSmall", device="cuda")
result = model.generate(input="audio.wav")
print(result[0]["text"])

Models

Model Languages Parameters Download
SenseVoice-Small 5 (zh/en/ja/ko/yue) 234M ModelScope · HuggingFace

Links