Release faster-whisper 0.4.0 · SYSTRAN/faster-whisper

Integration of Silero VAD

The Silero VAD model is integrated to ignore parts of the audio without speech:

model.transcribe(..., vad_filter=True)

The default behavior is conservative and only removes silence longer than 2 seconds. See the README to find how to customize the VAD parameters.

Note: the Silero model is executed with onnxruntime which is currently not released for Python 3.11. The dependency is excluded for this Python version and so the VAD features cannot be used.

Speaker diarization using stereo channels

The function decode_audio has a new argument split_stereo to split stereo audio into seperate left and right channels:

left, right = decode_audio(audio_file, split_stereo=True)

# model.transcribe(left)
# model.transcribe(right)

Other changes

Add Segment attributes avg_log_prob and no_speech_prob (same definition as openai/whisper)
Ignore audio frames raising an av.error.InvalidDataError exception during decoding
Fix option prefix to be passed only to the first 30-second window
Extend suppress_tokens with some special tokens that should always be suppressed (unless suppress_tokens is None)
Raise a more helpful error message when the selected model size is invalid
Disable the progress bar when the model to download is already in the cache

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

faster-whisper 0.4.0

Integration of Silero VAD

Speaker diarization using stereo channels

Other changes