Skip to content

Latest commit

 

History

History

whisper

Whisper : Robust Speech Recognition via Large-Scale Weak Supervision

Input

Audio file

demo.mov

Output

Recognized speech text

He hoped there would be stew for dinner, turnips and carrots and bruised potatoes and fat mutton pieces to be ladled out in thick, peppered, flour-fattened sauce.

Requirements

This model requires additional module.

pip3 install librosa
pip3 install pyaudio  # for microphone input mode

If you use --disable_ailia_tokenizer option, this model requires additional module.

pip3 install transformers

Usage

Automatically downloads the onnx and prototxt files on the first run. It is necessary to be connected to the Internet while downloading.

For the sample wav,

$ python3 whisper.py

If you want to specify the audio, put the file path after the --input option.

$ python3 whisper.py --input AUDIO_FILE

By adding the --model_type option, you can specify model type which is selected from "tiny", "base", "small", "medium". (default is base)

$ python3 whisper.py --model_type small

By giving the --task translate option, you can translate it into English.

$ python3 whisper.py --task translate

If you specify the -V option, it will be in input mode from the microphone.

$ python3 whisper.py -V
  1. speak into the microphone when "Please speak something."
  2. end the recording after about 0.5 second of silence and do voice recognition
  3. return to 1 again after displaying the forecast results
  4. type Ctrl+c if you want to exit

Reference

Framework

Pytorch

Model Format

ONNX opset=11

Netron

Normal models (opset 11)

encoder_tiny.onnx.prototxt
encoder_base.onnx.prototxt
encoder_small.onnx.prototxt
encoder_medium.onnx.prototxt
encoder_large.onnx.prototxt

decoder_tiny_fix_kv_cache.onnx.prototxt
decoder_base_fix_kv_cache.onnx.prototxt
decoder_small_fix_kv_cache.onnx.prototxt
decoder_medium_fix_kv_cache.onnx.prototxt decoder_large_fix_kv_cache.onnx.prototxt

Mean Variance Normalization models (opset 11)

encoder_tiny.opt.onnx.prototxt
encoder_base.opt.onnx.prototxt
encoder_small.opt.onnx.prototxt
encoder_medium.opt.onnx.prototxt

decoder_tiny_fix_kv_cache.opt.onnx.prototxt
decoder_base_fix_kv_cache.opt.onnx.prototxt
decoder_small_fix_kv_cache.opt.onnx.prototxt
decoder_medium_fix_kv_cache.opt.onnx.prototxt

Fuse ScatterND models (opset 11)

decoder_tiny_fix_kv_cache.opt2.onnx.prototxt
decoder_base_fix_kv_cache.opt2.onnx.prototxt
decoder_small_fix_kv_cache.opt2.onnx.prototxt
decoder_medium_fix_kv_cache.opt2.onnx.prototxt

Layer Normalization models (opset 17)

encoder_tiny.opt3.onnx.prototxt
encoder_base.opt3.onnx.prototxt
encoder_small.opt3.onnx.prototxt
encoder_medium.opt3.onnx.prototxt

decoder_tiny_fix_kv_cache.opt3.onnx.prototxt
decoder_base_fix_kv_cache.opt3.onnx.prototxt
decoder_small_fix_kv_cache.opt3.onnx.prototxt
decoder_medium_fix_kv_cache.opt3.onnx.prototxt

Dynamic shape models (opset 11)

decoder_tiny.onnx.prototxt
decoder_base.onnx.prototxt
decoder_small.onnx.prototxt
decoder_medium.onnx.prototxt
decoder_large.onnx.prototxt