Name		Name	Last commit message	Last commit date
parent directory ..
assets		assets
LICENSE		LICENSE
README.md		README.md
ailia_audio_utils.py		ailia_audio_utils.py
ailia_tokenizer.py		ailia_tokenizer.py
audio_utils.py		audio_utils.py
decode_utils.py		decode_utils.py
demo.wav		demo.wav
languages.py		languages.py
tokenizer.py		tokenizer.py
whisper.py		whisper.py

README.md

Whisper : Robust Speech Recognition via Large-Scale Weak Supervision

Input

Audio file

demo.mov

Output

Recognized speech text

He hoped there would be stew for dinner, turnips and carrots and bruised potatoes and fat mutton pieces to be ladled out in thick, peppered, flour-fattened sauce.

Requirements

This model requires additional module.

pip3 install librosa
pip3 install pyaudio  # for microphone input mode

If you use --disable_ailia_tokenizer option, this model requires additional module.

pip3 install transformers

Usage

Automatically downloads the onnx and prototxt files on the first run. It is necessary to be connected to the Internet while downloading.

For the sample wav,

$ python3 whisper.py

If you want to specify the audio, put the file path after the --input option.

$ python3 whisper.py --input AUDIO_FILE

By adding the --model_type option, you can specify model type which is selected from "tiny", "base", "small", "medium". (default is base)

$ python3 whisper.py --model_type small

By giving the --task translate option, you can translate it into English.

$ python3 whisper.py --task translate

If you specify the -V option, it will be in input mode from the microphone.

$ python3 whisper.py -V

speak into the microphone when "Please speak something."
end the recording after about 0.5 second of silence and do voice recognition
return to 1 again after displaying the forecast results
type Ctrl+c if you want to exit

Reference

Framework

Pytorch

Model Format

ONNX opset=11

Netron

Files

whisper

Directory actions

More options

Directory actions

More options

Latest commit

History

whisper

Folders and files

parent directory

Whisper : Robust Speech Recognition via Large-Scale Weak Supervision

Input

Output

Requirements

Usage

Reference

Framework

Model Format

Netron

Normal models (opset 11)

Mean Variance Normalization models (opset 11)

Fuse ScatterND models (opset 11)

Layer Normalization models (opset 17)

Dynamic shape models (opset 11)