Skip to content

PhoWhisper: Automatic Speech Recognition for Vietnamese (2024)

License

Notifications You must be signed in to change notification settings

VinAIResearch/PhoWhisper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

PhoWhisper: Automatic Speech Recognition for Vietnamese

We introduce PhoWhisper in five versions for Vietnamese automatic speech recognition. PhoWhisper's robustness is achieved through fine-tuning the multilingual Whisper on an 844-hour dataset that encompasses diverse Vietnamese accents. Our experimental study demonstrates state-of-the-art performances of PhoWhisper on benchmark Vietnamese ASR datasets. Please cite our PhoWhisper paper when it is used to help produce published results or is incorporated into other software:

@inproceedings{PhoWhisper,
  title     = {{PhoWhisper: Automatic Speech Recognition for Vietnamese}},
  author    = {Thanh-Thien Le and Linh The Nguyen and Dat Quoc Nguyen},
  booktitle = {Proceedings of the ICLR 2024 Tiny Papers track},
  year      = {2024}
}

Model download & WER results

Model #paras CMV–Vi VIVOS VLSP 2020 Task-1 VLSP 2020 Task-2
vinai/PhoWhisper-tiny 39M 19.05 10.41 20.74 49.85
vinai/PhoWhisper-base 74M 16.19 8.46 19.70 43.01
vinai/PhoWhisper-small 244M 11.08 6.33 15.93 32.96
vinai/PhoWhisper-medium 769M 8.27 4.97 14.12 26.85
vinai/PhoWhisper-large 1.55B 8.14 4.67 13.75 26.68

Run the model

Example usage with transformers

from transformers import pipeline
transcriber = pipeline("automatic-speech-recognition", model="vinai/PhoWhisper-small")
output = transcriber(path_to_audio_with_sampling_rate_16kHz)['text']

License

Copyright (c) 2024 VinAI

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

About

PhoWhisper: Automatic Speech Recognition for Vietnamese (2024)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published