We introduce PhoWhisper in five versions for Vietnamese automatic speech recognition. PhoWhisper's robustness is achieved through fine-tuning the multilingual Whisper on an 844-hour dataset that encompasses diverse Vietnamese accents. Our experimental study demonstrates state-of-the-art performances of PhoWhisper on benchmark Vietnamese ASR datasets. Please cite our PhoWhisper paper when it is used to help produce published results or is incorporated into other software:
@inproceedings{PhoWhisper,
title = {{PhoWhisper: Automatic Speech Recognition for Vietnamese}},
author = {Thanh-Thien Le and Linh The Nguyen and Dat Quoc Nguyen},
booktitle = {Proceedings of the ICLR 2024 Tiny Papers track},
year = {2024}
}
Model | #paras | CMV–Vi | VIVOS | VLSP 2020 Task-1 | VLSP 2020 Task-2 |
---|---|---|---|---|---|
vinai/PhoWhisper-tiny |
39M | 19.05 | 10.41 | 20.74 | 49.85 |
vinai/PhoWhisper-base |
74M | 16.19 | 8.46 | 19.70 | 43.01 |
vinai/PhoWhisper-small |
244M | 11.08 | 6.33 | 15.93 | 32.96 |
vinai/PhoWhisper-medium |
769M | 8.27 | 4.97 | 14.12 | 26.85 |
vinai/PhoWhisper-large |
1.55B | 8.14 | 4.67 | 13.75 | 26.68 |
from transformers import pipeline
transcriber = pipeline("automatic-speech-recognition", model="vinai/PhoWhisper-small")
output = transcriber(path_to_audio_with_sampling_rate_16kHz)['text']
Copyright (c) 2024 VinAI
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.