GitHub - biodatlab/thonburian-whisper: Thonburian Whisper: Open models for fine-tuned Whisper in Thai. Try our demo on Huggingface space:

🤖 Model | 📔 Jupyter Notebook | 🤗 Huggingface Space Demo | 📃 Medium Blog (Thai)

Thonburian Whisper is an Automatic Speech Recognition (ASR) model for Thai, fine-tuned using Whisper model originally from OpenAI. The model is released as a part of Huggingface's Whisper fine-tuning event (December 2022). We fine-tuned Whisper models for Thai using Commonvoice 13, Gowajee corpus, Thai Elderly Speech, Thai Dialect datasets. Our models demonstrate robustness under environmental noise and fine-tuned abilities to domain-specific audio such as financial and medical domains. We release models and distilled models on Huggingface model hubs (see below).

Usage

Use the model with Huggingface's transformers as follows:

import torch
from transformers import pipeline

MODEL_NAME = "biodatlab/whisper-th-medium-combined"  # see alternative model names below
lang = "th"

device = 0 if torch.cuda.is_available() else "cpu"

pipe = pipeline(
    task="automatic-speech-recognition",
    model=MODEL_NAME,
    chunk_length_s=30,
    device=device,
)

# Perform ASR with the created pipe.
pipe("audio.mp3", generate_kwargs={"language":"<|th|>", "task":"transcribe"}, batch_size=16)["text"]

Requirements

Use pip to install the requirements as follows:

!pip install git+https://github.com/huggingface/transformers
!pip install librosa
!sudo apt install ffmpeg

Model checkpoint and performance

We measure word error rate (WER) of the model with deepcut tokenizer after normalizing special tokens (▁ to _ and — to -) and simple text-postprocessing (เเ to แ and ํา to ำ).

Model	WER (Commonvoice 13)
Thonburian Whisper (small) Link	13.14
Thonburian Whisper (medium) Link	7.42
Thonburian Whisper (large-v2) Link	7.69
Thonburian Whisper (large-v3) Link	6.59

Thonburian Whisper is fine-tuned with a combined dataset of Thai speech including common voice, google fleurs, and curated datasets. The common voice test splitting is based on original splitting from datasets.

Inference time

We have performed benchmark average inference speed on 1 minute audio with different model sizes (small, medium, and large) on NVIDIA A100 with 32 fp, batch size of 32. The medium model presents a balanced trade-off between WER and computational costs.

Model	Memory usage (Mb)	Inference time (sec / 1 min)	Number of Parameters
Thonburian Whisper (small) Link	7,194	4.83	242M
Thonburian Whisper (medium) Link	10,878	7.11	764M
Thonburian Whisper (large) Link	18,246	9.61	1540M
Distilled Thonburian Whisper (small) Link	4,944	TBA	166M
Distilled Thonburian Whisper (medium) Link	7,084	TBA	428M

Long-form Inference

Thonburian Whisper can be used for long-form audio transcription by combining VAD, Thai-word tokenizer, and chunking for word-level alignment. We found that this is more robust and produce less insertion error rate (IER) comparing to using Whisper with timestamp. See README.md in longform_transcription folder for detail usage.

Developers

Citation

If you use the model, you can cite it with the following bibtex.

@misc {thonburian_whisper_med,
    author       = { Zaw Htet Aung, Thanachot Thavornmongkol, Atirut Boribalburephan, Vittavas Tangsriworakan, Knot Pipatsrisawat, Titipat Achakulvisut },
    title        = { Thonburian Whisper: A fine-tuned Whisper model for Thai automatic speech recognition },
    year         = 2022,
    url          = { https://huggingface.co/biodatlab/whisper-th-medium-combined },
    doi          = { 10.57967/hf/0226 },
    publisher    = { Hugging Face }
}

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
assets		assets
longform_transcription		longform_transcription
.gitignore		.gitignore
README.md		README.md
thonburian_whisper_notebook.ipynb		thonburian_whisper_notebook.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

assets

assets

longform_transcription

longform_transcription

.gitignore

.gitignore

README.md

README.md

thonburian_whisper_notebook.ipynb

thonburian_whisper_notebook.ipynb

Repository files navigation

Usage

Requirements

Model checkpoint and performance

Long-form Inference

Developers

Citation

About

Releases

Packages

Contributors 3

Languages

biodatlab/thonburian-whisper

Folders and files

Latest commit

History

Repository files navigation

Usage

Requirements

Model checkpoint and performance

Long-form Inference

Developers

Citation

About

Topics

Resources

Stars

Watchers

Forks

Languages