conda env create -f environment.yml
conda activate lstt
Audio files must be in wav format and sampled to 16kHz. The audio files must be in the following directory structure:
BASE_PATH
├── train_files_dir
│ ├── file1.wav
│ ├── file2.wav
│ ├── ...
│ ├── metadata.csv
├── test_files_dir
│ ├── file1.wav
│ ├── file2.wav
│ ├── ...
│ ├── metadata.csv
.....
AUDIO_PATH = 'path/to/audio/files'
RESAMPLED_PATH = 'path/to/resampled/audio/files'
AUDIO_FILES = os.listdir(AUDIO_PATH)
from utils import resample_audio
for file in AUDIO_FILES:
resample_audio(sample_rate=16000, input_file_path=os.path.join(AUDIO_PATH, file), output_dir=RESAMPLED_PATH)
This files has only two columns: file_name
and transcript
. The file_name
column contains the name of the audio file and the transcripts
column contains the transcript of the audio file. A sample metadata file is available in the sample csv
folder.
Use the following script to start fine-tuning:
python3 ft.py --train_dir path/to/train/files --valid_dir path/to/valid/files --audio_dir base/directory/containing/split/folders --repo_name hfrepo/you/want/to/finetune/from --generate_vocab
The training arguments are available in the args.json
file. To change the run parameters, edit this file.
Inference is the same as fine-tuning command:
python inference.py --test_dir path/to/test/files --audio_dir base/directory/containing/split/folders --ckpt_dir your/saved/checkpoint/dir --result_dir path/to/save/results