The collection of scripts in this repository represent a template for training neural networks via Multi-Task Learning in Kaldi. This repo is heavily based on the existing Kaldi multilingual Babel example directory.
multi-task-kaldi allows similar functionality to the multilingual Babel scripts, but with more easily extendable code. Adding a new language with
multi-task-kaldi is as easy as creating a new
input_lang dir. Running multiple tasks on the same corpus is not possible in the multilingual Babel setup, but in
multi-task-kaldi it is possible by creating a new
input_task dir. The code here aims to be easily readable and extensible, and makes few assumptions about the kind of data you have and where it's located on disk.
To get started,
multi-task-kaldi should be cloned and moved into the
egs dir of your local version of the latest Kaldi branch.
If you're used to typical Kaldi
egs, you should know that all the scripts here in
steps exist in this repo. That is, they do not link back to the
wsj example. This was done to make custom changes to the scripts, making them more readable.
In order to run
multi-task-kaldi, you need to make a new
input_task dir. This is the only place you need to make changes for your new task (or new language).
This directory contains information about the location of your data, lexicon, language model.
Here is an example of the structure of my
input_task directory for the task called
input_my-task/ ├── lexicon_nosil.txt -> /data/my-task/lexicon/lexicon_nosil.txt ├── lexicon.txt -> /data/my-task/lexicon/lexicon.txt ├── task.arpabo -> /data/my-task/lm/task.arpabo ├── test_audio_path -> /data/my-task/audio/test_audio_path ├── train_audio_path -> /data/my-task/audio/train_audio_path ├── transcripts.test -> /data/my-task/audio/transcripts.test └── transcripts.train -> /data/my-task/audio/transcripts.train 0 directories, 7 files
Most of these files are standard Kaldi format, and more detailed descriptions of them can be found on the official docs.
lexicon_nosil.txt// Standard Kaldi // phonetic dictionary without silence phonemes
lexicon.txt// Standard Kaldi // phonetic dictionary with silence phonemes
task.arpabo// Standard Kaldi // language model in ARPA back-off format
test_audio_path// Custom file! // one-line text file containing absolute path to dir of audio files (eg. WAV) for testing
train_audio_path// Custom file! // one-line text file containing absolute path to dir of audio files (eg. WAV) for training
transcripts.test// Custom file! // A typical Kaldi transcript file, but with only the test utterances
transcripts.train// Custom file! // A typical Kaldi transcript file, but with only the train utterances
Running the scripts
The scripts will name files and directories dynamically. You will define the name of your input data (ie. task or language) in the initial
input_ dir, and then the rest of the generated dirs and files will be named accordingly. For instance, if you have
input_your-task, then the GMM alignment stage will create
Force Align Training Data (GMM)
$ ./run_gmm.sh your-task test001
your-taskshould correspond exactly to
input_your-task. In multilingual training, this will be
input_lang2, etc. In monolingual Multi-Task Learning, this will be
test001is any character string, and is written to the name of the WER file:
Format data from GMM --> DNN
$ ./utils/setup_multitask.sh to_dir from_dir "your-task1 your-task2 your-task3"
nnet3log files and experimental data will be written to
to_dir(absolute path). This dir must exist already.
the output dirs from GMM alignment should exist at
the task names
"your-task1 your-task2 your-task3"must correspond to input dir names as such:
input_your-task2, etc. However, do not include the initial
Multi-Task Learning (DNN)
$ ./run_nnet3_multitask.sh "your-task1 your-task2" "gmm-typo1 gmm-typo2" "weight-task1,weight-task2" hidden-dim num-epochs main-dir
first argument is a space-delimited string of task names (must correspond to
second argument is a space-delimited string of GMM model typologies. These are either "mono" or "tri", and determine whether you want to use monophone alignments or triphone alignments for each task.
third argument is comma-delimited list of weights for each task. Should be probably equal to or less than
hidden-dimis the number of nodes in your hidden layer
num-epochsis num epochs for each task. This is not task-specific.
main-diris the dir you moved your GMM alignments into. Above we used