This is an implementation of an LSTM Seq2Seq model that predicts characters from waveforms. It was completed as part of the 10-618 course at CMU. The model is trained on a subset of TIMIT dataset and different training algorithms: MLE, Scheduled Sampling, and OCD are tested.
- Create a Virtual Environment with
venvor a Conda Environment:conda create --name myenv python=3.10 pip conda activate myenv - Run the following command to install dependencies from requirements.txt:
pip install -r requirements.txt
To select the desired training mode, use the following command line arguments:
-
Maximum likelihood estimation (MLE):
--mode mleThe ground truth is fed into the decoder and we minimize cross entropy of the ground truth.
-
Scheduled Sampling (SS):
--mode sswith desired--betaor--mode ss_linearfor SS with beta on a linear decay scheduleThe ground truth is fed with probability beta or the model prediction with probability 1 - beta.
-
Optimal Completion Distillation (OCD) [DAgger training]:
--mode ocdwith desired--betaor--mode ocd_linearfor OCD with beta on a linear decay scheduleA dynamic oracle completion is fed in with probability beta or the model prediction with probability 1 - beta.