# Kaldi for Dysarthic Speech Recognition
- Part 1: Installation
- **Part 2: Data Preparation**
- Part 3: Training
- Part 4: Evaluation

# Part 2 Data Preparation
### We will use the TORGO Database from the University of Toronto
- http://www.cs.toronto.edu/~complingweb/data/TORGO/torgo.html
- Speech from patients with **Cerebral Palsy** or **Amyotrophic Lateral Sclerosis**.
- 4 sets of data available for free totalling 18GB (uncompressed).

&nbsp;
**Dataset**
- F.tar.bz2. Females (F01, F03, F04) with dysarthria.
- FC.tar.bz2. Female controls (FC01, FC02, FC03) without dysarthria.
- M.tar.bz2. Males (M01, M02, M03, M04, M05) with dysarthria.
- MC.tar.bz2. Male controls (MC01, MC02, MC03, MC04) without dysarthria.


### Extract data into some directory
- eg. tar -xf FC.tar.bz2

### Renaming files
- The current status of the data is not prefectly organized, so I already created some necessary files.
- However, you should rename all the audio files for better consistency.
- Currently all files are labeled with numbers not unique to the speaker or session, but Kaldi requires unique naming for all audio files.

In [49]:
!ls /data2/TORGO/M01/Session1/wav_arrayMic/ # Change to your directory

0001.wav  0014.wav  0027.wav  0040.wav	0053.wav  0066.wav  0079.wav  0092.wav
0002.wav  0015.wav  0028.wav  0041.wav	0054.wav  0067.wav  0080.wav  0093.wav
0003.wav  0016.wav  0029.wav  0042.wav	0055.wav  0068.wav  0081.wav  0094.wav
0004.wav  0017.wav  0030.wav  0043.wav	0056.wav  0069.wav  0082.wav  0095.wav
0005.wav  0018.wav  0031.wav  0044.wav	0057.wav  0070.wav  0083.wav  0096.wav
0006.wav  0019.wav  0032.wav  0045.wav	0058.wav  0071.wav  0084.wav  0097.wav
0007.wav  0020.wav  0033.wav  0046.wav	0059.wav  0072.wav  0085.wav  0098.wav
0008.wav  0021.wav  0034.wav  0047.wav	0060.wav  0073.wav  0086.wav  0099.wav
0009.wav  0022.wav  0035.wav  0048.wav	0061.wav  0074.wav  0087.wav  0100.wav
0010.wav  0023.wav  0036.wav  0049.wav	0062.wav  0075.wav  0088.wav
0011.wav  0024.wav  0037.wav  0050.wav	0063.wav  0076.wav  0089.wav
0012.wav  0025.wav  0038.wav  0051.wav	0064.wav  0077.wav  0090.wav
0013.wav  0026.wav  0039.wav  0052.wav	0065.wav  0078.wav  0091.wav


In [50]:
import os

mydir = "/data2/TORGO/" # change to your TORGO directory location

for subdir, dirs, files in os.walk(mydir):
    for file in files:
        filepath = subdir + os.sep + file
        
        if filepath.endswith(".wav"):
            name = filepath.split('/')
            ID = name[-4]
            SESS = name[-3][-1]
            MIC = name[-2].split('_')[1][:-3]
            WAV = name[-1] 
            
            new_name = ID + '_' + SESS + '_' + MIC + '_' + WAV
            #print(new_name)
            new_filepath = subdir + os.sep + new_name
            os.rename(filepath, new_filepath)

In [51]:
!ls /data2/TORGO/M01/Session1/wav_arrayMic/ # Change to your directory

M01_1_array_0001.wav  M01_1_array_0035.wav  M01_1_array_0069.wav
M01_1_array_0002.wav  M01_1_array_0036.wav  M01_1_array_0070.wav
M01_1_array_0003.wav  M01_1_array_0037.wav  M01_1_array_0071.wav
M01_1_array_0004.wav  M01_1_array_0038.wav  M01_1_array_0072.wav
M01_1_array_0005.wav  M01_1_array_0039.wav  M01_1_array_0073.wav
M01_1_array_0006.wav  M01_1_array_0040.wav  M01_1_array_0074.wav
M01_1_array_0007.wav  M01_1_array_0041.wav  M01_1_array_0075.wav
M01_1_array_0008.wav  M01_1_array_0042.wav  M01_1_array_0076.wav
M01_1_array_0009.wav  M01_1_array_0043.wav  M01_1_array_0077.wav
M01_1_array_0010.wav  M01_1_array_0044.wav  M01_1_array_0078.wav
M01_1_array_0011.wav  M01_1_array_0045.wav  M01_1_array_0079.wav
M01_1_array_0012.wav  M01_1_array_0046.wav  M01_1_array_0080.wav
M01_1_array_0013.wav  M01_1_array_0047.wav  M01_1_array_0081.wav
M01_1_array_0014.wav  M01_1_array_0048.wav  M01_1_array_0082.wav
M01_1_array_0015.wav  M01_1_array_0049.wav  M01_1_array_0083.wav
M01_1_arra

### We will train our GMM-HMM acoustic model with all but one speaker (speaker - F03)
- The one speaker left out will be used to evaluate our model.
- The speaker has moderate dysarthria.


### Kaldi requires 4 main files that need to be created by us
1. wav.scp
2. utt2spk
3. text
4. spk2utt 

#### 1. wav.scp
- This is a file containing the file location for each audio file
- One utterance per line
- file name \<space> file location
- eg: <br/>

F01_1_array_0006 /data2/TORGO/F01/Session1/wav_arrayMic/F01_1_array_0006.wav <br/>
F01_1_array_0007 /data2/TORGO/F01/Session1/wav_arrayMic/F01_1_array_0007.wav <br/>
F01_1_array_0008 /data2/TORGO/F01/Session1/wav_arrayMic/F01_1_array_0008.wav <br/>
F01_1_array_0009 /data2/TORGO/F01/Session1/wav_arrayMic/F01_1_array_0009.wav <br/>
...



#### 2. utt2spk
- A file containing the speaker ID for each utterance
- One utterance per line
- file name \<space> speaker ID
- eg: <br/>

F01_1_array_0006 F01 <br/>
F01_1_array_0007 F01 <br/>
F01_1_array_0008 F01 <br/>
MC04_2_array_1016 MC04 <br/>
MC04_2_array_1017 MC04 <br/>
MC04_2_array_1018 MC04 <br/>
...

#### 3. text
- Transcriptions for each utterance 
- One utterance per line
- file name \<space> transcription
- eg: <br/> 

F01_1_array_0008 EXCEPT IN THE WINTER WHEN THE OOZE OR SNOW OR ICE PREVENTS <br/> 
F01_1_array_0009 PAT <br/> 
F01_1_array_0010 UP <br/> 
F01_1_array_0013 KNOW <br/> 
F01_1_array_0014 HE SLOWLY TAKES A SHORT WALK IN THE OPEN AIR EACH DAY <br/> 
MC04_2_array_1019 UPGRADE YOUR STATUS TO REFLECT YOUR WEALTH <br/> 
MC04_2_array_1021 EAT YOUR RAISINS OUTDOORS ON THE PORCH STEPS <br/> 
MC04_2_array_1023 THE FAMILY REQUESTS THAT FLOWERS BE OMITTED <br/> 



#### 4. spk2utt
- All utterances for each speaker
- One speaker per line
- speaker ID \<space> utterance
- can be created with kaldi provided script (eg. utils/utt2spk_to_spk2utt.pl data/train/utt2spk > data/train/spk2utt) <br/>
eg: <br/>
M05 M05_1_array_0005 M05_1_array_0006 M05_1_array_0007 M05_1_array_0008 M05_1_array_0009 ... ... ...

### All files except wav.scp have been provided.  
- create wav.scp after renaming all audio files and use the other provided files as a reference.

### Once your data directory looks like below, we can start training