# Forced alignment using FAVE #

June 2018 Liberty Hamilton

This forced aligner is a little more sophisticated than the original Penn Forced Aligner, and can handle transcriptions with multiple talkers at once (exported as tab-delimited text from ELAN).

Run each cell with shift-Enter.

## 1. Show the transcriptions ##

In this first cell, we'll just import the libraries we need in python and also show a list of all the transcription files in the transcripts directory.  If the transcript file isn't in there, you'll need to upload it to github and relaunch this notebook.

In [1]:
# First let's import the libraries we need
import sys
import os

# We have to add FAVE to the python path so python knows where to find the code
sys.path.append('FAVE/FAVE-align')
cwd = os.getcwd()
os.chdir('C:\\Users\\Jade\\Documents\\College stuff\\HamLab\\transcription-master\\FAVE\\FAVE-align')

# Now we import the libraries for FAVE alignment
import FAAValign
import optparse

# List all the available transcriptions
os.listdir('Z:\\Hamilton\\summer_interns\\stimuli\\trailers\\transcriptions\\ELAN')

['.DS_Store',
 'angrybirds-tlr1_a720p_CV.eaf',
 'angrybirds-tlr1_a720p_CV.pfsx',
 'angrybirds-tlr1_a720p_CV.txt',
 'bighero6-tlr2_a720p_JH.eaf',
 'bighero6-tlr2_a720p_JH.pfsx',
 'bighero6-tlr2_a720p_JH.txt',
 'bighero6-tlr3_a720p_JH.eaf',
 'bighero6-tlr3_a720p_JH.pfsx',
 'bighero6-tlr3_a720p_JH.txt',
 'boss-baby-trailer-1_a720p_CV.txt',
 'boss-baby-trailer-1_a720p_CV.txt.eaf',
 'boss-baby-trailer-1_a720p_CV.txt.pfsx',
 'boss-baby-trailer-1_a720p_final.eaf',
 'boss-baby-trailer-1_a720p_final.pfsx',
 'boss-baby-trailer-1_a720p_NC.txt',
 'boss-baby-trailer-2_a720p_CV.eaf',
 'boss-baby-trailer-2_a720p_CV.pfsx',
 'boss-baby-trailer-2_a720p_CV.txt',
 'ferdinand-trailer-2_a720p_final.eaf',
 'ferdinand-trailer-2_a720p_final.pfsx',
 'ferdinand-trailer-2_a720p_NC.txt',
 'ferdinand-trailer3_a720p_JH.eaf',
 'ferdinand-trailer3_a720p_JH.pfsx',
 'ferdinand-trailer3_a720p_JH.txt',
 'Frozen-tlr2_a720p_CV.eaf',
 'Frozen-tlr2_a720p_CV.pfsx',
 'Frozen-tlr2_a720p_CV.txt',
 'ice-dragon-trailer-1_a720p_JH 2

## 2. Change the name of the transcription file and wav file ##

Use the name of whichever transcription 5 column file you've exported from ELAN, and the wav file in the wavs directory.

In [2]:
# First we need to change the transcription file and the wave file so that we load the
# ones we're interested in aligning
transcription_file = 'Z:\\Hamilton\\summer_interns\\stimuli\\trailers\\transcriptions\\ELAN\\insideout-usca-tlr2_a720p_JH.txt'
wav_file = 'C:\\Users\\Jade\\Documents\\College stuff\\HamLab\\transcription-master\\wavs\\insideout-usca-tlr2_a720p.wav'

In [3]:
# The arguments to pass for checking that all words are in the dictionary
# No need to edit this line
parser = FAAValign.define_options_and_arguments()
dictionary_path = os.path.join('FAVE','FAVE-align','model','dict')
sys.argv = ['FAAValign','-v', '--check=unknown.txt', '', transcription_file]
(opts, args) = parser.parse_args()
print(opts)
print(args)

{'check': 'unknown.txt', 'importfile': None, 'verbose': True, 'dict': 'model/dict', 'noprompt': False, 'htktoolspath': ''}
['', 'Z:\\Hamilton\\summer_interns\\stimuli\\trailers\\transcriptions\\ELAN\\insideout-usca-tlr2_a720p_JH.txt']


## 3. Run the aligner without the wav file to check that all words exist in dictionary ##

If there are any errors at this step, you'll want to edit the `dict` file in `FAVE/FAVE-align/models/dict` to add the appropriate entry, if it's missing.  Otherwise, if there was an error because of a spelling mistake, correct the text file.

In [4]:
# This actually runs the aligner
FAAValign.FAAValign(opts, args)

Temp dir is C:\Users\Jade\Documents\College stuff\HamLab\transcription-master\FAVE\FAVE-align\tmp\


FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\Jade\\Documents\\College stuff\\HamLab\\transcription-master\\FAVE\\FAVE-align\\tmp\\model/dict'

## 4. Run the aligner with the wav file ##

In [5]:
# Next we'll get the arguments to run the aligner!
sys.argv = ['FAAValign','-v', '--import=unknown.txt', '--htktoolspath=/usr/local/bin', 
            wav_file, transcription_file]
(opts, args) = parser.parse_args()
print(opts)
print(args)


{'check': None, 'importfile': 'unknown.txt', 'verbose': True, 'dict': 'model/dict', 'noprompt': False, 'htktoolspath': '/usr/local/bin'}
['C:\\Users\\Jade\\Documents\\College stuff\\HamLab\\transcription-master\\wavs\\insideout-usca-tlr2_a720p.wav', 'Z:\\Hamilton\\summer_interns\\stimuli\\trailers\\transcriptions\\ELAN\\insideout-usca-tlr2_a720p_JH.txt']


In [6]:
# This step actually runs the forced aligner.  Read the messages carefully to make sure
# there are no errors here!
FAAValign.FAAValign(opts, args)

Temp dir is C:\Users\Jade\Documents\College stuff\HamLab\transcription-master\FAVE\FAVE-align\tmp\


FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\Jade\\Documents\\College stuff\\HamLab\\transcription-master\\FAVE\\FAVE-align\\tmp\\model/dict'

## 5. Inspect the textgrid file ##

If you get a message like "`Successfully written TextGrid trolls-tlr1_a720p.TextGrid to file.`", then that's a good sign! Check it along with the wav file in Praat to see whether it was correctly generated (and look for errors in the output above).  You will have to adjust boundaries in the TextGrid itself, but as long as the file is not corrupted and appears to have the tiers you set up, you should be good to go!