# Using Gentle Forced Aligner (Built on Kaldi) for Forced Audio-Transcript Alignment 

_Written by Sushmita Sadhukha_

This notebook contains code to take the thought segment data (the timestamp data that mTurkers will generate by completing HITS on [Eshin Jolly's Svelte Thought Tagger app](https://github.com/cosanlab/svelte-psiturk) and _align_ the text, specifically the first and last word from each thought segment, to a larger dataframe in which _all_ words of that text has been aligned. In this large dataframe, all words of the text is aligned with a forced alignment software, which aligns _each word_ with a start and end timestamp. In this notebook, I use the [Gentle Beta Forced Aligner](https://lowerquality.com/gentle/) to generate the aligned-word timestamps.  

#### I. What is forced alignment?

Forced alignment is a method to take orthographic transcription of an audio file and generate a time-aligned version using a pronunciation dictionary (pretrained on acoustic models) to look up phones for words [(Montreal Forced Aligner, 2018)](https://montreal-forced-aligner.readthedocs.io/en/latest/introduction.html#what-is-forced-alignment). Forced alignment is useful for examining the temporal dynamics of spoken text data, allowing us to generate timestamps for text and using these timestamps to perform further analyses like sentiment analysis, topic analysis, text extraction, named entity recognition, etc. 

#### II. What do we need for this?
1. Audio files 
2. Transcripts of the audio files (text files)
3. Gentle forced-alignment transcript dataframe 
4. Thought segments dataframe (output of a single transcript's thought timestamps)

#### III. What does the data look like?
- Left: _Thought segments dataframe_
- Right: _Forced-alignment transcript dataframe_

<table> <tr>
    <td> <img src="thoughts.png" width="500"/> </td>
    <td> <img src="aligned_df.png" width="500"/> </td>
</tr></table>

#### IV. What does the output look like?

<img src="output_df.png" width="800"/>

## Code

In [2]:
# import packages 
import os, subprocess
import numpy as np
import pandas as pd
import csv
import networkx as nx
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib as mpl

from nltools.data import Adjacency
from scipy.linalg import block_diag
import scipy.spatial.distance as scipy

import json 
from glob import glob
from aeneas.executetask import ExecuteTask
from aeneas.task import Task

  from ._conv import register_converters as _register_converters
  return f(*args, **kwds)
  return f(*args, **kwds)
  return f(*args, **kwds)


In [75]:
base_dir = os.path.abspath(os.path.curdir) 

## Functions

In [78]:
def check_alignment(aligned_df, segments_df, offset = 1.0):
    
    """ 
    Given two dataframes, the Gentle alignment dataframe (where each word is aligned) and the thoughts dataframe, 
    create a search window dataframe using the start and end timestamps for each thought segment (text) in the 
    segments dataframe from the aligned df. Then, match the first and last words of the text (thought segment) to 
    the words in the 'Word' column of the alignment search window, and get the start times of the first word and 
    the end times of the last word. Append to output dataframe and create timestamp offsets.
        
        Args:
            aligned_df(df): Gentle aligned words 
            segments_df(df): manual thought segments 
            offset(float): the offset in seconds to set the search window  
            
        Returns: 
            output(df): dataframe with aligned words and start and end timestamp differences
    """
     
    output = segments_df.copy() # make a duplicate of the segments df, make changes to this dataframe
    aligned_start = [] # append the aligned 'start' times of a thought segment's first word 
    aligned_end = [] # append the aligned 'end' times of a thought segment's last word 
    
    for first_word, last_word, start_time, end_time in zip(output['firstWord'], output['lastWord'], output['startTime'], output['endTime']):
        
        # make search window wider than the manual start and end times of the thoughts to account for alignment deviations 
        # default offset is 1.0 sec, but can change the value of this parameter
        # the wider the search window, the more likely it is that words match
        
        search_window = aligned[(aligned['startTime'] >= start_time - offset) & (aligned['endTime'] <= end_time + offset)]
        match_first = search_window[search_window['Word'].str.match(first_word)]
        match_last = search_window[search_window['Word'].str.match(last_word)]
        
        # In cases where either the match_first or match_last dataframes are empty, need conditionals here to append correctly to segments_df  
        if len(match_last) == 0 and len(match_first) == 0:
            aligned_end.append('****')
            aligned_start.append('****')
        elif len(match_first) == 0 and len(match_last) > 0:
            aligned_start.append('****')
            aligned_end.append(match_last['endTime'].iloc[0])
        elif len(match_first) > 0 and len(match_last) == 0:
            aligned_end.append('****')
            aligned_start.append(match_first['startTime'].iloc[0])
        else:
            aligned_start.append(match_first['startTime'].iloc[0])
            aligned_end.append(match_last['endTime'].iloc[0])
    
    output['alignedStart'] = aligned_start
    output['alignedEnd'] = aligned_end
    
    # create offset columns by computing the difference between the manual start/end times with the aligned start/end times 
    output['start_off'] = output.apply(lambda row: '****' if row.alignedStart == '****' else abs(row.startTime - row.alignedStart), axis = 1) 
    output['end_off'] = output.apply(lambda row: '****' if row.alignedEnd == '****' else abs(row.endTime - row.alignedEnd), axis = 1) 

    return(output)

In [70]:
# load both dataframes
aligned = pd.read_csv('align.csv', header=None)
thoughts = pd.read_csv('sample_output_segments.csv')

# some formatting 
thoughts['text'] = thoughts['text'].astype(str)
thoughts['firstWord'] = thoughts['text'].str.replace(',','').str.strip('.').str.split(' ').str[0]
thoughts['lastWord'] = thoughts['text'].str.replace(',','').str.strip('.').str.split(' ').str[-1]

aligned.columns = ['Word', 'alignedWord', 'startTime', 'endTime']

In [71]:
thoughts

Unnamed: 0,startTime,endTime,text,firstWord,lastWord
0,1.0,7.0,Tim Riggins is one of the few characters who’s...,Tim,show
1,7.0,21.0,"He is tall, has long blonde hair and kind of h...",He,team
2,22.0,35.0,"But critical to this show, he’s best friends w...",But,paralyzed
3,36.0,49.0,Tim has a lot of trouble processing this and a...,Tim,head
4,50.0,62.0,You usually see him at at parties or other typ...,You,life
5,62.0,70.0,And he appears to live with his brother but th...,And,dynamics
6,71.0,88.0,He has a girlfriend - her name is Tyra. They ...,He,relationship
7,89.0,108.0,Kind of the biggest scandal that happens in th...,Kind,friend
8,109.0,120.0,"Oddly, they end of crying together and then st...",Oddly,Jason


In [72]:
aligned.head(n=10)

Unnamed: 0,Word,alignedWord,startTime,endTime
0,Tim,tim,2.01,2.3
1,Riggins,<unk>,2.31,2.89
2,is,is,2.89,3.11
3,one,one,3.15,3.31
4,of,of,3.31,3.39
5,the,the,3.4,3.51
6,few,few,3.51,3.7
7,characters,characters,3.7,4.28
8,who’s,who's,4.29,4.72
9,really,really,4.72,5.44


In [73]:
output = check_alignment(aligned, thoughts, offset = 1.0)

In [74]:
output

Unnamed: 0,startTime,endTime,text,firstWord,lastWord,alignedStart,alignedEnd,start_off,end_off
0,1.0,7.0,Tim Riggins is one of the few characters who’s...,Tim,show,2.01,7.4,1.01,0.4
1,7.0,21.0,"He is tall, has long blonde hair and kind of h...",He,team,8.83,20.97,1.83,0.03
2,22.0,35.0,"But critical to this show, he’s best friends w...",But,paralyzed,22.59,34.08,0.59,0.92
3,36.0,49.0,Tim has a lot of trouble processing this and a...,Tim,head,36.449999,48.42,0.449999,0.58
4,50.0,62.0,You usually see him at at parties or other typ...,You,life,50.87,62.07,0.87,0.07
5,62.0,70.0,And he appears to live with his brother but th...,And,dynamics,63.65,69.76,1.65,0.240001
6,71.0,88.0,He has a girlfriend - her name is Tyra. They ...,He,relationship,72.36,78.36,1.36,9.64
7,89.0,108.0,Kind of the biggest scandal that happens in th...,Kind,friend,89.95,****,0.95,****
8,109.0,120.0,"Oddly, they end of crying together and then st...",Oddly,Jason,110.329999,119.23,1.329999,0.77


In [76]:
# set offset to 2.0 
# widening the search window increases the likelihood that the words in the thoughts df will match the aligned words in the aligned df 
output_2 = check_alignment(aligned, thoughts, offset = 2.0)

In [77]:
output_2

Unnamed: 0,startTime,endTime,text,firstWord,lastWord,alignedStart,alignedEnd,start_off,end_off
0,1.0,7.0,Tim Riggins is one of the few characters who’s...,Tim,show,2.01,7.4,1.01,0.4
1,7.0,21.0,"He is tall, has long blonde hair and kind of h...",He,team,8.83,20.97,1.83,0.03
2,22.0,35.0,"But critical to this show, he’s best friends w...",But,paralyzed,22.59,34.08,0.59,0.92
3,36.0,49.0,Tim has a lot of trouble processing this and a...,Tim,head,36.449999,48.42,0.449999,0.58
4,50.0,62.0,You usually see him at at parties or other typ...,You,life,50.87,62.07,0.87,0.07
5,62.0,70.0,And he appears to live with his brother but th...,And,dynamics,63.65,69.759999,1.65,0.240001
6,71.0,88.0,He has a girlfriend - her name is Tyra. They ...,He,relationship,72.36,78.36,1.36,9.64
7,89.0,108.0,Kind of the biggest scandal that happens in th...,Kind,friend,89.95,109.249999,0.95,1.249999
8,109.0,120.0,"Oddly, they end of crying together and then st...",Oddly,Jason,110.329999,119.23,1.329999,0.77
