# Introduction

- This notebook processes data from the TransVoices project for the purpose of analyzing acoustic change of vowels over time.
- Two trans women YouTube vloggers' videos over a period of seven years have been transcribed as TextGrids.
- The TextGrids were force aligned using a multi-tier version of FAVE, and formant measurements were made using ifcformant.
- The files are ''[name].multi_align.TextGrid'' in the folder "multi_align" and ''[name].ifc'' in the folder "ifc_files".
- The .TextGrid files are read as separate 'phone' and 'word' tiers, then merged.
- The relevant rows from each .ifc file are extracted from the .ifc files, then concatenated to the .TextGrid dataframe.
- Formant values are normalized by speaker and by local speech rate.

# Section 1: Initialize and read in files

In [None]:
import pandas as pd
pd.options.mode.chained_assignment = None
import os
import matplotlib.pyplot as plt
import numpy as np
from audiolabel import read_label

The relevant files (ifc files and textgrids) are stored here in these two directories. Make each directory an object so that files can be read from it.

In [None]:
wdifc = './ifc_files/'
wdtg = './multi_align/'

Next, we'll create a dataframe with a column of .ifc filenames, and a dataframe with a column of .TextGrid filenames. We're reading these from wdifc and wdtg. Because the filenames contain metadata split by underscores (e.g., JV_013110_Title.ifc or .multi_align.TextGrid), we'll use regular expressions to extract this data and put it into new columns.

In [None]:
ifc_list = pd.DataFrame(os.listdir(wdifc), columns=['ifcname'])
codeifc = ifc_list.ifcname.str.extract(r'^(?P<speaker>.+)_(?P<date>\d+)_(?P<title>.+)\.ifc$',expand=True)
ifc_list = pd.concat([ifc_list, codeifc], axis=1) # axis=1 concatenates columns

tg_list = pd.DataFrame(os.listdir(wdtg), columns=['tgname'])
codetg = tg_list.tgname.str.extract(r'^(?P<speaker>.+)_(?P<date>\d+)_(?P<title>.+)\.multi_align\.TextGrid$',expand=True)
tg_list = pd.concat([tg_list, codetg], axis=1) # axis=1 concatenates columns

# Sanity check
#tgdf.head()

Sometimes there won't be the same number of .ifc and .TextGrid files. We'll merge the two list dataframes using pd.merge, and the final product is matchdf, showing which .ifc files have a .TextGrid match.

In [None]:
code = ['speaker','date','title']
matchdf = pd.merge(ifc_list, tg_list, on=code, how='inner')

# Sanity check
count = matchdf.shape[0]
print("matchdf has",count,"matched textgrids and ifc files.")
matchdf.head()

The function defined below is used to find the 'previous' and 'following' phone for any given phone, based on the values in the TextGrid's phone tier that come just above or below it. At the beginnings and ends of a column, it will fill in the space with 'val'.

In [None]:
def roll_1d_with_constant(a, shift, val):
    '''
    Roll a list of values 'a' by amount 'shift' in a way similar to np.roll(),
    but instead of wrapping values, replace wrapped elements with a constant 'val'
    '''
    if shift >= 0:
        index = np.arange(len(a))
    else:
        index = np.arange(len(a) * -1, 0, 1)
    return np.pad(a, np.abs(shift), 'constant', constant_values=val)[index]

Getting ready to read in all the labels from the phone and word tiers of the actual TextGrids. As soon as we read in the labels, we're going to add some new columns. One of them will be 'vcontext', a vowel that is specific to a phonological context; so we define certain phonological classes as lists below.

In [None]:
nasals = ['N','M']
lateral = ['L']
coronal = ['T','D']
lowfront = ['AE','AE1','AE2','AE0'] # to separate AE from AEN
hiback = ['UW','UW1','UW2','UW0'] # to separate UW from pre-lateral UWL
midback = ['OW','OW1','OW2','OW0'] # to separate OW from pre-lateral OWL

Now, we will use audiolabel's read_label function to compile a dataframe from all of the .TextGrid files. This loops over each aligned TextGrid in matchdf and extracts the labels from 'phone' tier and 'word' tier. The read_label function adds a column called 'fname' that reports what file all the labels came from.

It also creates the 'prev' and 'foll' phone columns using roll_1d_with_constant; a 'vowel' and 'stress' column using regular expressions and the 'phone' column; an 'ipa' column using a conversion table of ARPABET and IPA symbols; and the 'vcontext' column mentioned previously.

The products are two dataframes, phonedf and worddf, that contain all the phones/words in all the TextGrids in matchdf, and they will be merged next.

In [None]:
phones = []
words = []
ph2ipa = pd.read_table('arpabet2ipa.txt', names=('phone','ipa')) # Read in conversion table for ARPA to ipa

for tgname in matchdf.tgname: # For debugging, replace matchdf.tgname with matchdf.head(2).tgname
    [phdf, wddf] = read_label(os.path.join(wdtg, tgname), 'praat', addcols=['barename'], tiers=['phone','word'])
    phdf=phdf.assign(prev=roll_1d_with_constant(phdf.phone,1,'sp'),
                     foll=roll_1d_with_constant(phdf.phone,-1,'sp'),
                     # Create two new columns, 'vowel' and 'stress', based on regular expressions in 'phone' column
                     vowel=phdf.phone.str.extract(r'^(?P<vowel>.+)(?P<stress>\d+)$', expand=True).iloc[:,0],
                     stress=phdf.phone.str.extract(r'^(?P<vowel>.+)(?P<stress>\d+)$', expand=True).iloc[:,1],
                    )
    phdf=phdf.merge(ph2ipa,how='left',on='phone') # merge ipa df with phones
    phdf=phdf.assign(vcontext=phdf.vowel)
    phdf.loc[(phdf.phone.isin(lowfront)) & (phdf.foll.isin(nasals)),'vcontext'] = 'AEN'
    phdf.loc[(phdf.phone.isin(hiback)) & (phdf.foll.isin(lateral)),'vcontext'] = 'UWL'
    phdf.loc[(phdf.phone.isin(midback)) & (phdf.foll.isin(lateral)),'vcontext'] = 'OWL'
    phones.append(phdf)
    words.append(wddf)
phonedf = pd.concat(phones)
worddf = pd.concat(words)

In [None]:
# Sanity checks
#phonedf['stress'].count() # this is how many syllables are in the entire phonedf
#phonedf.head(20)
#worddf.tail()

# Section 2: Get local speech rate from textgrid

Thanks to Geoff Bacon for this section. The first function, make_rows_stress_count, gets local speech rate in syllables, defined as stressed vowels per 20 TextGrid rows. Twenty rows is arbitrary, and can give you about seven words (twenty phones/segments, plus pauses).

In [None]:
def make_rows_stress_count(df, window_size):
    """
    Returns `df` with an extra column of the moving count of the stress column.
    `df` must have a column called `stress` and `fname` (identifying each video, for groupby)
    Counts all values in `stress` as syllables in `binary_stress`.
    Uses `.rolling` to count `binary_stress` values in a window size of X rows.
    Fills in edge cases (begininng and end of window) using `bfill` and `ffill`.
    """
    df['binary_stress'] = df['stress'].notnull()
    df['binary_stress'].replace({False: 0, True: 1}, inplace=True)
    grouped = df.groupby('fname')
    rolling_count = grouped['binary_stress'].rolling(window_size, center=True).sum()
    ### moved ['binary_stress'] to after grouped instead of after .sum() to fix an error ###
    rolling_count = rolling_count.groupby(level='fname').fillna(method='bfill')
    rolling_count = rolling_count.groupby(level='fname').fillna(method='ffill')
    multiindexed_df = df.set_index('fname', append=True).swaplevel()
    renamed_rolling_count = rolling_count.rename('rows_stress_count').to_frame()
    merged = pd.merge(multiindexed_df, renamed_rolling_count, left_index=True, right_index=True)
    df_with_rolling_count = merged.reset_index().drop('level_1', axis=1)
    return df_with_rolling_count

In [None]:
# Apply function to phonedf with a window size of 20 rows
df = make_rows_stress_count(phonedf, 20)
phonedf = df

# Sanity checks
#phonedf.head() # df should have columns "binary_stress" and "rows_stress_count"
#print(phonedf.binary_stress.unique()) # should get [0 1]

The second and third functions, process and make_temporal_rolling_stress_count, get local speech rate in syllables, defined as stressed vowels per 30 seconds in TextGrid. Using actual time requires reading from the 't1' column of phonedf but is more reliable for speech rate than using dataframe rows.

In [None]:
from datetime import timedelta

def process(dataframe, offset):
    """
    Changes `t1` in dataframe to seconds units, counts up all values in `binary_stress` every 30 seconds
    """
    dataframe['t1_as_datetime'] = pd.to_datetime(dataframe['t1'], unit='s')
    dataframe['time_stress_count'] = dataframe.rolling('30s', on='t1_as_datetime')['binary_stress'].sum()
    start = dataframe['t1_as_datetime'].iloc[0]
    dataframe['offset'] = dataframe['t1_as_datetime'] - start
    dataframe['beginning'] = dataframe['offset'] < offset
    dataframe.loc[dataframe['beginning'], 'time_stress_count'] = np.NaN
    dataframe['time_stress_count'].fillna(method='bfill', inplace=True)
    return dataframe

In [None]:
def make_temporal_rolling_stress_count(df, time_in_seconds):
    """
    Returns `df` with an extra column of the moving count of the stress column of length `time_in_seconds`.
    `df` must have a column called `stress` and `fname`.
    Uses `.rolling` to count `binary_stress` values per window of 30 seconds.
    """
    offset = timedelta(seconds=time_in_seconds)
    
    fnames = phonedf['fname'].unique()
    result = []
    for fname in fnames:
        tmp = phonedf[phonedf['fname'] == fname]
        tmp = process(tmp, offset)
        result.append(tmp)
    df = pd.concat(result, ignore_index=True)
    return df.drop(['offset', 't1_as_datetime', 'beginning'], axis=1)

In [None]:
# Apply function to phonedf with a window size of 30 seconds
df = make_temporal_rolling_stress_count(phonedf, 30)
phonedf = df

# Sanity check
phonedf.head() # should now have final column 'time_stress_count'

# Section 3: Merge phone df & word df to create full textgrid df

Now we have all the TextGrid information we need. We will merge the phone dataframe with the word dataframe to create fulltgdf, or the dataframe that includes phone and word information. We'll merge the two dataframes by their 't1', since the start time of a word is the same no matter which phone of that word you're analyzing.

In [None]:
worddf = worddf.assign(t1_w=worddf.t1) # Duplicate the 't1' column and assign it to 't1_w'

def mergepw(p,w):
    b=p['barename'].values[0] # find barename value of argument p
    locw=w.loc[w.barename==b,['t1','t1_w','t2','word']] # select words that have barename value 'b'
    return pd.merge_asof(p,locw,on='t1',suffixes=['_p','_w']) # merge, create suffixes for duplicated columns

In [None]:
# Use the function to merge phone and word textgrids and create the full textgrid dataframe
fulltgdf = phonedf.groupby('barename').apply(mergepw,w=worddf)

# Reset multilevel index, ditch .multi_align extension, rename time column, drop fname
fulltgdf = fulltgdf.reset_index(level='barename',drop=True)
fulltgdf = fulltgdf.assign(barename=fulltgdf.barename.str.replace('.multi_align',''))
fulltgdf = fulltgdf.rename(columns={'t1':'t1_p'}).drop(columns='fname')

# replace all empty 'phone' cells with NaN, then drop those rows
fulltgdf['phone'].replace('', np.nan, inplace=True)
fulltgdf.dropna(subset=['phone'], inplace=True)

# Sanity checks
#print(fulltgdf.columns)
#fulltgdf.head()

We're going to subset fulltgdf so that it contains only vowels. To work on consonants, just change what is subsetted.

In [None]:
vowels = ['ɔ','ɑ','i','u','ɛ','ɪ','ʊ','ʌ','æ','ə','eɪ','aɪ','oʊ','aʊ','ɔɪ','ɚ']
vowels_df = fulltgdf[fulltgdf.ipa.isin(vowels)]

# Section 4: Merge fulltgdf with ifc files

Now we will read in the .ifc files. But we will only read in certain rows of each .ifc file, to prevent our final dataframe from being too large. The functions defined below will figure out which row of each .ifc file to read in, based on which value of 'sec' is closest to the 25%, 50%, and 75% timepoint in the duration of each vowel.

In [None]:
def get_row_nearest_time(df, t, timecol):
    '''Get the row nearest in time to specified value.
    In case of tie, first row is returned.
    ***NOTE***
    df[timecol] must be sorted!!!
    '''
    return df.loc[(df[timecol] - t).abs().idxmin()]

In [None]:
def get_timepoints(t,ifc):
    dur = t.t2_p-t.t1_p # Calculate the duration of the vowel
    # Create series; one for the 25% values, one for the 50% values, and one for the 75% values
    idx25 = get_row_nearest_time(ifc,(t.t1_p+(0.25*dur)),'sec') \
        .rename({'f1':'f1_25','f2':'f2_25','f3':'f3_25','f0':'f0_25'})
    idx50 = get_row_nearest_time(ifc,(t.t1_p+(0.50*dur)),'sec') \
        .rename({'f1':'f1_50','f2':'f2_50','f3':'f3_50','f0':'f0_50'})
    idx75 = get_row_nearest_time(ifc,(t.t1_p+(0.75*dur)),'sec') \
        .rename({'f1':'f1_75','f2':'f2_75','f3':'f3_75','f0':'f0_75'})
    # Concatenate the series-converted-to-dataframes and drop extra columns
    return pd.concat(
        [
            pd.DataFrame.from_records([t]),
            pd.DataFrame.from_records([idx25]),
            pd.DataFrame.from_records([idx50]).drop(['sec','rms'],axis='columns'),
            pd.DataFrame.from_records([idx75]).drop(['sec','rms'],axis='columns')
        ],
        axis='columns'
    )

Finally, the function below will open up the .ifc file using read_table, read the columns defined in usecols, and apply the function get_timepoints to read only the desired rows.

In [None]:
def mergeifc(df,wdifc):
    usecols = ['sec','rms','f0', 'f1', 'f2', 'f3']
    ifcname = df.barename.iloc[0]+'.ifc'
    ifc = pd.read_table(os.path.join(wdifc,ifcname),usecols=usecols)
    vowelsdf = pd.concat(
        df.apply(get_timepoints,axis=1,ifc=ifc) \
        .values.tolist() # concatenates the list of dataframes into a dataframe
    )
    return vowelsdf

We will apply the function to vowels_df, but group it by video (barename). The cell below took about 10 minutes to run for 50 .ifc files (~72MB of data).

In [None]:
merged_df = vowels_df.groupby('barename').apply(mergeifc,wdifc=wdifc) # This will take some time! Go make some tea.

In [None]:
### CHECK POINT ###
# This cell saves merged_df as a .csv; to save time during future debugging, just read in mergeddf.csv

merged_df.to_csv('mergeddf.csv', encoding='utf-8')
merged_df = pd.read_csv('mergeddf.csv') 

In [None]:
full_df = merged_df.reset_index(drop=True) # reset hierarchical index

# Sanity check
full_df.head()

# Section 5: Clean up full_df

The dataframe full_df now contains every vowel from every video that has been analyzed. Each vowel has measurements for f0, F1, F2, and F3 at the 25/50/75% timepoints, as well as meta information about the word, previous/following segments, etc. The last few lines will clean up the dataframe a little bit.

Although we already calculated vowel duration, we'll do again here using t1_p and t2_p, and also provide vowel duration normalized by the number of syllables in the phone's 30-second window.

In [None]:
full_df['duration'] = full_df['t2_p'] - full_df['t1_p']
full_df['norm_duration'] = full_df['duration']/full_df['time_stress_count']*100

Finally, we'll use regular expressions to add the metadata columns for speaker, date, and time, all taken from the values in the column 'barename'. We'll have to use pd.to_datetime to change the values in 'date' from the monthdayyear format to something more useable.

In [None]:
sdt = full_df.barename.str.extract(r'^(?P<speaker>.+)_(?P<date>\d+)_(?P<title>.+)', expand=True)
full_df = pd.concat([full_df, sdt], axis=1)

# Convert dates to useable formats
full_df['olddatetime'] = pd.to_datetime(full_df['date'], format='%m%d%y', errors='coerce') # datetime format
full_df['datetime'] = full_df.olddatetime.astype(str).str.strip() # convert to string

# Sanity check
full_df.head()

# Section 6: Bin videos by time

In this section, we want to see if there are any formant patterns based on point of time within a video. We'll bin videos in two ways. First, 'bins' that are set at 20 seconds long, so that each video can have between 1 and 31 bins (31x20=620 seconds, and the longest video in the corpus is less than 10 minutes long).

In [None]:
# Create up to 31 bins of 20 seconds each for each video, grouping by 'barename'
binseq = full_df.groupby('barename').apply(
            lambda y: np.array([0,20,40,60,80,100,120,140,160,180,200,220,240,260,280,300,
                  320,340,360,380,400,420,440,460,480,500,520,540,560,580,600,620]
                               + y.t1_p.min() - 0.01)) # make bins of 20 secs each

Second, 'ebins' that splits every video into 30 bins of equal length. The bin size will depend on the length of the video, such that a 300-second (5-minute) video will have 30 bins of 10 seconds each, and a 120-second (2-minute) video will have 30 bins of 4 seconds each, etc. The cell below creates 'ebins' and 'bins'.

In [None]:
# Create  a dataframe that cuts full_df into 30 bins for each video, grouping by 'datetime'
ebins = pd.DataFrame(full_df.groupby('barename').apply(
    lambda q: pd.cut(q['t1_p'], bins=30, labels=False)))
ebins = ebins.rename(columns={'t1_p':'binidx'}) #ebins should have 30 bins per video

# Create a dataframe that cuts full_df into 20sec bins for each video, grouping by 'datetime'
bins = pd.DataFrame(full_df.groupby('barename').apply(
    lambda r: pd.cut(r['t1_p'], bins=binseq.loc[r.name], labels=False)))
bins = bins.rename(columns={'t1_p':'binidx'}) #binseq should have variable bins per video

Because we've used .groupby, we'll have to reset the indexes (removing 'dt'), and then we'll assign 'bins' and 'ebins' to full_df as new columns.

In [None]:
bins = bins.reset_index(level=0)
ebins = ebins.reset_index(level=0)

full_df = full_df.assign(bins=bins.binidx)
full_df = full_df.assign(ebins=ebins.binidx)

# Section 7: Normalize vowel formants
The stats function from scipy includes an automatic z-score computer that 'stacks' all the selected columns and scores them together. This way, f0 at all three timepoints (25, 50, and 75%) are scored together rather than separately, even though they are in separate columns. We define a function that creates three new dataframes for f0, F1, and F2 z-scores, then concatenates them to the input dataframe (df).

In [None]:
from scipy import stats

def myzscore(df):
    z0 = pd.DataFrame(
        stats.zscore(
            df.loc[:,['f0_25','f0_50','f0_75']],
            axis=None),
    columns = ['f0_25_z','f0_50_z','f0_75_z'],
        index = df.index
    )
    z1 = pd.DataFrame(
        stats.zscore(
            df.loc[:,['f1_25','f1_50','f1_75']],
            axis=None),
    columns = ['f1_25_z','f1_50_z','f1_75_z'],
        index = df.index
    )
    z2 = pd.DataFrame(
        stats.zscore(
            df.loc[:,['f2_25','f2_50','f2_75']],
            axis=None),
    columns = ['f2_25_z','f2_50_z','f2_75_z'],
        index = df.index
    )
    df = pd.concat([df,z0,z1,z2],axis=1)
    return df

Apply the function to full_df, grouped by speaker. Only run this once, as .concat will throw an error if you run it multiple times. Also, .groupby creates a hierarchical index that will be dropped in the next cell.

In [None]:
full_df = full_df.groupby('speaker').apply(myzscore)

In [None]:
full_df = full_df.reset_index(level=0,drop=True)

# Section 8: Calculate by-speaker and by-video means, standard deviations
Group full_df by speaker and vowel (either by phoneme ('ipa') or by phoneme in phonological context ('vcontext') and apply calculations of mean and standard deviation to every numeric column. This will produce extraneous columns, such as mean of 't1_p'.

In [None]:
# Calculate by-speaker means and standard deviations for each vowel
meanvowels_df = full_df.groupby(['speaker','ipa']).agg([np.mean,np.std])
meanvowels_df.columns = meanvowels_df.columns.map('_'.join)
#meanvowels_df.head()

# Same, but by vcontext (to separate AE from AEN, etc.)
meanvowels_context_df = full_df.groupby(['speaker','vcontext']).agg([np.mean,np.std])
meanvowels_context_df.columns = meanvowels_context_df.columns.map('_'.join)

As a sanity check, the cell below calculates F2 and f0 values for some vowels for each speaker. Check to see if these all look reasonable!

In [None]:
print(meanvowels_context_df.loc[('GN',['UW','UWL','IY','AE','AEN','AA']),'f2_50_mean'])
print('*'*40)
print(meanvowels_context_df.loc[('JV',['UW','UWL','IY','AE','AEN','AA']),'f2_50_mean'])
print('*'*40)
print('GN average AE F2 at midpoint is', meanvowels_context_df.loc[('GN','AE'),'f2_50_mean'],
      'while average AEN F2 at midpoint is', meanvowels_context_df.loc[('GN','AEN'),'f2_50_mean'])
print('*'*40)

print('GN average f0 is', full_df.loc[vowels_df.speaker=='GN','f0_50'].mean())
print('JV average f0 is', full_df.loc[vowels_df.speaker=='JV','f0_50'].mean())

Finally, we create a few "summary" dataframes from mean/standard deviation calculations on different groupings of the data in vowels_df. All of these will be exported, along with vowels_df.

In [None]:
# Calculate per-video means of all variables
video_mean = full_df.groupby(['datetime','speaker']).agg([np.mean, np.std])
video_mean.columns = video_mean.columns.map('_'.join)
# Drop bins with NaN: speaker didn't produce any tokens during that minute
video_mean = video_mean.dropna(axis=0, how='all')

video_mean_sepvowels = full_df.groupby(['datetime','speaker','ipa']).agg([np.mean, np.std])
video_mean_sepvowels.columns = video_mean_sepvowels.columns.map('_'.join)

video_mean_contextvowels = full_df.groupby(['datetime','speaker','vcontext']).agg([np.mean, np.std])
video_mean_contextvowels.columns = video_mean_contextvowels.columns.map('_'.join)

bin_mean = full_df.groupby(['datetime','speaker','bins']).agg([np.mean, np.std])
bin_mean.columns = bin_mean.columns.map('_'.join)

equal_bin_mean = full_df.groupby(['datetime','speaker','ebins']).agg([np.mean, np.std])
equal_bin_mean.columns = equal_bin_mean.columns.map('_'.join)

tword_mean = full_df.groupby(['datetime','speaker','t1_w']).agg([np.mean, np.std])
tword_mean.columns = tword_mean.columns.map('_'.join)

# Section 9: Export dataframes as csv files
UTF-8 encoding is important due to the use of IPA symbols in the dataframe.

In [None]:
tword_mean.to_csv('twordmean.csv', encoding='utf-8')
equal_bin_mean.to_csv('equalbinmean.csv', encoding='utf-8')
bin_mean.to_csv('binmean.csv', encoding='utf-8')
video_mean.to_csv('videomean.csv', encoding='utf-8')
video_mean_sepvowels.to_csv('videomean_sepvowels.csv', encoding='utf-8')
video_mean_contextvowels.to_csv('videomean_convowels.csv', encoding='utf-8')
meanvowels_df.to_csv('meanvowels.csv', encoding='utf-8')
meanvowels_context_df.to_csv('meanvowelscontext.csv', encoding='utf-8')
full_df.to_csv('full_df.csv', encoding='utf-8')
full_df.to_csv('vowels_df.csv', encoding='utf-8') # full_df.csv and vowels_df.csv will be the same

# Deprecated sections

In [None]:
### THIS SECTION IS NO LONGER NECESSARY ###
### get_timepoints function defined earlier already does all of this ###

# def get_timepoints(df):
#     ### To do later: Assert that all values in 'duration' are the same in df
#     vt50 = (df.duration.iloc[0] * 0.50) + df.t1_p.iloc[0]
#     vt25 = (df.duration.iloc[0] * 0.25) + df.t1_p.iloc[0]
#     vt75 = (df.duration.iloc[0] * 0.75) + df.t1_p.iloc[0]
#     # Get abs values of times - 25/50/75% times, choose index of minimum value (closest row) w/ argmin
#     vidx50 = df.iloc[(df.sec - vt50).abs().values.argmin()]
#     vidx25 = df.iloc[(df.sec - vt25).abs().values.argmin()] # If 2 minima, takes 1st idx/row
#     vidx75 = df.iloc[(df.sec - vt75).abs().values.argmin()] # If 2 minima, takes 1st idx/row
#     # Create a df from dictionary, keys from vidx.column name
#     dictionary = {'barename':df.barename.iloc[0],'speaker':df.speaker.iloc[0],
#                  'phone':df.phone.iloc[0],'ipa':df.ipa.iloc[0],'vcontext':df.vcontext.iloc[0],
#                  't1_p':df.t1_p.iloc[0],'t2_p':df.t2_p.iloc[0],
#                  'duration':df.duration.iloc[0],'norm_duration':df.norm_duration.iloc[0],
#                  'foll':df.foll.iloc[0],'prev':df.prev.iloc[0],
#                  'stress':df.stress.iloc[0],'rolling_stress_count':df.rolling_stress_count.iloc[0],
#                  'rolling_count':df.rolling_count.iloc[0],
#                  'word':df.word.iloc[0],'t1_w':df.t1_w.iloc[0],'t2_w':df.t2_w.iloc[0],
#                  'f1idx25':vidx25.f1,'f1idx50':vidx50.f1,'f1idx75':vidx75.f1,
#                  'f2idx25':vidx25.f2,'f2idx50':vidx50.f2,'f2idx75':vidx75.f2,
#                  'f0idx25':vidx25.f0,'f0idx50':vidx50.f0,'f0idx75':vidx75.f0,
#                  'f1idx25_z':vidx25.f1_z,'f1idx50_z':vidx50.f1_z,'f1idx75_z':vidx75.f1_z,
#                  'f2idx25_z':vidx25.f2_z,'f2idx50_z':vidx50.f2_z,'f2idx75_z':vidx75.f2_z,
#                  'f0idx25_z':vidx25.f0_z,'f0idx50_z':vidx50.f0_z,'f0idx75_z':vidx75.f0_z,
#                  'olddatetime':df.olddatetime.iloc[0],'datetime':df.datetime.iloc[0],
#                  'bins':df.bins.iloc[0],'ebins':df.ebins.iloc[0]}
#     newdf = pd.DataFrame([dictionary])
#     return newdf

In [None]:
### THIS SECTION IS NO LONGER NECESSARY ###
### z-scoring using scipy in section 7 ###

# # Calculate formant zscores for groups defined by num_part, speaker, and vowel.
# normcols = ['speaker']
# zscorecols = ['f1', 'f2', 'f0']
# zscore = lambda x: (x - x.mean()) / x.std()

# # Select columns of zscore interest, group, and calculate zscore for each group.
# zdf = full_df.loc[:, normcols + zscorecols].groupby(normcols).transform(zscore)
# zdf = zdf.rename(columns={'f1': 'f1_z', 'f2': 'f2_z', 'f0': 'f0_z'})

# # Verify that observations in zscored match observations in vowels_df.
# (zdf.index == full_df.index).all()

In [None]:
# Combine zscores with original formant measurements. (If you use .concat, only do it once!)
# full_df = pd.concat([v_df, zdf], axis=1)

In [None]:
### THIS SECTION IS NO LONGER NECCESSARY ###
### We've moved sibilants to a different notebook. ###

# sibilants = ['s','ʃ','z','ʒ']
# sibilants_df = full_df[full_df.ipa.isin(sibilants)]

In [None]:
# s_video_mean = sibilants_df.groupby(['datetime','speaker','ipa']).agg([np.mean, np.std])
# s_video_mean.columns = s_video_mean.columns.map('_'.join)

# s_bin_mean = sibilants_df.groupby(['datetime','speaker','bins','ipa']).agg([np.mean, np.std])
# s_bin_mean.columns = s_bin_mean.columns.map('_'.join)

# s_equal_bin_mean = sibilants_df.groupby(['datetime','speaker','ebins','ipa']).agg([np.mean, np.std])
# s_equal_bin_mean.columns = s_equal_bin_mean.columns.map('_'.join)

# groupcols = ['speaker' ,'ipa']
# meansibilants_df = sibilants_df.groupby(groupcols).agg([np.mean, np.std])
# meansibilants_df.columns = meansibilants_df.columns.map('_'.join)

In [None]:
# # Export sibilant dfs as csv
# s_equal_bin_mean.to_csv('s_equalbinmean.csv', encoding='utf-8'),
# s_bin_mean.to_csv('s_binmean.csv', encoding='utf-8')
# s_video_mean.to_csv('s_videomean.csv', encoding='utf-8')
# meansibilants_df.to_csv('meansibilants_df.csv', encoding='utf-8')
# sibilants_df.to_csv('sibilants_df.csv', encoding='utf-8')

In [None]:
### THIS SECTION IS NO LONGER NECESSARY ###
### But if you ever want to merge the entire ifc file with the entire tg files ###
### you should use this. Keeping for future reference. ###

# This reads in all of the ifc files in their entirety
# Columns to read from .ifc files
#usecols = ['sec','rms','f0', 'f1', 'f2', 'f3', 'f4']
# Compile a dataframe from all of the .ifc files.
#dfs = []
#counter = 0
#for ifcname in matchdf.ifcname:
#    if counter == 100: # For debugging, run only two iterations.
#        break
#    df = pd.read_table(os.path.join(wdifc, ifcname), usecols=usecols)
#    df = df.assign(ifcname=ifcname)
#    dfs.append(df)
#    counter += 1
#fullifcdf = pd.concat(dfs)
#fullifcdf = fullifcdf.assign(barename = fullifcdf.ifcname.str.replace('.ifc','')) # take out the extension

# Sanity check
#fullifcdf.tail()

In [None]:
# Read column from each .ifc file and merge on t1 with corresponding .textgrid file
#def mergeit(i, pw):
#    bn = i['barename'].values[0] # find barename value of argument 'i', assign it to 'b'
#    locb = pw.loc[pw.barename==bn,:] # select the words that have barename value 'b'
#    return pd.merge_asof(i, locb, left_on='sec', right_on='t1_p')

In [None]:
# Group by barename, apply the merge function
#full_df = fullifcdf.groupby('barename').apply(mergeit, pw=fulltgdf)
#full_df = full_df.drop('barename_x',axis=1).rename(columns={'barename_y':'barename'}) # how to prevent _x/_xy???

# Sanity check
#full_df.head()