# Working with phonetic dataframes

This notebook illustrates some commonly-used operations on dataframes that contain phonetic labels. See the ['Using audiolabel' notebook]('using_audiolabel.ipynb') for instructions on reading label files, such as Praat textgrids, into dataframes.

In [1]:
import os
import pandas as pd
from audiolabel import read_label

Load using `read_label`. The source textgrid has three tiers, as shown in the image. A few labels are not visible. ![Image of label tiers](this_is_a_label_file.png).

In [2]:
[phdf, wddf, ctxdf] = read_label(
    '../test/this_is_a_label_file.TextGrid',
    ftype='praat',
    tiers=['phone', 'word', 'context']
)

## Saving to a `.csv` file

If you need to work with your labels in a spreadsheet or R you can save your dataframe to a `.csv` file with `to_csv`. Normally it is not useful to include the index as a column, which is why `index=False` is used.

In [3]:
ctxdf.to_csv('context.csv', index=False)

## Add a 'duration' column

Label durations are simply the difference between the `t1` and `t2` columns.

In [4]:
phdf['dur'] = phdf.t2 - phdf.t1
wddf['dur'] = wddf.t2 - wddf.t1
phdf

Unnamed: 0,t1,t2,phone,fname,dur
0,0.012472,0.192063,DH,../test/this_is_a_label_file.TextGrid,0.179592
1,0.192063,0.291837,IH1,../test/this_is_a_label_file.TextGrid,0.099773
2,0.291837,0.441497,S,../test/this_is_a_label_file.TextGrid,0.14966
3,0.441497,0.501361,IH1,../test/this_is_a_label_file.TextGrid,0.059864
4,0.501361,0.611111,Z,../test/this_is_a_label_file.TextGrid,0.109751
5,0.611111,0.660998,AH0,../test/this_is_a_label_file.TextGrid,0.049887
6,0.660998,0.80068,L,../test/this_is_a_label_file.TextGrid,0.139683
7,0.80068,0.970295,EY1,../test/this_is_a_label_file.TextGrid,0.169615
8,0.970295,1.000227,B,../test/this_is_a_label_file.TextGrid,0.029932
9,1.000227,1.030159,AH0,../test/this_is_a_label_file.TextGrid,0.029932


## Extracting columns from a string column

String columns can be parsed into additional variables with the [`str.extract` method](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.extract.html). In our 'phone' column the labels identify individual phones with an optional stress value, which we extract into 'barephone' and 'stress' columns. For convenience we also use `fillna` to ensure cells with missing values contain an empty string instead of NaN.

The names of the capture groups in the regular expression become the corresponding column names in the output.

In [5]:
phdf.phone.str.extract(r'(?P<barephone>[^\d]+)(?P<stress>\d*)').fillna('')

Unnamed: 0,barephone,stress
0,DH,
1,IH,1.0
2,S,
3,IH,1.0
4,Z,
5,AH,0.0
6,L,
7,EY,1.0
8,B,
9,AH,0.0


Use [`pd.concat`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html) to add the extracted columns to `phdf`. The `axis='columns'` argument indicates that we are adding columns rather than rows, which is the default.

In [6]:
phdf = pd.concat(
    [
        phdf,
        phdf.phone.str.extract(r'(?P<barephone>[^\d]+)(?P<stress>\d*)').fillna('')
    ],
    axis='columns'
)
phdf

Unnamed: 0,t1,t2,phone,fname,dur,barephone,stress
0,0.012472,0.192063,DH,../test/this_is_a_label_file.TextGrid,0.179592,DH,
1,0.192063,0.291837,IH1,../test/this_is_a_label_file.TextGrid,0.099773,IH,1.0
2,0.291837,0.441497,S,../test/this_is_a_label_file.TextGrid,0.14966,S,
3,0.441497,0.501361,IH1,../test/this_is_a_label_file.TextGrid,0.059864,IH,1.0
4,0.501361,0.611111,Z,../test/this_is_a_label_file.TextGrid,0.109751,Z,
5,0.611111,0.660998,AH0,../test/this_is_a_label_file.TextGrid,0.049887,AH,0.0
6,0.660998,0.80068,L,../test/this_is_a_label_file.TextGrid,0.139683,L,
7,0.80068,0.970295,EY1,../test/this_is_a_label_file.TextGrid,0.169615,EY,1.0
8,0.970295,1.000227,B,../test/this_is_a_label_file.TextGrid,0.029932,B,
9,1.000227,1.030159,AH0,../test/this_is_a_label_file.TextGrid,0.029932,AH,0.0


## Including preceding/following labels

The [`shift` method](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.shift.html) can be used to shift label values by one or more rows. Use this to add surrounding context (e.g. previous/next phone) to the labels.

The following cell shifts the phones down one row. The shift inserts NaN into the first row, which we fill with an empty string.

In [7]:
phdf.barephone.shift(1).fillna('')

0       
1     DH
2     IH
3      S
4     IH
5      Z
6     AH
7      L
8     EY
9      B
10    AH
11     L
12     F
13    AY
14     L
Name: barephone, dtype: object

Negative shifts move the values up. Now the last row is filled with an empty string.

In [8]:
phdf.barephone.shift(-1).fillna('')

0     IH
1      S
2     IH
3      Z
4     AH
5      L
6     EY
7      B
8     AH
9      L
10     F
11    AY
12     L
13    sp
14      
Name: barephone, dtype: object

Assign the `shift` values to new columns to create the phone context.

In [9]:
phdf['next'] = phdf.barephone.shift(-1).fillna('')
phdf['prev'] = phdf.barephone.shift(1).fillna('')
phdf

Unnamed: 0,t1,t2,phone,fname,dur,barephone,stress,next,prev
0,0.012472,0.192063,DH,../test/this_is_a_label_file.TextGrid,0.179592,DH,,IH,
1,0.192063,0.291837,IH1,../test/this_is_a_label_file.TextGrid,0.099773,IH,1.0,S,DH
2,0.291837,0.441497,S,../test/this_is_a_label_file.TextGrid,0.14966,S,,IH,IH
3,0.441497,0.501361,IH1,../test/this_is_a_label_file.TextGrid,0.059864,IH,1.0,Z,S
4,0.501361,0.611111,Z,../test/this_is_a_label_file.TextGrid,0.109751,Z,,AH,IH
5,0.611111,0.660998,AH0,../test/this_is_a_label_file.TextGrid,0.049887,AH,0.0,L,Z
6,0.660998,0.80068,L,../test/this_is_a_label_file.TextGrid,0.139683,L,,EY,AH
7,0.80068,0.970295,EY1,../test/this_is_a_label_file.TextGrid,0.169615,EY,1.0,B,L
8,0.970295,1.000227,B,../test/this_is_a_label_file.TextGrid,0.029932,B,,AH,EY
9,1.000227,1.030159,AH0,../test/this_is_a_label_file.TextGrid,0.029932,AH,0.0,L,B


<a name="merging-tiers"></a>
## Merging tiers

It can be useful to merge tiers based on the starting time of the labels (`t1`). For instance, you can add the 'word' metadata to the 'phone' with [`merge_asof`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge_asof.html).

It is often the case that phonetic tiers have different size labels, and `merge_asof` works best if the left dataframe is the one where the labels are shorter than the right dataframe. In this case multiple 'phone' labels make up the 'word' labels, so `phdf` is used as the left dataframe, which means it is the first argument to `merge_asof`.

**The examples shown here assume the tiers are strictly hierarchical, meaning each 'phone' belongs to one 'word' only, and each 'word' to one 'context' only.** If you need to merge non-hierarchical tiers check the `merge_asof` documentation to determine how to handle your data.

Use tests like the following to ensure a strict hierarchy exists. The `assert` statements check that the boundaries in a containing tier match one of the boundaries in the contained tier. If either of the following tests fail then your tiers are not strictly hierarchical.

In [10]:
# words contain phones
assert(
    ((wddf.t1.isin(phdf.t1)) & (wddf.t2.isin(phdf.t2))).all()
)
# contexts contain words
assert(
    ((ctxdf.t1.isin(wddf.t1)) & (ctxdf.t2.isin(wddf.t2))).all()
)

In [11]:
phwddf = pd.merge_asof(
    phdf,
    wddf.drop('fname', axis='columns'),  # Do not need multiple 'fname' columns in the result
    on='t1',
    # Add appropriate suffixes to repeated column names (t2, dur)
    suffixes=('_ph', '_wd')
)
phwddf

Unnamed: 0,t1,t2_ph,phone,fname,dur_ph,barephone,stress,next,prev,t2_wd,word,dur_wd
0,0.012472,0.192063,DH,../test/this_is_a_label_file.TextGrid,0.179592,DH,,IH,,0.441497,THIS,0.429025
1,0.192063,0.291837,IH1,../test/this_is_a_label_file.TextGrid,0.099773,IH,1.0,S,DH,0.441497,THIS,0.429025
2,0.291837,0.441497,S,../test/this_is_a_label_file.TextGrid,0.14966,S,,IH,IH,0.441497,THIS,0.429025
3,0.441497,0.501361,IH1,../test/this_is_a_label_file.TextGrid,0.059864,IH,1.0,Z,S,0.611111,IS,0.169615
4,0.501361,0.611111,Z,../test/this_is_a_label_file.TextGrid,0.109751,Z,,AH,IH,0.611111,IS,0.169615
5,0.611111,0.660998,AH0,../test/this_is_a_label_file.TextGrid,0.049887,AH,0.0,L,Z,0.660998,A,0.049887
6,0.660998,0.80068,L,../test/this_is_a_label_file.TextGrid,0.139683,L,,EY,AH,1.139909,LABEL,0.478912
7,0.80068,0.970295,EY1,../test/this_is_a_label_file.TextGrid,0.169615,EY,1.0,B,L,1.139909,LABEL,0.478912
8,0.970295,1.000227,B,../test/this_is_a_label_file.TextGrid,0.029932,B,,AH,EY,1.139909,LABEL,0.478912
9,1.000227,1.030159,AH0,../test/this_is_a_label_file.TextGrid,0.029932,AH,0.0,L,B,1.139909,LABEL,0.478912


Last we merge the labels from the 'context' tier. Since the 't2' column name from `ctxdf` does not match any column names in `phwddf` it won't get an automatic suffix, so we rename the 't2' column to add the '\_ctx' suffix before merging.

In [12]:
pwcdf = pd.merge_asof(
    phwddf,
    ctxdf.drop('fname', axis='columns').rename({'t2': 't2_ctx'}, axis='columns'),
    on='t1'
)
pwcdf

Unnamed: 0,t1,t2_ph,phone,fname,dur_ph,barephone,stress,next,prev,t2_wd,word,dur_wd,t2_ctx,context
0,0.012472,0.192063,DH,../test/this_is_a_label_file.TextGrid,0.179592,DH,,IH,,0.441497,THIS,0.429025,0.611111,happy
1,0.192063,0.291837,IH1,../test/this_is_a_label_file.TextGrid,0.099773,IH,1.0,S,DH,0.441497,THIS,0.429025,0.611111,happy
2,0.291837,0.441497,S,../test/this_is_a_label_file.TextGrid,0.14966,S,,IH,IH,0.441497,THIS,0.429025,0.611111,happy
3,0.441497,0.501361,IH1,../test/this_is_a_label_file.TextGrid,0.059864,IH,1.0,Z,S,0.611111,IS,0.169615,0.611111,happy
4,0.501361,0.611111,Z,../test/this_is_a_label_file.TextGrid,0.109751,Z,,AH,IH,0.611111,IS,0.169615,0.611111,happy
5,0.611111,0.660998,AH0,../test/this_is_a_label_file.TextGrid,0.049887,AH,0.0,L,Z,0.660998,A,0.049887,1.139909,sad
6,0.660998,0.80068,L,../test/this_is_a_label_file.TextGrid,0.139683,L,,EY,AH,1.139909,LABEL,0.478912,1.139909,sad
7,0.80068,0.970295,EY1,../test/this_is_a_label_file.TextGrid,0.169615,EY,1.0,B,L,1.139909,LABEL,0.478912,1.139909,sad
8,0.970295,1.000227,B,../test/this_is_a_label_file.TextGrid,0.029932,B,,AH,EY,1.139909,LABEL,0.478912,1.139909,sad
9,1.000227,1.030159,AH0,../test/this_is_a_label_file.TextGrid,0.029932,AH,0.0,L,B,1.139909,LABEL,0.478912,1.139909,sad


## Getting summary statistics

Pandas has several features for calculating [summary statistics](https://pandas.pydata.org/docs/getting_started/intro_tutorials/06_calculate_statistics.html). Several are illustrated below.

### Aggregating statistics

#### Mean durations

In [13]:
pwcdf.mean()

t1           0.802011
t2_ph        0.911096
dur_ph       0.109085
stress    7340.066667
t2_wd        1.029494
dur_wd       0.370491
t2_ctx       1.099335
dtype: float64

#### Median durations

In [14]:
pwcdf.median()

t1        0.800680
t2_ph     0.970295
dur_ph    0.109751
t2_wd     1.139909
dur_wd    0.478912
t2_ctx    1.139909
dtype: float64

#### `describe`

In [15]:
pwcdf.describe()

Unnamed: 0,t1,t2_ph,dur_ph,t2_wd,dur_wd,t2_ctx
count,15.0,15.0,15.0,15.0,15.0,15.0
mean,0.802011,0.911096,0.109085,1.029494,0.370491,1.099335
std,0.472933,0.466478,0.061674,0.466372,0.172683,0.414826
min,0.012472,0.192063,0.019955,0.441497,0.019955,0.611111
25%,0.471429,0.556236,0.054875,0.611111,0.29932,0.611111
50%,0.80068,0.970295,0.109751,1.139909,0.478912,1.139909
75%,1.085034,1.199773,0.144671,1.384354,0.478912,1.394331
max,1.628798,1.648753,0.229478,1.648753,0.488889,1.648753


### Aggregating statistics by group

#### Mean durations by context

In [16]:
pwcdf.groupby('context').mean()

Unnamed: 0_level_0,t1,t2_ph,dur_ph,t2_wd,dur_wd,t2_ctx
context,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
happy,0.772965,0.896019,0.123054,1.009095,0.345881,1.072285
sad,0.845578,0.933711,0.088133,1.060091,0.407407,1.139909


#### Mean durations by context and phone

In [17]:
pwcdf.groupby(['context', 'phone']).mean()

Unnamed: 0_level_0,Unnamed: 1_level_0,t1,t2_ph,dur_ph,t2_wd,dur_wd,t2_ctx
context,phone,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
happy,AY1,1.259637,1.489116,0.229478,1.628798,0.488889,1.648753
happy,DH,0.012472,0.192063,0.179592,0.441497,0.429025,0.611111
happy,F,1.139909,1.259637,0.119728,1.628798,0.488889,1.648753
happy,IH1,0.31678,0.396599,0.079819,0.526304,0.29932,0.611111
happy,L,1.489116,1.628798,0.139683,1.628798,0.488889,1.648753
happy,S,0.291837,0.441497,0.14966,0.441497,0.429025,0.611111
happy,Z,0.501361,0.611111,0.109751,0.611111,0.169615,0.611111
happy,sp,1.628798,1.648753,0.019955,1.648753,0.019955,1.648753
sad,AH0,0.805669,0.845578,0.039909,0.900454,0.264399,1.139909
sad,B,0.970295,1.000227,0.029932,1.139909,0.478912,1.139909


#### `describe` by categories

In [18]:
pwcdf.groupby(['context', 'phone']).describe()

Unnamed: 0_level_0,Unnamed: 1_level_0,t1,t1,t1,t1,t1,t1,t1,t1,t2_ph,t2_ph,...,dur_wd,dur_wd,t2_ctx,t2_ctx,t2_ctx,t2_ctx,t2_ctx,t2_ctx,t2_ctx,t2_ctx
Unnamed: 0_level_1,Unnamed: 1_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,...,75%,max,count,mean,std,min,25%,50%,75%,max
context,phone,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2
happy,AY1,1.0,1.259637,,1.259637,1.259637,1.259637,1.259637,1.259637,1.0,1.489116,...,0.488889,0.488889,1.0,1.648753,,1.648753,1.648753,1.648753,1.648753,1.648753
happy,DH,1.0,0.012472,,0.012472,0.012472,0.012472,0.012472,0.012472,1.0,0.192063,...,0.429025,0.429025,1.0,0.611111,,0.611111,0.611111,0.611111,0.611111,0.611111
happy,F,1.0,1.139909,,1.139909,1.139909,1.139909,1.139909,1.139909,1.0,1.259637,...,0.488889,0.488889,1.0,1.648753,,1.648753,1.648753,1.648753,1.648753,1.648753
happy,IH1,2.0,0.31678,0.176376,0.192063,0.254422,0.31678,0.379138,0.441497,2.0,0.396599,...,0.364172,0.429025,2.0,0.611111,0.0,0.611111,0.611111,0.611111,0.611111,0.611111
happy,L,1.0,1.489116,,1.489116,1.489116,1.489116,1.489116,1.489116,1.0,1.628798,...,0.488889,0.488889,1.0,1.648753,,1.648753,1.648753,1.648753,1.648753,1.648753
happy,S,1.0,0.291837,,0.291837,0.291837,0.291837,0.291837,0.291837,1.0,0.441497,...,0.429025,0.429025,1.0,0.611111,,0.611111,0.611111,0.611111,0.611111,0.611111
happy,Z,1.0,0.501361,,0.501361,0.501361,0.501361,0.501361,0.501361,1.0,0.611111,...,0.169615,0.169615,1.0,0.611111,,0.611111,0.611111,0.611111,0.611111,0.611111
happy,sp,1.0,1.628798,,1.628798,1.628798,1.628798,1.628798,1.628798,1.0,1.648753,...,0.019955,0.019955,1.0,1.648753,,1.648753,1.648753,1.648753,1.648753,1.648753
sad,AH0,2.0,0.805669,0.275146,0.611111,0.70839,0.805669,0.902948,1.000227,2.0,0.845578,...,0.371655,0.478912,2.0,1.139909,0.0,1.139909,1.139909,1.139909,1.139909,1.139909
sad,B,1.0,0.970295,,0.970295,0.970295,0.970295,0.970295,0.970295,1.0,1.000227,...,0.478912,0.478912,1.0,1.139909,,1.139909,1.139909,1.139909,1.139909,1.139909


### Count records by category

In [19]:
pwcdf['phone'].value_counts()

L      3
IH1    2
AH0    2
sp     1
AY1    1
DH     1
S      1
B      1
F      1
EY1    1
Z      1
Name: phone, dtype: int64

## Combining multiple label files

Concatenating dataframes from multiple files is another type of combining you might like to do. Normally you will do this after you have merged multiple tiers from the same label file.

We'll start by observing what happens when you use `pd.concat` to add the `ctxdf` dataframe to itself. By default the rows of the input dataframes are stacked. Note that the index has repeated values.

In [20]:
pd.concat([ctxdf, ctxdf])

Unnamed: 0,t1,t2,context,fname
0,0.012472,0.611111,happy,../test/this_is_a_label_file.TextGrid
1,0.611111,1.139909,sad,../test/this_is_a_label_file.TextGrid
2,1.139909,1.648753,happy,../test/this_is_a_label_file.TextGrid
0,0.012472,0.611111,happy,../test/this_is_a_label_file.TextGrid
1,0.611111,1.139909,sad,../test/this_is_a_label_file.TextGrid
2,1.139909,1.648753,happy,../test/this_is_a_label_file.TextGrid


To clean up the index we add `ignore_index=True`. Now the index values are consecutive integers.

In [21]:
pd.concat([ctxdf, ctxdf], ignore_index=True)

Unnamed: 0,t1,t2,context,fname
0,0.012472,0.611111,happy,../test/this_is_a_label_file.TextGrid
1,0.611111,1.139909,sad,../test/this_is_a_label_file.TextGrid
2,1.139909,1.648753,happy,../test/this_is_a_label_file.TextGrid
3,0.012472,0.611111,happy,../test/this_is_a_label_file.TextGrid
4,0.611111,1.139909,sad,../test/this_is_a_label_file.TextGrid
5,1.139909,1.648753,happy,../test/this_is_a_label_file.TextGrid


### Add label files by iteration

`pd.concat` can combine dataframes from a list of arbitrary length. An efficient way to construct a master dataframe from a large set of input label files is to iterate over the input filenames and create a list of dataframes for each textgrid, then stack them with `pd.concat`. (This is faster than incrementally adding to the master dataframe by using `pd.concat` every time a new textgrid is loaded.)

We start with a set of label file names in the form of a dataframe, as defined in the next cell. An easy way to construct a similar dataframe from a directory tree is provided by the [`dir2df` function](https://github.com/rsprouse/phonlab/blob/master/doc/Retrieving%20filenames%20in%20a%20directory%20tree%20with%20%60dir2df()%60.ipynb).

In [22]:
tgdf = pd.DataFrame({
    'relpath': '../test',
    'fname': ['this_is_a_label_file.TextGrid', 'this_is_a_label_file_scaled.TextGrid']
})
tgdf

Unnamed: 0,relpath,fname
0,../test,this_is_a_label_file.TextGrid
1,../test,this_is_a_label_file_scaled.TextGrid


Next we define a function that reads a textgrid from a row of `tgdf` and returns a dataframe that was merged using the techniques described in the ['Merging tiers' section](#merging-tiers) above.

If you wish you can add additional metadata, such as duration, as you load the textgrid tiers in the `tg2df` function. Adding phone durations is shown.

In [23]:
def tg2df(row):
    '''Load 'phone', 'word', and 'context' tiers from a textgrid and merge them.
    
    Parameters
    ----------
    
    row: named tuple
    A namedtuple as provided by `itertuples` that can be used to load a Praat
    textgrid from a path identified by row.relpath and row.fname. The textgrid is
    expected to have 'phone', 'word', and 'context' tiers.

    Returns
    -------
    
    mergedf: the merged dataframe.
    '''
    [phdf, wddf, ctxdf] = read_label(
        os.path.join(row.relpath, row.fname),
        ftype='praat',
        tiers=['phone', 'word', 'context']
    )
    # Throw an error if tiers are not strictly hierarchical.
    # words contain phones
    assert(
        ((wddf.t1.isin(phdf.t1)) & (wddf.t2.isin(phdf.t2))).all()
    )
    # contexts contain words
    assert(
        ((ctxdf.t1.isin(wddf.t1)) & (ctxdf.t2.isin(wddf.t2))).all()
    )
    phdf['dur_ph'] = phdf.t2 - phdf.t1
    # Merge phone and word tiers.
    phwddf = pd.merge_asof(
        phdf,
        wddf.drop('fname', axis='columns'),
        on='t1',
        suffixes=('_ph', '_wd')
    )
    # Merge context tier and return the result.
    return pd.merge_asof(
        phwddf,
        ctxdf.drop('fname', axis='columns').rename({'t2': 't2_ctx'}, axis='columns'),
        on='t1'
    )

The [`itertuples` method](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.itertuples.html) iterates over the rows of a dataframe. We can use it to apply the `tg2df` function to each textgrid in `tgdf` and compile the results into a list of dataframes. 

In [24]:
dflist = [tg2df(row) for row in tgdf.itertuples()]
dflist

[          t1     t2_ph phone                                  fname    dur_ph  \
 0   0.012472  0.192063    DH  ../test/this_is_a_label_file.TextGrid  0.179592   
 1   0.192063  0.291837   IH1  ../test/this_is_a_label_file.TextGrid  0.099773   
 2   0.291837  0.441497     S  ../test/this_is_a_label_file.TextGrid  0.149660   
 3   0.441497  0.501361   IH1  ../test/this_is_a_label_file.TextGrid  0.059864   
 4   0.501361  0.611111     Z  ../test/this_is_a_label_file.TextGrid  0.109751   
 5   0.611111  0.660998   AH0  ../test/this_is_a_label_file.TextGrid  0.049887   
 6   0.660998  0.800680     L  ../test/this_is_a_label_file.TextGrid  0.139683   
 7   0.800680  0.970295   EY1  ../test/this_is_a_label_file.TextGrid  0.169615   
 8   0.970295  1.000227     B  ../test/this_is_a_label_file.TextGrid  0.029932   
 9   1.000227  1.030159   AH0  ../test/this_is_a_label_file.TextGrid  0.029932   
 10  1.030159  1.139909     L  ../test/this_is_a_label_file.TextGrid  0.109751   
 11  1.139909  1

`pd.concat` stacks the dataframes from each of the textgrids. The `ignore_index=True` argument ensures that the index of the combined dataframe has no repeated values. Otherwise the index would have repetitions starting with 0 for each input textgrid.

In [25]:
alldf = pd.concat(dflist, ignore_index=True)
alldf

Unnamed: 0,t1,t2_ph,phone,fname,dur_ph,t2_wd,word,t2_ctx,context
0,0.012472,0.192063,DH,../test/this_is_a_label_file.TextGrid,0.179592,0.441497,THIS,0.611111,happy
1,0.192063,0.291837,IH1,../test/this_is_a_label_file.TextGrid,0.099773,0.441497,THIS,0.611111,happy
2,0.291837,0.441497,S,../test/this_is_a_label_file.TextGrid,0.14966,0.441497,THIS,0.611111,happy
3,0.441497,0.501361,IH1,../test/this_is_a_label_file.TextGrid,0.059864,0.611111,IS,0.611111,happy
4,0.501361,0.611111,Z,../test/this_is_a_label_file.TextGrid,0.109751,0.611111,IS,0.611111,happy
5,0.611111,0.660998,AH0,../test/this_is_a_label_file.TextGrid,0.049887,0.660998,A,1.139909,sad
6,0.660998,0.80068,L,../test/this_is_a_label_file.TextGrid,0.139683,1.139909,LABEL,1.139909,sad
7,0.80068,0.970295,EY1,../test/this_is_a_label_file.TextGrid,0.169615,1.139909,LABEL,1.139909,sad
8,0.970295,1.000227,B,../test/this_is_a_label_file.TextGrid,0.029932,1.139909,LABEL,1.139909,sad
9,1.000227,1.030159,AH0,../test/this_is_a_label_file.TextGrid,0.029932,1.139909,LABEL,1.139909,sad
