# Prepare .phn files to become Textgrids

The Corpus DIMEx100 contains segmented and labeled .phn files for phone and word tiers for each .wav file. However, the units of time are in miliseconds (instead of seconds) and the files contain headers that Praat scripts cannot read to create corresponding Textgrids from each .phn file. This notebook will be used for the processing of these .phn files so that Textgrids can be generated.

In [28]:
import pandas as pd
import os
import csv

In [44]:
# find current working directory
os.getcwd()

'E:\\cbas_dime\\male'

In [43]:
# change cwd to folder containing .phn files
os.chdir("E:\\cbas_dime\\male")

In [45]:
# for loop to loop through each .phn folder in directory
# for loop to loop through each file in folder

folders = ["phn_word", "phn_phone"]

for f in folders:
    folder  = "E:\\cbas_dime\\male\\" + f
    files = os.listdir(folder)
    for file in files:
        # import file and specify delim as space and first 2 rows as header
        phn = pd.read_csv(folder + "/" + file, delim_whitespace = True, header = [0,1])
        
        # reset index
        phn = phn.reset_index()
        
        # rename cols
        phn.columns = ["0", "1", "2"]

        # divide first 2 cols by 1000
        phn["0"] = phn["0"].apply(lambda x: int(x)/1000)
        phn["1"] = phn["1"].apply(lambda x: int(x)/1000)

        # save new file
        phn.to_csv(folder + "/" + file, sep=' ', quoting=csv.QUOTE_NONE, escapechar='\\', index = False, header = None)

These files are then converted to TextGrids using a Praat script from Daniel Hirst 2010 and adapted by Annie Helms 2020. A second Praat script is used (Lennes, 2013) to extract F1, F2, and F3 for each phone in the TextGrids.