# Five Minutes with AI: Speech Analysis with Praat

This week, we will use Praat to analyze speech data. [Praat](https://www.fon.hum.uva.nl/praat/) is a software that allows extracting phonetic information about speech using a computer. It has an app with user interface and many functionalities. 


## What do we like about Praat?

* It is very easy to use. One can dowload the app from here and anlayze audio files within minutes.
* It provides basic features such as pitch (loudness), speaking rate (number of syllables), pauses along with advanced features such labeling and segmentation.

## What we don't like about Praat?

* Analyzing large number of audio files using praat require you to choose some parameters. 

By using the package [Parselmouth](https://parselmouth.readthedocs.io/en/stable/), we can use praat through python which helps to analyze large number of audio files.

## First, let's install the package

In [1]:
!pip install praat-parselmouth

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting praat-parselmouth
  Downloading praat_parselmouth-0.4.3-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (10.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.7/10.7 MB[0m [31m43.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: praat-parselmouth
Successfully installed praat-parselmouth-0.4.3


In [2]:
import sys
!{sys.executable} -m pip install praat-parselmouth
import os
import csv
import subprocess
import parselmouth
from glob import glob
from parselmouth.praat import call
import math
import pandas

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


## This is a codeblock we have written to export the information extracted from audio files into a table for later analysis.

In [3]:
def speech_rate(filename, silencedb = -25, mindip = 2, minpause = 0.3):
    '''
    The following code has been adapted from https://osf.io/r8jau to use parcel mouth to extract speaking rate
    '''
    sound = parselmouth.Sound(filename)
    originaldur = sound.get_total_duration()
    intensity = sound.to_intensity(50)
    start = call(intensity, "Get time from frame number", 1)
    nframes = call(intensity, "Get number of frames")
    end = call(intensity, "Get time from frame number", nframes)
    min_intensity = call(intensity, "Get minimum", 0, 0, "Parabolic")
    max_intensity = call(intensity, "Get maximum", 0, 0, "Parabolic")

    # get .99 quantile to get maximum (without influence of non-speech sound bursts)
    max_99_intensity = call(intensity, "Get quantile", 0, 0, 0.99)

    # estimate Intensity threshold
    threshold = max_99_intensity + silencedb
    threshold2 = max_intensity - max_99_intensity
    threshold3 = silencedb - threshold2
    if threshold < min_intensity:
        threshold = min_intensity

    # get pauses (silences) and speakingtime
    textgrid = call(intensity, "To TextGrid (silences)", threshold3, minpause, 0.1, "silent", "sounding")
    silencetier = call(textgrid, "Extract tier", 1)
    silencetable = call(silencetier, "Down to TableOfReal", "sounding")
    npauses = call(silencetable, "Get number of rows")
    speakingtot = 0
    for ipause in range(npauses):
        pause = ipause + 1
        beginsound = call(silencetable, "Get value", pause, 1)
        endsound = call(silencetable, "Get value", pause, 2)
        speakingdur = endsound - beginsound
        speakingtot += speakingdur

    intensity_matrix = call(intensity, "Down to Matrix")
    sound_from_intensity_matrix = call(intensity_matrix, "To Sound (slice)", 1)
    # use total duration, not end time, to find out duration of intdur (intensity_duration)
    # in order to allow nonzero starting times.
    intensity_duration = call(sound_from_intensity_matrix, "Get total duration")
    intensity_max = call(sound_from_intensity_matrix, "Get maximum", 0, 0, "Parabolic")
    point_process = call(sound_from_intensity_matrix, "To PointProcess (extrema)", "Left", "yes", "no", "Sinc70")
    # estimate peak positions (all peaks)
    numpeaks = call(point_process, "Get number of points")
    t = [call(point_process, "Get time from index", i + 1) for i in range(numpeaks)]

    # fill array with intensity values
    timepeaks = []
    peakcount = 0
    intensities = []
    for i in range(numpeaks):
        value = call(sound_from_intensity_matrix, "Get value at time", t[i], "Cubic")
        if value > threshold:
            peakcount += 1
            intensities.append(value)
            timepeaks.append(t[i])

    # fill array with valid peaks: only intensity values if preceding
    # dip in intensity is greater than mindip
    validpeakcount = 0
    currenttime = timepeaks[0]
    currentint = intensities[0]
    validtime = []

    for p in range(peakcount - 1):
        following = p + 1
        followingtime = timepeaks[p + 1]
        dip = call(intensity, "Get minimum", currenttime, timepeaks[p + 1], "None")
        diffint = abs(currentint - dip)
        if diffint > mindip:
            validpeakcount += 1
            validtime.append(timepeaks[p])
        currenttime = timepeaks[following]
        currentint = call(intensity, "Get value at time", timepeaks[following], "Cubic")

    # Look for only voiced parts
    pitch = sound.to_pitch_ac(0.02, 30, 4, False, 0.03, 0.25, 0.01, 0.35, 0.25, 450)
    voicedcount = 0
    voicedpeak = []

    for time in range(validpeakcount):
        querytime = validtime[time]
        whichinterval = call(textgrid, "Get interval at time", 1, querytime)
        whichlabel = call(textgrid, "Get label of interval", 1, whichinterval)
        value = pitch.get_value_at_time(querytime) 
        if not math.isnan(value):
            if whichlabel == "sounding":
                voicedcount += 1
                voicedpeak.append(validtime[time])

    # calculate time correction due to shift in time for Sound object versus
    # intensity object
    timecorrection = originaldur / intensity_duration

    # Insert voiced peaks in TextGrid
    call(textgrid, "Insert point tier", 1, "syllables")
    for i in range(len(voicedpeak)):
        position = (voicedpeak[i] * timecorrection)
        call(textgrid, "Insert point", 1, position, "")

    # return results
    speakingrate = voicedcount / originaldur
    articulationrate = voicedcount / speakingtot
    npause = npauses - 1
    speechrate_dictionary = {'soundname':filename, 
                             'nsyll':voicedcount,
                             'npause': npause,
                             'dur(s)':originaldur,
                             'phonationtime(s)':intensity_duration,
                             'speechrate(nsyll / dur)': speakingrate,
                             "articulation rate(nsyll / phonationtime)":articulationrate}
    return speechrate_dictionary

## Let's run this on a .wav file.

#### write the name of your file below. For example, if your file is called "relaxing_rainsound.wav", then the code should be `file = './drive/MyDrive/relaxing_rainsound.wav'`

#### We are using a short snippet from Briish TV game show "Would I Lie to You", where celebrities compete to guess the truthfulness of each 

* Upload your file to google drive folder

In [9]:
file = './drive/MyDrive/Colab Notebooks/s09e08_3.wav'

#give your notebook access to google drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## Then run the following code, that will:
* analyze the audio file with Praat through Parselmouth
* print the extracted information to display


In [13]:
dict = speech_rate(file)
#print("The raw output of the function is: \n")
#print(dict , '\n')
#print("Let's break that down: \n")
print("The name of the file / soundname is :", dict["soundname"])
print("The number of syllables is: ", dict["nsyll"])
print("The number of pauses in the file is: ", dict["npause"])
print("The duration of the audio file in seconds is: ", dict["dur(s)"])
print("The amount of time where there is speaking in seconds is: ", dict["phonationtime(s)"])
print("The speaking rate, or the number of syllables divided by the duration is:" , dict["speechrate(nsyll / dur)"])
print("The articulation rate, or the number of syllables divided by the amount of time with speech is: ", dict["articulation rate(nsyll / phonationtime)"])

The name of the file / soundname is : ./drive/MyDrive/Colab Notebooks/s09e08_3.wav
The number of syllables is:  48
The number of pauses in the file is:  2
The duration of the audio file in seconds is:  13.795
The amount of time where there is speaking in seconds is:  13.795
The speaking rate, or the number of syllables divided by the duration is: 3.4795215657847045
The articulation rate, or the number of syllables divided by the amount of time with speech is:  4.092246046293533


## If you have a large set of audio files you want to analyze together:

### The following function, given the name of the folder containing the files, will provide a summary table with all teh information extracted

In [14]:
def parsel_ext(folder_in, file_out):
    '''
    Takes in a folder of .wav files, uses parcel mouth and praat to extract speaking rate of each file, writes to csv file
    
    Inputs:
    folder_in -- the name of the folder to read
    file_out -- name of csv file to write
    
    Outputs: 
    files_lost -- the list of files that the script could not be called on and therefore did not make it to the csv file
    writes to file file_out with columns for the associated file name (minus the ".wav"), speech_rate
    '''

    with open(file_out+".csv", 'w', newline='') as file:

        writer = csv.writer(file)
        writer.writerow(["file", "nsyll", "npause", "dur", "phone_time", "speech_rate", "art_rate"])

        files_lost = []

        #iterating through all files in a folder
        for filename in os.listdir(folder_in):
            f = os.path.join(folder_in, filename)
            if filename[-4:] == ".wav" and os.path.isfile(f):

                #call parsel mouth script
                try:
                    temp_dict = speech_rate(f)
                    
                    writer.writerow([filename[:-4], temp_dict["nsyll"], temp_dict["npause"], temp_dict["dur(s)"], \
                            temp_dict["phonationtime(s)"], temp_dict["speechrate(nsyll / dur)"], temp_dict["articulation rate(nsyll / phonationtime)"]])
                except:
                    files_lost.append(f)
        
    return files_lost

## Let's pass a folder with our files. 

### You can enter the name of a folder to run the code on. For example, if the folder is called sample_data/test_folder, then type `folder = './drive/MyDrive/test_folder`.

### We will use a folder with 4 audio snippets from 'Would I Lie to You' season 9 episode 8.

In [15]:
folder = "./drive/MyDrive/Colab Notebooks/s09e08"

Now, we'll call the code to extract the features from all the .wav files in the folder. Run the code below:

In [21]:
files_lost = parsel_ext(folder, "summary")

## Let's look at the result file now as a table.

In [22]:
print(pandas.read_csv("summary.csv").to_markdown(index=False))

| file     |   nsyll |   npause |     dur |   phone_time |   speech_rate |   art_rate |
|:---------|--------:|---------:|--------:|-------------:|--------------:|-----------:|
| s09e08_1 |      52 |        2 | 18.5233 |      18.5233 |       2.80728 |    3.13298 |
| s09e08_3 |      48 |        2 | 13.795  |      13.795  |       3.47952 |    4.09225 |
| s09e08_4 |      45 |        3 | 13.9    |      13.9    |       3.23741 |    4.32443 |
| s09e08_2 |      26 |        1 |  6.926  |       6.926  |       3.75397 |    4.44672 |


## Debugging
If Praat is unable analyze some of the audio files we have in our collection, the following function will tell which files, if any, the program is not able to run on. This usually happens when the file is too short for analysis to be done on it. In such case, you may want to keep track of which files were lost in the analysis and remove it later.

The following files were lost when you ran the code on your folder:

In [23]:
print(files_analyzed)

[]


### Which means, in this case all the files were analyzed with Praat!

## How are we using Praat?

We are using Praat to analyse speech data during conversations for:
* deception detection
* alignment during a conversation
* understanding non-verbal cues when interacting with neuro-divergent children.

## I am curious to know more!

If you like Praat-Parselmouth, we have other great resources coming up.
* If you have audio, we can introduce you to [Whisper](https://colab.research.google.com/drive/1yAHySBUs6W5GRrJfg4IrSrDn-tAeMCE1?usp=sharing) to get the transcript.
* If you have *face video*, we can introduce you to facial expression ana landmark analysis with [Pyfeat](https://colab.research.google.com/drive/1lCiTDUp8YHUB9g6UHN4w7W3nZJOCOLUa?usp=sharing).
* If you have *fully body video*, we can introduce you to pose detection and person tracking with [OpenPose](https://colab.research.google.com/drive/1PB6sa3PFwT2ag7_7n7KaaUdf7Fj2HxBy?usp=sharing)
* For the *audio*, we can add extraction of acoustic/prosodic features.
* Once you have a *transcript*, we can add NLP to identify sentiment, named entities, and more.

## End Note
If you use OpnePose, PyFeat, Whisper or Praat, please let us know. We want to work with you! 
We want to know what works and what doesn't! We want to understand your joys and your concerns.

## Acknowledgement

The credit for this tutorial goes to DavisAI RA Meredith Green ('24).