# OVERVIEW - LENA Foundation inspired audio project

## Count spoken words in audio clips

Given an audio clip, we want to count the number of spoken words it contains. 
There are paremeters used to split audio clips at "silences" which we intend to optimize.
We will use machine learning models to see if there are any features which effect the accuracy of our counter (for example, if our counter more accurate for females than males, we will account for that).

Later, we want to make a model which can associate word counts with people 
(e.g. 100 words by child, 300 words by mother, 500 words by teacher in a specific day). 





>### Most helpful link:

>*Split audio files using silence detection - StackOverflow*
<br>
>*https://stackoverflow.com/questions/45526996/split-audio-files-using-silence-detection*



>>### Other links to consider:

>>*Audio signal split at word level boundary - StackOverflow*
<br>
*https://stackoverflow.com/questions/64153590/audio-signal-split-at-word-level-boundary*

>>*Split speech audio file on words in python - StackOverflow*
<br>
*https://stackoverflow.com/questions/36458214/split-speech-audio-file-on-words-in-python*

>>*Using pyDub to chop up a long audio file - StackOverflow*
<br>
*https://stackoverflow.com/questions/23730796/using-pydub-to-chop-up-a-long-audio-file*


### Data source: 

*common-voice2 - Kaggle*
<br>
*https://www.kaggle.com/datasets/danielgraham1997/commonvoice2*

## Libraries and functions

Import the AudioSegment class for processing audio and the 
split_on_silence function for separating out the silent dips in the audio.



>*Split audio files using silence detection - StackOverflow* 
<br>
*https://stackoverflow.com/questions/45526996/split-audio-files-using-silence-detection*

In [1]:
import pydub
from pydub import AudioSegment
from pydub.silence import split_on_silence

Ability to display audio players within notebook.


>*How to insert Audio File in Jupyter Notebook Python - YouTube*
<br>
*https://www.youtube.com/watch?v=2RhvasAXxH4*

In [2]:
import IPython

Standard imports.

In [3]:
import pandas as pd
from sklearn.metrics import mean_absolute_error
import random

# Suppress superficial/redundant warnings
import warnings
warnings.simplefilter('error', UserWarning)

# Navigate directories (folders) and files.
import os

Display the directory paths so we can see where the folders containing our data are.
E.g. folder "commonvoice" contains folders "test", "train", and "validation".



>*Getting a list of all subdirectories in the current directory - StackOverflow*
<br>
*https://stackoverflow.com/questions/973473/getting-a-list-of-all-subdirectories-in-the-current-directory*

>*How to iterate over files in directory using Python - StackOverflow*
<br>
*https://www.geeksforgeeks.org/how-to-iterate-over-files-in-directory-using-python/*

In [4]:
# Remove unwanted items from a list.
# E.g. remove_items( [1,2,2,3,3,3] , 2 ) outputs [1,3,3,3].
def remove_items(test_list, item):
    res = [i for i in test_list if i != item] # Using list comprehension to perform the task.
    return res

# Create a folder directory tree.
def OrganizeDirectory(in_list,in_main_dir=r'C:\Users\kitsu\Documents\GitHub\Erdos_2022_05_Audio_Project'):
    out_list = in_list.copy()
    for i in range(len(out_list)-1):
        if out_list[i] in out_list[i+1]:
            out_list[i] = 0
        else:
            pass
    out_list = remove_items(out_list,0)
    for i in range(len(out_list)):
        out_list[i] = out_list[i][len(in_main_dir):]
    return out_list

# Simply print all of the items in a list.
def Display(in_list):
    for i in in_list:
        print(i)
    return in_list

## Local directories and data

In [5]:
# Be sure to UPDATE with your USER
dir_path = r'C:\Users\kitsu\Documents\GitHub\Erdos_2022_05_Audio_Project'


os.walk(dir_path)
dir_list = [x[0] for x in os.walk(dir_path)]
dir_list = [x for x in dir_list if '.git' not in x] # There are a lot of uneccessary hidden folders to remove
Display(OrganizeDirectory(dir_list))


TR_dir = r'C:\Users\kitsu\Documents\GitHub\Erdos_2022_05_Audio_Project\commonvoice\train\clips'
TR_aud = os.listdir(TR_dir)
TE_dir = r'C:\Users\kitsu\Documents\GitHub\Erdos_2022_05_Audio_Project\commonvoice\test\clips'
TE_aud = os.listdir(TE_dir)
VA_dir = r'C:\Users\kitsu\Documents\GitHub\Erdos_2022_05_Audio_Project\commonvoice\validation\clips'
VA_aud = os.listdir(VA_dir)

TR_df = pd.read_csv(r'C:\Users\kitsu\Documents\GitHub\Erdos_2022_05_Audio_Project\commonvoice\train\train1.csv', 
                    sep='\t', 
                    lineterminator='\r')
TE_df = pd.read_csv(r'C:\Users\kitsu\Documents\GitHub\Erdos_2022_05_Audio_Project\commonvoice\test\test1.csv', 
                    sep='\t', 
                    lineterminator='\r')
VA_df = pd.read_csv(r'C:\Users\kitsu\Documents\GitHub\Erdos_2022_05_Audio_Project\commonvoice\validation\validation1.csv', 
                    sep='\t', 
                    lineterminator='\r')


print('\n')
IPython.display.display(IPython.display.Audio(r'C:\Users\kitsu\Documents\GitHub\Erdos_2022_05_Audio_Project\commonvoice\train\clips\common_voice_en_10110.wav'))
IPython.display.display(IPython.display.Audio(r'C:\Users\kitsu\Documents\GitHub\Erdos_2022_05_Audio_Project\commonvoice\train\clips\common_voice_en_10153.wav'))
IPython.display.display(IPython.display.Audio(r'C:\Users\kitsu\Documents\GitHub\Erdos_2022_05_Audio_Project\commonvoice\train\clips\common_voice_en_101622.wav'))
IPython.display.display(IPython.display.Audio(r'C:\Users\kitsu\Documents\GitHub\Erdos_2022_05_Audio_Project\commonvoice\train\clips\common_voice_en_10187.wav'))
IPython.display.display(IPython.display.Audio(r'C:\Users\kitsu\Documents\GitHub\Erdos_2022_05_Audio_Project\commonvoice\train\clips\common_voice_en_10199.wav'))


print('\n')
TR_df.head()

\.ipynb_checkpoints
\commonvoice\test\clips
\commonvoice\train\clips
\commonvoice\validation\clips
\FunctionsGraphics
\Plots








Unnamed: 0,client_id,path,sentence,up_votes,down_votes,age,gender,accent
0,\nac5fea9cacdfa4a2d6291c780b0a0ee1c0f2c5d2389c...,common_voice_en_10110,I really liked the film we saw last week.,4.0,0.0,sixties,male,us
1,\nac5fea9cacdfa4a2d6291c780b0a0ee1c0f2c5d2389c...,common_voice_en_10153,Please put maimi yajima's song onto Operación ...,3.0,0.0,sixties,male,us
2,\n0e7bca7f3243636599bd8e7bbe03b4f09ae8898bb0e1...,common_voice_en_101622,Three men are painting a metal wall white.,3.0,0.0,twenties,male,indian
3,\nac5fea9cacdfa4a2d6291c780b0a0ee1c0f2c5d2389c...,common_voice_en_10187,"Though this be madness, yet there is method in it",4.0,0.0,sixties,male,us
4,\nac5fea9cacdfa4a2d6291c780b0a0ee1c0f2c5d2389c...,common_voice_en_10199,"As she watched, the cat washed his ears and th...",4.0,0.0,sixties,male,us


## FUNCTIONS FOR TASK

In [6]:
def CountWordsText(in_str):
    return len(in_str.split(' '))


# Define a function to normalize audio to a target amplitude.
def match_target_amplitude(aChunk, target_dBFS):
    change_in_dBFS = target_dBFS - aChunk.dBFS # Normalize given audio chunk
    return aChunk.apply_gain(change_in_dBFS)

def CountWordsAudio(in_aud, in_msl=10, in_st=-25, in_targ=-15):
    return len(split_on_silence(match_target_amplitude(in_aud,in_targ), min_silence_len = in_msl, silence_thresh = in_st))

In [7]:
# QUICK TEST

for song in range(len(TR_aud[:5])):
    print(CountWordsAudio(AudioSegment.from_wav(TR_dir+'\\'+TR_aud[song])), ' ', TR_df.loc[[song][0]][2], ' ', CountWordsText(TR_df.loc[[song][0]][2]))

5   I really liked the film we saw last week.   9
11   Please put maimi yajima's song onto Operación Bikini.   8
11   Three men are painting a metal wall white.   8
10   Though this be madness, yet there is method in it   10
14   As she watched, the cat washed his ears and then settled down to sleep.   14


<br>

## EDA (EXPLORATORY DATA ANALYSIS)

Let's make our model more precise.
To do this, let's simply try out different values for msl, st, targ
to see which ones produce the best accuracy.

In [8]:
# Let's use some random combinations of parameters to get an intuition of what works best.
ran_msl = range(14,16,1)
ran_st = range(-3,-1,1)
ran_targ = range(14,16,1)

def RandomTestRun(in_ran1=ran_msl, in_ran2=ran_st, in_ran3=ran_targ):
    mse = []
    for i in range(10):
        r1 = random.choice(in_ran1)
        r2 = random.choice(in_ran2)
        r3 = random.choice(in_ran3)
        l1 = []
        l2 = []
        for song in range(len(TR_aud[:10])):
            l1.append(CountWordsAudio(AudioSegment.from_wav(TR_dir+'\\'+TR_aud[song]),r1,r2,r3))
            l2.append(CountWordsText(TR_df.loc[[song][0]][2]))
        print(l1,l2,mean_absolute_error(l1,l2), 'with:', 'r1',r1,'r2',r2,'r3',r3)
        mse.append(mean_absolute_error(l1,l2))
    return mse

RandomTestRun()

[4, 10, 9, 13, 10, 5, 13, 9, 8, 11] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 2.7 with: r1 14 r2 -2 r3 14
[3, 8, 10, 5, 14, 6, 11, 8, 9, 10] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 2.3 with: r1 14 r2 -3 r3 14
[4, 9, 10, 9, 11, 5, 10, 7, 8, 11] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 1.9 with: r1 14 r2 -2 r3 15
[4, 11, 9, 3, 10, 4, 10, 7, 9, 10] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 2.6 with: r1 14 r2 -3 r3 15
[4, 10, 9, 13, 10, 5, 13, 9, 8, 11] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 2.7 with: r1 14 r2 -2 r3 14
[4, 10, 9, 13, 10, 5, 13, 9, 8, 11] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 2.7 with: r1 14 r2 -2 r3 14
[4, 10, 9, 13, 10, 5, 13, 9, 8, 11] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 2.7 with: r1 14 r2 -2 r3 14
[4, 10, 7, 3, 10, 4, 10, 7, 9, 10] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 2.5 with: r1 15 r2 -3 r3 15
[3, 8, 10, 5, 14, 6, 11, 8, 9, 10] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 2.3 with: r1 14 r2 -3 r3 14
[4, 10, 7, 3, 10, 4, 10, 7, 9, 10] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 2.5 with: r1 15 r2 -3 r3 15


[2.7, 2.3, 1.9, 2.6, 2.7, 2.7, 2.7, 2.5, 2.3, 2.5]

<br>

## NOTABLE RECORDS


I ran the above test multiple times, and here are some of the results that allowed me to improve CountWordsAudio.

**[3, 9, 8, 3, 9, 4, 9, 7, 8, 10] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 2.3 with:              r1 16 r2 -9 r3 8**
<br>
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 8.3 with: r1 8 r2 1 r3 12
<br>
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 8.3 with: r1 -7 r2 0 r3 -8
<br>
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 8.3 with: r1 -6 r2 18 r3 14
<br>
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 8.3 with: r1 -10 r2 10 r3 6
<br>
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 8.3 with: r1 0 r2 8 r3 4
<br>
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 8.3 with: r1 -6 r2 12 r3 7
<br>
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 8.3 with: r1 18 r2 18 r3 5
<br>
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 8.3 with: r1 0 r2 19 r3 13
<br>
[6, 9, 10, 4, 10, 4, 6, 8, 8, 10] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 2.4 with: r1 11 r2 -4 r3 19
<br>
[2.3, 8.3, 8.3, 8.3, 8.3, 8.3, 8.3, 8.3, 8.3, 2.4]

[9, 6, 12, 3, 5, 4, 6, 7, 4, 10] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 3.3 with: r1 17 r2 -17 r3 7
<br>
[9, 6, 12, 3, 5, 4, 5, 9, 4, 11] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 3.7 with: r1 15 r2 -7 r3 17<br>
[13, 14, 41, 7, 8, 6, 7, 33, 8, 15] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 9.1 with: r1 5 r2 -10 r3 15<br>
[6, 3, 1, 11, 10, 10, 6, 6, 3, 7] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 3.6 with: r1 9 r2 -19 r3 13<br>
[9, 11, 11, 4, 8, 4, 6, 6, 5, 11] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 2.8 with: r1 11 r2 -7 r3 16<br>
[4, 8, 10, 8, 10, 6, 9, 8, 9, 11] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 2.2 with: r1 16 r2 -5 r3 8<br>
[9, 6, 13, 3, 5, 4, 5, 8, 4, 10] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 3.6 with: r1 16 r2 -9 r3 15<br>
**[4, 7, 8, 4, 12, 5, 5, 6, 7, 9] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 2.2 with:               r1 17 r2 -5 r3 15**<br>
[9, 6, 12, 3, 5, 4, 6, 6, 4, 10] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 3.2 with: r1 18 r2 -18 r3 6<br>
[3, 12, 10, 4, 11, 5, 10, 7, 8, 10] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 2.7 with: r1 14 r2 -8 r3 8<br>
[3.3, 3.7, 9.1, 3.6, 2.8, 2.2, 3.6, 2.2, 3.2, 2.7]

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 8.3 with: r1 16 r2 1 r3 10<br>
[5, 11, 9, 7, 10, 3, 12, 6, 9, 10] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 2.3 with: r1 17 r2 -3 r3 8<br>
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 8.3 with: r1 16 r2 2 r3 9<br>
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 8.3 with: r1 15 r2 3 r3 12<br>
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 8.3 with: r1 17 r2 1 r3 13<br>
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 8.3 with: r1 17 r2 3 r3 9<br>
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 8.3 with: r1 17 r2 1 r3 11<br>
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 8.3 with: r1 16 r2 1 r3 8<br>
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 8.3 with: r1 16 r2 2 r3 10<br>
[6, 10, 9, 12, 9, 5, 11, 7, 8, 10] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 2.0 with:              r1 15 r2 -2 r3 13<br>
**[8.3, 2.3, 8.3, 8.3, 8.3, 8.3, 8.3, 8.3, 8.3, 2.0]**

[4, 7, 8, 3, 10, 5, 6, 7, 8, 9] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 2.4 with: r1 15 r2 -5 r3 14<br>
[4, 7, 9, 4, 11, 5, 5, 6, 7, 9] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 2.4 with: r1 15 r2 -5 r3 15<br>
**[4, 8, 10, 8, 14, 5, 10, 8, 9, 10] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 1.7 with:            r1 14 r2 -3 r3 13**<br>
[8, 13, 11, 11, 12, 4, 11, 7, 9, 10] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 1.9 with: r1 14 r2 -2 r3 12<br>
[4, 8, 10, 9, 13, 5, 10, 7, 8, 11] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 1.6 with: r1 15 r2 -2 r3 15<br>
[4, 8, 10, 8, 14, 5, 10, 8, 9, 10] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 1.7 with: r1 14 r2 -3 r3 13<br>
[4, 8, 11, 8, 13, 4, 10, 8, 8, 11] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 1.8 with: r1 15 r2 -3 r3 12<br>
[3, 9, 7, 3, 10, 4, 10, 7, 8, 10] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 2.4 with: r1 14 r2 -4 r3 14<br>
[4, 11, 9, 3, 10, 4, 10, 7, 9, 10] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 2.6 with: r1 14 r2 -3 r3 15<br>
[7, 19, 10, 9, 20, 5, 15, 6, 9, 10] [9, 8, 8, 10, 14, 4, 9, 5, 8, 8] 3.3 with: r1 14 r2 -1 r3 15<br>
[2.4, 2.4, 1.7, 1.9, 1.6, 1.7, 1.8, 2.4, 2.6, 3.3]

**r1 15 r2 -2 r3 15**

<br>

## Optimized CountWordsAudio function

In [9]:
def CountWordsAudio(in_aud, in_msl=15, in_st=-2, in_targ=15):
    return len(split_on_silence(match_target_amplitude(in_aud,in_targ), min_silence_len = in_msl, silence_thresh = in_st))

for song in range(len(TR_aud[:50])):
    print(CountWordsAudio(AudioSegment.from_wav(TR_dir+'\\'+TR_aud[song])), ' ', TR_df.loc[[song][0]][2], ' ', CountWordsText(TR_df.loc[[song][0]][2]))

l1 = []
l2 = []
for song in range(len(TR_aud[:100])):
    l1.append(CountWordsAudio(AudioSegment.from_wav(TR_dir+'\\'+TR_aud[song]),15,-2,15))
    l2.append(CountWordsText(TR_df.loc[[song][0]][2]))

print('\n')
print("Mean absolute error:", mean_absolute_error(l1,l2))

4   I really liked the film we saw last week.   9
8   Please put maimi yajima's song onto Operación Bikini.   8
10   Three men are painting a metal wall white.   8
9   Though this be madness, yet there is method in it   10
13   As she watched, the cat washed his ears and then settled down to sleep.   14
5   add artist in my   4
10   There's a thread in their forums complaining about this.   9
7   Let me talk to him.   5
8   Ice skater taking a hit by another player   8
11   A girl smiles while interacting with medical personnel   8
6   Man doing a skateboard trick   5
12   A man sitting behind a typewriter on the street selling poetry   11
18   A group of men women carrying carpentry tools stand around the framework of a house   15
12   Two girls and a boy are getting ready to jump into a backyard pool.   14
6   A tractor is driving in a field.   7
11   A lady cooking a cake for her family.   8
5   Add the tune to the Rage Radio playlist.   8
11   An African American man is standing on