# 2 Extract speech and laughter from audio files

For speech recognition we try the [SpeechBrain](https://github.com/speechbrain/speechbrain) project and OpenAI's [Whisper](https://github.com/openai/whisper) model.

We also try identifying laughter with [Laughter Detection model](https://github.com/jrgillick/laughter-detection) by jrgillick. 

This code here is based on prototypes developed at Sage IDEMS hackathon in 2023 
https://github.com/chilledgeek/ethical_ai_hackathon_2023


In [26]:
import os
import time
import json
import pandas as pd



## 2.1 Speech-to-text - SpeechBrain

Let's try [SpeechBrain](https://github.com/speechbrain/speechbrain). It's not on Anaconda so we'll have to install it with pip.

`pip install speechbrain`

It depends on pytorch and torchaudio. So we'll install them with conda. Note that we need to specify the cuda version. And install a sound processing backend libary. On windows this is `soundfile` on mac\linux it is `sox`. 

```
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
conda install -c conda-forge pysoundfile
conda install -c conda-forge ffmpeg
```

Windows users: If you encounter `Backend not found.` or similiar errors try restarting the PC.   
Windows users: If you encounter `: UserWarning: huggingface_hub cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in ` You could try running VSCode as Administrator (Right click icon in start menu look under More >).  See [speechbrain issue 1155](https://github.com/speechbrain/speechbrain/issues/1155) 


*Initially we tried with Google Cloud Speech to text. But it's a closed model and kept crashing my ipykernel. Then we tried the [Speech Recognition](https://github.com/Uberi/speech_recognition) project to try and access the [Sphinx](https://github.com/cmusphinx/pocketsphinx) speech model. But that pocketsphinx is not maintained on Anaconda any more and compiling from source is a bit beyond me :)*

In [2]:
from speechbrain.pretrained import EncoderDecoderASR

source="speechbrain/asr-crdnn-rnnlm-librispeech" 
savedir="pretrained_models/asr-crdnn-rnnlm-librispeech"

asr_model = EncoderDecoderASR.from_hparams(
    source=source, 
    savedir=savedir)

The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.


In [23]:
videos_in = "..\\LookitLaughter.test\\"
videos_out = "..\\data\\1_interim\\"

AUDIO_FILE = "..\\data\\1_interim\\2UWdXP.joke1.rep2.take1.Peekaboo.mp3"
AUDIO_FILE2 = "..\\data\\1_interim\\2UWdXP.joke2.rep1.take1.NomNomNom.mp3"

testset = [AUDIO_FILE, AUDIO_FILE2] 

In [5]:
for audio_file in testset:
    results = asr_model.transcribe_file(audio_file)
    print(results)

DRINK DRINK DRINK SAID D'ARTAGNAN
HE MURMURED WON'T YOU RUDDY


In [3]:
import moviepy.editor as mp
import os

def convert_mp3_to_wav_moviepy(audio_file, output_ext="wav"):
    """Converts video to audio using MoviePy library
    that uses `ffmpeg` under the hood"""
    filename, ext = os.path.splitext(audio_file)
    clip = mp.AudioFileClip(audio_file)
    clip.write_audiofile(f"{filename}.{output_ext}")


In [5]:

#convert_audio_to_wav_moviepy(AUDIO_FILE)

  from .autonotebook import tqdm as notebook_tqdm
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.


# 2.2 Speech-to-text using OpenAI Whisper 



https://analyzingalpha.com/openai-whisper-python-tutorial 

In [1]:
import whisper
model = whisper.load_model("base")

In [21]:
def whisper_transcribe(audio_file, save_path, saveJSON = True):
    result = model.transcribe(audio_file, verbose = True)
    if saveJSON:
        basename = os.path.basename(audio_file)
        filename, ext = os.path.splitext(basename)
        jsonfile = f"{save_path}{filename}.json"
        with open(jsonfile, "w") as f:
            json.dump(result, f)
        return jsonfile, result
    else:
        return result

In [11]:
def getprocessedvideos(videos_out):
    #check if we have already processed some videos
    if os.path.exists(videos_out + "\\processedvideos.xlsx"):
        print("found existing processedvideos.xlsx")
        processedvideos = pd.read_excel(videos_out + "\\processedvideos.xlsx")
    else:
        #create new dataframe for info about processed videos
        print("creating new processedvideos.xlsx")
        cols = ["VideoID","ChildID", "JokeType","JokeNum","JokeRep","JokeTake", "HowFunny","LaughYesNo", "Frames", "FPS", "Width", "Height", "Duration","Keypoints.when", "Keypoints.file","Audio.when","Audio.file","Cleaned.when","Cleaned.file","LastError"]
        processedvideos = pd.DataFrame(columns=cols)
    return processedvideos

In [17]:
processedvideos = getprocessedvideos(videos_out)
processedvideos.head()

found existing processedvideos.xlsx


Unnamed: 0.1,Unnamed: 0,VideoID,ChildID,JokeType,JokeNum,JokeRep,JokeTake,HowFunny,LaughYesNo,Frames,...,Duration,Keypoints.when,Keypoints.file,Audio.when,Audio.file,Cleaned.when,Cleaned.file,LastError,Speech.file,Speech.when
0,0,2UWdXP.joke1.rep2.take1.Peekaboo.mp4,2UWdXP,Peekaboo,1,2,1,Slightly funny,No,217,...,15.175982,2023-09-19 11:39:38,..\data\1_interim\\2UWdXP.joke1.rep2.take1.Pee...,2023-09-19 12:31:14,..\data\1_interim\\2UWdXP.joke1.rep2.take1.Pee...,,,,,
1,1,2UWdXP.joke1.rep3.take1.Peekaboo.mp4,2UWdXP,Peekaboo,1,3,1,Slightly funny,No,152,...,10.58563,2023-09-19 11:40:12,..\data\1_interim\\2UWdXP.joke1.rep3.take1.Pee...,2023-09-19 12:31:15,..\data\1_interim\\2UWdXP.joke1.rep3.take1.Pee...,,,,,
2,2,2UWdXP.joke2.rep1.take1.NomNomNom.mp4,2UWdXP,NomNomNom,2,1,1,Funny,No,95,...,7.174514,2023-09-19 11:40:36,..\data\1_interim\\2UWdXP.joke2.rep1.take1.Nom...,2023-09-19 12:31:16,..\data\1_interim\\2UWdXP.joke2.rep1.take1.Nom...,,,,,
3,3,2UWdXP.joke2.rep2.take1.NomNomNom.mp4,2UWdXP,NomNomNom,2,2,1,Slightly funny,No,97,...,6.824348,2023-09-19 11:41:00,..\data\1_interim\\2UWdXP.joke2.rep2.take1.Nom...,2023-09-19 12:31:16,..\data\1_interim\\2UWdXP.joke2.rep2.take1.Nom...,,,,,
4,4,2UWdXP.joke2.rep3.take1.NomNomNom.mp4,2UWdXP,NomNomNom,2,3,1,Slightly funny,No,133,...,9.350991,2023-09-19 11:41:34,..\data\1_interim\\2UWdXP.joke2.rep3.take1.Nom...,2023-09-19 12:31:17,..\data\1_interim\\2UWdXP.joke2.rep3.take1.Nom...,,,,,


In [27]:
for index, r in processedvideos.iterrows():
    if pd.isnull(r["Speech.file"]) and not pd.isnull(r["Audio.file"]):
        speechpath, result = whisper_transcribe(r["Audio.file"],save_path=videos_out)
        r["Speech.file"] = speechpath
        r["Speech.when"] = time.strftime("%Y-%m-%d %H:%M:%S", time.gmtime())
        #update this row in processedvideos dataframe
        processedvideos.loc[index] = r
        

processedvideos.to_excel(videos_out + "\\processedvideos.xlsx")
processedvideos.head()

Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Detected language: English
[00:00.000 --> 00:04.000]  Hey, excuse me. Look.
[00:04.000 --> 00:07.000]  Ah, I can't handle this.
[00:07.000 --> 00:09.000]  I'm just going to put it on.
[00:09.000 --> 00:11.000]  You know, peek-a-boo!
[00:12.000 --> 00:13.000]  Hey.
Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Detected language: English
[00:00.000 --> 00:02.000]  Ready baby girl?
[00:02.000 --> 00:04.000]  Good morning girl.
[00:04.000 --> 00:07.000]  Take a move!
[00:09.000 --> 00:11.000]  Go to smile, huh?
Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Detected language: English
[00:00.000 --> 00:02.000]  Ready?
[00:02.000 --> 00:04.000]  Ah, no, no, no, no, no.
Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Detected language: English
[00:00.000 --> 00:

PermissionError: [Errno 13] Permission denied: '..\\data\\1_interim\\\\processedvideos.xlsx'

In [28]:

processedvideos.to_excel(videos_out + "\\processedvideos.xlsx")
processedvideos.head()

Unnamed: 0.1,Unnamed: 0,VideoID,ChildID,JokeType,JokeNum,JokeRep,JokeTake,HowFunny,LaughYesNo,Frames,...,Duration,Keypoints.when,Keypoints.file,Audio.when,Audio.file,Cleaned.when,Cleaned.file,LastError,Speech.file,Speech.when
0,0,2UWdXP.joke1.rep2.take1.Peekaboo.mp4,2UWdXP,Peekaboo,1,2,1,Slightly funny,No,217,...,15.175982,2023-09-19 11:39:38,..\data\1_interim\\2UWdXP.joke1.rep2.take1.Pee...,2023-09-19 12:31:14,..\data\1_interim\\2UWdXP.joke1.rep2.take1.Pee...,,,,..\data\1_interim\2UWdXP.joke1.rep2.take1.Peek...,2023-09-20 16:58:38
1,1,2UWdXP.joke1.rep3.take1.Peekaboo.mp4,2UWdXP,Peekaboo,1,3,1,Slightly funny,No,152,...,10.58563,2023-09-19 11:40:12,..\data\1_interim\\2UWdXP.joke1.rep3.take1.Pee...,2023-09-19 12:31:15,..\data\1_interim\\2UWdXP.joke1.rep3.take1.Pee...,,,,..\data\1_interim\2UWdXP.joke1.rep3.take1.Peek...,2023-09-20 16:58:39
2,2,2UWdXP.joke2.rep1.take1.NomNomNom.mp4,2UWdXP,NomNomNom,2,1,1,Funny,No,95,...,7.174514,2023-09-19 11:40:36,..\data\1_interim\\2UWdXP.joke2.rep1.take1.Nom...,2023-09-19 12:31:16,..\data\1_interim\\2UWdXP.joke2.rep1.take1.Nom...,,,,..\data\1_interim\2UWdXP.joke2.rep1.take1.NomN...,2023-09-20 16:58:40
3,3,2UWdXP.joke2.rep2.take1.NomNomNom.mp4,2UWdXP,NomNomNom,2,2,1,Slightly funny,No,97,...,6.824348,2023-09-19 11:41:00,..\data\1_interim\\2UWdXP.joke2.rep2.take1.Nom...,2023-09-19 12:31:16,..\data\1_interim\\2UWdXP.joke2.rep2.take1.Nom...,,,,..\data\1_interim\2UWdXP.joke2.rep2.take1.NomN...,2023-09-20 16:58:40
4,4,2UWdXP.joke2.rep3.take1.NomNomNom.mp4,2UWdXP,NomNomNom,2,3,1,Slightly funny,No,133,...,9.350991,2023-09-19 11:41:34,..\data\1_interim\\2UWdXP.joke2.rep3.take1.Nom...,2023-09-19 12:31:17,..\data\1_interim\\2UWdXP.joke2.rep3.take1.Nom...,,,,..\data\1_interim\2UWdXP.joke2.rep3.take1.NomN...,2023-09-20 16:58:48


In [21]:
testset = processedvideos["Audio.file"][0:3].tolist()
#just filenames, no path
testset = [x.split("\\")[-1] for x in testset]
print(testset)

['2UWdXP.joke1.rep2.take1.Peekaboo.mp3', '2UWdXP.joke1.rep3.take1.Peekaboo.mp3', '2UWdXP.joke2.rep1.take1.NomNomNom.mp3']


: 

: 

# 2.3 Laughter detection

In [4]:
from laughter-detection import laughter_segmenter


def segment_laughter(wav_filename):
        #results[file_prefix]["laughs"] = segment_laughter(wav_filename)

    return results


SyntaxError: invalid syntax (775607523.py, line 1)