## Whisper tutorial notebook: Transcript

This notebook provides an introduction to [OpenSMILE](https://audeering.github.io/opensmile/), a tool for detecting signals and features for emotion recognition from speech audio.

Pros:
* Computes low level features such as loudness, voice quality, pitch, etc.
* It is free.
* Keypoints detected from video using OpenPose can be used for applications related to affective computing and other behavioral reserach, etc.

Cons:
* Cannot differentiate between vocal signals if multiple speakers are talking simultaneously.

To learn more about the audio features extracted, please refer to [the book](https://link.springer.com/book/10.1007/978-3-319-27299-3).

## Step 1: Install packages and libraries

In [9]:
# processing libraries
import os 
import pandas as pd
import numpy as np
import csv

If you have followed **tutorial2 on opensmile**, we walked through there how to get audio file (.wav) from
your .mp4 video recordings. You can skip this part if you have already converted your video to audio files.

### Specify path settings

In [18]:
# get audio files from conversation video files in .wav format 
BASE_PATH = os.getcwd()

# where our input videos are stored
INPUT_VIDEOS = os.path.join(BASE_PATH,'conversation_data/PID1101_ZR') 

# where our output audio files will be stored
OUTPUT_AUDIOS = os.path.join(BASE_PATH, 'conversation_audios')

#where final csv files containing audio features will be stored
OUTPUT_CSVS = os.path.join(BASE_PATH,'transcripts')

#create a directory to save the csv files
try:
    #os.mkdir(OUTPUT_AUDIOS) #uncomment this if you skipped tutorial2
    os.mkdir(OUTPUT_CSVS)
    
             
except:
    pass

### Convert video files to audio

In [3]:
input_videos = sorted([x for x in os.listdir(INPUT_VIDEOS) if not x.startswith(".")])
input_videos = [i.split('.')[0] for i in input_videos] 

for file in input_videos:
    !echo y | ffmpeg -i "$INPUT_VIDEOS"/"$file".mp4 -acodec pcm_s16le -ar 16000 -ac 2 "$OUTPUT_AUDIOS"/"$file".wav

ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --e

## Step 2: Install Whisper

**Note** You may need to run `!pip install numpy==1.23.4` to resolve software version mismatch

In [4]:
#installing whisper from its github repository
!pip install git+https://github.com/openai/whisper.git 

Collecting git+https://github.com/openai/whisper.git
  Cloning https://github.com/openai/whisper.git to /tmp/pip-req-build-ccig7ded
  Running command git clone --filter=blob:none --quiet https://github.com/openai/whisper.git /tmp/pip-req-build-ccig7ded
  Resolved https://github.com/openai/whisper.git to commit b91c907694f96a3fb9da03d4bbdc83fbcd3a40a4
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
[0m

### Import additional libraries

Whisper supports many different langauge including English which can be found [here](https://github.com/openai/whisper)

In [13]:
#import libraries

import torch
import whisper

# define parameters for whisper

#this ensures gpu-based parallel processing if available, if not uses cpu
torch.cuda.is_available()
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

#this chooses the model language to english
model = whisper.load_model("base.en", device=DEVICE)
print(
    f"Model is {'multilingual' if model.is_multilingual else 'English-only'} "
    f"and has {sum(np.prod(p.shape) for p in model.parameters()):,} parameters.")

Model is English-only and has 71,825,408 parameters.


## Step 3: Run Whisper on .wav file to get transcription

In [21]:
wav_files = sorted([x for x in os.listdir(OUTPUT_AUDIOS) if not x.startswith(".")]) 

for file in wav_files:
    with open(OUTPUT_CSVS + '/' + file.split('.')[0] + '.txt', mode="wt") as f:
        text = model.transcribe(OUTPUT_AUDIOS + '/' + file)
        print('Transcription for ', file)
        print('=========================')
        print(text['text'])
    
        #saving the transcription
        f.write(text['text'])
        
        #closing the current file
        f.close()
        
        
    

Transcription for  1101_ZR_1_Aff_Video_left.wav
 So, like, what's your favorite book or like movie? Um, so I don't, I haven't read an actual story or like a book book in a while. Um, but recently I've read like the alchemist. I don't know if you've heard of it. Yeah. Yeah. Uh, what do you think of it? I haven't finished it yet. I'm still like in the middle of, but. I mean, I think I read it a few years ago. So, like, I don't remember every detail, but I remember that like it was philosophical. And like, I, I'm like found some coast that like resonated. Um, I remember that I took some notes from that book, like from a, like, from a not from my personal book. I mean, um, so like life lessons kind of thing. Okay. That's cool. That's dope. Um, yeah. Um, it seems very philosophical. I mean, I've, I got it from old high school teacher. And he said that in the mail. So it was very nice to see that. Um, but yeah, I'm not. Don't really read as many books. I do watch a lot of movies and TV shows

### Congratulations! you now have your transcripts stored in `transcripts` folder.

### You can use analysis tools such as [Align](https://github.com/nickduran/align-linguistic-alignment/tree/master) to perform linguistic analysis using the generated transcripts.

**Note** We would recommend doing manual checking on these transcripts to correct word errors.