## OpenSMILE tutorial notebook: AUDIO

This notebook provides an introduction to [OpenSMILE](https://audeering.github.io/opensmile/), a tool for detecting signals and features for emotion recognition from speech audio.

Pros:
* Computes low level features such as loudness, voice quality, pitch, etc.
* It is free.
* Keypoints detected from video using OpenPose can be used for applications related to affective computing and other behavioral reserach, etc.

Cons:
* Cannot differentiate between vocal signals if multiple speakers are talking simultaneously.

To learn more about the audio features extracted, please refer to [the book](https://link.springer.com/book/10.1007/978-3-319-27299-3)

## Data preparation

OpenSMILE only takes .wav files as input and analyzes such file to extract acoustic and prosodic features available in OpenSMILE.

In [1]:
# processing libraries
import os 
import pandas as pd
import numpy as np
import csv

### Specify path settings

In [2]:
# get audio files from conversation video files in .wav format 
BASE_PATH = os.getcwd()

# where our input videos are stored
INPUT_VIDEOS = os.path.join(BASE_PATH,'conversation_data/PID1101_ZR') 

# where our output audio files will be stored
OUTPUT_AUDIOS = os.path.join(BASE_PATH, 'conversation_audios')

#where final csv files containing audio features will be stored
OUTPUT_CSVS = os.path.join(BASE_PATH,'audio_csvs')

#create a directory to save the csv files
try:
    os.mkdir(OUTPUT_AUDIOS)
    os.mkdir(OUTPUT_CSVS)
    
             
except:
    pass

### Converting video files to audio files

In [11]:
input_videos = sorted([x for x in os.listdir(INPUT_VIDEOS) if not x.startswith(".")])
input_videos = [i.split('.')[0] for i in input_videos] 

for file in input_videos:
    !echo y | ffmpeg -i "$INPUT_VIDEOS"/"$file".mp4 -acodec pcm_s16le -ar 16000 -ac 2 "$OUTPUT_AUDIOS"/"$file".wav

ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --e

## Step 1: Install OpenSMILE

In [12]:
#installing opensmile
!pip install opensmile

[0m

## Step 2: run OpenSMILE on your audio file

**Note** Converted audio files involve multiple speakers and many conversation turns. If you are interested in getting audio features 
from each turns of each speakers and compare them, you will need to:
1. Manually annotate speaker change and create separate audio (.wav) files for each.
2. Alternately, you can use speaker diariazation models such as [Pyannote](https://huggingface.co/pyannote/speaker-segmentation). Sperker diarization and segmenttaion is a very imperefect technology, so please use with caution.

### Specify paramaters

1. OpenSMILE offers several feature sets, we use ComParE_2016 developed for compulational paralinguistics. You can see others [here](https://audeering.github.io/opensmile-python/api/opensmile.FeatureSet.html#opensmile.FeatureSet).
2. OpenSMILE offers 3 types of feature. we use low-level descriptors developed for basic usage. You can see others [here](https://audeering.github.io/opensmile-python/api/opensmile.FeatureLevel.html)

In [3]:
#import opensmile
import opensmile

#set feature name and type
feature_set_name = opensmile.FeatureSet.ComParE_2016

feature_level_name=opensmile.FeatureLevel.LowLevelDescriptors

#define csv file name
csv_name = 'opensmile_lowlevelFeatures.csv'

# we want to save file names to keep track of what file has been analyszed
file_names_columns = ['File_name']

## Run OpenSMILE on each fiel and save results

In [4]:
#define smile 
smile = opensmile.Smile(
        feature_set=feature_set_name,
        feature_level=feature_level_name
)

#feature column names for csv file
feature_names = smile.feature_names


# complete final column names
with open(OUTPUT_CSVS + '/' + csv_name, 'w', newline = '') as f:
    writer = csv.writer(f)
    writer.writerow(feature_names)
    
    wav_files = sorted([x for x in os.listdir(OUTPUT_AUDIOS) if not x.startswith(".")]) 
    
    for file in wav_files:
        
        file_id = [str(file)]
        feature = smile.process_file(OUTPUT_AUDIOS + '/' + file)
        
        #storing the statistical values of the features, we store mean here
        mean_feature = np.mean(feature, axis = 0).tolist()
        
        #file and feature
        id_and_feature = file_id + mean_feature
        
        writer.writerow(id_and_feature)
        print('data written for ', file)
        

data written for  1101_ZR_1_Aff_Video_left.wav
data written for  1101_ZR_1_Aff_Video_right.wav
data written for  1101_ZR_1_Arg_Video_left.wav
data written for  1101_ZR_1_Arg_Video_right.wav
data written for  1101_ZR_1_Coop_Video_left.wav
data written for  1101_ZR_1_Coop_Video_right.wav


### Congratulations! You should have your audio features for each .wav file stored in a single csv in `output_csv` folder.
You can also see it below.

### We can use different analysis tool such as [PyRQA](https://pypi.org/project/PyRQA/) to analyze this time series.

In [5]:
df = pd.read_csv(OUTPUT_CSVS + '/' + csv_name) 
df

Unnamed: 0,F0final_sma,voicingFinalUnclipped_sma,jitterLocal_sma,jitterDDP_sma,shimmerLocal_sma,logHNR_sma,audspec_lengthL1norm_sma,audspecRasta_lengthL1norm_sma,pcm_RMSenergy_sma,pcm_zcr_sma,...,mfcc_sma[5],mfcc_sma[6],mfcc_sma[7],mfcc_sma[8],mfcc_sma[9],mfcc_sma[10],mfcc_sma[11],mfcc_sma[12],mfcc_sma[13],mfcc_sma[14]
1101_ZR_1_Aff_Video_left.wav,67.573875,0.516951,0.012903,0.010247,0.080977,-50.087257,0.294172,1.873871,0.024732,0.104864,...,-14.224147,-18.252089,-8.890177,-1.768037,3.363763,1.097402,-2.860701,-2.299195,-1.433254,-4.291696
1101_ZR_1_Aff_Video_right.wav,67.573875,0.516951,0.012903,0.010247,0.080977,-50.087257,0.294172,1.873871,0.024732,0.104864,...,-14.224147,-18.252089,-8.890177,-1.768037,3.363763,1.097402,-2.860701,-2.299195,-1.433254,-4.291696
1101_ZR_1_Arg_Video_left.wav,62.69566,0.491056,0.012314,0.009728,0.075418,-53.347172,0.294623,1.841782,0.025335,0.107453,...,-14.447663,-17.851768,-7.809336,-2.176062,2.851089,1.735189,-2.054115,-2.273605,-1.692027,-3.767252
1101_ZR_1_Arg_Video_right.wav,62.69566,0.491056,0.012314,0.009728,0.075418,-53.347172,0.294623,1.841782,0.025335,0.107453,...,-14.447663,-17.851768,-7.809336,-2.176062,2.851089,1.735189,-2.054115,-2.273605,-1.692027,-3.767252
1101_ZR_1_Coop_Video_left.wav,43.808903,0.37604,0.009803,0.008043,0.057201,-65.081177,0.196341,1.728786,0.016421,0.091056,...,-8.565414,-11.341309,-5.938381,-1.141728,4.210069,-0.078988,-0.038832,-2.210269,-0.278161,-1.972976
1101_ZR_1_Coop_Video_right.wav,43.808903,0.37604,0.009803,0.008043,0.057201,-65.081177,0.196341,1.728786,0.016421,0.091056,...,-8.565414,-11.341309,-5.938381,-1.141728,4.210069,-0.078988,-0.038832,-2.210269,-0.278161,-1.972976
