## OpenSMILE tutorial notebook: AUDIO

This notebook provides an introduction to [OpenSMILE](https://audeering.github.io/opensmile/), a tool for detecting signals and features for emotion recognition from speech audio.

Pros:
* Computes low level features such as loudness, voice quality, pitch, etc.
* It is free.
* Features detected from audio using OpenSMILE can be used for applications related to affective computing and other behavioral reserach, etc.

Cons:
* Cannot differentiate between vocal signals if multiple speakers are talking simultaneously.

To learn more about the audio features extracted, please refer to [the book](https://link.springer.com/book/10.1007/978-3-319-27299-3).

## Data preparation

OpenSMILE only takes .wav files as input and analyzes such file to extract acoustic and prosodic features available in OpenSMILE.

In [1]:
# processing libraries
import os 
import pandas as pd
import numpy as np
import csv

### Specify path settings

In [8]:
# get audio files from conversation video files in .wav format 
BASE_PATH = os.getcwd()

# where our input videos are stored
INPUT_VIDEOS = os.path.join(BASE_PATH,'conversation_data/ZR') 

# where our output audio files will be stored
OUTPUT_AUDIOS = os.path.join(BASE_PATH, 'conversation_audios')

#where final csv files containing audio features will be stored
OUTPUT_CSVS = os.path.join(BASE_PATH,'opensmile_csvs')

#create a directory to save the csv files
try:
    os.mkdir(OUTPUT_AUDIOS)
    os.mkdir(OUTPUT_CSVS)
    
             
except:
    pass

### Converting video files to audio files

Here we are converting the video files to .wav audio files.

If you already have audio files in .wav format, you can directly upload to your input audio folder and skip the next step.

In [9]:
input_videos = sorted([x for x in os.listdir(INPUT_VIDEOS) if not x.startswith(".")])
input_videos = [i.split('.')[0] for i in input_videos] 

for file in input_videos:
    !echo y | ffmpeg -i "$INPUT_VIDEOS"/"$file".mp4 -acodec pcm_s16le -ar 16000 -ac 2 "$OUTPUT_AUDIOS"/"$file".wav

ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --e

## Step 1: Install OpenSMILE

In [10]:
#installing opensmile
!pip install opensmile

[0m

## Step 2: run OpenSMILE on your audio file

**Note** Converted audio files involve multiple speakers and many conversation turns. If you are interested in getting audio features 
from each turns of each speakers and compare them, you will need to:
1. Manually annotate speaker change and create separate audio (.wav) files for each.
2. Alternately, you can use speaker diariazation models such as [Pyannote](https://huggingface.co/pyannote/speaker-segmentation). Sperker diarization and segmenttaion is a very imperefect technology, so please use with caution.

### Specify paramaters

1. OpenSMILE offers several feature sets, we use ComParE_2016 developed for compulational paralinguistics. You can see others [here](https://audeering.github.io/opensmile-python/api/opensmile.FeatureSet.html#opensmile.FeatureSet).
2. OpenSMILE offers 3 types of feature. we use low-level descriptors developed for basic usage. You can see others [here](https://audeering.github.io/opensmile-python/api/opensmile.FeatureLevel.html)

In [11]:
#import opensmile
import opensmile

#set feature name and type
feature_set_name = opensmile.FeatureSet.ComParE_2016

feature_level_name=opensmile.FeatureLevel.LowLevelDescriptors

#define csv file name
csv_name = 'opensmile_lowlevelFeatures.csv'

# we want to save file names to keep track of what file has been analyszed
file_names_columns = ['File_name']

## Run OpenSMILE on each file and save results

In [12]:
#define smile 
smile = opensmile.Smile(
        feature_set=feature_set_name,
        feature_level=feature_level_name
)

#feature column names for csv file
feature_names = smile.feature_names


# complete final column names
with open(OUTPUT_CSVS + '/' + csv_name, 'w', newline = '') as f:
    writer = csv.writer(f)
    writer.writerow(feature_names)
    
    wav_files = sorted([x for x in os.listdir(OUTPUT_AUDIOS) if not x.startswith(".")]) 
    
    for file in wav_files:
        
        file_id = [str(file)]
        feature = smile.process_file(OUTPUT_AUDIOS + '/' + file)
        
        #storing the statistical values of the features, we store mean here
        mean_feature = np.mean(feature, axis = 0).tolist()
        
        #file and feature
        id_and_feature = file_id + mean_feature
        
        writer.writerow(id_and_feature)
        print('data written for ', file)
        

data written for  1031_ZT_1_Aff_Video_left.wav
data written for  1031_ZT_1_Aff_Video_right.wav
data written for  1031_ZT_1_Arg_Video_left.wav
data written for  1031_ZT_1_Arg_Video_right.wav
data written for  1031_ZT_1_Coop_Video_left.wav
data written for  1031_ZT_1_Coop_Video_right.wav
data written for  1063_ZR_3_Aff_Video_left.wav
data written for  1063_ZR_3_Aff_Video_right.wav
data written for  1063_ZR_3_Arg_Video_left.wav
data written for  1063_ZR_3_Arg_Video_right.wav
data written for  1063_ZR_3_Coop_Video_left.wav
data written for  1063_ZR_3_Coop_Video_right.wav


### Congratulations! You should have your audio features for each .wav file stored in a single csv in `opensmile_csv` folder.
You can also see it below.

In [13]:
df = pd.read_csv(OUTPUT_CSVS + '/' + csv_name) 
df

Unnamed: 0,F0final_sma,voicingFinalUnclipped_sma,jitterLocal_sma,jitterDDP_sma,shimmerLocal_sma,logHNR_sma,audspec_lengthL1norm_sma,audspecRasta_lengthL1norm_sma,pcm_RMSenergy_sma,pcm_zcr_sma,...,mfcc_sma[5],mfcc_sma[6],mfcc_sma[7],mfcc_sma[8],mfcc_sma[9],mfcc_sma[10],mfcc_sma[11],mfcc_sma[12],mfcc_sma[13],mfcc_sma[14]
1031_ZT_1_Aff_Video_left.wav,113.610283,0.519943,0.015362,0.012714,0.082543,-44.294159,0.385587,2.388262,0.032365,0.177119,...,-7.901967,-13.627111,-8.927453,-1.297027,-7.614359,-6.009823,-8.000062,-3.931658,-4.512427,-6.198941
1031_ZT_1_Aff_Video_right.wav,113.610283,0.519943,0.015362,0.012714,0.082543,-44.294159,0.385587,2.388262,0.032365,0.177119,...,-7.901967,-13.627111,-8.927453,-1.297027,-7.614359,-6.009823,-8.000062,-3.931658,-4.512427,-6.198941
1031_ZT_1_Arg_Video_left.wav,111.107224,0.568702,0.016094,0.013103,0.090077,-38.928047,0.39188,2.340198,0.033977,0.172328,...,-9.621688,-15.773398,-9.746632,-1.306573,-5.943861,-4.645528,-5.546753,-3.26999,-4.561475,-7.247764
1031_ZT_1_Arg_Video_right.wav,111.107224,0.568702,0.016094,0.013103,0.090077,-38.928047,0.39188,2.340198,0.033977,0.172328,...,-9.621688,-15.773398,-9.746632,-1.306573,-5.943861,-4.645528,-5.546753,-3.26999,-4.561475,-7.247764
1031_ZT_1_Coop_Video_left.wav,75.484306,0.348333,0.01082,0.008875,0.056721,-62.720348,0.231993,2.079747,0.019614,0.278737,...,-4.813778,-9.914968,-5.976103,-2.469086,-5.938532,-3.604149,-4.32554,-2.925473,-3.059956,-4.115152
1031_ZT_1_Coop_Video_right.wav,75.484306,0.348333,0.01082,0.008875,0.056721,-62.720348,0.231993,2.079747,0.019614,0.278737,...,-4.813778,-9.914968,-5.976103,-2.469086,-5.938532,-3.604149,-4.32554,-2.925473,-3.059956,-4.115152
1063_ZR_3_Aff_Video_left.wav,79.603432,0.414417,0.015106,0.013148,0.073671,-58.924927,0.306179,2.25087,0.017078,0.192753,...,-4.695473,-12.717813,-16.757681,-4.520012,-7.331759,-2.861531,-4.515655,-6.582701,4.339488,-3.036617
1063_ZR_3_Aff_Video_right.wav,79.603432,0.414417,0.015106,0.013148,0.073671,-58.924927,0.306179,2.25087,0.017078,0.192753,...,-4.695473,-12.717813,-16.757681,-4.520012,-7.331759,-2.861531,-4.515655,-6.582701,4.339488,-3.036617
1063_ZR_3_Arg_Video_left.wav,91.064049,0.455293,0.016655,0.014777,0.081882,-51.521107,0.32111,2.343179,0.019053,0.194824,...,-6.783048,-11.780149,-17.408401,-6.479145,-8.506731,-3.508029,-3.999152,-4.121421,3.260093,-3.071137
1063_ZR_3_Arg_Video_right.wav,91.064049,0.455293,0.016655,0.014777,0.081882,-51.521107,0.32111,2.343179,0.019053,0.194824,...,-6.783048,-11.780149,-17.408401,-6.479145,-8.506731,-3.508029,-3.999152,-4.121421,3.260093,-3.071137


### We can use different analysis tools such as [PyRQA](https://pypi.org/project/PyRQA/) to analyze this time series.