# Speech Emotion Recognition
이 jupyter notebook에서는 사용자의 음성을 통한(특히 콘서트 상황에서의) 감정 인식을 진행합니다.  
참고한 논문은 다음과 같습니다.

- CHATTERJEE, Rajdeep, et al. Real-time speech emotion analysis for smart home assistants. IEEE Transactions on Consumer Electronics, 2021, 67.1: 68-76.

In [145]:
import os
import glob
import numpy as np     
import pandas as pd
import matplotlib.pyplot as plt 
import tensorflow as tf
from tensorflow.keras.utils import to_categorical


# Live Audio
import librosa                                      
import librosa.display                         
from scipy.fftpack import fft                  
from scipy.io.wavfile import write
from sklearn.decomposition import FastICA
from sklearn.preprocessing import normalize, scale, LabelEncoder


In [17]:
# If you use local machine, run this cell
file_root = "/Users/sebinlee/Desktop/Github/SERPractice"

In [30]:
TRAIN_DATA = os.path.join(file_root, "audio-dataset")
TRAIN_DATA_OUTPUT = os.path.join(file_root,"audio-dataset/output")

## Preprocessing


In [157]:
max_pad_len = 174

def pre_processing(audio_file_path) :
    # Load Audio File
    audio_timeseries, sampling_rate = librosa.load(audio_file_path)

    # Silence Removal
    audio_timeseries, _ = librosa.effects.trim(audio_timeseries)

    # Declare variable  
    n_fft = 2048
    hop_length = 512

    # Get mfcc
    mfcc = librosa.feature.mfcc(audio_timeseries, sr=sampling_rate, hop_length=hop_length, n_fft=n_fft, dct_type=3, n_mfcc=13)
    # librosa.display.specshow(mfcc, sr=sampling_rate, x_axis='time')

    # Make Padding to match length
    pad_width = max_pad_len - mfcc.shape[1]
    mfcc = np.pad(mfcc, pad_width = ((0,0),(0,pad_width)), mode="constant")

    return mfcc

In [158]:
valence_dict = {
    "achievement" : 1,
    "pleasure" : 2,
    "surprise" : 3,
    "anger" : -1,
    "fear" : -2,
    "pain" : -3,
}

arousal_dict = {
    "low" : 1,
    "moderate" : 2,
    "strong" : 3,
    "peak" : 4
}

def vivae_fetch_label(file_path) :
    file_name = file_path[len(TRAIN_DATA) + 1 :]
    splited = file_name.split("_")
    return [valence_dict[splited[1]], arousal_dict[splited[2]]]


In [159]:
train_file_list = glob.glob(os.path.join(TRAIN_DATA, "*"))
features = []

# Append [mfcc, valence, arousal]
for file_path in train_file_list :
    valence_arousal = vivae_fetch_label(file_path)
    features.append([pre_processing(file_path), valence_arousal[0], valence_arousal[1]])

# Convert features variable to Pandas DataFrame
featuresDF = pd.DataFrame(features, columns=['features', 'valence', 'arousal'])

## Training

In [149]:
model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv1D(128,5,padding='causal',activation='relu', input_shape=(13,max_pad_len)))
model.add(tf.keras.layers.Conv1D(128,5,padding='causal',activation='relu'))
model.add(tf.keras.layers.MaxPooling1D(pool_size=8))
model.add(tf.keras.layers.Conv1D(128,5,padding='causal',activation='relu'))
model.add(tf.keras.layers.Conv1D(128,5,padding='causal',activation='relu'))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(10, activation="relu"))
model.add(tf.keras.layers.Dense(10, activation="softmax"))

model.compile(loss=tf.losses.MeanAbsoluteError(), optimizer='adam', metrics=['accuracy'])   

In [160]:
le = LabelEncoder()
features_list = np.array(featuresDF.features.tolist())
valence_list = np.array(featuresDF.valence.tolist())

train_data = features_list
train_result = tf.keras.utils.to_categorical(le.fit_transform(valence_list))

print(train_data.shape)
print(train_result.shape)


(1085, 13, 174)
(1085, 6)


In [161]:
model.fit(train_data, train_result, epochs=10)

Epoch 1/10


ValueError: in user code:

    File "/Users/sebinlee/miniforge3/envs/tf38/lib/python3.8/site-packages/keras/engine/training.py", line 1021, in train_function  *
        return step_function(self, iterator)
    File "/Users/sebinlee/miniforge3/envs/tf38/lib/python3.8/site-packages/keras/engine/training.py", line 1010, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/Users/sebinlee/miniforge3/envs/tf38/lib/python3.8/site-packages/keras/engine/training.py", line 1000, in run_step  **
        outputs = model.train_step(data)
    File "/Users/sebinlee/miniforge3/envs/tf38/lib/python3.8/site-packages/keras/engine/training.py", line 860, in train_step
        loss = self.compute_loss(x, y, y_pred, sample_weight)
    File "/Users/sebinlee/miniforge3/envs/tf38/lib/python3.8/site-packages/keras/engine/training.py", line 918, in compute_loss
        return self.compiled_loss(
    File "/Users/sebinlee/miniforge3/envs/tf38/lib/python3.8/site-packages/keras/engine/compile_utils.py", line 201, in __call__
        loss_value = loss_obj(y_t, y_p, sample_weight=sw)
    File "/Users/sebinlee/miniforge3/envs/tf38/lib/python3.8/site-packages/keras/losses.py", line 141, in __call__
        losses = call_fn(y_true, y_pred)
    File "/Users/sebinlee/miniforge3/envs/tf38/lib/python3.8/site-packages/keras/losses.py", line 245, in call  **
        return ag_fn(y_true, y_pred, **self._fn_kwargs)
    File "/Users/sebinlee/miniforge3/envs/tf38/lib/python3.8/site-packages/keras/losses.py", line 1457, in mean_absolute_error
        return backend.mean(tf.abs(y_pred - y_true), axis=-1)

    ValueError: Dimensions must be equal, but are 10 and 6 for '{{node mean_absolute_error/sub}} = Sub[T=DT_FLOAT](sequential_32/dense_41/Softmax, IteratorGetNext:1)' with input shapes: [?,10], [?,6].
