# Emotion Detection

<div class="alert alert-info">

This tutorial is available as an IPython notebook at [malaya-speech/example/emotion](https://github.com/huseinzol05/malaya-speech/tree/master/example/emotion).
    
</div>

<div class="alert alert-info">

This module is language independent, so it save to use on different languages.
    
</div>

<div class="alert alert-warning">

This is an application of malaya-speech Pipeline, read more about malaya-speech Pipeline at [malaya-speech/example/pipeline](https://github.com/huseinzol05/malaya-speech/tree/master/example/pipeline).
    
</div>

### Dataset

Trained on Toronto emotional speech set (TESS) with augmented noises, https://tspace.library.utoronto.ca/handle/1807/24487

In [None]:
!pip install malaya_speech
!pip install networkx
!pip install subprocess
!pip install graphviz


In [None]:
import subprocess
import os
import malaya_speech
import numpy as np
from malaya_speech import Pipeline
import IPython.display as ipd

os.environ["CUDA_VISIBLE_DEVICES"] = "-1" # to stop tensorflow form loading cuda

In [None]:
command = "ffmpeg -y -i ./Center.mp4 -ab 160k -ac 2 -ar 44100 -vn audio.wav"
subprocess.call(command)

In [None]:
command = "ffmpeg -y -ss 174 -i ./audio.wav -t 00:00:9 trimed.wav"
subprocess.call(command)

In [None]:
y, sr = malaya_speech.load('./trimed.wav')
len(y), sr

In [None]:
# just going to take 30 seconds
#y = y[:sr* 30]

This audio extracted from https://www.youtube.com/watch?v=HylaY5e1awo&t=2s

### Supported emotions

In [None]:
malaya_speech.emotion.labels

### List available deep model

In [None]:
malaya_speech.emotion.available_model()

# Note
 You will have an error loading the deep-speaker model so you have to download it manualy
 
Download the model:

* [deep-speaker](https://f000.backblazeb2.com/file/malaya-speech-model/emotion/deep-speaker/model.pb)
  to:
  `<home>\Malaya-Speech\emotion\deep-speaker`
  


# Models

In [None]:
vggvox_v2 = malaya_speech.emotion.deep_model(model = 'vggvox-v2')
deep_speaker = malaya_speech.emotion.deep_model(model ="deep-speaker",validate = False)
quantized_vggvox_v2 = malaya_speech.emotion.deep_model(model = 'vggvox-v2', quantized = True)

### How to classify emotions in an audio sample

So we are going to use VAD to help us. Instead we are going to classify as a whole sample, we chunk it into multiple small samples and classify it.

In [None]:
vad = malaya_speech.vad.deep_model(model = 'vggvox-v2')

In [None]:
%%time
frames = list(malaya_speech.utils.generator.frames(y, 30, sr))

In [None]:
p = Pipeline()
pipeline = (
    p.batching(5)
    .foreach_map(vad.predict)
    .flatten()
)

In [None]:
%%time
result = p.emit(frames)
result.keys()

In [None]:
frames_vad = [(frame, result['flatten'][no]) for no, frame in enumerate(frames)]
grouped_vad = malaya_speech.utils.group.group_frames(frames_vad)
grouped_vad = malaya_speech.utils.group.group_frames_threshold(grouped_vad, threshold_to_stop = 0.3)

In [None]:
p_vggvox_v2 = Pipeline()
pipeline = (
    p_vggvox_v2.foreach_map(vggvox_v2)
    .flatten()
)

In [None]:
p_deep_speaker = Pipeline()
pipeline = (
    p_deep_speaker.foreach_map(deep_speaker)
    .flatten()
)

In [None]:
p_quantized_vggvox_v2 = Pipeline()
pipeline = (
    p_quantized_vggvox_v2.foreach_map(quantized_vggvox_v2)
    .flatten()
)

In [None]:
%%time
samples_vad = [g[0] for g in grouped_vad]
result_vggvox_v2 = p_vggvox_v2.emit(samples_vad)
result_vggvox_v2.keys()

In [None]:
%%time
samples_vad = [g[0] for g in grouped_vad]
result_deep_speaker = p_deep_speaker.emit(samples_vad)
result_deep_speaker.keys()

In [None]:
%%time
samples_vad = [g[0] for g in grouped_vad]
result_quantized_vggvox_v2 = p_quantized_vggvox_v2.emit(samples_vad)
result_quantized_vggvox_v2.keys()

In [None]:
samples_vad_vggvox_v2 = [(frame, result_vggvox_v2['flatten'][no]) for no, frame in enumerate(samples_vad)]
samples_vad_vggvox_v2

In [None]:
samples_vad_deep_speaker = [(frame, result_deep_speaker['flatten'][no]) for no, frame in enumerate(samples_vad)]
samples_vad_deep_speaker

In [None]:
samples_vad_quantized_vggvox_v2 = [(frame, result_quantized_vggvox_v2['flatten'][no]) for no, frame in enumerate(samples_vad)]
samples_vad_quantized_vggvox_v2

### Reference

1. Toronto emotional speech set (TESS), https://tspace.library.utoronto.ca/handle/1807/24487
2. The Singaporean White Boy - The Shan and Rozz Show: EP7, https://www.youtube.com/watch?v=HylaY5e1awo&t=2s&ab_channel=Clicknetwork