General Instructions
* This notebook requires some manuel intervention to download the TESS dataset, kindly execute each cell one by one by reading the instructions


# Solution Design - Downloading Datasets and Combining Them

For this task, the dataset is built using 5252 samples from:

> the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset (https://zenodo.org/record/1188976)

> the Toronto emotional speech set (TESS) dataset (https://www.kaggle.com/datasets/ejlok1/toronto-emotional-speech-set-tess)

Both of the dataset can be downloaded from the above links free of cost.

In [None]:
# Creating DATASET directory
!mkdir DATASET

### Download RAVDESS dataset

In [None]:
# Downloading RAVDESS dataset
!wget https://zenodo.org/record/1188976/files/Audio_Song_Actors_01-24.zip
!wget https://zenodo.org/record/1188976/files/Audio_Speech_Actors_01-24.zip

#Extracting RAVDESS dataset and adding it to DATASET directory
!unzip Audio_Song_Actors_01-24.zip -d DATASET
!unzip Audio_Speech_Actors_01-24.zip -d DATASET

### IMPORTANT NOTE: Manuel Download of TESS dataset
* TESS dataset does not have a direct download link due to licensing issues
* Kindly visit the link (https://borealisdata.ca/dataset.xhtml?persistentId=doi%3A10.5683%2FSP2%2FE8H2MF)
* Click __"Access Dataset"__, then on __"Download ZIP (268.3 MB)"__ and click the __"Accept"__ button to accept the license and download the  the dataset to your computer
* The download should now be started and a compressed file called __dataverse_files.zip__ should be downloaded to your computer
* Upload the file __dataverse_files.zip__ to Intel Developer Cloud
* Run the below cell to prepare the dataset for TESS pipeline

In [None]:
# Creating TESS_Toronto_emotional_speech_set_data directory
!mkdir TESS_Toronto_emotional_speech_set_data

# Creating the Actor_26 and Actor_28 directories for TESS pipeline
!mkdir DATASET/Actor_26
!mkdir DATASET/Actor_28

# Extracting TESS dataset and adding it to the TESS_Toronto_emotional_speech_set_data directory
!unzip dataverse_files.zip -d TESS_Toronto_emotional_speech_set_data

### TESS Pipeline

The RAVDESS dataset uses a unique filename. The filename consists of a 7-part numerical identifier (e.g., 02-01-06-01-02-01-12.mp4). These identifiers define the stimulus characteristics:

Filename identifiers
* Modality (01 = full-AV, 02 = video-only, 03 = audio-only).
* Vocal channel (01 = speech, 02 = song).
* Emotion (01 = neutral, 02 = calm, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised).
* Emotional intensity (01 = normal, 02 = strong). NOTE: There is no strong intensity for the ‘neutral’ emotion.
* Statement (01 = “Kids are talking by the door”, 02 = “Dogs are sitting by the door”).
* Repetition (01 = 1st repetition, 02 = 2nd repetition).
* Actor (01 to 24. Odd numbered actors are male, even numbered actors are female).

Filename example: 02-01-06-01-02-01-12.mp4
* Video-only (02)
* Speech (01)
* Fearful (06)
* Normal intensity (01)
* Statement “dogs” (02)
* 1st Repetition (01)
* 12th Actor (12)
* Female, as the actor ID number is even.


The TESS dataset uses a directory heirarchy strucutre with 2 actors named OAF and YAF followed by the emotion conveyed by the actor as shown below:
TESS_Toronto_emotional_speech_set_data
> --OAF_angry

> --OAF_disgust

> ..

> YAF_angry

> YAF_disgust

> ..

To facilitate the feature creation, the TESS data have been renamed using the same naming convention adopted
by the RAVDESS dataset. In case of TESS files, we are assigning values other than the ones specified below as those are not used by the model, hence we are assigning random integers.
- 03 (Random)
- 01 (Random)
- 01 (This varies based on emotion as per the RAVDESS dataset as 01 = neutral, 03 = happy, 04 = sad, 05 = angry, 06 = fearful, 07 = disgust, 08 = surprised. The 02 = calm is not present since TESS dataset does not contain calm class).
- 01 (Random)
- 03 (Random)
- 01 (Random)
- 01 (Random)

The below cell performs the combining of RAVDESS and TESS dataset into the DATASET directory

In [2]:
# TESS Pipeline - to combine RAVDESS and TESS dataset. This can take some time.

import os
import shutil
import random

TRAINING_FILES_PATH = './DATASET/' #Path of DATASET folder containing RAVDESS dataset
TESS_ORIGINAL_FOLDER_PATH = './TESS_Toronto_emotional_speech_set_data/' #Path of TESS dataset

class TESSPipeline:
    @staticmethod
    def create_tess_folders(path):
        label_conversion = {'01': 'neutral',
                            '03': 'happy',
                            '04': 'sad',
                            '05': 'angry',
                            '06': 'fear',
                            '07': 'disgust',
                            '08': 'ps'}

        for subdir, dirs, files in os.walk(path):
            for filename in files:
                if filename.startswith('OAF'):
                    destination_path = TRAINING_FILES_PATH + 'Actor_28/'
                    old_file_path = os.path.join(os.path.abspath(subdir), filename)
                    actor = '28'
                    base, extension = os.path.splitext(filename)

                    for key, value in label_conversion.items():
                        if base.endswith(value):
                            random_list = random.sample(range(10, 99), 6)
                            file_name = '-'.join([str(i) for i in random_list])
                            file_name_with_correct_emotion = file_name[:6] + key + '-' + file_name[9:] + '-26' + extension
                            new_file_path = destination_path + file_name_with_correct_emotion
                            shutil.copy(old_file_path, new_file_path)

                else:
                    destination_path = TRAINING_FILES_PATH + 'Actor_26/'
                    old_file_path = os.path.join(os.path.abspath(subdir), filename)
                    actor = '26'
                    base, extension = os.path.splitext(filename)

                    for key, value in label_conversion.items():
                        if base.endswith(value):
                            random_list = random.sample(range(10, 99), 6)
                            file_name = '-'.join([str(i) for i in random_list])
                            file_name_with_correct_emotion = (file_name[:6] + key + '-' + file_name[9:] + '-25' + extension).strip()
                            new_file_path = destination_path + file_name_with_correct_emotion
                            shutil.copy(old_file_path, new_file_path)
print('TESS Pipeline Started')
TESSPipeline.create_tess_folders(TESS_ORIGINAL_FOLDER_PATH)
print('TESS Pipeline Completed')

TESS Pipeline Started
TESS Pipeline Completed


In [21]:
# Removing the RAVDESS file structure heirarchy and directories
!find DATASET/ -name \*.wav -exec cp {} DATASET/ \;
! rm -rf DATASET/Actor_*

In [23]:
# Confirming if the pipeline was successful
import subprocess
count = subprocess.check_output('find DATASET/ -name "*.wav" | wc -l', shell=True)
if count==b'5252\n':
    print("Combining RAVDESS and TESS dataset SUCCESSFUL")
else:
    print("Combining RAVDESS and TESS dataset FAILED")

Combining RAVDESS and TESS dataset SUCCESSFUL


#### END OF NOTEBOOK

#### _Citations_

###### RAVDESS
* @article{10.1371/journal.pone.0196391,
    doi = {10.1371/journal.pone.0196391},
    author = {Livingstone, Steven R. AND Russo, Frank A.},
    journal = {PLOS ONE},
    publisher = {Public Library of Science},
    title = {The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English},
    year = {2018},
    month = {05},
    volume = {13},
    url = {https://doi.org/10.1371/journal.pone.0196391},
}

###### TESS
* @data{SP2/E8H2MF_2020,
author = {Pichora-Fuller, M. Kathleen and Dupuis, Kate},
publisher = {Borealis},
title = {{Toronto emotional speech set (TESS)}},
year = {2020},
version = {DRAFT VERSION},
doi = {10.5683/SP2/E8H2MF},
url = {https://doi.org/10.5683/SP2/E8H2MF}
}

###### TESS Pipeline
* https://github.com/marcogdepinto/emotion-classification-from-audio-files/blob/master/tess_pipeline.py