<a href="https://colab.research.google.com/github/DorAzaria/Voice-Emotion-Recognition/blob/main/preprocess.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Data Preprocessing for Sentiment Analysis Recognition**

By Dolev Abuhazira and Dor Azaria.


For this project, we collected several datasets around the network.
Finding those kinds of datasets was challenging because sentiment speech recognition isn’t popular research.
Every dataset has a different representation for the same class labels.
After a long search on Huggingface and Kaggle, we’ve found the following:

|Dataset | Files | Speech Emotions |
| -- | -- | -- |
| RAVDESS |  60 trials per actor x 24 actors = 1440 files. |calm, happy, sad, angry, fearful, surprised, and disgusted. |
| TESS | 200 trials per emotion x7 per gender x2 = 2800 files. |  anger, disgust, fear, happiness, pleasant surprise, sadness, and neutral|
| URDU | 100 trials per emotion x4 = 400 files. | Angry, Happy, Neutral, and Sad. |

In [15]:
from google.colab import drive
drive.mount('/content/data/')

Drive already mounted at /content/data/; to attempt to forcibly remount, call drive.mount("/content/data/", force_remount=True).


**Imports**

In [16]:
import numpy as np
import pandas as pd
import os
import random
import torch
import pickle
import torchaudio

**Pandas DataFrame**

create a dataframe 'data' - contains the following pattern:

| Emotion | Path |
| --- | --- |
| 0 | '/path/to/wavfile0.wav' |
| 1 | '/path/to/wavfile1.wav' |
| 2 | '/path/to/wavfile2.wav' |
| .. | ... |

In [31]:
data = pd.DataFrame(columns=['Emotion', 'Path'])

#### ***Class Distribution***

For better accuracy results, we decided to focus on 3 different class labels.
Each class is a combination of common emotions.

```distributeEmotion``` - For each audio file, we classified its label to one of the three main classes (Positive, Neutral, Negative) and added it into the Panda’s Dataframe by the following form: (Label, File_path).

1. **Positive** - a mixture of Happy and Surprise.
2. **Neutral** - a mixture of Neutral and Calm.
3. **Negative** - a mixture of Anger, Fear, Sad, and Disgust.

Each dataset represents different names for the same emotion, for example "ang" or "anger" or "a" represents the same sentiment - Anger.

In [26]:
POSITIVE = 0
NEUTRAL = 1
NEGATIVE = 2

def distributeEmotion(emotion):

    if isinstance(emotion, str):
      emotion = emotion.lower()

    if emotion in {'ang', 'dis', 'fea', 'sad','angry' , 'anger', 'disgust', 'fear', 'fearful', 'sad', 'sadness', 4, 5, 6, 7, 'negative', 's', 'a', 'f'}:
      return NEGATIVE

    if emotion in {'neu','neutral', 'calm', 1, 2, 'n'}:
      return NEUTRAL

    if emotion in {'hap', 'happy', 'hapiness', 'ps', 'surprised', 'excited', 'encouraging', 3, 8, 'positive', 'h', 'w'}:
      return POSITIVE

    return -1
        

#### ***IMPORT DATASETS AUDIO PATH***


In [28]:
datasets_path = ['/content/data/MyDrive/dl/ravdess', '/content/data/MyDrive/dl/tess', '/content/data/MyDrive/dl/urdu']
emotion = -1

For each dataset (RAVDESS / TESS / URDU) we import the audio file path and attach its appropriate label.

In [32]:
for ds_path in datasets_path:
  for dirname, _, filenames in os.walk(ds_path):
      for filename in filenames:
          file_path = os.path.join('\\', dirname, filename)
          
          if ds_path == '/content/data/MyDrive/dl/ravdess':
              identifiers = filename.split('.')[0].split('-')
              emotion = distributeEmotion(int(identifiers[2]))

          if ds_path == '/content/data/MyDrive/dl/tess':
              identifiers = filename.split('.')[0].split('_')
              emotion = distributeEmotion(identifiers[2])

          if ds_path == '/content/data/MyDrive/dl/urdu':
              identifiers = filename.split('.')[0].split('_')
              emotion = distributeEmotion(dirname[10:])

          if emotion != -1:
              data = data.append( {"Emotion": emotion, "Path": file_path } , ignore_index=True)

#### ***IMPORT SUMMARY***

*   TOTAL - 4,240
*   0) POSITIVE - 1184
*   1) NEUTRAL - 688
*   2) NEGATIVE - 2368





In [33]:
data['Emotion'].value_counts()

2    2368
0    1184
1     688
Name: Emotion, dtype: int64

## ***SAMPLE & NORMALIZATION***
---

* Using ``manual_seed(0)`` to control sources of randomness that can cause multiple executions of your application to behave differently. And also so that multiple calls to those operations, given the same inputs, will produce the same result.

* A ``torch.device`` is an object representing the device on which a torch.Tensor is or will be allocated.

* ``WAV2VEC2_ASR_BASE_960H`` to access the model with pretrained weights, and information/helper functions associated the pretrained weights.

In [None]:
torch.random.manual_seed(0)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
bundle = torchaudio.pipelines.WAV2VEC2_ASR_BASE_960H
model = bundle.get_model().to(device)
SAMPLE_RATE = 16000

In [9]:
def normalize_features(features):
    for i in range(len(features[0])):
        mlist = features[0][i]
        features[0][i] = 2 * (mlist - np.max(mlist)) / (np.max(mlist) - np.min(mlist)) + 1

In [10]:
def resample_padding(path):
    signal = np.zeros((int(SAMPLE_RATE*3 ,)))
  
    waveform, sampling_rate = torchaudio.load(filepath=path, num_frames=SAMPLE_RATE * 3)

    waveform = waveform.to(device)
    waveform = waveform.detach().cpu().numpy()[0]

    if len(waveform) >= 32000 and len(waveform) <= 48000:
        signal[:len(waveform)] = waveform

        if sampling_rate < 48000: # if there is more to fill
          rest = len(signal) - len(waveform) # get the "rest length"
          filled_list = signal[:len(waveform)] # we don't want to choose zero values, so this list contains non-zero values only.
          signal[len(waveform):] = random.choices(filled_list, k=rest) # choose k values from the filled_list
          
        signal_final = np.array([np.array(signal)])
        signal_final = torch.from_numpy(signal_final).to(device)
        signal_final = signal_final.type(torch.cuda.FloatTensor).to(device)

        return signal_final

    return -1

# **SAMPLE DATA**
---
EACH SAMPLE SHAPE IS (1, 149, 32)



In [12]:
signals = []

total_data = len(data)
with torch.inference_mode():
    for i, file_path in enumerate(data.Path):
        tor = resample_padding(file_path)

        if isinstance(tor, torch.Tensor):
            emission, _ = model(tor)
            features = emission.detach().cpu().numpy()
            normalize_features(features)
            row = (file_path, features, data.iloc[i]['Emotion'])
            signals.append(row)

        percent = (len(signals) / total_data) * 100
        print("\r Processed {}/{} files. ({}%) ".format(len(signals), total_data, int(percent)), end='')


 Processed 4235/4240 files. (99%) 

# **SAVE DATA**

---



In [14]:
counter = [0, 0, 0]

for tup in signals:
  counter[tup[2]] += 1

print(counter)

[1184, 688, 2363]


In [None]:
file_pth = open('/content/dataset444.pth', 'wb')
pickle.dump(signals, file_pth)