<a href="https://colab.research.google.com/github/arunkumar120/speech_emotion_recognition_project/blob/main/Speech_Emotion_Recognition.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [3]:
import kagglehub
import os
import librosa
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report, accuracy_score
from tqdm import tqdm

Step 1: Download the dataset using kagglehub

In [4]:
print("Downloading dataset...")
path = kagglehub.dataset_download("ejlok1/toronto-emotional-speech-set-tess")
print("Path to dataset files:", path)

Downloading dataset...
Path to dataset files: /root/.cache/kagglehub/datasets/ejlok1/toronto-emotional-speech-set-tess/versions/1


Step 2: Initialize variables for storing features and labels

In [5]:
audio_data = []
labels = []

Step 3: Traverse the dataset and extract features

In [6]:
print("Processing dataset and extracting features...")
for root, dirs, files in os.walk(path):
    for file in tqdm(files, desc="Processing files"):
        if file.endswith(".wav"):
            file_path = os.path.join(root, file)
            try:
                # Load audio file
                y, sr = librosa.load(file_path, sr=None)

                # Extract MFCC features
                mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)
                mfccs_mean = np.mean(mfccs.T, axis=0)  # Calculate mean for fixed-length features

                # Append features and corresponding label
                audio_data.append(mfccs_mean)
                labels.append(os.path.basename(root))  # Use folder name as label
            except Exception as e:
                print(f"Error processing file {file_path}: {e}")

Processing dataset and extracting features...


Processing files: 0it [00:00, ?it/s]
Processing files: 0it [00:00, ?it/s]
Processing files: 100%|██████████| 200/200 [00:03<00:00, 54.68it/s]
Processing files: 100%|██████████| 200/200 [00:02<00:00, 80.64it/s]
Processing files: 100%|██████████| 200/200 [00:02<00:00, 75.09it/s]
Processing files: 100%|██████████| 200/200 [00:04<00:00, 49.60it/s]
Processing files: 100%|██████████| 200/200 [00:02<00:00, 75.50it/s]
Processing files: 100%|██████████| 200/200 [00:02<00:00, 82.07it/s]
Processing files: 100%|██████████| 200/200 [00:02<00:00, 82.37it/s]
Processing files: 100%|██████████| 200/200 [00:02<00:00, 76.80it/s]
Processing files: 100%|██████████| 200/200 [00:04<00:00, 43.36it/s]
Processing files: 100%|██████████| 200/200 [00:02<00:00, 67.42it/s]
Processing files: 100%|██████████| 200/200 [00:03<00:00, 61.15it/s]
Processing files: 100%|██████████| 200/200 [00:02<00:00, 72.18it/s]
Processing files: 100%|██████████| 200/200 [00:04<00:00, 49.64it/s]
Processing files: 100%|██████████| 200/200

Step 4: Verify data collection

In [7]:
print(f"Number of audio samples processed: {len(audio_data)}")
print(f"Number of labels collected: {len(labels)}")

if len(audio_data) == 0:
    raise ValueError("No audio data was processed. Check the dataset structure and file formats.")

Number of audio samples processed: 5600
Number of labels collected: 5600


Step 5: Convert data to NumPy arrays

In [8]:
X = np.array(audio_data)
y = np.array(labels)

Step 6: Encode labels using LabelEncoder

In [9]:
encoder = LabelEncoder()
y_encoded = encoder.fit_transform(y)
print("Classes detected:", encoder.classes_)

Classes detected: ['OAF_Fear' 'OAF_Pleasant_surprise' 'OAF_Sad' 'OAF_angry' 'OAF_disgust'
 'OAF_happy' 'OAF_neutral' 'YAF_angry' 'YAF_disgust' 'YAF_fear'
 'YAF_happy' 'YAF_neutral' 'YAF_pleasant_surprised' 'YAF_sad']


Step 7: Split data into training and testing sets

In [10]:
X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.2, random_state=42)
print(f"Training samples: {len(X_train)}, Testing samples: {len(X_test)}")

Training samples: 4480, Testing samples: 1120


Step 8: Train a Support Vector Machine (SVM) classifier

In [11]:
print("Training the SVM classifier...")
svm = SVC(kernel='linear', C=1.0, random_state=42)
svm.fit(X_train, y_train)

Training the SVM classifier...


Step 9: Evaluate the model

In [12]:
print("Evaluating the model...")
y_pred = svm.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")
print("Classification Report:")
print(classification_report(y_test, y_pred, target_names=encoder.classes_))


Evaluating the model...
Accuracy: 97.41%
Classification Report:
                        precision    recall  f1-score   support

              OAF_Fear       0.99      0.93      0.95        80
 OAF_Pleasant_surprise       0.94      0.94      0.94        70
               OAF_Sad       0.97      0.96      0.97        77
             OAF_angry       1.00      1.00      1.00        73
           OAF_disgust       0.98      0.98      0.98        87
             OAF_happy       0.91      0.97      0.94        87
           OAF_neutral       0.97      0.98      0.97        89
             YAF_angry       0.93      0.97      0.95        70
           YAF_disgust       1.00      1.00      1.00        78
              YAF_fear       0.98      0.94      0.96        88
             YAF_happy       0.97      1.00      0.99        70
           YAF_neutral       1.00      1.00      1.00        72
YAF_pleasant_surprised       1.00      0.98      0.99        83
               YAF_sad       1.00      

Step 10: Save the trained model for future use

In [13]:
import pickle
model_path = "emotion_svm_model.pkl"
with open(model_path, "wb") as model_file:
    pickle.dump(svm, model_file)
print(f"Model saved as {model_path}")

Model saved as emotion_svm_model.pkl


Step 11: Example Prediction (Optional)

In [14]:
print("Testing with a sample prediction...")
sample_index = 0
sample_features = X_test[sample_index].reshape(1, -1)
predicted_label = encoder.inverse_transform(svm.predict(sample_features))
true_label = encoder.inverse_transform([y_test[sample_index]])
print(f"Predicted Emotion: {predicted_label[0]}, True Emotion: {true_label[0]}")

Testing with a sample prediction...
Predicted Emotion: YAF_neutral, True Emotion: YAF_neutral
