# Random Forest Classifier for Speech Emotion Recognition
This notebook runs a Random Forest classifier using extracted OpenSMILE features to classify speech emotions. It includes:
- Data loading
- Label encoding
- Model training and validation
- Model evaluation on test set

## 1. Import Libraries

In [1]:
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report

## 2. Load and Prepare Data

In [2]:
# Load CSVs
train_df = pd.read_csv('../data/features/train_final.csv')
val_df = pd.read_csv('../data/features/val_final.csv')
test_df = pd.read_csv('../data/features/test_final.csv')

# Drop non-numeric, non-feature columns
non_feature_cols = ['Filepath', 'Id', 'Dataset', 'Filename', 'Ext', 'Duration', 'Emotion']
X_train = train_df.drop(columns=non_feature_cols, errors='ignore')
y_train = train_df['Emotion']
X_val = val_df.drop(columns=non_feature_cols, errors='ignore')
y_val = val_df['Emotion']
X_test = test_df.drop(columns=non_feature_cols, errors='ignore')
y_test = test_df['Emotion']

## 3. Encode Emotion Labels

In [3]:
le = LabelEncoder()
y_train_enc = le.fit_transform(y_train)
y_val_enc = le.transform(y_val)
y_test_enc = le.transform(y_test)

## 4. Train Random Forest Classifier

In [4]:
clf = RandomForestClassifier(
    n_estimators=100,        # number of trees
    class_weight='balanced', # handle imbalance
    random_state=42,
    n_jobs=-1                # use all available CPU cores
)
clf.fit(X_train, y_train_enc)

## 5. Evaluate Model on Validation Set

In [5]:
print("📈 Evaluating on validation set...")
val_preds = clf.predict(X_val)
print("🧾 Validation Results:")
print(classification_report(y_val_enc, val_preds, target_names=le.classes_))

📈 Evaluating on validation set...
🧾 Validation Results:
              precision    recall  f1-score   support

       Anger       0.63      0.64      0.63       892
       Bored       0.66      0.85      0.75      1098
     Disgust       0.63      0.42      0.50       270
        Fear       0.60      0.38      0.46       285
       Happy       0.63      0.55      0.59      1886
     Neutral       0.56      0.70      0.62      2205
    Question       0.77      0.64      0.70      1138
         Sad       0.65      0.56      0.60       835
    Surprise       0.72      0.56      0.63       728

    accuracy                           0.63      9337
   macro avg       0.65      0.59      0.61      9337
weighted avg       0.64      0.63      0.63      9337



## 6. Evaluate Model on Test Set

In [6]:
print("📈 Evaluating on test set...")
test_preds = clf.predict(X_test)
print("🧾 Test Results:")
print(classification_report(y_test_enc, test_preds, target_names=le.classes_))

📈 Evaluating on test set...
🧾 Test Results:
              precision    recall  f1-score   support

       Anger       0.64      0.64      0.64       891
       Bored       0.65      0.84      0.73      1098
     Disgust       0.56      0.42      0.48       273
        Fear       0.62      0.40      0.49       285
       Happy       0.61      0.52      0.56      1885
     Neutral       0.56      0.68      0.61      2203
    Question       0.73      0.65      0.69      1139
         Sad       0.65      0.60      0.62       830
    Surprise       0.68      0.54      0.60       728

    accuracy                           0.62      9332
   macro avg       0.63      0.59      0.60      9332
weighted avg       0.63      0.62      0.62      9332

