# 🤖 KNN Classifier for Speech Emotion Recognition
This notebook runs a **K-Nearest Neighbors (KNN)** classifier using extracted OpenSMILE features to classify speech emotions.

It includes:
- Data loading
- Label encoding
- Model training and validation
- Model evaluation on test set

# 1.Imports

In [1]:
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import classification_report, accuracy_score
import time

# 2.Load and Prepare Data

In [2]:
# Load CSVs
train_df = pd.read_csv('../data/features/train_final.csv')
val_df = pd.read_csv('../data/features/val_final.csv')
test_df = pd.read_csv('../data/features/test_final.csv')

# Drop any non-numeric columns that are not features
non_feature_cols = ['Filepath', 'Id', 'Dataset', 'Filename', 'Ext', 'Duration', 'Emotion']

X_train = train_df.drop(columns=non_feature_cols, errors='ignore')
y_train = train_df['Emotion']

X_val = val_df.drop(columns=non_feature_cols, errors='ignore')
y_val = val_df['Emotion']

X_test = test_df.drop(columns=non_feature_cols, errors='ignore')
y_test = test_df['Emotion']

# 3.Encode Labels

In [3]:
le = LabelEncoder()
y_train_enc = le.fit_transform(y_train)
y_val_enc = le.transform(y_val)
y_test_enc = le.transform(y_test)

# 4.Train KNN Model

In [4]:
start_time = time.time()
clf = KNeighborsClassifier(n_neighbors=5)
clf.fit(X_train, y_train_enc)
print(f"✅ Training completed in {time.time() - start_time:.2f} seconds")

✅ Training completed in 0.01 seconds


# 5.Evaluate on Validation Set

In [5]:
val_preds = clf.predict(X_val)
print("Validation Results:")
print(classification_report(y_val_enc, val_preds, target_names=le.classes_))

Validation Results:
              precision    recall  f1-score   support

       Anger       0.51      0.57      0.54       892
       Bored       0.68      0.83      0.75      1098
     Disgust       0.44      0.42      0.43       270
        Fear       0.48      0.40      0.44       285
       Happy       0.57      0.56      0.57      1886
     Neutral       0.56      0.62      0.59      2205
    Question       0.78      0.65      0.71      1138
         Sad       0.60      0.49      0.54       835
    Surprise       0.63      0.49      0.55       728

    accuracy                           0.60      9337
   macro avg       0.58      0.56      0.57      9337
weighted avg       0.60      0.60      0.60      9337



# 6.Evaluate on Test Set

In [6]:
test_preds = clf.predict(X_test)
print("Test Results:")
print(classification_report(y_test_enc, test_preds, target_names=le.classes_))

Test Results:
              precision    recall  f1-score   support

       Anger       0.53      0.62      0.57       891
       Bored       0.68      0.83      0.75      1098
     Disgust       0.44      0.42      0.43       273
        Fear       0.44      0.44      0.44       285
       Happy       0.58      0.56      0.57      1885
     Neutral       0.56      0.61      0.58      2203
    Question       0.78      0.66      0.72      1139
         Sad       0.60      0.51      0.55       830
    Surprise       0.67      0.46      0.54       728

    accuracy                           0.60      9332
   macro avg       0.59      0.57      0.57      9332
weighted avg       0.61      0.60      0.60      9332

