<a href="https://colab.research.google.com/github/Blueorchid1711/DRLBP/blob/main/Final1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
Build and compare SVM, Decision Tree, and KNN models on the "zoya77/face-emotion-insight-recognition-dataset" dataset.

## Load and prepare data

### Subtask:
Load the dataset and perform any necessary preprocessing steps, such as handling missing values, encoding categorical features, and splitting the data into training and testing sets.


**Reasoning**:
The dataset is already loaded as `df` and split into `X`, `y`, `X_train`, `X_test`, `y_train`, and `y_test`. The original `df` also contains the target variable `emotion_label`. I will display the first few rows and the info of `df` to understand its structure and data types and then check for missing values.



In [None]:
display(df.head())
display(df.info())
display(df.isnull().sum())

Unnamed: 0,eda_mean,eda_std,eda_peaks,hr_mean,hr_std,hr_skewness,face_au01,face_au06,face_au12,face_landmark_distance1,face_landmark_distance2,face_landmark_ratio1,emotion_label
0,1.074507,0.276585,3,69.585207,6.579213,0.076743,0.076526,0.132384,0.176151,30.54256,44.073165,0.953427,Neutral
1,1.036294,0.198717,3,64.260159,4.091976,-0.14123,0.173282,0.013754,0.071886,29.774224,45.135056,0.857525,Neutral
2,0.918343,0.239936,3,70.332768,4.398293,0.185228,0.099325,0.04245,0.118785,28.942289,46.64509,0.877916,Neutral
3,1.03133,0.373847,3,64.12099,4.884352,-0.03011,0.026074,0.033591,0.109843,29.280156,44.078722,1.105712,Neutral
4,1.051543,0.232308,4,64.71088,6.031,0.093128,0.058039,0.116204,0.080746,29.690788,45.662527,1.097555,Neutral


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4998 entries, 0 to 4997
Data columns (total 13 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   eda_mean                 4998 non-null   float64
 1   eda_std                  4998 non-null   float64
 2   eda_peaks                4998 non-null   int64  
 3   hr_mean                  4998 non-null   float64
 4   hr_std                   4998 non-null   float64
 5   hr_skewness              4998 non-null   float64
 6   face_au01                4998 non-null   float64
 7   face_au06                4998 non-null   float64
 8   face_au12                4998 non-null   float64
 9   face_landmark_distance1  4998 non-null   float64
 10  face_landmark_distance2  4998 non-null   float64
 11  face_landmark_ratio1     4998 non-null   float64
 12  emotion_label            4998 non-null   object 
dtypes: float64(11), int64(1), object(1)
memory usage: 507.7+ KB


None

Unnamed: 0,0
eda_mean,0
eda_std,0
eda_peaks,0
hr_mean,0
hr_std,0
hr_skewness,0
face_au01,0
face_au06,0
face_au12,0
face_landmark_distance1,0


**Reasoning**:
There are no missing values. The target variable 'emotion_label' is a categorical feature and needs to be encoded. I will use one-hot encoding to convert it into numerical format. Then, I will split the data into features (X) and target (y) and finally split them into training and testing sets.



In [None]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
import pandas as pd

# Identify categorical features and apply one-hot encoding
categorical_features = ['emotion_label']
encoder = OneHotEncoder(sparse_output=False)
encoded_features = encoder.fit_transform(df[categorical_features])
encoded_df = pd.DataFrame(encoded_features, columns=encoder.get_feature_names_out(categorical_features))

# Separate features (X) and target (y)
X = df.drop(columns=categorical_features)
y = encoded_df

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

display(X_train.head())
display(y_train.head())

Unnamed: 0,eda_mean,eda_std,eda_peaks,hr_mean,hr_std,hr_skewness,face_au01,face_au06,face_au12,face_landmark_distance1,face_landmark_distance2,face_landmark_ratio1
1738,0.732046,0.297202,2,56.77937,4.165485,-0.172699,0.090115,0.094802,0.19976,27.379906,44.683661,0.989466
3986,2.867443,0.304028,1,86.848215,7.470028,0.298343,0.281826,0.539549,0.328279,30.797588,49.443998,0.864107
2915,3.697553,0.467669,3,97.466697,6.460534,0.336279,0.289388,0.34355,0.397884,31.183519,43.009973,1.146913
1595,3.326075,0.506257,5,81.275369,6.074852,0.127352,0.075896,0.680294,1.0,31.157039,46.142463,1.009586
3213,3.28167,0.789887,0,92.160571,8.629637,0.065824,0.024431,0.288232,0.489034,32.517018,48.038547,1.268013


Unnamed: 0,emotion_label_Angry,emotion_label_Fear,emotion_label_Happy,emotion_label_Neutral,emotion_label_Sad,emotion_label_Surprise
1738,0.0,0.0,0.0,0.0,1.0,0.0
3986,0.0,1.0,0.0,0.0,0.0,0.0
2915,1.0,0.0,0.0,0.0,0.0,0.0
1595,0.0,0.0,1.0,0.0,0.0,0.0
3213,1.0,0.0,0.0,0.0,0.0,0.0


## Train models

### Subtask:
Train SVM, Decision Tree, and KNN models on the training data.


**Reasoning**:
Import the necessary model classes and instantiate and train each model using the training data.



In [None]:
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier

# Instantiate models with default parameters
svm_model = SVC()
dt_model = DecisionTreeClassifier()
knn_model = KNeighborsClassifier()

# Train each model
svm_model.fit(X_train, y_train)
dt_model.fit(X_train, y_train)
knn_model.fit(X_train, y_train)

ValueError: y should be a 1d array, got an array of shape (3998, 6) instead.

**Reasoning**:
The error indicates that the target variable `y_train` is in a one-hot encoded format (shape (3998, 6)) which is not suitable for the SVM, Decision Tree, and KNN models with default parameters. These models expect a single column for the target variable. I need to convert the one-hot encoded `y_train` back to a single column of class labels.



In [None]:
import numpy as np

# Convert one-hot encoded y_train back to a single column of class labels
y_train_labels = np.argmax(y_train.values, axis=1)
y_test_labels = np.argmax(y_test.values, axis=1)

# Instantiate models with default parameters
svm_model = SVC()
dt_model = DecisionTreeClassifier()
knn_model = KNeighborsClassifier()

# Train each model using the label encoded y_train
svm_model.fit(X_train, y_train_labels)
dt_model.fit(X_train, y_train_labels)
knn_model.fit(X_train, y_train_labels)

## Evaluate models

### Subtask:
Evaluate the performance of each model using appropriate metrics, such as accuracy, precision, recall, and F1-score.


**Reasoning**:
Import the necessary metrics and evaluate the performance of each model on the test set.



In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Make predictions on the test set
svm_pred = svm_model.predict(X_test)
dt_pred = dt_model.predict(X_test)
knn_pred = knn_model.predict(X_test)

# Evaluate each model
svm_accuracy = accuracy_score(y_test_labels, svm_pred)
svm_precision = precision_score(y_test_labels, svm_pred, average='weighted')
svm_recall = recall_score(y_test_labels, svm_pred, average='weighted')
svm_f1 = f1_score(y_test_labels, svm_pred, average='weighted')

dt_accuracy = accuracy_score(y_test_labels, dt_pred)
dt_precision = precision_score(y_test_labels, dt_pred, average='weighted')
dt_recall = recall_score(y_test_labels, dt_pred, average='weighted')
dt_f1 = f1_score(y_test_labels, dt_pred, average='weighted')

knn_accuracy = accuracy_score(y_test_labels, knn_pred)
knn_precision = precision_score(y_test_labels, knn_pred, average='weighted')
knn_recall = recall_score(y_test_labels, knn_pred, average='weighted')
knn_f1 = f1_score(y_test_labels, knn_pred, average='weighted')

# Print the evaluation metrics
print("SVM Model Performance:")
print(f"  Accuracy: {svm_accuracy:.4f}")
print(f"  Precision: {svm_precision:.4f}")
print(f"  Recall: {svm_recall:.4f}")
print(f"  F1-score: {svm_f1:.4f}")

print("\nDecision Tree Model Performance:")
print(f"  Accuracy: {dt_accuracy:.4f}")
print(f"  Precision: {dt_precision:.4f}")
print(f"  Recall: {dt_recall:.4f}")
print(f"  F1-score: {dt_f1:.4f}")

print("\nKNN Model Performance:")
print(f"  Accuracy: {knn_accuracy:.4f}")
print(f"  Precision: {knn_precision:.4f}")
print(f"  Recall: {knn_recall:.4f}")
print(f"  F1-score: {knn_f1:.4f}")

SVM Model Performance:
  Accuracy: 0.7640
  Precision: 0.7613
  Recall: 0.7640
  F1-score: 0.7617

Decision Tree Model Performance:
  Accuracy: 0.9370
  Precision: 0.9375
  Recall: 0.9370
  F1-score: 0.9369

KNN Model Performance:
  Accuracy: 0.8070
  Precision: 0.8112
  Recall: 0.8070
  F1-score: 0.8081


## Compare models

### Subtask:
Compare the performance of the three models and identify the best-performing model.


**Reasoning**:
Compare the performance metrics of the three models and summarize the findings to identify the best-performing model.



In [None]:
print("Model Performance Comparison:")
print("-" * 30)
print(f"SVM:")
print(f"  Accuracy: {svm_accuracy:.4f}")
print(f"  Precision: {svm_precision:.4f}")
print(f"  Recall: {svm_recall:.4f}")
print(f"  F1-score: {svm_f1:.4f}")
print("-" * 30)
print(f"Decision Tree:")
print(f"  Accuracy: {dt_accuracy:.4f}")
print(f"  Precision: {dt_precision:.4f}")
print(f"  Recall: {dt_recall:.4f}")
print(f"  F1-score: {dt_f1:.4f}")
print("-" * 30)
print(f"KNN:")
print(f"  Accuracy: {knn_accuracy:.4f}")
print(f"  Precision: {knn_precision:.4f}")
print(f"  Recall: {knn_recall:.4f}")
print(f"  F1-score: {knn_f1:.4f}")
print("-" * 30)

# Identify the best performing model based on accuracy
best_model = max(("SVM", svm_accuracy), ("Decision Tree", dt_accuracy), ("KNN", knn_accuracy), key=lambda item: item[1])[0]

print(f"\nBased on accuracy, the best performing model is: {best_model}")

Model Performance Comparison:
------------------------------
SVM:
  Accuracy: 0.7640
  Precision: 0.7613
  Recall: 0.7640
  F1-score: 0.7617
------------------------------
Decision Tree:
  Accuracy: 0.9370
  Precision: 0.9375
  Recall: 0.9370
  F1-score: 0.9369
------------------------------
KNN:
  Accuracy: 0.8070
  Precision: 0.8112
  Recall: 0.8070
  F1-score: 0.8081
------------------------------

Based on accuracy, the best performing model is: Decision Tree
