## Part A
### Support Vector Machine

Train a support vector machine using the images from fer2013.csv. Use the Training set for training, and the PrivateTest test set for testing. Report precision, recall, accuracy, F1 score, and create a confusion matrix on the test set, showing the confusions between emotion labels.

In [None]:
import pandas as pd
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
import numpy as np

# Load the CSV file into a DataFrame
data = pd.read_csv('fer2013.csv')
scaler = StandardScaler()

# Split the data for training and testing
train_data = data[data['Usage'] == 'Training']
test_data = data[data['Usage'] == 'PrivateTest']

# format the pixel column into numpy array
X_train_pixels = train_data['pixels'].apply(lambda x: np.fromstring(x, dtype=int, sep=' '))
X_train = np.vstack(X_train_pixels.values)
X_train_scaled = scaler.fit_transform(X_train)

y_train = train_data['emotion']

# format the pixel column into numpy array
X_test_pixels = test_data['pixels'].apply(lambda x: np.fromstring(x, dtype=int, sep=' '))
X_test = np.vstack(X_test_pixels.values)
X_test_scaled = scaler.transform(X_test)

y_test = test_data['emotion']

# Initialize SVM classifier
svm = SVC()

# Train the SVM
svm.fit(X_train, y_train)

In [None]:
from sklearn.metrics import f1_score, accuracy_score, recall_score, precision_score, confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt

# Get Scores
predicted = svm.predict(X_test)

f1_score = f1_score(y_test, predicted, average='weighted')
accuracy_score = accuracy_score(y_test, predicted)
recall_score = recall_score(y_test,predicted, average='weighted')
precision_score = precision_score(y_test,predicted, average='weighted')
confusion_matrix = confusion_matrix(y_test, predicted)
print(f"f1 Score: {f1_score}\nAccuracy Score:{accuracy_score}\nRecall Score:{recall_score}\nPrecision Score:{precision_score}\n")
print("Confusion Matrix:\n")
cm_display = ConfusionMatrixDisplay(confusion_matrix = confusion_matrix)
cm_display.plot()
plt.show()

f1 Score: 0.42950209418514934

Accuracy Score:0.4471997770966843

Recall Score:0.4471997770966843

Precision Score:0.45357166337232335

Train a support vector machine using the Action Units of labeled samples from phoebe_AU.csv. Use 5-fold cross-validation on this training set to report the performance. Report your perceived qualitative performance on the unknown labels (e.g. How many appear correct? Provide your own labels as unknown groundtruth to help quantify your results.)

In [None]:
from sklearn.model_selection import cross_val_score

# Load the Phoebe dataset
phoebe_data = pd.read_csv('Phoebe_AU.csv')

# Load Training Data (all records with labels)
train_data = phoebe_data[phoebe_data['label'] != 'unknown']
labelled_data = train_data['label']
train_data = train_data.drop(['file_name', 'label'], axis=1)

# Load Test Data (all records without label)
test_data = phoebe_data[phoebe_data['label'] == 'unknown']
test_data = test_data.drop(['file_name', 'label'], axis=1)

# Scale the test data using the same scaler
scaler = StandardScaler()
train_data_scaled = scaler.fit_transform(train_data)

# Initialize and train SVM model
svm = SVC()
svm.fit(train_data_scaled, labelled_data)

cv_scores = cross_val_score(svm, train_data_scaled, labelled_data, cv=5)
print("CV Scores:", cv_scores)
print("Average CV Score:", cv_scores.mean())

test_data_scaled = scaler.transform(test_data)

# Test the model
test_predictions = svm.predict(test_data_scaled)

for i, prediction in zip(test_data.index, test_predictions):
    test_data.loc[i, 'label'] = prediction

# Print the estimated labels for test data
print("Estimated labels for records with label value 'unknown':")
print(test_data)

It looks like much of the predicted labels are limited by the amount of data available to train on. However, looking at the ones that have been classified into a basic emotion, it seems like it is mostly is accurate. Below are the labels I would associate with the unknown labelled images is:

0. surprise (same as model)

4. disgust 

5. sad (same as model)

11. happy (same as model)

13. happy (same as model)

47. angry 

62. surprise 

74. happy (same as model)

78. sad (same as model)

81. happy (same as model)

84. disgust 

93. surprise (same as model)

Comparing the self-labelled and the model-labelled versions, the model had a 8/12 = 66% of the classified images correct. Though this would likely improve once the model can classify more distinctions between the surprise, happy, sad with angry and disgust. Most of the discrepency between the incorrect classified images was that the model lacked the distinctions between similar emotions (disgust vs surprise) and (angry vs sad).

## Part B
### Neural Network

Neural Network. Train a neural network using the images from fer2013.csv using Keras. Your first layer should be a Conv2D layer, and the last layers should be a Dense layer followed by a Softmax. Use the Training set for training, PublicTest validation set to avoid overfitting, and the PrivateTest test set for testing. Aim for a minimum validation accuracy of 40% on the Fer2013 validation set. To enhance your model's performance, experiment with various batch sizes and epochs. Incorporate dropout and normalization techniques to further mitigate overfitting and improve generalization. Report precision, recall, accuracy, F1 score, and create a confusion matrix on the test set.

Test. Use your trained neural network from Part B.1 and classify the Phoebe unknown image data. Report your perceived performance on the unknown labels, comparing it to the SVM in Part A.2.

Fine-tune the Neural Network, and re-classify. Fine-tune your neural network on the Phoebe-face image dataset provided (Hints: use imread() in grayscale to read the images, and freeze early layer weights during fine-tuning). Then, reclassify the images in unknown. Do you think the results improved compared to Part B.2?

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from keras.utils import to_categorical
import matplotlib.pyplot as plt

df = pd.read_csv("fer2013.csv")
df['pixels'] = df['pixels'].apply(lambda x: np.array(x.split(), dtype=np.uint8))

# convert pixels to input shape and split into separate sets
X = np.stack(df['pixels'].values).reshape(-1, 48, 48, 1) / 255.0
y = to_categorical(df['emotion'])

# splitting data into training, validation, and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=df['Usage'], random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.2, stratify=y_train, random_state=42)

model = Sequential([
    Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(48, 48, 1)),
    MaxPooling2D(pool_size=(2, 2)), # merge
    Conv2D(64, kernel_size=(3, 3), activation='relu'), # additional convolution stage
    MaxPooling2D(pool_size=(2, 2)), # merge
    Flatten(),
    Dense(128, activation='relu'),
    Dropout(0.5), 
    Dense(7, activation='softmax') # softmax with 7 possible one-hot encodings
])

# train with 20 epoch cycles and using crossentropy cost function
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
history = model.fit(X_train, y_train, batch_size=64, epochs=20, validation_data=(X_val, y_val), verbose=1)

test_loss, test_acc = model.evaluate(X_test, y_test, verbose=0)
print("Loss:", test_loss)
print("Accuracy:", test_acc) # validate >40% accuracy

In [None]:
# testing model and reporting results
predicted = np.argmax(model.predict(X_test), axis=-1) 

true_positives = np.argmax(y_test, axis=-1)

print(classification_report(true_positives, predicted)) # all scores

print("Confusion Matrix:")
confusion_matrix = confusion_matrix(true_positives, predicted)
cm_disp = ConfusionMatrixDisplay(confusion_matrix=confusion_matrix, display_labels=range(7))
cm_disp.plot()
plt.show()

### B.2

In [None]:
import os
import cv2

phoebe_df = pd.read_csv("phoebe_AU.csv")

# get image names that are unknown labelled
unknown_df = phoebe_df[phoebe_df['label'] == 'unknown']

X_unknown = [] # stores the image data for each unknown labelled image
for file_name in unknown_df['file_name']:
    image = cv2.imread(os.path.join("images", "unknown", file_name), cv2.IMREAD_GRAYSCALE)
    image = cv2.resize(image, (48, 48))
    image = image / 255.0 # normalizes pixel values
    X_unknown.append(image)

X_unknown = np.array(X_unknown)

# reshape for model inputting
X_unknown = X_unknown.reshape(-1, 48, 48, 1)

# test the unknown dataset on the model
unknown_predictions = model.predict(X_unknown)

# converts the one hot encoding to a number label
emotion_labels = range(7)
predicted_emotions = [emotion_labels[np.argmax(pred)] for pred in unknown_predictions]

# update the images with the predicted label
unknown_df['predicted_label'] = predicted_emotions
print(unknown_df[['file_name', 'predicted_label']])

Based on the results of the neural network designed using keras, we observe labelled:

     file_name  predicted_label
0     1_01.jpg                4

4     4_01.jpg                4

5     4_20.jpg                4

11    8_01.jpg                3

13    9_41.jpg                3

47  26_123.jpg                3

62   35_42.jpg                5

74   41_06.jpg                3

78   44_01.jpg                6

81   46_03.jpg                3

84   48_01.jpg                0

93   52_31.jpg                3

This is a predominantly high number of classifications to 3 -> happy, however since this model was designed on the data of fer2013.csv and it had labels from 0 to 6, we get some outstanding labels for instance 48_01.jpg

The mapping for labels is as follows:

1 -> angry

2 -> disgust

3 -> happy

4 -> sad 

5 -> surprise

6 -> unknown

Overall, this results in 6/12 = 50% being correctly classified. This is worse than the 66% accurate predictions that were observe in A.2. This NN model was not as consistent with feature distinction, particularly between happy and surprise.

### B.3

In [None]:
# Function to load and preprocess images into model input shape
def load_images(directory='images/', img_height=48, img_width=48):
    image_data = []
    labels = []
    for label in os.listdir(directory):
        label_dir = os.path.join(directory, label)
        for image_file in os.listdir(label_dir):
            image_path = os.path.join(label_dir, image_file)
            image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
            image = cv2.resize(image, (img_width, img_height))
            image = image / 255.0 
            image_data.append(image)
            labels.append(label)
    return np.array(image_data), np.array(labels)

image_data, labels = load_images()

# convert labels to one-hot encoding
labels = pd.get_dummies(labels).values 

# split data into training and validation sets
x_train, x_val, y_train, y_val = train_test_split(image_data, labels, test_size=0.2, random_state=42)

# save the original model and add additional laeyrs to the model
for layer in model.layers:
    layer.trainable = False
model = Sequential([
    model,
    Flatten(), 
    Dense(128, activation='relu'),
    Dense(5, activation='softmax')
])

# Compile the model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=20, batch_size=64, validation_data=(x_val, y_val))

# Evaluate the fine-tuned model
_, test_acc = model.evaluate(x_val, y_val)
print('Test accuracy:', test_acc)

In [None]:
phoebe_df = pd.read_csv("phoebe_AU.csv")

# get image names that are unknown labelled
unknown_df = phoebe_df[phoebe_df['label'] == 'unknown']

X_unknown = [] # stores the image data for each unknown labelled image
for file_name in unknown_df['file_name']:
    image = cv2.imread(os.path.join("images", "unknown", file_name), cv2.IMREAD_GRAYSCALE)
    image = cv2.resize(image, (48, 48))
    image = image / 255.0 # normalizes pixel values
    X_unknown.append(image)

X_unknown = np.array(X_unknown)

# reshape for model inputting
X_unknown = X_unknown.reshape(-1, 48, 48, 1)

# test the unknown dataset on the model
unknown_predictions = model.predict(X_unknown)

# converts the one hot encoding to a number label
emotion_labels = range(7)
predicted_emotions = [emotion_labels[np.argmax(pred)] for pred in unknown_predictions]

# update the images with the predicted label
unknown_df['predicted_label'] = predicted_emotions
print(unknown_df[['file_name', 'predicted_label']])

From the results in B.3, it is seen that the fine tuning of the original model fails to demonstrate improvement as there was an accuracy of 42% on the predicted results vs the analytical results that I was able to observe from the images. This loss in value could be a result from the way the model was appended to. The model was appended to at the softmax level, which may have improved by removing the encoding layer, then adding the additional relu activation layer, before adding another softmax output layer to classify the images in the 5-bit encoding they should be. Unfortunately, due to time and computation constraints, this wasn't able to process.

## Part C
### Comparison

Compare. Compare the results from the 4 models (SVM-Fer2013, SVM-OpenFace, NN-Fer2013, NN-FineTuned) on the Phoebe unknown dataset. Specifically compare the approach with hand-crafted features (SVM-OpenFace) versus neural network extracted features (NN-FineTuned). Choose the one that you think worked best with this dataset. Justify your answer based on the results from Part A and Part B and discuss limitations.


Comparing the results from all the models, specifically the ones with hand-crafted features from the svm and nn variants, I observe a better result from the SVM. The SVM implementation had a final accuracy of 66% on the unknown dataset labelling, however the same test on the Neural Network variant resulted in a mere 42% accuracy. A limitation of the Neural Network implementation is the variants that need to be tested to achieve an optimal result, as well as the high dependance of the activation layer type and number of layers which greatly impact the processing. With this dataset and perhaps more time spent fine tuning the NN model, it could have been possible to observe a better prediction accuracy due to the control the designer has on the model.

The results from part A, particularly training on fer2013 showed a better scores overall, particularly viewing the confusion matrix. There were more cases along the diagonal (true positives) of the confusion matrix on the SVM compared to NN. The performance of the SVM also carried through to the training and testing using the phoebe dataset. The SVM provided a 66% accuracy on the phoebe unknown dataset, whereas the NN implementation got 42% of them right. 

Now that we've discussed the results, and some limitations of the models, it's important to also mention limitations from the controllers perspective. Particularly the person who labels the unknown data themselves, in the case of this assignment, it's me. It is important to note that I may perceive an emotion incorrectly than that another individual would. This is especially true for emotions that share similar action units, such as surprise and happy. This was also a common occurence in the actual model predictions, where they confused happy and surprise, similarly, angry and disgust were common mistakes in the predictions. Ultimately, these model predictions should be a small part of a larger system which can account for these nuances, especially when these distinctions impact the way a system behaves towards the user.