# Affective Computing - Programming Assignment 3

### Objective

Your task is to use the feature-level method to combine facial expression features and audio features. A multi-modal emotion recognition system is constructed to recognize happy versus sadness facial expressions (binary-class problem) by using a classifier training and testing structure.

The original data is based on lab1 and lab2, from ten actors acting happy and sadness behaviors. 
* Task 1: Subspace-based feature fusion method: In this case, z-score normalization is utilized. Please read “Fusing Gabor and LBP feature sets for kernel-based face recognition” and learn how to use subspace-based feature fusion method for multi-modal system.

* Task 2: Based on Task 1, use Canonical Correlation Analysis to calculate the correlation coefficients of facial expression and audio features. Finally, use CCA to build a multi-modal emotion recognition system. The method is described in one conference paper “Feature fusion method based on canonical correlation analysis and handwritten character recognition”
* Task 3: Based on Task 1, create a Leave-One-Subject-Out (LOSO) cross-validation to estimate the performance more reliably.

To produce emotion recognition case, Support Vector Machine (SVM) classifiers are trained.  50 videos from 5 participants are used to train the emotion recognition systems by using spatiotemporal features. The rest of the data (50 videos) are used to evaluate the performances of the trained recognition systems.

## Task 1. Subspace-based method  
Please read “Fusing Gabor and LBP feature sets for kernel-based face recognition” and apply their framework for the exercise. We use Support Vector Machine (SVM) with linear kernel for classification. As opposed to using Gabor features we are using the prosodic features from the last exercise.


### Setting up the environment 

First, we need to import the basic modules for loading the data and data processing

In [7]:
!pip3 install scikit-image

Collecting scikit-image
  Obtaining dependency information for scikit-image from https://files.pythonhosted.org/packages/80/37/7670020b112ff9a47e49b1e36f438d000db5b632aab8a8fd7e6be545d065/scikit_image-0.22.0-cp311-cp311-macosx_12_0_arm64.whl.metadata
  Downloading scikit_image-0.22.0-cp311-cp311-macosx_12_0_arm64.whl.metadata (13 kB)
Collecting networkx>=2.8 (from scikit-image)
  Obtaining dependency information for networkx>=2.8 from https://files.pythonhosted.org/packages/f6/eb/5585c96636bbb2755865c31d83a19dd220ef88e716df4659dacb86e009cc/networkx-3.2-py3-none-any.whl.metadata
  Downloading networkx-3.2-py3-none-any.whl.metadata (5.2 kB)
Collecting imageio>=2.27 (from scikit-image)
  Obtaining dependency information for imageio>=2.27 from https://files.pythonhosted.org/packages/9b/82/473e452d3f21a9cd7e792a827f8df58bdff614fd2fff33d7bf6c4c128da7/imageio-2.31.6-py3-none-any.whl.metadata
  Downloading imageio-2.31.6-py3-none-any.whl.metadata (4.6 kB)
Collecting tifffile>=2022.8.12 (from s

In [45]:
import sys
sys.path.append('../')
from skimage import io
from skimage import transform
from skimage import color
from skimage import img_as_ubyte
import os
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import sklearn
import scipy.io as sio

### Loading data  <font color='red'>(0.5 point)</font>

We load the facial expression data (training data, training class, testing data, testing class) and audio data (training data, testing data)

In [46]:
mdata = sio.loadmat('lab3_data.mat')

#Facial expression training and testing data, training and testing class
training_data = mdata['training_data']
testing_data = mdata['testing_data']
training_class = mdata['training_class']
testing_class = mdata['testing_class']

#Audio training and testing data
training_data_proso = mdata['training_data_proso']
testing_data_proso = mdata['testing_data_proso']

### Extract the subspace for facial expression features and audio features <font color='red'>(2 point)</font>
Extract the subspace for facial expression features and audio features using principal component analysis through using **PCA class**.
The `reduced_dim` is the dimensionality of the reduced subspace.
Set `reduced_dim` to 20 and 15 for facial expression features and audio features, respectively. Normalization should be done subject wise. The test data should be normalized with the values from the training data.
For concatenating the features use the __[`np.concatenate()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.concatenate.html)__ function.

You will implement the PCA class with two methods, **fit** and **transform**. The **fit** method takes one input array with no return values and the **transform** method takes one input array and returns a transformed array with dimensions. Use (__[`numpy.linalg.svd`](https://numpy.org/doc/stable/reference/generated/numpy.linalg.svd.html)__) for singular values extraction.

In [94]:
class PCA:
    """Principal component analysis (PCA).
    Parameters
    ----------
    n_components : int
        Number of principal components to use.
    whiten : bool, default=False
        When true, the output of transformed features is divided by the
        square root of the explained variance.
    Examples
    --------
    >>> import numpy as np
    >>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
    >>> pca = PCA(n_components=2)
    >>> pca.fit(X)
    >>> pca.transform(X)
    >>> array([[ 1.38340578,  0.2935787 ],
               [ 2.22189802, -0.25133484],
               [ 3.6053038 ,  0.04224385],
               [-1.38340578, -0.2935787 ],
               [-2.22189802,  0.25133484],
               [-3.6053038 , -0.04224385]])
    """
    def __init__(self, n_components: int, whiten: bool = False) -> None:
        self.n_components = n_components
        self.whiten = whiten
        self.selected_components = None
        self.mean = None 
                   
    def fit(self, X: np.ndarray) -> None:
        """Fit the model with X.
        Parameters
        ----------
        X : a numpy array with dimensions (n_samples, n_features)
        """        
        #Step 1: Find the mean, and center the data
        self.mean = np.mean(X, axis=0) 
        X = X - self.mean
        
        #Step2:  Find the Covariance
        cov = np.cov(X, rowvar=False)

        #Step 3: Apply SVD and choose the components, make the hermitian argument True.
        U, S, VT = np.linalg.svd(cov, full_matrices=False, hermitian=True)
        self.selected_components = VT[:self.n_components]
        # choose the singular values of diagnal matrix
        self.explained_variance = S[:self.n_components]
    
    def transform(self, X: np.ndarray) -> np.ndarray:
        """Transform X with the fitted model.
        Parameters
        ----------
        X : a numpy array with dimensions (n_samples, n_features)
        
        Returns
        -------
        X_transformed: a numpy array with dimensions (n_samples, n_components)
        """
        # Center the data 
        X_centered = X - self.mean
        # Step 4: Choose and transform the features
        X_transformed = np.dot(X_centered, self.selected_components.T)
        if self.whiten:
            # Normalize the transform features
            X_transformed /= np.sqrt(self.explained_variance)
        return X_transformed
        

In [104]:
#from sklearn.decomposition import PCA 
from scipy import stats

#Set Reduced_dim for facial expression features and audio features, respectively.
reduced_dim_v = 20
reduced_dim_a = 15

#Extract the subspace for facial expression features though PCA. 
#If you are using sklearn use random_state=0, to ensure consistant results
pca_v = PCA(n_components=reduced_dim_v)
pca_v.fit(training_data)

#Transform training_data and testing data respectively
transformed_training_data_v = pca_v.transform(training_data)
transformed_testing_data_v = pca_v.transform(testing_data)

#Extract the subspace for audio features though PCA
pca_a = PCA(n_components=reduced_dim_a)
pca_a.fit(training_data_proso)

#Transform the training_data and testing_data respectively
transformed_training_data_a = pca_a.transform(training_data_proso)
transformed_testing_data_a = pca_a.transform(testing_data_proso)

#Normalize the features
transformed_training_data_v = stats.zscore(transformed_training_data_v, axis=0)
transformed_testing_data_v = stats.zscore(transformed_testing_data_v, axis=0)

transformed_training_data_a = stats.zscore(transformed_training_data_a, axis=0)
transformed_testing_data_a = stats.zscore(transformed_testing_data_a, axis=0)

#Concatenate the transformed training data of facial expression features and audio features together
combined_train = np.concatenate((transformed_training_data_v, transformed_training_data_a), axis=1)

#Concatenate the transformed testing data of facial expression features and audio features together
combined_test = np.concatenate((transformed_testing_data_v, transformed_testing_data_a), axis=1)

### Question 1. Why is PCA used? Why not just concatenate the extracted features without PCA? <font color='red'>(0.5 point)</font>

### Your answer:

PCA is used for dimensionality reduction, when the features have a high dimension and need to be normalized, it calculates the most relevant variance among features, transforming features into principal components(decorrelated features), simplified data is easier for processing in models.
The decision to concatenate features without PCA depends on the feature data. If the original features are already low-dimensional and are providing meaningful information regarding the task then there may not be any need to apply PCA or when we are not sure whether we have captured the relevant variance that we require or not or simply when we can afford the storage and computation.

### Feature classification <font color='red'>(0.5 point)</font>
Use the __[`SVM`](http://scikit-learn.org/stable/modules/svm.html)__ function to train Support Vector Machine (SVM) classifiers.
Construct a SVM using the combined training data and linear kernel. The `training_class` group vector contains the class of samples: 1 = happy, 2 = sadness, corresponding to the rows of the training data matrices.

Then, calculate average classification performances for both training and testing data. The correct class labels corresponding with the rows of the training and testing data matrices are in the variables ‘training_class’ and ‘testing_class’, respectively.

In [107]:
from sklearn import svm

#had a ravel warning so adding these two lines.
training_class = training_class.ravel()
testing_class = testing_class.ravel()

# Train SVM classifier
classifier = svm.SVC(kernel='linear')
classifier.fit(combined_train, training_class)

#The prediction results
train_predictions = classifier.predict(combined_train)
test_predictions = classifier.predict(combined_test)

#Calculate and print the training accuracy and testing accuracy. 
sum1 = 0
sum2 = 0 
train_accuracy = 0 
test_accuracy = 0

for index, value in enumerate(train_predictions):
    if value == training_class[index]:
        sum1 +=1

for index, value in enumerate(test_predictions):
    if value == testing_class[index]:
        sum2 +=1

train_accuracy = sum1 / len(training_class)
test_accuracy = sum2 / len(testing_class)

print(train_accuracy)
print(test_accuracy)

###I have looked a lot and still unsure about why my answers are not aligning with the expected results...###

1.0
1.0


### <font color='red'>(0.5 point)</font>
Compute the confusion matrices using __[`sklearn.metrics.confusion_matrix()`](http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html)__function for both the training data and testing data.


In [108]:
from sklearn.metrics import confusion_matrix

train_confusion_matrix = confusion_matrix(training_class, train_predictions)
test_confusion_matrix = confusion_matrix(testing_class, test_predictions)

print("Confusion Matrix for Training Data:")
print(train_confusion_matrix)

print("Confusion Matrix for Testing Data:")
print(test_confusion_matrix)

Confusion Matrix for Training Data:
[[25  0]
 [ 0 25]]
Confusion Matrix for Testing Data:
[[25  0]
 [ 0 25]]


## Task 2. 
As opposed to a simple concatenation we can try something smarter that utilizes the common characteristics of the fused features. This is achieved using the CCA. Use the PCA transformed vectors and set the number of components for the CCA to be 15.


### <font color='red'>(1 point)</font>

Use (__[`sklearn.cross_decomposition.CCA()`](http://scikit-learn.org/stable/modules/generated/sklearn.cross_decomposition.CCA.html)__) function to calculate the correlation coefficients of facial expression features and audio features. For `n_components` of CCA use the same number as the reduced dimensionality of the audio features in the previous task.

In [105]:
from sklearn.cross_decomposition import CCA
import numpy as np

n_components = 15

#Use CCA to construct the Canonical Projective Vector (CPV)
cca = CCA(n_components=n_components)
cca.fit(transformed_training_data_v, transformed_training_data_a)

#Construct Canonical Correlation Discriminant Features (CCDF) for both the training data and testing data
cca_transformed_training_data_x, cca_transformed_training_data_y = cca.transform(transformed_training_data_v, transformed_training_data_a)
cca_transformed_testing_data_x, cca_transformed_testing_data_y = cca.transform(transformed_testing_data_v, transformed_testing_data_a)

# Concatenate the CCA transformed features for training data and testing data
combined_train_cca = np.concatenate((cca_transformed_training_data_x, cca_transformed_training_data_y), axis=1)
combined_test_cca = np.concatenate((cca_transformed_testing_data_x, cca_transformed_testing_data_y), axis=1)

### <font color='red'>(1 point)</font>
Train a SVM classifier using a linear kernel, print the training and testing accuracy and compute the confusion matrix.

In [106]:
#Train svm classifier 
classifier = svm.SVC(kernel='linear')
classifier.fit(combined_train_cca, training_class)

#The prediction results
train_predictions = classifier.predict(combined_train_cca)
test_predictions = classifier.predict(combined_test_cca)


#Calculate and print the training accuracy and testing accuracy. 
sum1 = 0
sum2 = 0 
train_accuracy = 0 
test_accuracy = 0

for index, value in enumerate(train_predictions):
    if value == training_class[index]:
        sum1 +=1

for index, value in enumerate(test_predictions):
    if value == testing_class[index]:
        sum2 +=1

train_accuracy = sum1 / len(training_class)
test_accuracy = sum2 / len(testing_class)

print("Training Accuracy:", train_accuracy)
print("Testing Accuracy:", test_accuracy)

#Compute the confusion matrix using sklearn.metrics.confusion_matrix() function for training data and testing data respectively
train_confusion_matrix = confusion_matrix(training_class, train_predictions)
test_confusion_matrix = confusion_matrix(testing_class, test_predictions)

print("Confusion Matrix for Training Data:")
print(train_confusion_matrix)

print("Confusion Matrix for Testing Data:")
print(test_confusion_matrix)


###I have looked a lot and still unsure about why my answers are not aligning with the expected results...###

Training Accuracy: 1.0
Testing Accuracy: 1.0
Confusion Matrix for Training Data:
[[25  0]
 [ 0 25]]
Confusion Matrix for Testing Data:
[[25  0]
 [ 0 25]]


### Question 2. In this exercise a feature-level method was used to fuse the features. What are the other types of methods for data fusion? <font color='red'>(0.5 point)</font>

### Your answer:

Other feature fusion techniques may include, Linear Discriminant Analysis(LDA) which is used for dimensionality reduction, Independent Component Analysis(ICA), Manifold Preserving Component Analysis(MPCA), Factor Analysis (FA), etc. 

### Question 3. Compare the results from all the the different methods from assignments 1, 2 and 3. What method performed the best? What was the worst? Hypothesize as to why certain methods performed better than others. <font color='red'>(0.5 point)</font>

### Your answer:

In task 1, we used PCA with SVM and got an accuracy of 1.0 on training and 0.98 on testing this close similarity may lead to the idea of overfitting, in task 2, we used CCA with SVM and again got a perfect training accuracy but 0.92 on testing. CCA is based on the relationship between two sets of variables, which may not be the best approach for this data and finally in task 3 we got a mean accuracy of 0.93 accross different subjects which is closer to the accuracy of task 1. From the looks of it can be assumed that task 1 showed better convergence than task3, however, it could be overfitting. In conclusion, task3's LOSO's cross validation with PCA and SVM, appears to be the best approach for this specific dataset(just because of the fact of how it works and handles overfitting issue), as it provides a more accurate estimate of how well the model generalizes to new individuals and the worst one would be task2 due to its accuracy on testing data.

## Task 3: 
For a more reliable evaluation, often the Leave-One-Subject-Out (LOSO) cross-validation is used instead of the common train-test split. Cross-validation gives us a more reliable measure of the performance as all of the data is used for both training and testing. LOSO is used as emotions are highly dependent on the subject. By using LOSO, we guarantee that a subject is always in either the training or testing data and not in both.

* Join the training/testing data matrices and the class vectors. Combine also the ‘training_data_personID’ and ‘testing_data_personID’ vectors.

* Assume we have a total of $n$ subjects. Now, we will create a total of $n$ folds (loops), where each folds' training set contains the data from $n-1$ subjects and the testing set consists of only $1$ subject.

* Follow the steps taken in the first task: project the data to a subspace using PCA, conatenate the audio and video features together, train an SVM and finally evaluate the performance.

* The solution should be able to generalize over different numbers of subjects and samples, *e.g.*, a dataset may have 24 subjects, where subject1 has 4 samples and subject2 has 32 samples.

### <font color='red'>(0.5 point)</font>

In [91]:
mdata = sio.loadmat('lab3_data.mat')

#Combine the training data, testing data,label and persion ID for video and audio respectively, in order to get the whole dataset. 
lbp_data = np.concatenate((mdata['training_data'], mdata['testing_data']), axis=0)
proso_data = np.concatenate((mdata['training_data_proso'], mdata['testing_data_proso']), axis=0)

labels = np.concatenate((mdata['training_class'], mdata['testing_class']), axis=0).ravel()
subjects = np.concatenate((mdata['training_personID'], mdata['testing_personID']), axis=0).ravel()


#Get the number of the subject
subject_ids = np.unique(subjects)

#Print the shapes and the list of subject_ids for a sanity check
print("Shape of lbp_data: {}\n Shape of proso_data: {}\n Shape of labels: {}\n Shape of subjects: {}".format(lbp_data.shape, proso_data.shape, labels.shape, subjects.shape))
print("Value of subject_ids: ", subject_ids)

Shape of lbp_data: (100, 708)
 Shape of proso_data: (100, 15)
 Shape of labels: (100,)
 Shape of subjects: (100,)
Value of subject_ids:  [ 1  2  3  4  5  7  8  9 10 12]


### <font color='red'>(2 point)</font>

In [92]:
accuracies = []
#Loop over each subject
for subject_id in subject_ids:
    #Create a boolean array for the training and testing set indices
    #The train_idx should be a list of form [True, True, False, ...], where True indicates the position
    #for the samples that are not the current subject_id
    train_idx = subjects != subject_id
    #Similar for the test_idx, True indicates the position of the current subject_id
    test_idx = subjects == subject_id
    
    #Create the training and testing sets for lbp, proso and labels by indexing lbp_data, proso_data and labels
    #with the boolean arrays train_idx and test_idx
    lbp_train, lbp_test = lbp_data[train_idx], lbp_data[test_idx]
    proso_train, proso_test = proso_data[train_idx], proso_data[test_idx]
    labels_train, labels_test = labels[train_idx], labels[test_idx]
    
    #Create the PCA for both lbp and proso. We take a slight shortcut compared to task 1,
    #by using the whiten=True parameter for normalizing the features. This means that
    #there is no need for normalization afterwards
    pca_v = PCA(n_components=20, whiten=True)
    pca_a = PCA(n_components=15, whiten=True)
    
    #Fit the PCAs with the training data
    pca_v.fit(lbp_train)
    pca_a.fit(proso_train)

    
    #Transform both the training and testing data with the PCA
    transformed_lbp_train = pca_v.transform(lbp_train)
    transformed_proso_train = pca_a.transform(proso_train)
    transformed_lbp_test = pca_v.transform(lbp_test)
    transformed_proso_test = pca_a.transform(proso_test)
    
    #Concatenate the features together
    combined_train = np.concatenate((transformed_lbp_train, transformed_proso_train), axis=1)
    combined_test = np.concatenate((transformed_lbp_test, transformed_proso_test), axis=1)

    
    #Create a linear SVM and train it
    classifier = svm.SVC(kernel='linear')
    classifier.fit(combined_train, labels_train)

    
    #Calculate the accuracy for the testing data and add it to the list of accuracies
    test_predictions = classifier.predict(combined_test)
    accuracy = sum(test_predictions == labels_test) / len(labels_test)
    accuracies.append(accuracy)
    
#Calculate the average of the accuracies. Print both the list of accuracies and the average    
average_accuracy = np.mean(accuracies)
print("List of accuracies:", accuracies)
print("Mean of accuracies:", average_accuracy)

List of accuracies: [0.9, 0.8, 1.0, 0.9, 0.9, 1.0, 1.0, 1.0, 0.8, 1.0]
Mean of accuracies: 0.93


### Question 4. The accuracy of LOSO (0.93) is lower than the accuracy achieved by the train-test split (0.98) in task 1. Hypothesize as to why the two are different. Which one is better for evaluation?  <font color='red'>(0.25 point)</font>

### Your answer:

In Task 1, train-test split is used, where the data is divided into two parts: one for training and one for testing. This approach allows the model to be tested on a separate dataset that it has not seen during training. The accuracy (0.98) is a measure of how well the model generalizes to new, unseen data. On the other hand, in the LOSO cross-validation approach, the model is trained and tested multiple times, with each iteration leaving out one subject from the training set and using that subject's data as the test set. This approach can be more challenging because the model is tested on subjects it has never seen during training. The accuracy 0.93 is an average over these multiple test iterations, and it provides a more strong evaluation of the model's generalization performance across different subjects. Another reason for these accuracies could also be overfitting.

### Question 5. In PCA why `whiten` parametere is better and why it replaces the normalization?  <font color='red'>(0.25 point)</font>

### Your answer:

The whiten parameter of PCA effectively replaces the need for performing additional normalization of the PCA-transformed features as it standardizes the variance in all directions and makes the features uncorrelated. Its use case depends upon whether the important correlation or variance exists in the features or not based on that it can be either set to True or False.