## **SVM emotion classification**

### Why Support Vector Machine?

Inspiration: [Emotional Expression Recognition using
Support Vector Machines](http://cseweb.ucsd.edu/~elkan/254spring01/mdumasrep.pdf) by Melanie Dumas from University of California.

This article describes achiving a mean accuracy of 88.1% in emotion recognition usig SVM method. In reaching so high accuracy contributed usage of the large and high quality dataset [POFA](https://www.paulekman.com/product/pictures-of-facial-affect-pofa/).

Model created in this notebook uses reccomended in the article SVM paremeters (linear kernel and one-against-one multiclass classification).


In [1]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

from skimage.feature import hog

from sklearn import svm, metrics
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
from sklearn.metrics import accuracy_score

In [None]:
from google.colab import files

uploaded = files.upload()

### Load data from the file
After preprocessing data using data-preprocessing.ipynb there is possibility to download pickle with emotion label, usage and array with image pixels.


In [None]:
df = pd.read_pickle("../data/icml_face_data_procc.csv")
df.head()

Unnamed: 0,emotion,usage,pixels
0,0,Training,"[[0.27450982, 0.3137255, 0.32156864, 0.2823529..."
1,0,Training,"[[0.5921569, 0.5882353, 0.5764706, 0.60784316,..."
2,2,Training,"[[0.90588236, 0.83137256, 0.6117647, 0.6431373..."
3,4,Training,"[[0.09411765, 0.1254902, 0.14117648, 0.1176470..."
4,6,Training,"[[0.015686275, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0..."


### Reduce dataset and split features and labels
This dataset contains 'usage' column. It's more convinient to work with dataset with labels and features only, that's why 'usage' column is skipped.

Then features and labels are splitted into X matrix and y vector. In this case X is also a vector becouse there is only one feature.

In [None]:
X = df['pixels']
y = df['emotion']

### Features extraction and transformation

Downloaded dataset contains pixtures consists of 48x48 pixels array = 2304 features for each of 35887 pictures.
According to [OpenCV-Python Documentation](https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_ml/py_svm/py_svm_opencv/py_svm_opencv.html#svm-opencv) image in form of Histogram of Oriented Gradients is easier to understand by the model.

Dataframe is transformed to numpy array to use StandardScaler (values normalization) and PCA (group features to understandable components).

In [None]:
def hog_transformation(img: np.ndarray) -> np.ndarray:
    hog_image = hog(img,
                    block_norm='L2-Hys',
                    pixels_per_cell=(4,4))
    return hog_image

In [None]:
X_hog = np.array([hog_transformation(x) for x in X])

In [None]:
standard_scaler = StandardScaler()
x_stand = standard_scaler.fit_transform(X_hog)

pca = PCA(n_components=100)
x_pca = pca.fit_transform(x_stand)

### Building the model

Firstly, data should be divided into train and test set. It can be done using feature 'usage' provided by dataset author but in this notebook train_test_split is used.

In [42]:
X = pd.DataFrame(x_pca)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

Model is defined on the base of parameters provided in the mentioned before article. 

In [None]:
model = svm.SVC(kernel='linear', decision_function_shape='ovo', gamma=0.001)
model.fit(X_train, y_train)

SVC(C=1.0, break_ties=False, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovo', degree=3, gamma=0.001, kernel='linear',
    max_iter=-1, probability=False, random_state=None, shrinking=True,
    tol=0.001, verbose=False)

### Validation
After training the model, its accuracy is measured.

In [45]:
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print('SVM accuracy: ', accuracy)

SVM accuracy:  0.35333333333333333


### Summary
As accuracy shows, shomething is wrong. After trying to 'imshow' training dataset, it's obvious that there are some bugs in transforming data.
This model needs a lot of improvement and research to gain accuracy mentioned in the article.