## Task 4 (Bonus Task). Emotion Recognition using mutiple signals with deep learning
### Objective

**This exercise task asks you to design deep learning based emotion recognition using multiple signals**.
Generally speaking, you are asked to predict the emotion with the facial expression and audio features in Task 1. 
In this task, you need to use the fully conncted layers which we have used in the bonus task of Exercise 1 and 2. 

You can use the facial expression and audio features in Task 2. Different from exercise 1 and 2. You do not need to extract features with convolutional layers from images and audios. You can use the facial expression and audio features directly and input the feautures into the network.

In order to utilize facial expression and audio features, you can combine them together as a new feature or you can train a two-path network and fuse the scores. 

In this bonus task, you need to define the network architecture using pytorch, and you will need to invoke it in your training and evaluation code. 

### Suggested procedures

We provide following procedures to support you to complete this exercise. But you are free to achieve the exercise goal by your own way of implementation.

1. Load facial expression and audio features. Process the features form different modalities to the equal length through CCA following task 2.

2. Combine the facial expression and audio features and input the feature to the network. Or train facial expression and audio features seprately and combine the scores in the network. 

3. Initialize the network and perform the training

4. Evaluate the trained mode.



### Code snippet of the network architecture
You need to define your whole network structure in __init__() fuction and the forward() function, following bonus tasks in exercise 1 and 2. For the network design, as we input extracted features, only fully connected layer is needed.


In [78]:
import torch
from torch import nn, optim
from torch.autograd import Variable
import torch.nn.functional as F
import os

import numpy as np

from torch.autograd import Variable

class FusionNet(nn.Module):

    def __init__(self, in_dim, n_hidden, out_dim):
       
        super().__init__()

        self.layer1 = nn.Linear(in_dim, n_hidden)
        self.layer2 = nn.Linear(n_hidden, out_dim)
        
        """
        self.layer1 = nn.Sequential(nn.Linear(in_dim, n_hidden), nn.BatchNorm1d(n_hidden), nn.ReLU(True))
        self.layer2 = nn.Sequential(nn.Linear(n_hidden, out_dim))
        """
    
    
    def forward(self, x):
        
        lay_out1 = F.relu(self.layer1(x))        
        lay_out2 = self.layer2(lay_out1)
        
        return lay_out2


### Your implementation
Please write your code below to complete the exercise


#### Load and re-orgnize the data

1. Load the facial and audio data in task 2. Different from SVM, the deep learning requires the class label starting with 0. So, you need to re-orgnize the class label to 0 and 1.

2. Combine the facial and audio features. As the features have different length. We firstly map them to same length with CCA following Task 2.  You can also train them seprately and fuse the scores in the network.




In [6]:
from skimage import io
from skimage import transform
from skimage import color
from skimage import img_as_ubyte
import os
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import sklearn
import scipy.io as sio



mdata = sio.loadmat('lab3_data.mat')

#facial expression training and testing data, training and testing class and transfrom the labels to 0 and 1.

training_data = mdata['training_data']
testing_data = mdata['testing_data']
training_class =  mdata['training_class']
testing_class = mdata['testing_class']


#audio training and testing data
training_data_proso = mdata['training_data_proso']
testing_data_proso = mdata['testing_data_proso']

# Re-organize the class label

retraining_class = []
retesting_class = []
for i in range(len(training_class)):
    retraining_class.append([0 if training_class[i] == 1 else 1])
for i in range(len(testing_class)):
    retesting_class.append([0 if testing_class[i] == 1 else 1])
    
retraining_class = np.array(retraining_class).astype(training_class[0][0])
retesting_class = np.array(retesting_class).astype(testing_class[0][0])


In [7]:
from sklearn.cross_decomposition import CCA
import numpy as np

#Use CCA to construct the Canonical Projective Vector (CPV) and calculate CCA features
cca = CCA(n_components=15)
cca.fit(training_data, training_data_proso)

v_train_cca , a_train_cca = cca.transform(training_data ,training_data_proso)
v_test_cca , a_test_cca = cca.transform(testing_data, testing_data_proso)

# Concatenate multiple feature for training data and testing data respectively

training_CCDF = np.concatenate((v_train_cca ,a_train_cca),axis=1)
testing_CCDF = np.concatenate((v_test_cca ,a_test_cca),axis=1)

### Perform the network training.

Please write your code below for the network training. Please accumulate the loss and classification accuracy for each epoch and output them and the end of the epoch.




In [39]:
inputs = torch.from_numpy(training_CCDF.astype(np.float))
labels = torch.LongTensor(retraining_class.reshape(50))

In [87]:
trainingEpoch=20
LEARNING_RATE=0.1

inDim = training_CCDF.shape[1]
outDim = 2
hidden = 10

batch_loss = 0

train_losses, test_losses = [], []

#Initialize the network and optimizer
model = FusionNet(inDim, hidden, outDim)

criterion = nn.CrossEntropyLoss()

# Declare the optimizer for the network
optimizer = optim.SGD(model.parameters(), lr=LEARNING_RATE)

##########
inputs = torch.from_numpy(training_CCDF.astype(np.float))

#input the data to the model to train
for epoch in range(trainingEpoch):
    train_len = 0
    running_loss = 0
    running_corrects = 0
    
    
    optimizer.zero_grad()
    outputs = model(inputs.float())
    _, preds = torch.max(outputs, 1)
#     print(preds)
#     print(labels)
    loss = criterion(outputs,labels)
    loss.backward()
    optimizer.step()
    running_loss += loss.item()
    running_corrects += torch.sum(preds == labels)
    model.eval()
    train_len += labels.size(0)
    print(f"Epoch {epoch+1}/{trainingEpoch}"
      f"Train loss: {running_loss/len(preds):.3f}"
      f"Train accuracy: {100*running_corrects.double()/train_len:.3f}")



Epoch 1/20Train loss: 0.020Train accuracy: 38.000
Epoch 2/20Train loss: 0.017Train accuracy: 52.000
Epoch 3/20Train loss: 0.014Train accuracy: 64.000
Epoch 4/20Train loss: 0.013Train accuracy: 72.000
Epoch 5/20Train loss: 0.012Train accuracy: 76.000
Epoch 6/20Train loss: 0.011Train accuracy: 78.000
Epoch 7/20Train loss: 0.010Train accuracy: 78.000
Epoch 8/20Train loss: 0.009Train accuracy: 82.000
Epoch 9/20Train loss: 0.008Train accuracy: 84.000
Epoch 10/20Train loss: 0.008Train accuracy: 86.000
Epoch 11/20Train loss: 0.007Train accuracy: 88.000
Epoch 12/20Train loss: 0.006Train accuracy: 94.000
Epoch 13/20Train loss: 0.006Train accuracy: 98.000
Epoch 14/20Train loss: 0.005Train accuracy: 100.000
Epoch 15/20Train loss: 0.005Train accuracy: 100.000
Epoch 16/20Train loss: 0.005Train accuracy: 100.000
Epoch 17/20Train loss: 0.004Train accuracy: 100.000
Epoch 18/20Train loss: 0.004Train accuracy: 100.000
Epoch 19/20Train loss: 0.004Train accuracy: 100.000
Epoch 20/20Train loss: 0.004Train 

#### Conduct the network evaluation

Please write your code below to evaluated the trained model using your testing data. Please output the loss and accuracy of the testing data.

In [85]:
inputs = torch.from_numpy(testing_CCDF.astype(np.float))
labels = torch.LongTensor(retesting_class.reshape(50))

In [88]:
test_loss = 0
test_corrects = 0
test_len =0


outputs = model(inputs.float())
_, preds = torch.max(outputs, 1)
batch_loss = criterion(outputs, labels)
test_loss = batch_loss.item()
test_corrects = torch.sum(preds == labels.data)
test_len = labels.size(0)
test_losses.append(test_loss/len(preds))
    
print(f"Test loss: {test_loss/len(preds):.3f}"
    f"Test accuracy: {100*test_corrects.double()/test_len:.3f}")

Test loss: 0.003Test accuracy: 100.000
