# Multimodal Data Fusion - Project Work: Multi-Modal Physical Exercise Classification


In this project, real multi-modal data is studied by utilizing different techniques presented during the course. In addition, there is an optional task to try some different approaches to identify persons from the same dataset. Open MEx dataset from UCI machine learning repository is used. Idea is to apply different techniques to recognize physical exercises from wearable sensors and depth camera, user-independently.

## Author(s)
Add your information here

Name:

Student number:

## Description 

The goal of this project is to develop user-independent pre-processing and classification models to recognize 7 different physical exercises measured by accelerometer (attached to subject's thigh) and depth camera (above the subject facing downwards recording an aerial view). All the exercises were performed subject lying down on the mat. Original dataset have also another acceleration sensor and pressure-sensitive mat, but those two modalities are ommited in this project. There are totally 30 subjects in the original dataset, and in this work subset of 10 person is utilized. Detailed description of the dataset and original data can be access in [MEx dataset @ UCI machine learning repository](https://archive.ics.uci.edu/ml/datasets/MEx#). We are providing the subset of dataset in Moodle.

The project work is divided on following phases:

1. Data preparation, exploration, and visualization
2. Feature extraction and unimodal fusion for classification
3. Feature extraction and feature-level fusion for multimodal classification
4. Decision-level fusion for multimodal classification
5. Bonus task: Multimodal biometric identification of persons

where 1-4 are compulsory (max. 10 points each), and 5 is optional to get bonus points (max. 5+5 points). In each phase, you should visualize and analyse the results and document the work and findings properly by text blocks and figures between the code. <b> Nice looking </b> and <b> informative </b> notebook representing your results and analysis will be part of the grading in addition to actual implementation.

The results are validated using confusion matrices and F1 scores. F1 macro score is given as 
<br>
<br>
$
\begin{equation}
F1_{macro} = \frac{1}{N} \sum_i^N F1_i,
\end{equation}
$
<br>
<br>
where $F1_i = 2  \frac{precision_i * recall_i}{precision_i + recall_i}$, and $N$ is the number of classes.
<br>

## Learning goals 

After the project work, you should  

- be able to study real world multi-modal data
- be able to apply different data fusion techniques to real-world problem
- be able to evaluate the results
- be able to analyse the outcome
- be able to document your work properly

## Relevant lectures

Lectures 1-8

## Relevant exercises

Exercises 0-6

## Relevant chapters in course book

Chapter 1-14

## Additional Material 

* Original dataset [MEx dataset @ UCI machine learning repository](https://archive.ics.uci.edu/ml/datasets/MEx#)
* Related scientific article [MEx: Multi-modal Exercises Dataset for Human Activity Recognition](https://arxiv.org/pdf/1908.08992.pdf)

# 1. Data preparation, exploration, and visualization

<a id='task1'></a>
<div class=" alert alert-warning">
    <b>Assigment.</b> <b>Task 1.</b>

Download data from the Moodle's Project section. Get yourself familiar with the folder structure and data. You can read the data files using the function given below. Each file consists one exercise type performed by single user. Data are divided on multiple folders. Note that, in each folder there is one long sequence of single exercise, except exercise 4 which is performed two times in different ways. Those two sequences belongs to same class. Do the following subtasks to pre-analyse data examples and to prepare the training and testing data for next tasks:
<br>
<br> 
<p> <b>1.1</b> Read raw data from the files. Prepare and divide each data file to shorter sequences using windowing method. Similar to related article "MEx: Multi-modal Exercises Dataset for Human Activity Recognition", use 5 second window and 3 second overlapping between windows, producing several example sequences from one exercise file for classification purposes. Windowing is working so that starting from the beginning of each long exercise sequence, take 5 seconds of data points (from synchronized acceleration data and depth images) based on the time stamps. Next, move the window 2 seconds forward and take another 5 seconds of data. Then continue this until your are at the end of sequence. Each window will consists 500x3 matrix of acceleration data and 5x192 matrix of depth image data.</p>
<br>  
<p> <b>1.2</b> Plot few examples of prepared data for each modalities (accelometer and depth camera). Plot acceleration sensor as multi-dimensional time-series and depth camera data as 2D image. Plot 5 second acceleration sensor and depth image sequences of person 1 and 5 performing exercises 2, 5, and 6. Take the first windowed example from the long exercise sequence. </p>
<br>
<p> <b>1.3</b> Split the prepared dataset to training and testing datasets so that data of persons 1-7 are used for training and data of persons 8-10 are used for testing. In next tasks, training dataset could be further divided on (multiple) validation data folds to tune the models parameters, when needed.<br>
<br> 
Document your work, calculate the indicator statistics of training and testing datasets (number of examples, dimensions of each example) and visualize prepared examples.

</div>

In [1]:
import numpy as np
import pandas as pd
from os import listdir,getcwd
    
#Reads data from the folders

def read_data_from_folders(folder_name, ID, outputform):
    output = np.empty(outputform, dtype=object)
    
    for n in range(len(ID)):
        #check files in folder.
        currentDir = getcwd()
        filepath = "%s/%s/%02d" %(currentDir, "MEx/" + folder_name, ID[n])
        files = sorted(listdir(filepath))
        count = 0
        
        for f in files:
            file = filepath + "/" + f
            data = pd.read_csv(file, delimiter=',', header=None)
            output[n,count] = data 
            count += 1
            
    return output

folders = ['act', 'dc_0.05_0.05']

allData = np.empty((2,10,8),dtype=object)
num = 0
ids = np.arange(1,11)

for idx,fol in enumerate(folders):
    allData[idx,:,:]=read_data_from_folders(fol, ids, (10,8))
print(allData.shape)

(2, 10, 8)


In [2]:
def window_split_file(data,miliseconds=5000):
    step_size=int(data.iloc[1,0]-data.iloc[0,0])
#     print("The last timestamp",data.iloc[-1,0])
#     print("Step size",step_size)
    nr_points=int(miliseconds/step_size)
    new_data=[]
    
    stop_time=int(data.iloc[-1,0]-miliseconds)#get last one
#     print("Stop time",stop_time/2000)
    out=np.empty((int(stop_time/2000)+1,int(miliseconds/step_size),data.shape[1]-1))
#     print("Out",out.shape)
    count=0
    for start_time in range(0,stop_time,2000):
        start_index=int(start_time/step_size)
        stop_index=start_index+nr_points
#         print("Start",start_index)
#         print("End point",stop_index)
        out[count]=data.iloc[start_index:stop_index,1:]
        count+=1
    return out
def apply_windowing(data,outputform):
    outData=np.empty((2,10,8),dtype=object)
    count=0
    for sensor in data:
        print("Done")
        output = np.empty(outputform, dtype=object)
      
        for p_count,person in enumerate(sensor):
            
            for f_count,file in enumerate(person):
                
                output[p_count,f_count]=window_split_file(sensor[p_count,f_count])
        outData[count]=output
        count+=1
    return outData
windowed_data=apply_windowing(allData,(10,8))

print(windowed_data[0,0,0].shape)
# print(r[1][0,0].shape)
##########

Done
Done
(30, 500, 3)


1.2 Plot few examples of prepared data for each modalities (accelometer and depth camera). Plot acceleration sensor as multi-dimensional time-series and depth camera data as 2D image. Plot 5 second acceleration sensor and depth image sequences of person 1 and 5 performing exercises 2, 5, and 6. Take the first windowed example from the long exercise sequence.

In [3]:
#############    windowed_data[0]                              [0,1]                       [0]   
##############        0 for acc,1 for camera        person 1, exercise 2                  first frame of the exercise
import matplotlib.pyplot as plt
t=range(0,500)
f = plt.figure(figsize=(30,10))
f.add_subplot(3, 3, 1)
plt.plot(t,windowed_data[0][0,1][0], 'g')
plt.plot(t,windowed_data[0][4,1][0], 'r')

f.add_subplot(3, 3, 2)
plt.plot(t,windowed_data[0][0,4][0], 'g')
plt.plot(t,windowed_data[0][4,4][0], 'r')

f.add_subplot(3, 3, 3)
plt.plot(t,windowed_data[0][0,5][0], 'g')
plt.plot(t,windowed_data[0][4,5][0], 'r')

f.add_subplot(3, 3, 4)
# print(windowed_data[1][4,5][0,0].shape)
image=windowed_data[1][0,1][0,0].reshape((12,16))
plt.imshow(image)
f.add_subplot(3, 3, 7)
image=windowed_data[1][4,1][0,0].reshape((12,16))
plt.imshow(image)

f.add_subplot(3, 3, 5)
# print(windowed_data[1][4,5][0,0].shape)
image=windowed_data[1][0,4][0,0].reshape((12,16))
plt.imshow(image)
f.add_subplot(3, 3, 8)
image=windowed_data[1][4,4][0,0].reshape((12,16))
plt.imshow(image)

f.add_subplot(3, 3, 6)
print(windowed_data[1][0,5].shape)
image=windowed_data[1][0,5][0,0].reshape((12,16))
plt.imshow(image)
f.add_subplot(3, 3, 9)
image=windowed_data[1][4,5][0,0].reshape((12,16))
plt.imshow(image)

(29, 5, 192)


<matplotlib.image.AxesImage at 0x7f17bc730f90>

1.3 Split the prepared dataset to training and testing datasets so that data of persons 1-7 are used for training and data of persons 8-10 are used for testing. In next tasks, training dataset could be further divided on (multiple) validation data folds to tune the models parameters, when needed.


In [4]:
def create_data_set(data):
    out_data=np.empty((0,data[0,0].shape[1],data[0,0].shape[2]))
    out_labels=np.empty(0,dtype=int)
    for idx,person in enumerate(data):
        for idy,exercise in enumerate(person):
#             print(exercise.shape)
            out_data=np.concatenate((out_data,exercise))
            labels=np.full(exercise.shape[0],idy+1)### create labels array with the id of the exercise
            out_labels=np.concatenate((out_labels,labels))
    print("Out shape:",out_data.shape," Label shape:",out_labels.shape)
    return out_data, out_labels
training_data_slice=windowed_data[:,:7,:]
testing_data_slice=windowed_data[:,7:,:]

training_data_acc=training_data_slice[0,:,:]

x_train_acc,y_train_acc=create_data_set(training_data_slice[0])
x_test_acc,y_test_acc=create_data_set(testing_data_slice[0])
x_train_cam,y_train_cam=create_data_set(training_data_slice[1])
x_test_cam,y_test_cam=create_data_set(testing_data_slice[1])
# print(x_train_acc,"====",x_test_acc)
# print("!!!!!!!!!!!!!!!!!!!")
# print(x_train_cam,"====",x_test_cam)


Out shape: (1487, 500, 3)  Label shape: (1487,)
Out shape: (598, 500, 3)  Label shape: (598,)
Out shape: (1486, 5, 192)  Label shape: (1486,)
Out shape: (598, 5, 192)  Label shape: (598,)


# 2. Feature extraction and fusion for unimodal classification

<a id='task2'></a>
<div class=" alert alert-warning">
    <b>Assigment.</b> <b>Task 2.</b>

Use the training dataset prepared in task 1. to build models based on the combination of principal component analysis (PCA), linear discriminant analysis (LDA), and nearest neighbour (NN) classifier for each modality separately and evaluate the model on test dataset. Do the subtasks given as
<br>
<br>
<p> <b>2.1</b> Calculate PCA and LDA transformations to reduce the dimensionality of accelerometer data (e.g., using scikit-learn implementations). Before transformations downsample data from 100 Hz to 25 Hz (using scipy.signal.resample) to get 125x3 matrix of data for each 5 sec window. You should also standardize the values to zero mean and unit variance before the transformations. Using training dataset, fit PCA with 5-dimensional subspace (i.e., choosing the 5 largest principal components) and fit LDA with 5-dimensional subspace. Transform both train and test examples to this low-dimensional feature representation. Concatenate each sequence to single vector size of 3x(5+5). Perform the fusion of PCA and LDA similar manner as presented in Lecture 3 (pages-19-20) using NN method. Evaluate the performance on testset. Show confusion matrix and F1 scores of the results. </p>
<br>
<p> <b>2.2</b> Use PCA and LDA transformations to reduce the dimensionality of depth images. You should also standardize the values to zero mean and unit variance before the transformations. Fit PCA and LDA for all training images (12x16, 192-dimensional in vectorized form) by choosing 5-dimensional subspace for both PCA and LDA. Transform both train and test examples to this low-dimensional feature representation. Concatenate each sequence to single vector size of 5x1x(5+5). Similar to task 2.1, do the PCA and LDA fusion using NN and evaluate the performance on testset. Show confusion matrix and F1 scores of the results. </p>
<br> 
Document your work, evaluate the results, and analyse the outcomes in each subtasks 2.1-2.2.
    
</div>

In [5]:

from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from scipy import signal
def downsample(data):
    
    m=signal.resample(data,125)
   
    return m
def standardise(data):
    mean=np.mean(data,axis=0)
    std=np.std(data,axis=0)
    print(mean.shape)
    print(std.shape)
    res=(data-mean)/std
    print("After standar",res.shape)
    
    return res
def preprocess(data,resample=True):
    aux=standardise(data)
    if(resample):
        aux=np.apply_along_axis(downsample, 1, aux)
    return aux

In [6]:


x_train_acc_p=preprocess (x_train_acc)
x_test_acc_p=preprocess (x_test_acc)


pca_acc_1 = PCA(n_components=5)
pca_acc_2 = PCA(n_components=5)
pca_acc_3 = PCA(n_components=5)

lda_acc_1=LinearDiscriminantAnalysis(n_components=5)
lda_acc_2=LinearDiscriminantAnalysis(n_components=5)
lda_acc_3=LinearDiscriminantAnalysis(n_components=5)

x_acc_1_train_pca=pca_acc_1.fit_transform(x_train_acc_p[:,:,0])
x_acc_2_train_pca=pca_acc_2.fit_transform(x_train_acc_p[:,:,1])
x_acc_3_train_pca=pca_acc_3.fit_transform(x_train_acc_p[:,:,2])

x_acc_1_test_pca=pca_acc_1.transform(x_test_acc_p[:,:,0])
x_acc_2_test_pca=pca_acc_2.transform(x_test_acc_p[:,:,1])
x_acc_3_test_pca=pca_acc_3.transform(x_test_acc_p[:,:,2])

x_acc_1_train_lda=lda_acc_1.fit_transform(x_train_acc_p[:,:,0],y_train_acc)
x_acc_2_train_lda=lda_acc_2.fit_transform(x_train_acc_p[:,:,1],y_train_acc)
x_acc_3_train_lda=lda_acc_3.fit_transform(x_train_acc_p[:,:,2],y_train_acc)

x_acc_1_test_lda=lda_acc_1.transform(x_test_acc_p[:,:,0])
x_acc_2_test_lda=lda_acc_2.transform(x_test_acc_p[:,:,1])
x_acc_3_test_lda=lda_acc_3.transform(x_test_acc_p[:,:,2])

pca_train_acc=np.concatenate((x_acc_1_train_pca,x_acc_2_train_pca,x_acc_3_train_pca),axis=1)
lda_train_acc=np.concatenate((x_acc_1_train_lda,x_acc_2_train_lda,x_acc_3_train_lda),axis=1)

pca_test_acc=np.concatenate((x_acc_1_test_pca,x_acc_2_test_pca,x_acc_3_test_pca),axis=1)
lda_test_acc=np.concatenate((x_acc_1_test_lda,x_acc_2_test_lda,x_acc_3_test_lda),axis=1)



# pca_test_acc=np.concatenate((x_acc_1_test,x_acc_2_test,x_acc_3_test),axis=1)


# x_test_pca_acc=pca_acc.transform(x_test_acc.reshape(-1,3))

print(pca_train_acc.shape)
print(lda_train_acc.shape)

print(pca_test_acc.shape)
print(lda_test_acc.shape)
pca_lda_train_acc=np.concatenate((pca_train_acc,lda_train_acc),axis=1)
pca_lda_test_acc=np.concatenate((pca_test_acc,lda_test_acc),axis=1)
print(pca_lda_train_acc.shape)
print(pca_lda_test_acc.shape)
# print(x_train_acc[0,0])


(500, 3)
(500, 3)
After standar (1487, 500, 3)
(500, 3)
(500, 3)
After standar (598, 500, 3)
(1487, 15)
(1487, 15)
(598, 15)
(598, 15)
(1487, 30)
(598, 30)


2.1 Calculate PCA and LDA transformations to reduce the dimensionality of accelerometer data (e.g., using scikit-learn implementations). Before transformations downsample data from 100 Hz to 25 Hz (using scipy.signal.resample) to get 125x3 matrix of data for each 5 sec window. You should also standardize the values to zero mean and unit variance before the transformations. Using training dataset, fit PCA with 5-dimensional subspace (i.e., choosing the 5 largest principal components) and fit LDA with 5-dimensional subspace. Transform both train and test examples to this low-dimensional feature representation. Concatenate each sequence to single vector size of 3x(5+5). Perform the fusion of PCA and LDA similar manner as presented in Lecture 3 (pages-19-20) using NN method. Evaluate the performance on testset. Show confusion matrix and F1 scores of the results.

In [7]:
from sklearn.metrics import accuracy_score,f1_score,confusion_matrix
def PCA_LDA_fusion(train,test,y_train):
    predicted_labels=[]
    for row in test:
#         print(train.shape)
#         print(train[:,0:3].shape)
        d=train[:,0:3]-row[0:3]
        D=train[:,3:6]-row[3:6]
#         print(d.shape)
        d=np.sum(d**2,axis=1)
        D=np.sum(D**2,axis=1)
        d=(d-np.amin(d))/(np.amax(d)-np.amin(d))
        D=(D-np.amin(D))/(np.amax(D)-np.amin(D))
        F=(d+D)/2
#         print(F)
        label_index=np.argmin(F)
        predicted_label=int(y_train[label_index])
        predicted_labels.append(predicted_label)
    return predicted_labels
        


In [8]:
        
predicted=PCA_LDA_fusion(pca_lda_train_acc,pca_lda_test_acc,y_train_acc)
# print(predicted)
# print(y_test_acc)
print("ACC:",accuracy_score(y_test_acc,predicted))
print("F1 score:",f1_score(y_test_acc,predicted,average=None))
confusion_matrix(y_test_acc,predicted)
# print(predicted)

ACC: 0.4782608695652174
F1 score: [0.20967742 0.4526749  0.36708861 0.72340426 0.51612903 0.69444444
 0.6056338  0.40174672]


array([[13,  7, 37,  0,  0,  0,  0, 28],
       [ 0, 55,  0,  0,  0,  0,  5, 24],
       [ 0, 28, 29,  0,  0,  0,  0, 29],
       [ 0,  0,  0, 34, 13,  0,  0,  0],
       [ 0,  0,  0, 13, 16,  3,  0,  1],
       [11, 29,  0,  0,  0, 50,  0,  0],
       [ 0, 29,  0,  0,  0,  0, 43, 16],
       [15, 11,  6,  0,  0,  1,  6, 46]])

In [9]:

print(x_train_cam.shape)

x_train_cam_p=preprocess(x_train_cam,resample=False)
x_test_cam_p=preprocess(x_test_cam,resample=False)
print("After processing",x_train_cam_p.shape)
pca_cam_1 = PCA(n_components=5)
pca_cam_2 = PCA(n_components=5)
pca_cam_3 = PCA(n_components=5)
pca_cam_4 = PCA(n_components=5)
pca_cam_5 = PCA(n_components=5)


lda_cam_1=LinearDiscriminantAnalysis(n_components=5)
lda_cam_2=LinearDiscriminantAnalysis(n_components=5)
lda_cam_3=LinearDiscriminantAnalysis(n_components=5)
lda_cam_4=LinearDiscriminantAnalysis(n_components=5)
lda_cam_5=LinearDiscriminantAnalysis(n_components=5)

x_cam_1_train_pca=pca_cam_1.fit_transform(x_train_cam_p[:,0,:])
x_cam_2_train_pca=pca_cam_2.fit_transform(x_train_cam_p[:,1,:])
x_cam_3_train_pca=pca_cam_3.fit_transform(x_train_cam_p[:,2,:])
x_cam_4_train_pca=pca_cam_4.fit_transform(x_train_cam_p[:,3,:])
x_cam_5_train_pca=pca_cam_5.fit_transform(x_train_cam_p[:,4,:])


x_cam_1_test_pca=pca_cam_1.transform(x_test_cam_p[:,0,:])
x_cam_2_test_pca=pca_cam_2.transform(x_test_cam_p[:,1,:])
x_cam_3_test_pca=pca_cam_3.transform(x_test_cam_p[:,2,:])
x_cam_4_test_pca=pca_cam_4.transform(x_test_cam_p[:,3,:])
x_cam_5_test_pca=pca_cam_5.transform(x_test_cam_p[:,4,:])

x_cam_1_train_lda=lda_cam_1.fit_transform(x_train_cam_p[:,0,:],y_train_cam)
x_cam_2_train_lda=lda_cam_2.fit_transform(x_train_cam_p[:,1,:],y_train_cam)
x_cam_3_train_lda=lda_cam_3.fit_transform(x_train_cam_p[:,2,:],y_train_cam)
x_cam_4_train_lda=lda_cam_4.fit_transform(x_train_cam_p[:,3,:],y_train_cam)
x_cam_5_train_lda=lda_cam_5.fit_transform(x_train_cam_p[:,4,:],y_train_cam)

x_cam_1_test_lda=lda_cam_1.transform(x_test_cam_p[:,0,:])
x_cam_2_test_lda=lda_cam_2.transform(x_test_cam_p[:,1,:])
x_cam_3_test_lda=lda_cam_3.transform(x_test_cam_p[:,2,:])
x_cam_4_test_lda=lda_cam_4.transform(x_test_cam_p[:,3,:])
x_cam_5_test_lda=lda_cam_5.transform(x_test_cam_p[:,4,:])

pca_train_cam=np.concatenate((x_cam_1_train_pca,x_cam_2_train_pca,x_cam_3_train_pca,x_cam_4_train_pca,x_cam_5_train_pca),axis=1)
lda_train_cam=np.concatenate((x_cam_1_train_lda,x_cam_2_train_lda,x_cam_3_train_lda,x_cam_4_train_lda,x_cam_5_train_lda),axis=1)

pca_test_cam=np.concatenate((x_cam_1_test_pca,x_cam_2_test_pca,x_cam_3_test_pca,x_cam_4_test_pca,x_cam_5_test_pca),axis=1)
lda_test_cam=np.concatenate((x_cam_1_test_lda,x_cam_2_test_lda,x_cam_3_test_lda,x_cam_4_test_lda,x_cam_5_test_lda),axis=1)

print(pca_train_cam.shape)
print(lda_train_cam.shape)

print(pca_test_cam.shape)
print(lda_test_cam.shape)
pca_lda_train_cam=np.concatenate((pca_train_cam,lda_train_cam),axis=1)
pca_lda_test_cam=np.concatenate((pca_test_cam,pca_test_cam),axis=1)
print(pca_lda_train_cam)
print(pca_lda_test_cam.shape)


(1486, 5, 192)
(5, 192)
(5, 192)
After standar (1486, 5, 192)
(5, 192)
(5, 192)
After standar (598, 5, 192)
After processing (1486, 5, 192)
(1486, 25)
(1486, 25)
(598, 25)
(598, 25)
[[-3.33369095  1.25878072  2.6474211  ... -3.85231192  2.6240902
  -1.62688376]
 [-2.74286065  2.79827172 -1.8444897  ... -2.22968309  0.83410659
  -0.72416775]
 [-2.34070718  2.84802888  0.50248912 ... -3.5223537   2.05621398
  -2.12087323]
 ...
 [ 7.61241439 -5.46706471 -0.09681949 ... -0.53231533  1.96970874
  -0.67511362]
 [ 9.03406911 -5.68977299 -0.91177264 ...  0.30374386  1.83254539
  -0.96454372]
 [ 8.97726327 -5.31921837 -1.81708139 ... -0.13217389  1.40475224
   0.11161582]]
(598, 50)


In [10]:
predicted=PCA_LDA_fusion(pca_lda_train_cam,pca_lda_test_cam,y_train_cam)
print("ACC:",accuracy_score(y_test_cam,predicted))
print("F1 score:",f1_score(y_test_cam,predicted,average=None))
confusion_matrix(y_test_cam,predicted)
print(predicted)

ACC: 0.5317725752508361
F1 score: [0.53140097 0.48951049 0.64285714 0.3877551  0.14583333 0.83229814
 0.5505618  0.44137931]
[1, 3, 3, 1, 1, 3, 5, 5, 4, 1, 1, 4, 3, 1, 1, 1, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1, 2, 3, 1, 3, 3, 1, 5, 2, 1, 5, 1, 1, 5, 1, 1, 3, 1, 5, 2, 1, 5, 1, 1, 5, 1, 1, 3, 3, 3, 1, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 4, 4, 5, 5, 1, 4, 4, 5, 4, 4, 6, 4, 1, 1, 5, 1, 4, 4, 4, 1, 1, 4, 5, 4, 5, 4, 4, 4, 5, 5, 6, 6, 4, 4, 6, 6, 4, 6, 4, 6, 6, 6, 5, 6, 6, 6, 6, 6, 4, 6, 6, 6, 6, 6, 6, 6, 2, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 5, 7, 5, 5, 7, 7, 5, 6, 5, 7, 5, 5, 5, 5, 2, 7, 5, 5, 7, 7, 5, 7, 7, 7, 7, 5, 7, 7, 7, 7, 5, 7, 7, 7, 7, 5, 5, 8, 7, 6, 7, 5, 5, 7, 2, 5, 5, 5, 1, 1, 5, 1, 1, 5, 5, 5, 5, 5, 1, 5, 5, 5, 5, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 3, 3, 1, 3, 2, 2, 2, 1, 2, 2, 2, 2, 1, 3, 2, 3, 2, 3, 1, 2, 2, 2, 5, 2, 3, 3, 3, 2, 3, 3, 1, 2, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 3, 3, 1, 1, 3, 1, 3, 1, 1, 3, 3, 3, 3, 1, 3, 4, 1, 5, 5, 1, 5

# 3. Feature extraction and feature-level fusion for multimodal classification

<a id='task3'></a>
<div class=" alert alert-warning">
    <b>Assigment.</b> <b>Task 3.</b>

Prepare new feature sets for each modality and combine them to single feature representation. Compare two classifiers from scikit-learn. Train classifiers using joint feature presentation. Evaluate and compare the result using testing dataset. Do the subtasks given as
<br>   
<br> 
<p> <b>3.1</b> Similar to task 2.1, calculate PCA for accelerometer, but choose now the 10 largest principal components as 10-dim feature vector for each window. In addition, for each window calculate mean and standard deviation of each three acc channels as statistical features, resulting 6-dimensional vector. Combine these to 36-dimensional final feature vector.</p>
<br>  
<p> <b>3.2</b> Similar to task 2.2, calculate the PCA for depth images using same setup, but now choose the 10 largest principal components as feature vector. Concatenate the image sequence forming 50-dimensional feature vector from each windowed example.</p>
<br> 
<p> <b>3.3</b> Form a joint feature presentation of features extracted in 3.1 and 3.2, resulting 86-dimensional feature vector for each example. Normalize data between 0-1 using the training dataset. Use support vector machine (SVM) with RBF-kernel and Gaussian naiveBayes classifier (use default parameter values for both classifiers). Train the classifiers and evaluate and compare classifiers on testset using confusion matrices and F1 scores.</p>
<br> 
Document your work, evaluate the results, and analyse the outcomes in each subtasks 3.1-3.3.
    
</div>

In [18]:

x_train_acc_p=preprocess (x_train_acc)
x_test_acc_p=preprocess (x_test_acc)
x_train_cam_p=preprocess(x_train_cam,resample=False)
x_test_cam_p=preprocess(x_test_cam,resample=False)

pca_acc_1 = PCA(n_components=10)
pca_acc_2 = PCA(n_components=10)
pca_acc_3 = PCA(n_components=10)

x_acc_1_train_pca=pca_acc_1.fit_transform(x_train_acc_p[:,:,0])
x_acc_2_train_pca=pca_acc_2.fit_transform(x_train_acc_p[:,:,1])
x_acc_3_train_pca=pca_acc_3.fit_transform(x_train_acc_p[:,:,2])

x_acc_1_test_pca=pca_acc_1.transform(x_test_acc_p[:,:,0])
x_acc_2_test_pca=pca_acc_2.transform(x_test_acc_p[:,:,1])
x_acc_3_test_pca=pca_acc_3.transform(x_test_acc_p[:,:,2])

pca_mean=np.mean(x_train_acc_p,axis=1)
pca_std=np.std(x_train_acc_p,axis=1)
print("pca_mean:",pca_mean.shape)
print("pca_std:",pca_std.shape)

pca_train_acc=np.concatenate((x_acc_1_train_pca,x_acc_2_train_pca,x_acc_3_train_pca),axis=1)
pca_test_acc=np.concatenate((x_acc_1_test_pca,x_acc_2_test_pca,x_acc_3_test_pca),axis=1)





# pca_test_acc=np.concatenate((x_acc_1_test,x_acc_2_test,x_acc_3_test),axis=1)


# x_test_pca_acc=pca_acc.transform(x_test_acc.reshape(-1,3))

print("pca_train_acc:",pca_train_acc.shape)

print("pca_test_acc:",pca_test_acc.shape)



(500, 3)
(500, 3)
After standar (1487, 500, 3)
(500, 3)
(500, 3)
After standar (598, 500, 3)
(5, 192)
(5, 192)
After standar (1486, 5, 192)
(5, 192)
(5, 192)
After standar (598, 5, 192)
pca_mean: (1487, 3)
pca_std: (1487, 3)
pca_train_acc: (1487, 30)
pca_test_acc: (598, 30)


# 4. Decision-level fusion for multimodal classification

<a id='task4'></a>
<div class=" alert alert-warning">
    <b>Assigment.</b> <b>Task 4.</b>

Use features calculated for each modality in task 3. Choose base classifier for each modality from scikit-learn. Train classifiers for each modality feature presentations separately and combine the outputs in decision level. Evaluate and compare the result on testing dataset. Do the subtasks given as
<br>
<br> 
<p> <b>4.1</b> Use base classifiers of support vector machine (SVM) with RBF-kernel and AdaBoost classifier (with random_state=0). 
Normalize data between 0-1 using the training dataset. Train the base classifiers by tuning the model parameters (<i>C</i> parameter and RBF-kernel <i>gamma</i> in SVM as well as <i>n_estimators</i> and <i>learning_rate</i> in Adaboost) using 10-fold cross-validation on training dataset to find optimal set of parameters (hint: use GridSearchCV from scikit-learn). For grid search use the following values $C = [0.1, 1.0, 10.0, 100.0]$, $gamma=[0.1, 0.25, 0.5, 0.75, 1.0, 2.0]$, $n\_estimators = [50, 100, 500, 1000]$, and $learning\_rate = [0.1, 0.25, 0.5, 0.75,1.0]$. Choose the best parameters and train the classifiers for each modality on whole training dataset. Is there a possibility that classifiers will overfit to training data using this parameter selection strategy? If so, why? </p>
<br>
<p> <b>4.2</b> Predict probabilistic outputs of each trained classifier for both modalities using the test set. </p>
<br>
<p> <b>4.3</b> Combine the probabilistic outputs of different modalities by fixed classification rules: max, min, prod, and sum. Evaluate, compare, and analyse the final combined results using confusion matrices and F1 scores. Show results for each base classifier combinations (i.e., $SVM_{acc}+SVM_{depth}$, $AdaBoost_{acc}+AdaBoost_{depth}$, $SVM_{acc}+AdaBoost_{depth}$, $AdaBoost_{acc}+SVM_{depth}$)</p>
<br>
Document your work, evaluate the results, and analyse the outcomes in each subtasks 4.1-4.3.
    
</div>

# 5. Bonus task: Multimodal biometric identification of persons (optional)

<a id='task5'></a>
<div class=" alert alert-warning">
    <b>Assigment.</b> <b>Task 5.</b>

Can you build a classifier that recognizes the person who is performing the exercise? Use same 10 person dataset and split it so that first 25% of each long exercise sequence is used for training and rest 75% of each sequence is used for testing the classifier. Use same 5 second windowing with 3 seconds overlap to prepare the examples. Note that, now the person identity is the class label instead of exercise type. Max. 10 points are given but you can earn points from partial solution, as well.
<br> 
<br> 
<p> <b>5.1</b> Build a classifier to identify persons based on the features and one of the models given in task 4 (max. 5 points).</p>
<br> 
<p> <b>5.2</b> Can you build your own solution (using new features, new classification model or different fusion approaches) to beat the approach in Task 5.1 ? (max. 5 points) </p>
<br>  
Document your work. Evaluate and compare the results using confusion matrix and F1 score.

</div>