## Force-based human's intent recognition: Feature-based classification
We propose a twofold machine learning approach to infer the human operator's intentions by means of force signals. First, we reduce the dimensionality of the data using an unsupervised method: Gaussian Process Latent Variable Model (GPLVM)[1]. Then, we use a Support Vector Machine (SVM) classifier which is trained using the lower dimensional representation of the data.
GPLVM is a non-linear dimensionality reduction method which can be considered as a multiple-output GP regression model where only the output data are given. The inputs are unobserved and treated as latent variables, however, instead of integrating out the latent variables, they are optimised. By doing this, the model gets more tractable and some theoretical grounding for the approach is given by the fact that the model can be seen as a non-linear extension of the linear probabilistic PCA (PPCA)[2]. Note that in this case, the temporal sequences are just considered as long feature vectors so that it is not explicitly considered the temporal relation between subsequent signal measurements.

### Implementation details
The implementation of the proposed method, GPLVM+SVM, relies on two existing libraries: *GPy* library for the dimensionality reduction and the *scikit learn* library for the SVM classifier. In the case of the latter, we used the default values for all the parameters. However, with regard to GPLVM, it has been necessary to set some parameters: kernel, optimiser and the maximum number of optimisation steps. 
Firstly, we chose a kernel which is a combination of the Radial Basis Function (RBF) kernel together with a *bias* kernel. RBF kernel was selected because it is one of the most well known kernels for non-linear problems. We added the \textit{bias} kernel to enable the kernel function to be computed not only in the origin of coordinates.

Secondly, for the optimisation process, we have used one of the optimisers already implemented in *GPy*, limited-memory Broyden–Fletcher–Goldfarb–Shanno (BFGS). We chose this optimiser because, unlike others included in the library, it was quite stable with respect to the number of optimisation steps needed to converge. 
Finally, the maximum number of optimisation steps is set to 5000, which in most of the cases was enough for the optimisation to converge. 

The implementation of the GPLVM algorithm allows to use two different types of latent variable inference: with optimisation step (GPLVM-op) and without optimisation step (GPLVM). For us, the most relevant difference between them is that the inference with optimisation takes more time, but it would be more correct in theory and it would lead to more accurate results.


#### Note
In this notebook, you can train your classifier with cross validation without replacement and evaluate the results. Please, if you want to do the training with a pre-defined dataset and generate a final model, go to the another notebook of this folder. 

In [None]:
# headers
import pandas as pd
from sklearn.svm import SVC
from sklearn.decomposition import PCA
from sklearn.neighbors import KNeighborsClassifier

from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot

from scripts.utils_general import *
from scripts.utils_evaluation import *
from scripts.utils_visualization import *
from scripts.utils_data_process import *
from scripts.utils_evaluation_global_variables import *

In [None]:
# variables for processing the data
type_of_dataset = 'natural' # natural or mechanical
labels = {0:'grab', 1:'move', 2:'polish'}
dataset_folder = '../data/'

training_portion = 0.75
cross_validation_iterations = 1


# parameters for data length
number_of_measurements = 350 # size of the window
step = 1 # for subsampling


# GPLVM parameters
latent_dimensionality = 10
max_iterations = 5000

In [None]:
processed_data = read_dataset_(dataset_folder, type_of_dataset, labels)

## Evaluation

In [None]:
data_training_ids_list = list()

for i in range (0, cross_validation_iterations):
    data_training, data_test = pick_training_dataset_randomly_(processed_data, training_portion, \
                                                              number_of_measurements, step, normalize=False)
    
    data_training_ids_list.append(data_training['sample_ids_training'])
    
    
    # gplvm
    gplvm = dualPPCA_init_and_optimization_(type_='gplvm', latent_dimensionality=latent_dimensionality, \
                                            data_training=data_training, max_iterations=max_iterations)
    
    inferred_gplvm_nop, gplvm_nop_results = dualPPCA_inference_(gplvm, data_test, gplvm_nop_results, optimize_=False)
    inferred_gplvm_op, gplvm_op_results = dualPPCA_inference_(gplvm, data_test, gplvm_op_results, optimize_=True)
    
    gplvm_classifier_init = SVC(gamma='auto', probability=True)
    
    gplvm_classifier = training_classifier_lower_dim_space_dualPPCA_(gplvm, gplvm_classifier_init, data_training)
    
    
    predictions_gplvm_nop = predict_with_classifier_lower_dim_space_dualPPCA_(gplvm, gplvm_classifier, data_test, \
                                                                          inferred_gplvm_nop)
    predictions_gplvm_op = predict_with_classifier_lower_dim_space_dualPPCA_(gplvm, gplvm_classifier, data_test, \
                                                                          inferred_gplvm_op)
    
    gplvm_nop_results = evaluate_classification_performance_dualPPCA(data_test['test_labels'], \
                                                                     predictions_gplvm_nop, gplvm_nop_results)
    gplvm_op_results = evaluate_classification_performance_dualPPCA(data_test['test_labels'], \
                                                                    predictions_gplvm_op, gplvm_op_results)

In [None]:
gplvm_op_mean = dict()
gplvm_op_std = dict()
gplvm_nop_mean = dict()
gplvm_nop_std = dict()

for key, value in gplvm_op_results.items():
    if key == 'confusion_matrix':
        gplvm_op_mean[key] = np.mean(value, axis=0)
        gplvm_op_std[key] = np.std(value, axis=0)
    else: 
        gplvm_op_mean[key] = np.mean(value)
        gplvm_op_std[key] = np.std(value)
for key, value in gplvm_nop_results.items():
    if key == 'confusion_matrix':
        gplvm_nop_mean[key] = np.mean(value, axis=0)
        gplvm_nop_std[key] = np.std(value, axis=0)
    else: 
        gplvm_nop_mean[key] = np.mean(value)
        gplvm_nop_std[key] = np.std(value)

In [None]:
add_to_name = '_latent_' # string containing the variable we are tuning

all_results['gplvm_opt_mean'+add_to_name+str(latent_dimensionality)] = gplvm_op_mean
all_results['gplvm_opt_std'+add_to_name+str(latent_dimensionality)] = gplvm_op_std
all_results['gplvm_no_opt_mean'+add_to_name+str(latent_dimensionality)] = gplvm_nop_mean
all_results['gplvm_no_opt_std'+add_to_name+str(latent_dimensionality)] = gplvm_nop_std

In [None]:
df_results = pd.DataFrame(all_results)
df_results.style

## References

[1] Lawrence, N. D. (2004). Gaussian process latent variable models for visualisation of high dimensional data. In Advances in neural information processing systems (pp. 329-336).

[2] Tipping, M. E., & Bishop, C. M. (1999). Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61(3), 611-622.