# An **RandomForest** Classifier for HoloLens 2 Gaze Features
### Training a classifier with gaze features for calculating predictions of various activities

This notebook trains a RandomForestClassifier with selected features and corresponding labels.\
The features and labels are read from a given csv-file.

## Read data from a csv-file.

Note that the csv-file we are using here is generated by the FeatureCalculation Notebook.

In [None]:
import os
import pandas as pd

# CHANGE these locations to where you stored the feature files ⬇️
recording_location = './'
all_features_csv = os.path.join(recording_location, './Data/FeatureFiles/feature_list_all.csv')
df = pd.read_csv(all_features_csv)

In [None]:
# Uncomment the following lines to see all columns of the csv file (i.e., the features and labels)
print("Columns of the CSV file are 19 features, label of the activity, duration or the timespan of the activity, and the ID of the participant:")
list(df.columns)

In [None]:
read_df = df[df.label == 'Reading']
inspect_df = df[df.label == 'Inspection']
search_df = df[df.label == 'Search']

In [None]:
#Importing the necessary packages and libaries
from sklearn.metrics import ConfusionMatrixDisplay
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MaxAbsScaler
from sklearn.ensemble import RandomForestClassifier

from IPython.display import display

## Use all features to train the random forest classifier and extract the labels

In [None]:
features = df.loc[:, df.columns!='label']
labels = df ['label']

## This is how the features and labels look

In [None]:
print("Features:")
display(features.head(10))
print("Labels:")
display(labels.head(10))

## Let's normalize the features (i.e., each column indivudally)

In [None]:
scaler = MaxAbsScaler()
scaler.fit(features)
scaled = scaler.transform(features)
scaled_features = pd.DataFrame(scaled, columns=features.columns)
print("Normalized Features:")
display(scaled_features.head(10))

## Let's split the data and have two sets, one for training the model and one for testing it.

In [None]:
feature_train, feature_test, label_train, label_test = train_test_split(scaled_features, labels, train_size=0.8, random_state = 0, stratify=labels)

## Train a random forest classifier

In [None]:
rf = RandomForestClassifier()
rf_fitted = rf.fit(feature_train,label_train)

## Lets collect the predictions from test data. . .

In [None]:
predictions = rf_fitted.predict(feature_test)
predictions

## . . . and have a look at the accuracy:

In [None]:
accuracy = rf.score(feature_test, label_test)
print("Accuracy:", accuracy)

## Confusion Matrix

In [None]:
ConfusionMatrixDisplay.from_estimator(rf, feature_test, label_test)

##