# Feedback Classification Analysis
*Ronald Marrero*
---

This notebook analyzes sample data from a force sensing resistor on an Arduino and the performance of several machine learning models in classifying measurements as one of four force types: [None, Low, Medium, High]


## Data Acquisition
Measurements for the dataset are populated from the arduino-fsr.ino module. Every second, 5 samples are taken (2 per millisecond). In addition, the known class of the measurement, the mean, and the normalized mean get generated. Each row is a comma-delimited list that gets copied from the serial monitor to a CSV file.

In [15]:
import pandas as pd 
import numpy as np
from sklearn.svm import SVC
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split, cross_val_score

measurements = pd.read_csv("forces-dataset.csv")
measurements

Unnamed: 0,Class,Sample1,Sample2,Sample3,Sample4,Sample5,Mean,TightMean
0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.00
1,0,0.0,0.0,0.0,0.0,0.0,0.0,0.00
2,0,0.0,0.0,0.0,0.0,0.0,0.0,0.00
3,0,0.0,0.0,0.0,0.0,0.0,0.0,0.00
4,0,0.0,0.0,0.0,0.0,0.0,0.0,0.00
...,...,...,...,...,...,...,...,...
195,3,893.0,888.0,891.0,892.0,881.0,889.0,890.33
196,3,883.0,886.0,889.0,902.0,903.0,892.6,892.33
197,3,908.0,911.0,912.0,909.0,878.0,903.6,909.33
198,3,852.0,847.0,867.0,881.0,885.0,866.4,866.67


In [16]:
Y = measurements["Class"].to_numpy()
Y

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3])

## Feature Extraction
From the data, you can see that 5 samples were taken as well as the mean and the tight mean was found by dropping the min and max in the set and then taking the mean.

While the model is very accurate with 5-fold Cross Validation already, using the tight mean consistently gave better accuracy on the classification model.

In [17]:
svclassifier = SVC(kernel="linear")

# Using just the mean from 5 samples
X_mean = measurements["Mean"].to_numpy()
X_mean = X_mean.reshape(-1, 1)
x_train, x_test, y_train, y_test = train_test_split(
    X_mean, Y, test_size=0.20
)
svclassifier.fit(x_train, y_train)
scores_mean = cross_val_score(svclassifier, X_mean, Y, cv=5)
print("SVM on mean data: %0.2f accuracy with a standard deviation of %0.2f" % (scores_mean.mean(), scores_mean.std()))

# Using the means where min and max are removed
X_tight = measurements["TightMean"].to_numpy()
X_tight = X_tight.reshape(-1, 1)
x_train, x_test, y_train, y_test = train_test_split(
    X_tight, Y, test_size=0.20
)
svclassifier.fit(x_train, y_train)
scores_tight = cross_val_score(svclassifier, X_tight, Y, cv=5)
print("SVM on tight mean data: %0.2f accuracy with a standard deviation of %0.2f" % (scores_tight.mean(), scores_tight.std()))

SVM on mean data: 0.97 accuracy with a standard deviation of 0.03
SVM on tight mean data: 0.98 accuracy with a standard deviation of 0.02


## Comparing Classification Models
The above tests have been using a linear SVM classifier. Here is a closer look at the different SVM implementations, K Nearest Neighbors, and Linear Regression:


In [19]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LinearRegression

# SVM
for kernel in ["linear","poly","rbf"]:
    clf = SVC(kernel=kernel)
    clf.fit(x_train, y_train)
    scores = cross_val_score(clf, X_tight, Y, cv=5)
    print("SVM %s on tight mean data: %0.2f accuracy with a standard deviation of %0.2f" % (kernel, scores.mean(), scores.std()))

# K Nearest Neighbors
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(x_train, y_train)
scores_knn = cross_val_score(clf, X_tight, Y, cv=5)
print("KNN on tight mean data: %0.2f accuracy with a standard deviation of %0.2f" % (scores_knn.mean(), scores_knn.std()))

SVM linear on tight mean data: 0.98 accuracy with a standard deviation of 0.02
SVM poly on tight mean data: 0.77 accuracy with a standard deviation of 0.03
SVM rbf on tight mean data: 0.94 accuracy with a standard deviation of 0.05
KNN on tight mean data: 0.94 accuracy with a standard deviation of 0.05
