# Hand Gesture Classification Using XGBoost
This data set was created using 8 sensors to measure muscle activity over 40 ms intervals for a total of 64 measurements for each sample. The hand gestures are clasified as digits 0-3. They represent "rock" , "sciccors, "paper", and "okay" respectively. The objective of this project was to accurately classify these gestures based on the mearuments in muscle activity over time. After training and testing the model using ten different folds the mean accuracy was found to be 93.3%.

In [56]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import xgboost as xgb


Load all the csv files into data frames. These files did not contain column labels for the data. Each "hand gesture" was loaded into a seperate data frame.

In [57]:
df0 = pd.read_csv('0.csv', header = None)
df1 = pd.read_csv('1.csv', header = None)
df2 = pd.read_csv('2.csv', header = None)
df3 = pd.read_csv('3.csv', header = None)

Concatenate all four data frames into one large data frame for analysis.

In [58]:
df = pd.concat([df0, df1, df2, df3])

First, it is important to inspect the data and make sure there are not any missing values.

In [59]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 11678 entries, 0 to 2921
Data columns (total 65 columns):
0     11678 non-null float64
1     11678 non-null float64
2     11678 non-null float64
3     11678 non-null float64
4     11678 non-null float64
5     11678 non-null float64
6     11678 non-null float64
7     11678 non-null float64
8     11678 non-null float64
9     11678 non-null float64
10    11678 non-null float64
11    11678 non-null float64
12    11678 non-null float64
13    11678 non-null float64
14    11678 non-null float64
15    11678 non-null float64
16    11678 non-null float64
17    11678 non-null float64
18    11678 non-null float64
19    11678 non-null float64
20    11678 non-null float64
21    11678 non-null float64
22    11678 non-null float64
23    11678 non-null float64
24    11678 non-null float64
25    11678 non-null float64
26    11678 non-null float64
27    11678 non-null float64
28    11678 non-null float64
29    11678 non-null float64
30    11678 non-null f

There does not appear to be any missing measurements in the data frame. Using describe gives an idea of the distribution of the data.

In [60]:
df.describe()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,55,56,57,58,59,60,61,62,63,64
count,11678.0,11678.0,11678.0,11678.0,11678.0,11678.0,11678.0,11678.0,11678.0,11678.0,...,11678.0,11678.0,11678.0,11678.0,11678.0,11678.0,11678.0,11678.0,11678.0,11678.0
mean,-0.52038,-0.726837,-0.739082,-0.729748,-0.159103,-0.55489,-1.272649,-0.661843,-0.665953,-0.654222,...,-0.932694,-0.836958,-0.740623,-0.76871,-0.705343,-0.146686,-0.374807,-1.449306,-0.609094,1.503254
std,18.566709,11.766878,4.989944,7.441675,17.850402,25.809528,25.089972,15.408896,18.123854,11.84126,...,15.158993,18.204465,12.005206,4.969758,7.38441,17.841479,25.551082,25.259736,15.530091,1.117541
min,-116.0,-104.0,-33.0,-75.0,-121.0,-122.0,-128.0,-128.0,-110.0,-128.0,...,-128.0,-116.0,-128.0,-46.0,-74.0,-103.0,-128.0,-128.0,-124.0,0.0
25%,-9.0,-4.0,-3.0,-4.0,-10.0,-15.0,-6.0,-8.0,-9.0,-4.0,...,-8.0,-9.0,-4.0,-3.0,-4.0,-10.0,-14.0,-6.0,-8.0,1.0
50%,-1.0,-1.0,-1.0,-1.0,0.0,-1.0,-1.0,-1.0,-1.0,-1.0,...,-1.0,-1.0,-1.0,-1.0,-1.0,0.0,-1.0,-1.0,-1.0,2.0
75%,7.0,3.0,2.0,3.0,10.0,13.0,4.0,6.0,6.0,3.0,...,6.0,6.0,3.0,2.0,3.0,10.0,13.0,3.0,6.0,3.0
max,111.0,90.0,34.0,55.0,92.0,127.0,127.0,126.0,127.0,106.0,...,114.0,127.0,105.0,29.0,51.0,110.0,127.0,127.0,127.0,3.0


In [61]:
df.shape

(11678, 65)

Next, the feautres and target data need to be seperated into two seperate arrays. 

In [62]:
features = df.iloc[:, :-1].values
target = df.iloc[: , -1].values

In [63]:
features.shape
target.shape

(11678,)

In [64]:
features.shape

(11678, 64)

Use standard scalar to ensure standard deviation is 1 for all columns and that all measurements are represented on a similar scale.

In [65]:
from sklearn.preprocessing import StandardScaler

In [66]:
sc = StandardScaler()

In [67]:
features = sc.fit_transform(features)

Check to see that the fit_tranform method worked on all features by showing the standard deviation of all columns.

In [68]:
features.std(0)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

Use the features and target data to train an XGBoost model. 

Initially I split the training and testing data to create and verify an XGB classifier model but after learning more about k means and cross validation from sklearn I decided that cross_val_score would give a more realistic representation of how well the model would generalize to data outside of the sample given.


In [69]:
model = xgb.XGBClassifier(max_depth=3, n_estimators=300, learning_rate=0.05)
model.fit(features, target)

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
       colsample_bytree=1, gamma=0, learning_rate=0.05, max_delta_step=0,
       max_depth=3, min_child_weight=1, missing=None, n_estimators=300,
       n_jobs=1, nthread=None, objective='multi:softprob', random_state=0,
       reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
       silent=True, subsample=1)

In [44]:
from sklearn.model_selection import cross_val_score

In [45]:
scores = cross_val_score(model, features, target, cv = 10, scoring = 'accuracy')

  if diff:
  if diff:
  if diff:
  if diff:
  if diff:
  if diff:
  if diff:
  if diff:
  if diff:
  if diff:


In [46]:
print(scores.mean())

0.9338781688767066
