<font size="6"><strong><center> Classify Kinematic Data using Naive Bayes</center></strong></font> 

<center><span style="font-family: Arial;font-size:1.2em"></center>
    

**DESCRIPTION**

You are supposed to detect whether the person is running or walking based on the sensor data collected from iOS device. The dataset contains a single file which represents sensor data samples collected from accelerometer and gyroscope from iPhone 5c in 10 seconds interval and ~5.4/second frequency.


**Objective**: Practice classification based on Naive Bayes algorithm. Identify the predictors that can be influential.

**Actions to Perform**:

1. Load the kinematics dataset as measured on mobile sensors from the file “run_or_walk.csv.”
2. List the columns in the dataset.
3. Let the target variable “y” be the activity, and assign all the columns after it to “x.”
4. Using Scikit-learn, fit a Gaussian Naive Bayes model and observe the accuracy.
5. Generate a classification report using Scikit-learn.
6. Repeat the model once using only the acceleration values as predictors and then using only the gyro values as predictors.
7. Comment on the difference in accuracy between both models.

In [2]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

In [4]:
df = pd.read_csv('run_or_walk.csv')
df.head()

Unnamed: 0,date,time,username,wrist,activity,acceleration_x,acceleration_y,acceleration_z,gyro_x,gyro_y,gyro_z
0,2017-6-30,13:51:15:847724020,viktor,0,0,0.265,-0.7814,-0.0076,-0.059,0.0325,-2.9296
1,2017-6-30,13:51:16:246945023,viktor,0,0,0.6722,-1.1233,-0.2344,-0.1757,0.0208,0.1269
2,2017-6-30,13:51:16:446233987,viktor,0,0,0.4399,-1.4817,0.0722,-0.9105,0.1063,-2.4367
3,2017-6-30,13:51:16:646117985,viktor,0,0,0.3031,-0.8125,0.0888,0.1199,-0.4099,-2.9336
4,2017-6-30,13:51:16:846738994,viktor,0,0,0.4814,-0.9312,0.0359,0.0527,0.4379,2.4922


In [12]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 88588 entries, 0 to 88587
Data columns (total 11 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   date            88588 non-null  object 
 1   time            88588 non-null  object 
 2   username        88588 non-null  object 
 3   wrist           88588 non-null  int64  
 4   activity        88588 non-null  int64  
 5   acceleration_x  88588 non-null  float64
 6   acceleration_y  88588 non-null  float64
 7   acceleration_z  88588 non-null  float64
 8   gyro_x          88588 non-null  float64
 9   gyro_y          88588 non-null  float64
 10  gyro_z          88588 non-null  float64
dtypes: float64(6), int64(2), object(3)
memory usage: 7.4+ MB


In [8]:
from sklearn.model_selection import train_test_split
X,y = df.iloc[:,5:].values, df.iloc[:,4].values
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2, random_state=42)

In [9]:
print(X_train.shape)
print(y_test[0:10])

(70870, 6)
[1 1 1 1 0 0 1 1 1 1]


In [10]:
from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(X_train, y_train)

GaussianNB()

In [11]:
y_pred = classifier.predict(X_test)

#### Result

In [13]:
from sklearn.metrics import accuracy_score

In [14]:
accuracy = accuracy_score(y_pred,y_test)
accuracy

0.958008804605486

In [15]:
from sklearn.metrics import confusion_matrix

In [16]:
conf_mat = confusion_matrix(y_pred,y_test)
conf_mat

array([[8828,  659],
       [  85, 8146]])

In [18]:
from sklearn.metrics import classification_report
target_names = ["Walk","Run"]
print(classification_report(y_test,y_pred,target_names=target_names))

              precision    recall  f1-score   support

        Walk       0.93      0.99      0.96      8913
         Run       0.99      0.93      0.96      8805

    accuracy                           0.96     17718
   macro avg       0.96      0.96      0.96     17718
weighted avg       0.96      0.96      0.96     17718



### Repeat the model using only acceleration values

In [21]:
from sklearn.model_selection import train_test_split
X, y = df.iloc[:, [5,6,7]].values,df.iloc[:, 4].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

In [22]:
classifier.fit(X_train,y_train)
y_predict = classifier.predict(X_test)
accuracy_score(y_predict,y_test)

0.9565978101365843

In [23]:
print(conf_mat)

[[8828  659]
 [  85 8146]]
