# Predict Human Activity using Smartphone Sensor Data

By Niladri Ghosh

## 1. Identify Problem

“The advance of technology is based on making it fit in so that you don't really even notice it, so it's part of everyday life.” A great quote provided by Bill Gates. Technology is making our life easy day by day, and we rarely exert any pressure on our body which leads to obesity and heart diseases. Through fitness devices, which are embedded into our smart watches and phones now a days we can monitor our daily routine; walk steps, running, sleep cycle, etc. So we can maintain a healthy lifestyle and thus we need to identify the actual state of our body.


### 1.1 Expected Outcome

Given data from UCI Machine Learning provides various accelorometer and gyroscope data. Since this is a __classification problem__, our __final output would be the to predict the activity estimated by our model__.


### 1.2 Objective

We will be working on the Human Activity Recognition with Smartphones database, it has been built using the recordings of study participants performing activities of daily living (ADL) while carrying a smartphone with an embedded inertial sensors. 

> The objective is to classify activities into one of the six activities (walking, walking upstairs, walking downstairs, sitting, standing, and laying) performed.


### 1.3 Identify Data Sources

The [Human Activity Recognition with Smartphone](https://www.kaggle.com/uciml/human-activity-recognition-with-smartphones) dataset is available at Kaggle by [UCI Machine Learning](https://www.kaggle.com/uciml). The dataset consists of :

* Triaxial acceleration from the accelerometer (total acceleration) and the estimated body acceleration.
* Triaxial Angular velocity from the gyroscope.
* 561-feature vector with time and frequency domain variables.
* Activity label.

__Getting Started : Load libraries and set options__

In [1]:
# import libraries
import numpy as np, pandas as pd, matplotlib.pyplot as plt, seaborn as sns
%matplotlib inline

__Load Dataset__

First load the supplied CSV file using Pandas read_csv function.

In [3]:
# load dataset
df_primary = pd.read_csv("data/smartphone-data.csv",delimiter = ",")

In [4]:
# create copy of dataframe
df = df_primary.copy()

__Inspecting the data__

The first step is to visually inspect the new dataset. There are multiple ways to acheive this:
* The easiest way is to fetch first 5 rows is using DataFrame.head(), here df.head().
* Alternatively we can fetch the last 5 rows using DataFrame.tail(), here df.tail().

__NOTE:__ 

For both the above methods we can add a parameter inside the parenthesis '()' to specify how many rows we want to display, thus we can inspect the data.

In [5]:
df.head(10)

Unnamed: 0,tBodyAcc-mean()-X,tBodyAcc-mean()-Y,tBodyAcc-mean()-Z,tBodyAcc-std()-X,tBodyAcc-std()-Y,tBodyAcc-std()-Z,tBodyAcc-mad()-X,tBodyAcc-mad()-Y,tBodyAcc-mad()-Z,tBodyAcc-max()-X,...,fBodyBodyGyroJerkMag-skewness(),fBodyBodyGyroJerkMag-kurtosis(),"angle(tBodyAccMean,gravity)","angle(tBodyAccJerkMean),gravityMean)","angle(tBodyGyroMean,gravityMean)","angle(tBodyGyroJerkMean,gravityMean)","angle(X,gravityMean)","angle(Y,gravityMean)","angle(Z,gravityMean)",Activity
0,0.288585,-0.020294,-0.132905,-0.995279,-0.983111,-0.913526,-0.995112,-0.983185,-0.923527,-0.934724,...,-0.298676,-0.710304,-0.112754,0.0304,-0.464761,-0.018446,-0.841247,0.179941,-0.058627,STANDING
1,0.278419,-0.016411,-0.12352,-0.998245,-0.9753,-0.960322,-0.998807,-0.974914,-0.957686,-0.943068,...,-0.595051,-0.861499,0.053477,-0.007435,-0.732626,0.703511,-0.844788,0.180289,-0.054317,STANDING
2,0.279653,-0.019467,-0.113462,-0.99538,-0.967187,-0.978944,-0.99652,-0.963668,-0.977469,-0.938692,...,-0.390748,-0.760104,-0.118559,0.177899,0.100699,0.808529,-0.848933,0.180637,-0.049118,STANDING
3,0.279174,-0.026201,-0.123283,-0.996091,-0.983403,-0.990675,-0.997099,-0.98275,-0.989302,-0.938692,...,-0.11729,-0.482845,-0.036788,-0.012892,0.640011,-0.485366,-0.848649,0.181935,-0.047663,STANDING
4,0.276629,-0.01657,-0.115362,-0.998139,-0.980817,-0.990482,-0.998321,-0.979672,-0.990441,-0.942469,...,-0.351471,-0.699205,0.12332,0.122542,0.693578,-0.615971,-0.847865,0.185151,-0.043892,STANDING
5,0.277199,-0.010098,-0.105137,-0.997335,-0.990487,-0.99542,-0.997627,-0.990218,-0.995549,-0.942469,...,-0.54541,-0.844619,0.082632,-0.143439,0.275041,-0.368224,-0.849632,0.184823,-0.042126,STANDING
6,0.279454,-0.019641,-0.110022,-0.996921,-0.967186,-0.983118,-0.997003,-0.966097,-0.983116,-0.940987,...,-0.217198,-0.56443,-0.212754,-0.230622,0.014637,-0.189512,-0.85215,0.18217,-0.04301,STANDING
7,0.277432,-0.030488,-0.12536,-0.996559,-0.966728,-0.981585,-0.996485,-0.966313,-0.982982,-0.940987,...,-0.082307,-0.421715,-0.020888,0.593996,-0.561871,0.467383,-0.851017,0.183779,-0.041976,STANDING
8,0.277293,-0.021751,-0.120751,-0.997328,-0.961245,-0.983672,-0.997596,-0.957236,-0.984379,-0.940598,...,-0.269401,-0.572995,0.012954,0.080936,-0.234313,0.117797,-0.847971,0.188982,-0.037364,STANDING
9,0.280586,-0.00996,-0.106065,-0.994803,-0.972758,-0.986244,-0.995405,-0.973663,-0.985642,-0.940028,...,0.339526,0.140452,-0.02059,-0.12773,-0.482871,-0.07067,-0.848294,0.19031,-0.034417,STANDING


In [6]:
# check shape of the given data
df.shape

(10299, 562)

We can observe here the number of 10,299 rows, each with 562 columns.

On contrary we can use use info() method provided by pandas to generate a consise summary of the data. It provides the detail about each column, number of rows, null values, the data type and the memory usage.

In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10299 entries, 0 to 10298
Columns: 562 entries, tBodyAcc-mean()-X to Activity
dtypes: float64(561), object(1)
memory usage: 44.2+ MB


All of the data types are correct. 561 float type columns (features) and one object type column (target).

Check for null and duplicate values.

In [12]:
# check null values
df.isnull().values.any()

False

In [15]:
# check duplicate values
df.duplicated().values.any()

False

No null or duplicated values present in the dataset.

`Check whether all the columns are scaled - mininimum value : -1 and maximum value : 1 `

In [16]:
print(df.iloc[:,:-1].min().value_counts())
print(df.iloc[:,:-1].max().value_counts())

-1.0    561
dtype: int64
1.0    561
dtype: int64


Affirmative all of the columns are scaled.

`Now we have to check is our target column is balanced for each class`

In [17]:
df.Activity.value_counts()

LAYING                1944
STANDING              1906
SITTING               1777
WALKING               1722
WALKING_UPSTAIRS      1544
WALKING_DOWNSTAIRS    1406
Name: Activity, dtype: int64

The classes seems pretty much balanced.

> We have completed the inspection and the dataset is perfect. In the next notebook we proceed with Exploratory Data Analysis to unserstand the data with the help of summary statistics and visualizations.