# Human Activity Prediction

<b>The current scenario has the following problems :</b>

The company has collected a large amount of sensor data from smartphones but is not able to use it efficiently.
This data can be utilized to achieve various goals that can help an individual for better health like activity detection and also monitor signs of fatigue.

The company has hired you as data science consultants to automate the process of predicting the activity and draw other insights by analyzing the smartphone sensor data.

Your Role
You are given a dataset containing the details about the participants/users.
Your task is to build a classification model for predicting the activity type.
Because there was no machine learning model for this problem in the company, you don’t have a quantifiable win condition. You need to build the best possible model.

Project Deliverables<br>
Deliverable: Human Activity Prediction.<br>
Machine Learning Task: Classification<br>
Target Variable: <b> is_legendary</b><br>

Evaluation Metric
The model evaluation will be based on the Accuracy Score.


In [58]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

import warnings
warnings.filterwarnings('ignore')

from sklearn.model_selection import train_test_split,cross_val_score,GridSearchCV
from sklearn.metrics import accuracy_score,f1_score
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import SelectFromModel

from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier,AdaBoostClassifier,GradientBoostingClassifier

In [2]:
train_data = pd.read_csv('hacr_train.csv')
train_data.shape

(2887, 563)

In [7]:
test_data = pd.read_csv('hacr_test.csv')
test_data.shape

(722, 563)

In [8]:
train_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2887 entries, 0 to 2886
Columns: 563 entries, rn to angle.Z.gravityMean
dtypes: float64(561), int64(1), object(1)
memory usage: 12.4+ MB


In [10]:
train_data.head()

Unnamed: 0,rn,activity,tBodyAcc.mean.X,tBodyAcc.mean.Y,tBodyAcc.mean.Z,tBodyAcc.std.X,tBodyAcc.std.Y,tBodyAcc.std.Z,tBodyAcc.mad.X,tBodyAcc.mad.Y,...,fBodyBodyGyroJerkMag.meanFreq,fBodyBodyGyroJerkMag.skewness,fBodyBodyGyroJerkMag.kurtosis,angle.tBodyAccMean.gravity,angle.tBodyAccJerkMean.gravityMean,angle.tBodyGyroMean.gravityMean,angle.tBodyGyroJerkMean.gravityMean,angle.X.gravityMean,angle.Y.gravityMean,angle.Z.gravityMean
0,9020,WALKING_UPSTAIRS,0.33,-0.00449,-0.0481,-0.395,-0.152,-0.196,-0.483,-0.131,...,0.395,-0.26,-0.526,-0.0342,-0.633,-0.171,0.654,-0.556,0.294,0.257
1,2646,WALKING,0.208,0.00554,-0.115,-0.432,-0.122,-0.431,-0.47,-0.114,...,-0.119,0.112,-0.171,0.725,0.388,0.942,-0.588,-0.742,0.264,-0.0505
2,5516,SITTING,-0.413,0.253,0.223,-0.779,-0.569,-0.699,-0.797,-0.572,...,-0.268,-0.608,-0.891,0.0843,0.917,-0.0414,0.0721,-0.434,-0.143,-0.292
3,5499,STANDING,0.272,-0.026,-0.103,-0.997,-0.982,-0.983,-0.998,-0.981,...,0.354,-0.735,-0.926,0.0526,0.121,-0.338,0.29,-0.854,0.17,-0.0555
4,4689,WALKING_UPSTAIRS,0.275,-0.0384,-0.0556,0.126,0.102,-0.044,0.104,0.101,...,0.43,-0.00277,-0.263,0.00825,-0.854,0.773,-0.83,-0.621,0.325,0.148


In [12]:
train_data.isnull().sum().any()

False

In [30]:
train_data1 = train_data.copy()

In [31]:
train_data1['activity'].unique()

array(['WALKING_UPSTAIRS', 'WALKING', 'SITTING', 'STANDING',
       'WALKING_DOWNSTAIRS', 'LAYING'], dtype=object)

In [32]:
train_data['activity'].value_counts()

STANDING              550
LAYING                529
SITTING               491
WALKING               477
WALKING_UPSTAIRS      444
WALKING_DOWNSTAIRS    396
Name: activity, dtype: int64

In [33]:
mapop = {'WALKING_UPSTAIRS':0, 'WALKING':1, 'SITTING':2, 'STANDING':3,
       'WALKING_DOWNSTAIRS':4, 'LAYING':5}

In [34]:
train_data1['activity'] = train_data1['activity'].map(mapop)

In [35]:
train_data1['activity'].value_counts()

3    550
5    529
2    491
1    477
0    444
4    396
Name: activity, dtype: int64

In [41]:
# checking variables with zero standard deviation
cols = []

for i in train_data1.columns:
    if train_data1[i].std() == 0:
        cols.append[i]
print("Number of constant columns to be dropped: ", len(cols))
print(cols)
    


Number of constant columns to be dropped:  0
[]


In [42]:
import dtale
d =dtale.show(train_data1)
d.open_browser()

2021-11-17 16:34:06,885 - INFO     - NumExpr defaulting to 4 threads.


In [46]:
train_data1.drop_duplicates(keep='first',inplace=True)

In [47]:
train_data1.shape

(2887, 563)

In [49]:
X=train_data1.drop(['activity','rn'],axis=1)
y=train_data1['activity']
print(X.shape,y.shape)

(2887, 561) (2887,)


In [51]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=42,stratify=y)
print(X_train.shape,y_train.shape,X_test.shape,y_test.shape)

(2309, 561) (2309,) (578, 561) (578,)


In [52]:
SS=StandardScaler()
X_train[X_train.columns] = SS.fit_transform(X_train)
X_test[X_test.columns] = SS.transform(X_test)


# Model1 - Logistic Regression

In [62]:
LR = LogisticRegression()
LR.fit(X_train,y_train)
LR_Train = LR.predict(X_train)
LR_Test = LR.predict(X_test)
print(accuracy_score(LR_Train,y_train))
print(accuracy_score(LR_Test,y_test))
cv_score =cross_val_score(estimator=LogisticRegression(),X=X_train,y=y_train,cv=10)
print(cv_score)
print(np.mean(cv_score))

0.9987007362494587
0.9688581314878892
[0.96969697 0.94805195 0.93506494 0.96536797 0.96103896 0.96536797
 0.98701299 0.97402597 0.96969697 0.97826087]
0.9653585544889891


# Model2 - Decision Tree Classifier

In [67]:
'''GSV1 = GridSearchCV(estimator=DecisionTreeClassifier(),cv=10,scoring='accuracy',param_grid=dict(max_depth=[1,2,3,4,5,6,7,8,9,10,11,12,13]))
GSV1.fit(X_train,y_train)
print(GSV1.best_params_)'''
DT = DecisionTreeClassifier(max_depth=8)
DT.fit(X_train,y_train)
DT_Train = DT.predict(X_train)
DT_Test = DT.predict(X_test)
print(accuracy_score(DT_Train,y_train))
print(accuracy_score(DT_Test,y_test))
cv_score =cross_val_score(estimator=DecisionTreeClassifier(),X=X_train,y=y_train,cv=10)
print(cv_score)
print(np.mean(cv_score))

0.9818103074924209
0.9083044982698962
[0.87878788 0.88311688 0.91774892 0.87445887 0.89177489 0.94372294
 0.9004329  0.89177489 0.91341991 0.89565217]
0.8990890269151139


# Model3 - RandomForestClassifier

In [79]:
'''GSV2 = GridSearchCV(estimator=RandomForestClassifier(),cv=10,scoring='accuracy',param_grid=dict(n_estimators=np.arange(20,100,10)))
GSV2.fit(X_train,y_train)
print(GSV2.best_params_)'''
RF = RandomForestClassifier(n_estimators=80,class_weight='balanced',random_state=42)
RF.fit(X_train,y_train)
RF_Train = RF.predict(X_train)
RF_Test = RF.predict(X_test)
print(accuracy_score(RF_Train,y_train))
print(accuracy_score(RF_Test,y_test))
cv_score =cross_val_score(estimator=RandomForestClassifier(),X=X_train,y=y_train,cv=10)
print(cv_score)
print(np.mean(cv_score))

1.0
0.9567474048442907


'cv_score =cross_val_score(estimator=RandomForestClassifier(),X=X_train,y=y_train,cv=10)\nprint(cv_score)\nprint(np.mean(cv_score))'

In [80]:
mapop1 = {0:'WALKING_UPSTAIRS', 1:'WALKING', 2:'SITTING', 3:'STANDING',
       4:'WALKING_DOWNSTAIRS', 5:':LAYING'}

In [95]:
test_data1 = test_data.drop(['rn','activity'],axis=1).copy()

In [96]:
test_data1[test_data1.columns] = SS.transform(test_data1)

In [109]:
pred_values = LR.predict(test_data1)

In [110]:
pred_values

array([4, 1, 1, 2, 5, 0, 5, 1, 4, 3, 5, 2, 3, 1, 0, 2, 1, 0, 3, 1, 2, 3,
       4, 5, 5, 4, 2, 1, 5, 4, 1, 0, 5, 3, 2, 4, 2, 5, 1, 5, 3, 3, 0, 5,
       4, 5, 1, 3, 2, 1, 2, 3, 5, 5, 4, 4, 3, 1, 2, 5, 4, 3, 0, 4, 1, 5,
       1, 2, 3, 2, 3, 2, 5, 1, 2, 1, 5, 5, 0, 0, 3, 0, 0, 5, 0, 4, 4, 1,
       3, 1, 0, 5, 0, 1, 1, 5, 5, 2, 5, 3, 1, 1, 3, 3, 2, 5, 2, 1, 4, 3,
       4, 0, 5, 1, 5, 1, 2, 4, 5, 3, 2, 4, 2, 5, 3, 4, 3, 2, 0, 4, 5, 5,
       5, 1, 5, 5, 2, 1, 3, 2, 5, 3, 4, 2, 3, 4, 2, 5, 1, 3, 4, 4, 2, 2,
       2, 2, 0, 3, 0, 0, 5, 5, 5, 3, 3, 1, 1, 5, 3, 5, 0, 3, 2, 0, 4, 1,
       3, 5, 3, 2, 4, 4, 1, 5, 2, 2, 3, 5, 4, 1, 2, 0, 2, 2, 3, 0, 0, 2,
       4, 3, 5, 5, 2, 5, 2, 4, 2, 2, 2, 3, 0, 5, 5, 5, 5, 1, 0, 0, 2, 2,
       1, 2, 4, 1, 4, 5, 4, 4, 0, 3, 1, 0, 5, 5, 1, 2, 5, 0, 3, 0, 3, 2,
       3, 2, 5, 0, 1, 5, 5, 5, 1, 2, 3, 5, 1, 4, 5, 1, 5, 2, 4, 1, 2, 5,
       3, 5, 0, 2, 5, 3, 1, 5, 2, 0, 1, 1, 1, 3, 3, 5, 1, 5, 0, 1, 3, 1,
       0, 2, 4, 0, 1, 2, 3, 4, 5, 3, 2, 2, 1, 1, 5,

In [111]:
finalop = pd.DataFrame(test_data['rn'])

In [112]:
finalop['activity'] = pred_values

In [113]:
finalop['activity'] = finalop['activity'].map(mapop1)

In [114]:
finalop

Unnamed: 0,rn,activity
0,811,WALKING_DOWNSTAIRS
1,8965,WALKING
2,5000,WALKING
3,1200,SITTING
4,9812,:LAYING
...,...,...
717,2992,WALKING
718,3487,SITTING
719,3905,WALKING
720,3955,WALKING_DOWNSTAIRS


In [115]:
finalop.to_csv('submission.csv',header=False,index=False)