## Activity Detection Using ML

### About data set 
The raw sensor data is located in the directory. Each user has its own data file which is tagged with their
subject id, the sensor, and the device. Within the data file, each line is:
Subject-id, Activity Label, Timestamp, x, y, z
The features are defined as follows:
subject-id: Identifies the subject and is an integer value between 1600 and 1650.
activity-label: see Table 2 for a mapping from 18 characters to the activity name
timestamp: time that the reading was taken (Unix Time)
x: x sensor value (real valued)
y: y sensor value (real valued)
z: z sensor value (real valued)


### Importing libraries

In [45]:
import pandas as pd
import numpy
import os
from datetime import datetime

#### Combining the data coming from particular sensor 

In [46]:
def combinedata(path):
    
    columns=['subject','activity','time','x','y','z']
    data = pd.DataFrame(data=None,columns=columns)
    for dirname, _, filenames in os.walk(path):
        for filename in filenames:
            df = pd.read_csv(path + '//' + filename , sep=",", header=None)
            temp=pd.DataFrame(data=df.values, columns=columns)
            data=pd.concat([data,temp])
                    
    return data

In [47]:
trainphoneaccel = combinedata('C:/Users/Ajit/Desktop/pingala/Data/raw/train/phone/accel')
trainphoneaccel.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1580245 entries, 0 to 81458
Data columns (total 6 columns):
 #   Column    Non-Null Count    Dtype 
---  ------    --------------    ----- 
 0   subject   1580245 non-null  object
 1   activity  1580245 non-null  object
 2   time      1580245 non-null  object
 3   x         1580245 non-null  object
 4   y         1580245 non-null  object
 5   z         1580245 non-null  object
dtypes: object(6)
memory usage: 84.4+ MB


In [48]:
trainphonegyro=combinedata('C:/Users/Ajit/Desktop/pingala/Data/raw/train/phone/gyro')
trainphonegyro.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1387312 entries, 0 to 81198
Data columns (total 6 columns):
 #   Column    Non-Null Count    Dtype 
---  ------    --------------    ----- 
 0   subject   1387312 non-null  object
 1   activity  1387312 non-null  object
 2   time      1387312 non-null  object
 3   x         1387312 non-null  object
 4   y         1387312 non-null  object
 5   z         1387312 non-null  object
dtypes: object(6)
memory usage: 74.1+ MB


In [49]:
trainwatchaccel = combinedata('C:/Users/Ajit/Desktop/pingala/Data/raw/train/watch/accel')

In [50]:
trainwatchgyro = combinedata('C:/Users/Ajit/Desktop/pingala/Data/raw/train/watch/gyro')

#### Function to change the data type of x, y and z columns

In [51]:
def change_type(data):
    data['z'] = data['z'].str.replace(';','')
    data['x']=data['x'].astype('float')
    data['y']=data['y'].astype('float')
    data['z']=data['z'].astype('float')

    return data

In [52]:
trainphoneaccel=change_type(trainphoneaccel)
trainphonegyro=change_type(trainphonegyro)
trainwatchaccel=change_type(trainwatchaccel)
trainwatchgyro=change_type(trainwatchgyro)

In [53]:
trainwatchgyro.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1303438 entries, 0 to 64832
Data columns (total 6 columns):
 #   Column    Non-Null Count    Dtype  
---  ------    --------------    -----  
 0   subject   1303438 non-null  object 
 1   activity  1303438 non-null  object 
 2   time      1303438 non-null  object 
 3   x         1303438 non-null  float64
 4   y         1303438 non-null  float64
 5   z         1303438 non-null  float64
dtypes: float64(3), object(3)
memory usage: 69.6+ MB


#### Concatenation of all training data

In [76]:
train_data=pd.concat([trainphoneaccel,trainphonegyro,trainwatchaccel,trainwatchgyro])
train_data.info()
train_data.head

<class 'pandas.core.frame.DataFrame'>
Int64Index: 5575010 entries, 0 to 64832
Data columns (total 6 columns):
 #   Column    Dtype  
---  ------    -----  
 0   subject   object 
 1   activity  object 
 2   time      object 
 3   x         float64
 4   y         float64
 5   z         float64
dtypes: float64(3), object(3)
memory usage: 297.7+ MB


<bound method NDFrame.head of       subject activity             time         x          y         z
0        1600        A  252207666810782 -0.364761   8.793503  1.055084
1        1600        A  252207717164786 -0.879730   9.768784  1.016998
2        1600        A  252207767518790  2.001495  11.109070  2.619156
3        1600        A  252207817872794  0.450623  12.651642  0.184555
4        1600        A  252207868226798 -2.164352  13.928436 -4.422485
...       ...      ...              ...       ...        ...       ...
64828    1619        S  350830146796403 -1.051321   0.374564  0.203493
64829    1619        S  350830196752313 -1.363450   0.417187  0.263148
64830    1619        S  350830246708223 -1.190877   0.570585  0.314280
64831    1619        S  350830296664133 -0.356775   0.516256  0.253560
64832    1619        S  350830346620043 -0.049979   0.343683  0.067139

[5575010 rows x 6 columns]>

#### Similarly for test data :- First combine the data for particular sensor and then concatenate them

In [55]:
testphoneaccel = combinedata('C:/Users/Ajit/Desktop/pingala/Data/raw/test/phone/accel')
testphonegyro=combinedata('C:/Users/Ajit/Desktop/pingala/Data/raw/test/phone/gyro')
testwatchaccel = combinedata('C:/Users/Ajit/Desktop/pingala/Data/raw/test/watch/accel')
testwatchgyro = combinedata('C:/Users/Ajit/Desktop/pingala/Data/raw/test/watch/gyro')

In [56]:
testphoneaccel=change_type(testphoneaccel)
testphonegyro=change_type(testphonegyro)
testwatchaccel=change_type(testwatchaccel)
testwatchgyro=change_type(testwatchgyro)


In [57]:
testwatchgyro.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 997751 entries, 0 to 64853
Data columns (total 6 columns):
 #   Column    Non-Null Count   Dtype  
---  ------    --------------   -----  
 0   subject   997751 non-null  object 
 1   activity  997751 non-null  object 
 2   time      997751 non-null  object 
 3   x         997751 non-null  float64
 4   y         997751 non-null  float64
 5   z         997751 non-null  float64
dtypes: float64(3), object(3)
memory usage: 53.3+ MB


In [58]:
test_data=pd.concat([testphoneaccel,testphonegyro,testwatchaccel,testwatchgyro])
test_data.info()
test_data.head

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4266331 entries, 0 to 64853
Data columns (total 6 columns):
 #   Column    Dtype  
---  ------    -----  
 0   subject   object 
 1   activity  object 
 2   time      object 
 3   x         float64
 4   y         float64
 5   z         float64
dtypes: float64(3), object(3)
memory usage: 227.8+ MB


<bound method NDFrame.head of       subject activity             time         x          y         z
0        1620        A  938034649865917  6.130188   5.923340 -0.791870
1        1620        A  938034700219921  9.979980  10.531952 -2.961151
2        1620        A  938034717004589  7.232987  11.211700 -0.783981
3        1620        A  938034767358593  6.013000   9.329330 -1.070175
4        1620        A  938034784143261  5.563065   7.343353 -4.254425
...       ...      ...              ...       ...        ...       ...
64849    1633        S  697019338580377 -1.073295  -0.204933 -0.385017
64850    1633        S  697019388505837  0.467077  -0.545817 -0.487282
64851    1633        S  697019438431297  0.801570  -0.868592 -0.810057
64852    1633        S  697019488356757  1.715567  -1.226521 -1.232967
64853    1633        S  697019538282217  1.811441  -1.753827 -1.964804

[4266331 rows x 6 columns]>

#### Changing the data type of time column

In [63]:
train_data['time'] = pd.to_numeric(train_data['time'])
test_data['time'] = pd.to_numeric(test_data['time'])

test_data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4266331 entries, 0 to 64853
Data columns (total 6 columns):
 #   Column    Dtype  
---  ------    -----  
 0   subject   object 
 1   activity  object 
 2   time      int64  
 3   x         float64
 4   y         float64
 5   z         float64
dtypes: float64(3), int64(1), object(2)
memory usage: 227.8+ MB


#### Breaking the time value into year, month, week, day, hour, minute and second format

In [64]:
train_data['time'] = train_data['time'].apply(lambda x: x//1000000)
train_data['time'] = train_data['time'].apply(lambda x: datetime.fromtimestamp(x))
test_data['time'] = test_data['time'].apply(lambda x: x//1000000)
test_data['time'] = test_data['time'].apply(lambda x: datetime.fromtimestamp(x))
train_data.drop(columns="subject",inplace=True)
test_data.drop(columns="subject",inplace=True)

In [65]:
for time in ('year','month','week','day','hour','minute','second'):
    train_data[time] = getattr(train_data['time'].dt,time)
train_data.drop(columns="time",inplace=True)

for time in ('year','month','week','day','hour','minute','second'):
    test_data[time] = getattr(test_data['time'].dt,time)
test_data.drop(columns="time",inplace=True)

  
  


In [66]:
train_data.head

<bound method NDFrame.head of       activity         x          y         z  year  month  week  day  hour  \
0            A -0.364761   8.793503  1.055084  1977     12    52   29     7   
1            A -0.879730   9.768784  1.016998  1977     12    52   29     7   
2            A  2.001495  11.109070  2.619156  1977     12    52   29     7   
3            A  0.450623  12.651642  0.184555  1977     12    52   29     7   
4            A -2.164352  13.928436 -4.422485  1977     12    52   29     7   
...        ...       ...        ...       ...   ...    ...   ...  ...   ...   
64828        S -1.051321   0.374564  0.203493  1981      2     7   12    18   
64829        S -1.363450   0.417187  0.263148  1981      2     7   12    18   
64830        S -1.190877   0.570585  0.314280  1981      2     7   12    18   
64831        S -0.356775   0.516256  0.253560  1981      2     7   12    18   
64832        S -0.049979   0.343683  0.067139  1981      2     7   12    18   

       minute  second

### importing libraries for model building and model evaluation 

In [67]:
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

#### label the 'activity' value from categorical to numeric 

In [69]:
label = LabelEncoder()
train_data['activity'] = label.fit_transform(train_data['activity']) 
test_data['activity'] = label.fit_transform(test_data['activity']) 

#### Combining the training and test data

In [72]:
train_data = train_data.sample(frac=1).reset_index(drop=True)
test_data = test_data.sample(frac=1).reset_index(drop=True)
data=pd.DataFrame()
data=pd.concat([train_data,test_data])
data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9841341 entries, 0 to 4266330
Data columns (total 11 columns):
 #   Column    Dtype  
---  ------    -----  
 0   activity  int64  
 1   x         float64
 2   y         float64
 3   z         float64
 4   year      int64  
 5   month     int64  
 6   week      int64  
 7   day       int64  
 8   hour      int64  
 9   minute    int64  
 10  second    int64  
dtypes: float64(3), int64(8)
memory usage: 901.0 MB


#### Spliting the data into training and test set

In [73]:
y=data["activity"]
x=data.drop(columns="activity")
x_train, x_test, y_train, y_test = train_test_split(x,y,train_size = 0.7, random_state =  42)

#### Initialize the model and train our model

In [74]:
model = RandomForestClassifier()
model.fit(x_train, y_train)


RandomForestClassifier()

#### Predicting the output and calculating the Accuracy score

In [75]:
y_pred=model.predict(x_test)
acc=accuracy_score(y_test,y_pred)
acc

0.975614778876732

## Thank you