# Steps prediction


About fitbit dataset:
    Fitbit Inc. is an American company known for its products of the same name, which are activity trackers, 
    wireless-enabled wearable technology devices that measure data such as the number of steps walked, heart rate, 
    quality of sleep, steps climbed, and other personal metrics. The first of these was the Fitbit Tracker.

For these example fitbit data of CSI member(Vivek rai :P) was collected. Here Machine learning is used to determine number of steps he walked
(Count) based on Calories burned and Distance covered
Since we want to "PREDICT" a missing attribute Regression is used for this example

### import libraries
Import all the required libraries at once

In [1]:
import numpy as np
import pandas as p
import matplotlib.pyplot as plt

### Read CSV File (Containing fitbit dataset)

In [2]:
fitbit = p.read_csv("fitbit_dataset.csv")

In [3]:
fitbit.head()

Unnamed: 0,time_offset,end_time,speed,pkg_name,start_time,count,sample_position_type,calorie,distance,datauuid,deviceuuid,update_time,create_time
0,19800000,57:00.0,1.638889,com.sec.android.app.shealth,56:00.0,5,,0.2,3.2,7ba6b7d2-519c-41f9-8706-d3edafa3b0fc,MdS75U+XxL,23:57.0,23:55.8
1,19800000,57:00.0,1.638889,com.sec.android.app.shealth,56:00.0,6,,0.24,4.78,7ba6b7d2-519c-41f9-8706-d3edafa3b0fc,MdS75U+XxL,23:57.0,23:55.8
2,19800000,57:00.0,0.916667,com.sec.android.app.shealth,56:00.0,1,,0.04,0.58,9568572f-f33d-43ed-8dbd-aa7033744b3d,MdS75U+XxL,23:55.8,23:55.8
3,19800000,53:00.0,2.888889,com.sec.android.app.shealth,52:00.0,25,,2.74,22.26,a144cb82-f4b1-4011-ae2c-7c8754273558,MdS75U+XxL,23:55.8,23:55.8
4,19800000,57:00.0,1.694444,com.sec.android.app.shealth,56:00.0,12,,0.48,8.85,88278acc-964a-44ca-a4d7-bf696804582f,MdS75U+XxL,23:55.8,23:55.8


### Select Input and Output features for our that dataSet (Value of X input and y output)


In [4]:
#Here we want to predict number of steps(Count) based on Calories consumed and distance covered.
features = ['count','distance','speed']
X = fitbit[features]
y = fitbit['calorie']

### Split our dataset into training set and testing set
train_test_split is a predefined function used to split data randomly
It takes Input data to be splited along with output data
Test size If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split.
If int, represents the absolute number of test samples. 

In [5]:

from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.4,random_state=5)

### Choose apt model and create an instance of that model


In [6]:
#In regression we are using linear regression model for prediction purposes.

from sklearn.linear_model import LinearRegression
fit_ln_model = LinearRegression()

### Fit the model

In [7]:
#Fit function will "fit" a "just fit" curve(or line) for your dataset which is apt for making prediction
#Note that fit at a certain extent will take care of overfitting  and underfitting but won't assure a right curve
# in case where data is small or ambiguous

fit_ln_model.fit(X_train,y_train)

In [8]:
# to check intercept and weights associated with feature use

print(fit_ln_model.intercept_)
print(fit_ln_model.coef_)

0.1989281503514455
[-0.01193946  0.06714398  0.04723289]


## Predict on test data

In [9]:
ypred = fit_ln_model.predict(X_test)

Notice the output values after running next two commands

In [10]:
print(ypred)

[5.31117222 0.89684617 5.24400829 ... 0.39504294 5.22938227 4.93660227]


In [11]:
print(y_test)

1795     4.98
9972     0.45
9525     4.88
7808     4.29
3625     1.16
         ... 
8692     0.04
10459    7.94
7732     0.12
1298     6.17
3699     5.26
Name: calorie, Length: 5561, dtype: float64


### Making a prediction for random value of calories and distance

In [12]:
i = [[3,27,1.6]]
test = fit_ln_model.predict(i)
print(test)

[2.05156978]




## Accuracy of prediction

In [13]:
#Accuracy is determined using predefined function of explained_variance_score

from sklearn.metrics import explained_variance_score
100*explained_variance_score(ypred,y_test)

83.17620876000885