<a href="https://colab.research.google.com/github/Kirtiwardhan01/Deep-Learning-/blob/master/Supervised%20Regression%20Problem%20using%20tensorflow.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##**Bike Sharing Dataset Data Set**

## In this assignment we are going to predict demand of bikes in Washington D.C 

**Import the data file for local repository using the command below**

In [30]:
from google.colab import files
files.upload()

{}

Attribute Information:

Both hour.csv and day.csv have the following fields, except hr which is not available in day.csv

- instant: record index
- dteday : date
- season : season (1:winter, 2:spring, 3:summer, 4:fall)
- yr : year (0: 2011, 1:2012)
- mnth : month ( 1 to 12)
- hr : hour (0 to 23)
- holiday : weather day is holiday or not (extracted from [Web Link])
- weekday : day of the week
- workingday : if day is neither weekend nor holiday is 1, otherwise is 0.
+ weathersit :
- 1: Clear, Few clouds, Partly cloudy, Partly cloudy
- 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
- 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
- 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
- temp : Normalized temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-8, t_max=+39 (only in hourly scale)
- atemp: Normalized feeling temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-16, t_max=+50 (only in hourly scale)
- hum: Normalized humidity. The values are divided to 100 (max)
- windspeed: Normalized wind speed. The values are divided to 67 (max)
- casual: count of casual users
- registered: count of registered users
- cnt: count of total rental bikes including both casual and registered

In [0]:
import numpy as numpy
import tensorflow as tf
import os
import pandas as pd

pd.options.display.max_columns=None

In [2]:
bike_df = pd.read_csv('/content/attachment_attachment_hour_lyst8188_lyst2724.csv')
bike_df.head()

Unnamed: 0,instant,dteday,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,0,6,0,1,0.24,0.2879,0.81,0.0,3,13,16
1,2,2011-01-01,1,0,1,1,0,6,0,1,0.22,0.2727,0.8,0.0,8,32,40
2,3,2011-01-01,1,0,1,2,0,6,0,1,0.22,0.2727,0.8,0.0,5,27,32
3,4,2011-01-01,1,0,1,3,0,6,0,1,0.24,0.2879,0.75,0.0,3,10,13
4,5,2011-01-01,1,0,1,4,0,6,0,1,0.24,0.2879,0.75,0.0,0,1,1


### Dummy encode the categorical variables

In [0]:
season_dummies=pd.get_dummies(bike_df.season, prefix='season',drop_first=True)
mnth_dummies=pd.get_dummies(bike_df.mnth, prefix='month',drop_first=True)
weather_dummies=pd.get_dummies(bike_df.weathersit,prefix='weather',drop_first=True)
weekday_dummies=pd.get_dummies(bike_df.weekday,prefix='weekday',drop_first=True)

**Append the dummy-encoded variables with original dataframe**

In [4]:
bike_df = pd.concat([bike_df,season_dummies,mnth_dummies,weather_dummies,weekday_dummies],axis=1)
bike_df.shape

(17379, 40)

**Drop the variables which were dummy-encoded**

In [0]:
del bike_df['season']
del bike_df['mnth']
del bike_df['weekday']
del bike_df['weathersit']
del bike_df['instant']
del bike_df['dteday']
del bike_df['casual']
del bike_df['registered']

In [6]:
bike_df.shape

(17379, 32)

In [7]:
bike_df.head()

Unnamed: 0,yr,hr,holiday,workingday,temp,atemp,hum,windspeed,cnt,season_2,season_3,season_4,month_2,month_3,month_4,month_5,month_6,month_7,month_8,month_9,month_10,month_11,month_12,weather_2,weather_3,weather_4,weekday_1,weekday_2,weekday_3,weekday_4,weekday_5,weekday_6
0,0,0,0,0,0.24,0.2879,0.81,0.0,16,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
1,0,1,0,0,0.22,0.2727,0.8,0.0,40,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
2,0,2,0,0,0.22,0.2727,0.8,0.0,32,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
3,0,3,0,0,0.24,0.2879,0.75,0.0,13,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
4,0,4,0,0,0.24,0.2879,0.75,0.0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1


**Convert the target variable into float64 data type and define input features and output variable**

In [0]:
#Let's convert dtype of cnt to float
bike_df['cnt'] = bike_df['cnt'].astype('float64')

In [9]:
target = bike_df['cnt']
bike_df.drop(['cnt'],axis=1,inplace=True)
features = bike_df
print(features.shape)
print(target.shape)

(17379, 31)
(17379,)


In [10]:
features.describe()


Unnamed: 0,yr,hr,holiday,workingday,temp,atemp,hum,windspeed,season_2,season_3,season_4,month_2,month_3,month_4,month_5,month_6,month_7,month_8,month_9,month_10,month_11,month_12,weather_2,weather_3,weather_4,weekday_1,weekday_2,weekday_3,weekday_4,weekday_5,weekday_6
count,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0,17379.0
mean,0.502561,11.546752,0.02877,0.682721,0.496987,0.475775,0.627229,0.190098,0.253697,0.258703,0.243512,0.077162,0.084757,0.082686,0.085621,0.082859,0.085621,0.084873,0.082686,0.083492,0.082686,0.085333,0.261465,0.08165,0.000173,0.142643,0.141147,0.142413,0.142183,0.143104,0.144542
std,0.500008,6.914405,0.167165,0.465431,0.192556,0.17185,0.19293,0.12234,0.435139,0.437935,0.429214,0.266856,0.278528,0.275415,0.279811,0.275676,0.279811,0.2787,0.275415,0.276632,0.275415,0.279384,0.439445,0.273839,0.013138,0.349719,0.348184,0.349484,0.349248,0.350189,0.351649
min,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,6.0,0.0,0.0,0.34,0.3333,0.48,0.1045,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,1.0,12.0,0.0,1.0,0.5,0.4848,0.63,0.194,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,1.0,18.0,0.0,1.0,0.66,0.6212,0.78,0.2537,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
max,1.0,23.0,1.0,1.0,1.0,1.0,1.0,0.8507,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


### Input features are given in different scale and should be stadardized before feeding them into Neural Network model

In [0]:
from sklearn.preprocessing import StandardScaler
ss = StandardScaler()

In [0]:
features_ss = ss.fit_transform(features)

**Split input and output into desired splits of train and validation**

In [13]:
from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(features_ss,target,test_size=0.2,random_state=12)
print(X_train.shape)
print(X_test.shape)
print(Y_train.shape)
print(Y_test.shape) 

(13903, 31)
(3476, 31)
(13903,)
(3476,)


## Build model
**Import the libraries from keras to model our data**

In [0]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,Dropout

In [0]:
model = Sequential()

In [0]:
model.add(Dense(150,activation='relu',input_shape=(31,)))      ## input shape 31 i.e we have 31 features and keep 150 neurons in the 1st hidden layer

In [0]:
model.add(Dense(90,activation='relu'))
model.add(Dense(60,activation='relu'))
model.add(Dense(30,activation='relu'))
model.add(Dense(1,activation='relu'))       ## We solving regression problem and should get output from one neuron
model.compile(optimizer='adam',loss='mse')  # We'll keep 'mse' as evaluation of the model for the time being later we'll find
                                                                                 # RMSE of the model

In [18]:
model.summary()    # Use summary to find parameters used across different layers of the NN model

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 150)               4800      
_________________________________________________________________
dense_1 (Dense)              (None, 90)                13590     
_________________________________________________________________
dense_2 (Dense)              (None, 60)                5460      
_________________________________________________________________
dense_3 (Dense)              (None, 30)                1830      
_________________________________________________________________
dense_4 (Dense)              (None, 1)                 31        
Total params: 25,711
Trainable params: 25,711
Non-trainable params: 0
_________________________________________________________________


**We need to have tensorboard see how parameters have been distributed across the layers and how optimizer has behaved at different epochs**

In [0]:
# It saves the best model having the least loss
filepath = 'weight.bike.preprocess.best.hdf5'
checkpoint = tf.keras.callbacks.ModelCheckpoint(filepath,monitor='loss',verbose=1,save_best_only=True,mode='auto')
log_dir = './tf-log/bike_v4'
tb_cb = tf.keras.callbacks.TensorBoard(log_dir=log_dir)  

In [20]:
!wget -q https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
!unzip ngrok-stable-linux-amd64.zip

Archive:  ngrok-stable-linux-amd64.zip
  inflating: ngrok                   


In [0]:
get_ipython().system_raw(
    'tensorboard --logdir {} --host 0.0.0.0 --port 6006 &'
    .format(log_dir)
)

In [0]:
get_ipython().system_raw('./ngrok http 6006 &')

In [23]:
! curl -s http://localhost:4040/api/tunnels | python3 -c \
    "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"

http://ed9e0d4a.ngrok.io


**Click on the link above to find the tensorboard of your Neural Network Architecture**

In [24]:
'''from keras.callbacks import EarlyStopping
stop=EarlyStopping(monitor='loss', min_delta=0, patience=5, verbose=1, mode='auto')'''

"from keras.callbacks import EarlyStopping\nstop=EarlyStopping(monitor='loss', min_delta=0, patience=5, verbose=1, mode='auto')"

## Fit the model

In [25]:
model.fit(X_train,Y_train,epochs=100,batch_size=32,callbacks=[tb_cb,checkpoint])

Epoch 1/100
Epoch 00001: loss improved from inf to 25096.20898, saving model to weight.bike.preprocess.best.hdf5
Epoch 2/100
Epoch 00002: loss improved from 25096.20898 to 18337.32031, saving model to weight.bike.preprocess.best.hdf5
Epoch 3/100
Epoch 00003: loss improved from 18337.32031 to 15318.35938, saving model to weight.bike.preprocess.best.hdf5
Epoch 4/100
Epoch 00004: loss improved from 15318.35938 to 11242.10449, saving model to weight.bike.preprocess.best.hdf5
Epoch 5/100
Epoch 00005: loss improved from 11242.10449 to 9519.27246, saving model to weight.bike.preprocess.best.hdf5
Epoch 6/100
Epoch 00006: loss improved from 9519.27246 to 8096.86328, saving model to weight.bike.preprocess.best.hdf5
Epoch 7/100
Epoch 00007: loss improved from 8096.86328 to 6605.11377, saving model to weight.bike.preprocess.best.hdf5
Epoch 8/100
Epoch 00008: loss improved from 6605.11377 to 5274.40332, saving model to weight.bike.preprocess.best.hdf5
Epoch 9/100
Epoch 00009: loss improved from 527

<tensorflow.python.keras.callbacks.History at 0x7f928001e3c8>

## Validate the model using validation data by finding *RMSE*

In [0]:
pred = model(X_test)

In [0]:
from sklearn.metrics import mean_squared_error

In [0]:
import numpy as np
y_test = np.array(Y_test)    # Convert y_test into numpy array otherwise can not be compared with predicted output

### RMSE

In [29]:
RMSE = np.sqrt(mean_squared_error(y_test,pred))
print('RMSE of the model\n',RMSE)

RMSE of the model
 40.71235987423827
