# The Regular Neural Network for the Prediction of Bike Sharing

The goal of the project is to predict the ranges of cnt (the count of a new bike shares) each hour based on the given factors. To achieve this goal, a regular neural network based on tensorflow is designed.

First of all, several essential packages and training data are imported. The data is from https://www.kaggle.com/c/cee-498-project1-london-bike-sharing 

In [16]:
import numpy as np
import tensorflow as tf
import pandas as pd

In [17]:
df = pd.read_csv("../input/cee-498-project1-london-bike-sharing/train.csv")
df

Unnamed: 0,timestamp,cnt,t1,t2,hum,wind_speed,weather_code,is_holiday,is_weekend,season
0,2015-01-04 02:00:00,134,2.5,2.5,96.5,0.0,1.0,0.0,1.0,3.0
1,2015-01-04 07:00:00,75,1.0,-1.0,100.0,7.0,4.0,0.0,1.0,3.0
2,2015-01-04 08:00:00,131,1.5,-1.0,96.5,8.0,4.0,0.0,1.0,3.0
3,2015-01-04 09:00:00,301,2.0,-0.5,100.0,9.0,3.0,0.0,1.0,3.0
4,2015-01-04 10:00:00,528,3.0,-0.5,93.0,12.0,3.0,0.0,1.0,3.0
...,...,...,...,...,...,...,...,...,...,...
12218,2017-01-03 19:00:00,1042,5.0,1.0,81.0,19.0,3.0,0.0,0.0,3.0
12219,2017-01-03 20:00:00,541,5.0,1.0,81.0,21.0,4.0,0.0,0.0,3.0
12220,2017-01-03 21:00:00,337,5.5,1.5,78.5,24.0,4.0,0.0,0.0,3.0
12221,2017-01-03 22:00:00,224,5.5,1.5,76.0,23.0,4.0,0.0,0.0,3.0


## Data Preprocessing

In data preprocessing, split "timestamp" to "year", "month", "day" and "hour". Besides, "t1" is deleted because "t1" and "t2" are highly correlated.

In [18]:
df.dropna(axis=0, how='any')
df['timestamp'] = pd.to_datetime(df['timestamp'])
df['year'] = df['timestamp'].dt.year
df['month'] = df['timestamp'].dt.month
df['day'] = df['timestamp'].dt.day
df['hour'] = df['timestamp'].dt.hour
df = df.drop(['timestamp','t1'], axis=1)
df

Unnamed: 0,cnt,t2,hum,wind_speed,weather_code,is_holiday,is_weekend,season,year,month,day,hour
0,134,2.5,96.5,0.0,1.0,0.0,1.0,3.0,2015,1,4,2
1,75,-1.0,100.0,7.0,4.0,0.0,1.0,3.0,2015,1,4,7
2,131,-1.0,96.5,8.0,4.0,0.0,1.0,3.0,2015,1,4,8
3,301,-0.5,100.0,9.0,3.0,0.0,1.0,3.0,2015,1,4,9
4,528,-0.5,93.0,12.0,3.0,0.0,1.0,3.0,2015,1,4,10
...,...,...,...,...,...,...,...,...,...,...,...,...
12218,1042,1.0,81.0,19.0,3.0,0.0,0.0,3.0,2017,1,3,19
12219,541,1.0,81.0,21.0,4.0,0.0,0.0,3.0,2017,1,3,20
12220,337,1.5,78.5,24.0,4.0,0.0,0.0,3.0,2017,1,3,21
12221,224,1.5,76.0,23.0,4.0,0.0,0.0,3.0,2017,1,3,22


Assign columns to dataset in the type of tensorflow. "cnt" is the target and the other columns are factors.

In [19]:
cnt = df.pop('cnt')
dataset = tf.data.Dataset.from_tensor_slices((df.values, cnt.values))

In [20]:
for feat, targ in dataset.take(5):
  print ('Features: {}, Target: {}'.format(feat, targ))

Features: [2.500e+00 9.650e+01 0.000e+00 1.000e+00 0.000e+00 1.000e+00 3.000e+00
 2.015e+03 1.000e+00 4.000e+00 2.000e+00], Target: 134
Features: [-1.000e+00  1.000e+02  7.000e+00  4.000e+00  0.000e+00  1.000e+00
  3.000e+00  2.015e+03  1.000e+00  4.000e+00  7.000e+00], Target: 75
Features: [-1.000e+00  9.650e+01  8.000e+00  4.000e+00  0.000e+00  1.000e+00
  3.000e+00  2.015e+03  1.000e+00  4.000e+00  8.000e+00], Target: 131
Features: [-5.000e-01  1.000e+02  9.000e+00  3.000e+00  0.000e+00  1.000e+00
  3.000e+00  2.015e+03  1.000e+00  4.000e+00  9.000e+00], Target: 301
Features: [-5.000e-01  9.300e+01  1.200e+01  3.000e+00  0.000e+00  1.000e+00
  3.000e+00  2.015e+03  1.000e+00  4.000e+00  1.000e+01], Target: 528


The training dataset is splited to mini batches.

In [21]:
train_dataset = dataset.shuffle(len(df)).batch(batch_size=150).repeat(20)

In [22]:
for feat, targ in train_dataset.take(1):
  print ('Features: {}, Target: {}'.format(feat, targ))

Features: [[14.  88.  24.  ... 10.  22.   2. ]
 [ 3.5 87.   6.5 ... 11.  11.   7. ]
 [17.5 91.   2.  ...  6.  20.  18. ]
 ...
 [14.  72.  17.  ...  5.  17.  22. ]
 [ 8.  88.  12.  ... 10.  13.   8. ]
 [ 3.  76.   7.  ...  2.  11.  12. ]], Target: [  68 1967 1039 2520  466 1455   68  594  152   35 4034  594 1263  562
  408 1154  171 1322  617  720  109   74  650 1733  186 1824 2553 1788
  579  105   98 1104 1735  121 2289 1805 2636 3913 1896 5008 2828  387
  120  853  204 1739 1721  585 2059 1228  623  121  351  840 3916  150
 1363 1499  493  129  176 1911  918  514  334  302  982   58 1709 2524
   52 1129  453  563 1555 1485   39  739 4953  852 4470 2754 2876  780
  120 3680 1708 1160 1530 3027  592 1544  154   62  572  964   46  992
  307  607 2244 2037 5117 1086  169  727  738  295  146  966 1010  101
  511  208 1208  210   93 1071 1771  166 2488 1076 2761 2013  381 4133
 2710 1751  310 2675 1469  282 1060 1482  975   66   60 4323  143 1399
  872 2053  478 1578  269  822 2437  640 41

## The Archtecture of Model

The layers are shown as follow. The loss is defined by mean square error. The learning rate will be changed by epoches during the training.

In [23]:
model = tf.keras.Sequential(
    [tf.keras.layers.Dense(units=32, input_shape=(11,)),
     tf.keras.layers.BatchNormalization(),
     tf.keras.layers.Dense(64, activation='relu'),
     tf.keras.layers.BatchNormalization(),
     tf.keras.layers.Dense(128, activation='relu'),
     tf.keras.layers.BatchNormalization(),
     tf.keras.layers.Dense(64, activation='relu'),
     tf.keras.layers.BatchNormalization(),
     tf.keras.layers.Dense(1)
    ])

model.compile(
     optimizer=tf.keras.optimizers.Adam(lr=0.001),
     loss='mean_squared_error',
    )

def scheduler(epoch, lr):
    if epoch < 10:
        return lr
    else:
        return lr * tf.math.exp(-0.1)**(epoch//4-2)

callback = [tf.keras.callbacks.LearningRateScheduler(scheduler),
            tf.keras.callbacks.EarlyStopping(monitor='loss',
                                             min_delta=0, patience=32,
                                             mode='min', restore_best_weights=True)]

history = model.fit(train_dataset, epochs=150, callbacks=[callback])

Epoch 1/150
Epoch 2/150
Epoch 3/150
Epoch 4/150
Epoch 5/150
Epoch 6/150
Epoch 7/150
Epoch 8/150
Epoch 9/150
Epoch 10/150
Epoch 11/150
Epoch 12/150
Epoch 13/150
Epoch 14/150
Epoch 15/150
Epoch 16/150
Epoch 17/150
Epoch 18/150
Epoch 19/150
Epoch 20/150
Epoch 21/150
Epoch 22/150
Epoch 23/150
Epoch 24/150
Epoch 25/150
Epoch 26/150
Epoch 27/150
Epoch 28/150
Epoch 29/150
Epoch 30/150
Epoch 31/150
Epoch 32/150
Epoch 33/150
Epoch 34/150
Epoch 35/150
Epoch 36/150
Epoch 37/150
Epoch 38/150
Epoch 39/150
Epoch 40/150
Epoch 41/150
Epoch 42/150
Epoch 43/150
Epoch 44/150
Epoch 45/150
Epoch 46/150
Epoch 47/150
Epoch 48/150
Epoch 49/150
Epoch 50/150
Epoch 51/150
Epoch 52/150
Epoch 53/150
Epoch 54/150
Epoch 55/150
Epoch 56/150
Epoch 57/150
Epoch 58/150
Epoch 59/150
Epoch 60/150
Epoch 61/150
Epoch 62/150
Epoch 63/150
Epoch 64/150
Epoch 65/150
Epoch 66/150
Epoch 67/150
Epoch 68/150
Epoch 69/150
Epoch 70/150
Epoch 71/150
Epoch 72/150
Epoch 73/150
Epoch 74/150
Epoch 75/150
Epoch 76/150
Epoch 77/150
Epoch 78

## The Test of Model

Import test data to test the accuracy of the model.

The performance of the neural network is evaluated based on the RMSE of predictions.

In [24]:
df2 = pd.read_csv("../input/cee-498-project1-london-bike-sharing/test.csv")
df2.dropna(axis=0, how='any')
df2['timestamp'] = pd.to_datetime(df2['timestamp'])
df2['year'] = df2['timestamp'].dt.year
df2['month'] = df2['timestamp'].dt.month
df2['day'] = df2['timestamp'].dt.day
df2['hour'] = df2['timestamp'].dt.hour
df_test = df2.drop(['timestamp','t1'], axis=1)
df_test

Unnamed: 0,t2,hum,wind_speed,weather_code,is_holiday,is_weekend,season,year,month,day,hour
0,2.0,93.0,6.0,3.0,0.0,1.0,3.0,2015,1,4,0
1,2.5,93.0,5.0,1.0,0.0,1.0,3.0,2015,1,4,1
2,2.0,100.0,0.0,1.0,0.0,1.0,3.0,2015,1,4,3
3,0.0,93.0,6.5,1.0,0.0,1.0,3.0,2015,1,4,4
4,2.0,93.0,4.0,1.0,0.0,1.0,3.0,2015,1,4,5
...,...,...,...,...,...,...,...,...,...,...,...
5186,2.5,76.0,11.0,1.0,1.0,0.0,3.0,2017,1,2,16
5187,0.0,81.0,11.0,1.0,1.0,0.0,3.0,2017,1,2,19
5188,-1.5,81.0,14.0,1.0,0.0,0.0,3.0,2017,1,3,9
5189,0.0,78.0,21.0,1.0,0.0,0.0,3.0,2017,1,3,11


In [25]:
test_dataset = tf.data.Dataset.from_tensor_slices(df_test.values).batch(1)
for feat in test_dataset.take(5):
  print ('Features: {}'.format(feat))

Features: [[2.000e+00 9.300e+01 6.000e+00 3.000e+00 0.000e+00 1.000e+00 3.000e+00
  2.015e+03 1.000e+00 4.000e+00 0.000e+00]]
Features: [[2.500e+00 9.300e+01 5.000e+00 1.000e+00 0.000e+00 1.000e+00 3.000e+00
  2.015e+03 1.000e+00 4.000e+00 1.000e+00]]
Features: [[2.000e+00 1.000e+02 0.000e+00 1.000e+00 0.000e+00 1.000e+00 3.000e+00
  2.015e+03 1.000e+00 4.000e+00 3.000e+00]]
Features: [[0.000e+00 9.300e+01 6.500e+00 1.000e+00 0.000e+00 1.000e+00 3.000e+00
  2.015e+03 1.000e+00 4.000e+00 4.000e+00]]
Features: [[2.000e+00 9.300e+01 4.000e+00 1.000e+00 0.000e+00 1.000e+00 3.000e+00
  2.015e+03 1.000e+00 4.000e+00 5.000e+00]]


In [26]:
prediction = model.predict(test_dataset)

In [27]:
df2['cnt']=prediction
df2

Unnamed: 0,timestamp,t1,t2,hum,wind_speed,weather_code,is_holiday,is_weekend,season,year,month,day,hour,cnt
0,2015-01-04 00:00:00,3.0,2.0,93.0,6.0,3.0,0.0,1.0,3.0,2015,1,4,0,405.224823
1,2015-01-04 01:00:00,3.0,2.5,93.0,5.0,1.0,0.0,1.0,3.0,2015,1,4,1,309.454895
2,2015-01-04 03:00:00,2.0,2.0,100.0,0.0,1.0,0.0,1.0,3.0,2015,1,4,3,79.884674
3,2015-01-04 04:00:00,2.0,0.0,93.0,6.5,1.0,0.0,1.0,3.0,2015,1,4,4,125.285057
4,2015-01-04 05:00:00,2.0,2.0,93.0,4.0,1.0,0.0,1.0,3.0,2015,1,4,5,101.978416
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5186,2017-01-02 16:00:00,5.0,2.5,76.0,11.0,1.0,1.0,0.0,3.0,2017,1,2,16,1102.733032
5187,2017-01-02 19:00:00,3.0,0.0,81.0,11.0,1.0,1.0,0.0,3.0,2017,1,2,19,393.207184
5188,2017-01-03 09:00:00,2.5,-1.5,81.0,14.0,1.0,0.0,0.0,3.0,2017,1,3,9,1754.050659
5189,2017-01-03 11:00:00,4.0,0.0,78.0,21.0,1.0,0.0,0.0,3.0,2017,1,3,11,666.676514


In [28]:
output = df2.drop(['t1','t2','hum','wind_speed','weather_code','is_holiday','is_weekend','season','year','month','day','hour'], axis=1)

The CSV document of results is uploaded to Kaggle for evaluated.

In [29]:
output.to_csv('output_32_128_128_64_1_001xe0.1xepoch4-2_min_loss_32.csv',index=False)

The model is saved in h5 format.

In [30]:
model.save("32_128_128_64_1_001xe0.1xepoch4-2_min_loss_32.h5")