<a href="https://colab.research.google.com/github/TobiasSunderdiek/hyperparameter-tuning-with-tune/blob/master/predicting_bike_sharing_data_with_tune_for_hyperparameter_tuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Predicting bike sharing data with tune for hyperparameter tuning


This notebook is based on the udacity deep learning nanodegree project for predicting bike sharing data, which can be found here:

https://github.com/udacity/deep-learning-v2-pytorch/blob/master/project-bikesharing/Predicting_bike_sharing_data.ipynb

I use the udacity implementation as a starting point for a self-learning project in which I try to build a variation where the model is a fcn and the hyperparameter tuning is done with tune[1].

[1] https://ray.readthedocs.io/en/latest/tune.html

## Loading dataset from github
The original dataset is located here:

https://raw.githubusercontent.com/udacity/deep-learning-v2-pytorch/master/project-bikesharing/Bike-Sharing-Dataset/hour.csv

which originaly came from [1].

[1] Fanaee-T, Hadi, and Gama, Joao, "Event labeling combining ensemble detectors and background knowledge", Progress in Artificial Intelligence (2013): pp. 1-15, Springer Berlin Heidelberg, doi:10.1007/s13748-013-0040-3.

In [1]:
# Fetch a single file using the raw GitHub URL.
!curl --remote-name \
     -H 'Accept: application/vnd.github.v3.raw' \
     --location https://raw.githubusercontent.com/udacity/deep-learning-v2-pytorch/master/project-bikesharing/Bike-Sharing-Dataset/hour.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1129k  100 1129k    0     0  3679k      0 --:--:-- --:--:-- --:--:-- 3679k


In [2]:
import pandas as pd

rides = pd.read_csv("/content/hour.csv")
rides_origin = rides
rides.head()

Unnamed: 0,instant,dteday,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,0,6,0,1,0.24,0.2879,0.81,0.0,3,13,16
1,2,2011-01-01,1,0,1,1,0,6,0,1,0.22,0.2727,0.8,0.0,8,32,40
2,3,2011-01-01,1,0,1,2,0,6,0,1,0.22,0.2727,0.8,0.0,5,27,32
3,4,2011-01-01,1,0,1,3,0,6,0,1,0.24,0.2879,0.75,0.0,3,10,13
4,5,2011-01-01,1,0,1,4,0,6,0,1,0.24,0.2879,0.75,0.0,0,1,1


### Convert data
- hot-encode categorical features *season*, *weathersit*, *mnth*, *hr*, *weekday* and drop origin of this features
- drop fields *instant*, *dteday*, *atemp* and *workingday* as in the udacity project
- additionally drop fields *casual* and *registered*, focus on overall output *cnt*
- shift and scale continuous features *cnt*, *temp*, *hum*, *windspeed* so they have zero mean and standard deviation of 1

In [3]:
for feature in ['season', 'weathersit', 'mnth', 'hr', 'weekday']:
  hot_encoded_features = pd.get_dummies(rides[feature], prefix=feature, drop_first=False)
  rides = pd.concat([rides, hot_encoded_features], axis=1)
  rides = rides.drop(feature, axis=1)
rides = rides.drop(['instant', 'dteday', 'atemp', 'workingday', 'casual', 'registered'], axis=1)

feature_scaling_store = {}

for feature in ['cnt', 'temp', 'hum', 'windspeed']:
  mean, std = rides[feature].mean(), rides[feature].std()
  feature_scaling_store[feature] = [mean, std]
  rides.loc[:, feature] = (rides[feature] - mean) / std

rides.head()

Unnamed: 0,yr,holiday,temp,hum,windspeed,cnt,season_1,season_2,season_3,season_4,weathersit_1,weathersit_2,weathersit_3,weathersit_4,mnth_1,mnth_2,mnth_3,mnth_4,mnth_5,mnth_6,mnth_7,mnth_8,mnth_9,mnth_10,mnth_11,mnth_12,hr_0,hr_1,hr_2,hr_3,hr_4,hr_5,hr_6,hr_7,hr_8,hr_9,hr_10,hr_11,hr_12,hr_13,hr_14,hr_15,hr_16,hr_17,hr_18,hr_19,hr_20,hr_21,hr_22,hr_23,weekday_0,weekday_1,weekday_2,weekday_3,weekday_4,weekday_5,weekday_6
0,0,0,-1.334609,0.947345,-1.553844,-0.956312,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
1,0,0,-1.438475,0.895513,-1.553844,-0.823998,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
2,0,0,-1.438475,0.895513,-1.553844,-0.868103,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
3,0,0,-1.334609,0.636351,-1.553844,-0.972851,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
4,0,0,-1.334609,0.636351,-1.553844,-1.039008,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1


### Split into training,  testing and validation set
The data consists of entries of how many bikes are rented at one specific hour of the day. The total number of entries in the hour.csv is 17.379, which means divided by 24 there are datapoints for approximatly 724 days.

The last 21 days (3%) are used as testing data.

Of the remaining days, 60 days (8.5%) are used as validation data.

The training data consists of 643 days.

In [0]:
test_data = rides[-21*24:]
rides = rides[:-21*24]

validation_data = rides[-60*24:]
rides = rides[:-60*24]

train_data = rides

target_fields = ['cnt']

features_train, targets_train = train_data.drop(target_fields, axis=1), train_data[target_fields]
features_validation, targets_validation = validation_data.drop(target_fields, axis=1), validation_data[target_fields]
features_test, targets_test = test_data.drop(target_fields, axis=1), test_data[target_fields]

## Network architecture
I define the necessary parts of the architecture and try them out before handing over to tune.

### Hyperparameter

In [0]:
hyperparameter = {'learning_rate': 0.01, 'hidden_nodes': 25, 'epochs': 4000, 'batch_size': 1}

### Model

In [0]:
import torch.nn as nn
import torch

class BikeSharingModel(nn.Module):
  def __init__(self, input_nodes, hidden_nodes):
    super(BikeSharingModel, self).__init__()
    self.fc_1 = nn.Linear(input_nodes, hidden_nodes)
    self.fc_2 = nn.Linear(hidden_nodes, 1)

  def forward(self, x):
    x = self.fc_1(x)
    x = torch.sigmoid(x)
    x = self.fc_2(x)

    return x  

In [7]:
input_nodes = features_train.shape[1]

bikeSharingModel = BikeSharingModel(input_nodes, hyperparameter['hidden_nodes'])
bikeSharingModel

BikeSharingModel(
  (fc_1): Linear(in_features=56, out_features=25, bias=True)
  (fc_2): Linear(in_features=25, out_features=1, bias=True)
)

### Loss-function

In [0]:
criterion = nn.MSELoss()

### Optimizer

In [0]:
from torch import optim

optimizer = optim.Adam(bikeSharingModel.parameters(), lr=hyperparameter['learning_rate'])

## Train, validate and test/inference
Training is done with a batch size of random training data

In [10]:
# preparing for using tune
!pip install ray
!pip uninstall -y pyarrow
# cleanup tune log dir if exists
!rm -rf tune_logs

Collecting ray
[?25l  Downloading https://files.pythonhosted.org/packages/a7/21/de080b8458de41fd8ac20dd96547418c0ed94eab4359e5885f4041d61337/ray-0.7.3-cp36-cp36m-manylinux1_x86_64.whl (48.5MB)
[K     |████████████████████████████████| 48.5MB 1.3MB/s 
[?25hCollecting protobuf>=3.8.0 (from ray)
[?25l  Downloading https://files.pythonhosted.org/packages/eb/f4/a27952733796330cd17c17ea1f974459f5fefbbad119c0f296a6d807fec3/protobuf-3.9.1-cp36-cp36m-manylinux1_x86_64.whl (1.2MB)
[K     |████████████████████████████████| 1.2MB 35.6MB/s 
Collecting colorama (from ray)
  Downloading https://files.pythonhosted.org/packages/4f/a6/728666f39bfff1719fc94c481890b2106837da9318031f71a8424b662e12/colorama-0.4.1-py2.py3-none-any.whl
Collecting redis (from ray)
[?25l  Downloading https://files.pythonhosted.org/packages/bd/64/b1e90af9bf0c7f6ef55e46b81ab527b33b785824d65300bb65636534b530/redis-3.3.8-py2.py3-none-any.whl (66kB)
[K     |████████████████████████████████| 71kB 25.2MB/s 
Collecting funcsigs 

Uninstalling pyarrow-0.14.1:
  Successfully uninstalled pyarrow-0.14.1


In [11]:
import numpy as np
import torch
import sys
from ray import tune

def train_validate_and_test_with_tune_tracking(features_train, targets_train, features_validation, targets_validation, features_test, targets_test, hyperparameter, with_tune_tracking=False):
  input_nodes = features_train.shape[1]
  bikeSharingModel = BikeSharingModel(input_nodes, hyperparameter['hidden_nodes'])
  optimizer = optim.Adam(bikeSharingModel.parameters(), lr=hyperparameter['learning_rate'])
  criterion = nn.MSELoss()

  train_losses, validation_losses = [], []
  for epoch in range(1, hyperparameter['epochs']+1):
    batch_indices = np.random.choice(features_train.index, size=hyperparameter['batch_size'])
    features = torch.tensor(features_train.iloc[batch_indices].values).float()
    targets = torch.tensor(targets_train.iloc[batch_indices].values).float()

    optimizer.zero_grad()

    output = bikeSharingModel(features)

    loss = criterion(output, targets)

    loss.backward()

    optimizer.step()

    train_losses.append(loss.item())

    # validate
    with torch.no_grad():
      bikeSharingModel.eval()
      output_validation = bikeSharingModel(torch.tensor(features_validation.values).float())
      loss_validation = criterion(output_validation, torch.tensor(targets_validation.values).float())

    bikeSharingModel.train()

    validation_losses.append(loss_validation.item())

    # tune tracking
    if with_tune_tracking:
      tune.track.log(validation_loss_metric=loss_validation.item())

    # output
    if not with_tune_tracking:
      sys.stdout.write("\rEpoch: {} Training loss: {} Validation loss: {}".format(epoch, loss, loss_validation))
      sys.stdout.flush()

  # test
  with torch.no_grad():
    bikeSharingModel.eval()
    output_test = bikeSharingModel(torch.tensor(features_test.values).float())
    test_loss = criterion(output_test, torch.tensor(targets_test.values).float())
    bikeSharingModel.train()

  # tune tracking
  if with_tune_tracking:
     tune.track.log(train_losses_metric=train_losses.item())

  # output
  if not with_tune_tracking:
    print("Test loss ", test_loss.item())

  return train_losses, validation_losses, test_loss, output_test

ImportError: ignored

In [0]:
import matplotlib.pyplot as plt

def plot(train_losses, validation_losses, test_loss, output_test, targets_test):
  plt.plot(train_losses, label='Training loss')
  plt.plot(validation_losses, label='Validation loss')
  plt.legend()
  _ = plt.ylim(0, 0.75)

  fig, ax = plt.subplots(figsize=(8,4))

  mean, std = feature_scaling_store['cnt']
  ax.plot(output_test.numpy()*std + mean, label='Prediction')
  ax.plot((targets_test['cnt']*std + mean).values, label='Data')
  ax.set_xlim(right=len(output_test))
  ax.legend()
  dates = pd.to_datetime(rides_origin.iloc[test_data.index]['dteday'])
  dates = dates.apply(lambda d: d.strftime('%b %d'))
  ax.set_xticks(np.arange(len(dates))[12::24])
  _ = ax.set_xticklabels(dates[12::24], rotation=45)

## Manual hyperparameter tuning

In [0]:
hyperparameter = {'learning_rate': 0.01, 'hidden_nodes': 25, 'epochs': 4000, 'batch_size': 1}
train_losses, validation_losses, test_loss, output_test = train_validate_and_test_with_tune_tracking(features_train, targets_train, features_validation, targets_validation, features_test, targets_test, hyperparameter)
plot(train_losses, validation_losses, test_loss, output_test, targets_test)

In [0]:
# try other hyperparameter
hyperparameter = {'learning_rate': 0.5, 'hidden_nodes': 25, 'epochs': 4000, 'batch_size': 128}
train_losses, validation_losses, test_loss, output_test = train_validate_and_test_with_tune_tracking(features_train, targets_train, features_validation, targets_validation, features_test, targets_test, hyperparameter)
plot(train_losses, validation_losses, test_loss, output_test, targets_test)

## Hyperparamter tuning with tune

In [0]:
def tune_it(config):
  train_validate_and_test_with_tune_tracking(features_train, targets_train, features_validation, targets_validation, features_test, targets_test, config, with_tune_tracking=True)

tune_result = tune.run(tune_it,
                   config={'learning_rate': tune.grid_search([0.01, 0.5, 0.4]),
                            'hidden_nodes': tune.grid_search([15, 25, 30]),
                            'epochs': 4,#4000,
                            'batch_size': tune.grid_search([1, 128])
                            },
                   local_dir='tune_logs',
                   verbose=0)

print("Best config: ", tune_result.get_best_config(metric='mymetric'))

In [0]:
%load_ext tensorboard
%tensorboard --logdir tune_logs/tune_it

## Further steps/TODO

- test with adding field *workingday*
- test with making field *holiday* categorical
- test with delete field *year*
- test with adding field *atemp*


- check standard deviation
- drop header in data
- use GPU
- add own dataloader
- add dropout
- transformation of tensor/dataframe for data sets is not dry


- losses of this model differ from losses of original implementation with same hyperparameter
- in the original project, the weights of the network are initialized


- change tune from grid search to bayesian optimization and write comments
- try different bag sizes for test/validation/train sets
- build fcn best practice by Andrej Karpathy