# Sales Volume Prediction: Outsmarting Bike Sharing with Machine Learning

<img src="../assets/ac-logo.png" style="height: 150px">

### About me:
Alex Divivi

alex.divivi@artificial-connect.com

Data Scientist and Developer

# What even is Machine Learning?

![title](../assets/machine-learning1.png)

## Linear Regression

![title](../assets/regression-ml-requirements.jpg)

## Difference between ML and DL

![title](../assets/main-qimg-6c1dc5666bd31bf16120d332957b4059.png)

## Data Science Workflow

![title](../assets/crisp-dm-4-problems-fig1.png)


# Let's apply Machine Learning

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
import torch
import torch.nn as nn

## The data

Data source: https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset

Download the data here: https://drive.google.com/file/d/14IeffIWwjWnWkOQy5vG33kuVLYuuq8Ni/view?usp=sharing

Place the .csv in prototyping/data

A critical step in Machine Learning is preparing the data correctly. Variables on different scales make it difficult for the model to efficiently learn the correct weights. Below, I've written the code to load and prepare the data. 

In [None]:
data_path = '../data/hour.csv'

data = pd.read_csv(data_path)

- instant: record index
- dteday : date
- season : season (1:spring, 2:summer, 3:fall, 4:winter)
- yr : year (0: 2011, 1:2012)
- mnth : month ( 1 to 12)
- hr : hour (0 to 23)
- holiday : weather day is holiday or not
- weekday : day of the week
- workingday : if day is neither weekend nor holiday is 1, otherwise is 0.
+ weathersit : 
- 1: Clear, Few clouds, Partly cloudy, Partly cloudy
- 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
- 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
- 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
- temp : Normalized temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-8, t_max=+39 (only in hourly scale)
- atemp: Normalized feeling temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-16, t_max=+50 (only in hourly scale)
- hum: Normalized humidity. The values are divided to 100 (max)
- windspeed: Normalized wind speed. The values are divided to 67 (max)
- casual: count of casual users
- registered: count of registered users
- cnt: count of total rental bikes including both casual and registered

In [None]:
data.head()

This dataset has the number of riders for each hour of each day from January 1 2011 to December 31 2012. The number of riders is split between casual and registered, summed up in the cnt column. You can see the first few rows of the data above.

In [None]:
data.info()

In [None]:
data.nunique()

Below is a plot showing the number of bike riders over the first 10 days in the data set. You can see the hourly rentals here. This data is pretty complicated! The weekends have lower over all ridership and there are spikes when people are biking to and from work during the week. Looking at the data above, we also have information about temperature, humidity, and windspeed, all of these likely affecting the number of riders. You'll be trying to capture all this with your model.

In [None]:
data[:24*10].plot(x='dteday', y='cnt')

## Data preprocessing

Here we have some categorical variables like season, weather, month. To include these in our model, we'll need to make binary dummy variables. This is simple to do with Pandas thanks to get_dummies().

In [None]:
categorical_variables = # TODO: Implement categorical variable names as array containing strings
for each in categorical_variables:
    cat = pd.get_dummies(data[each], prefix=each, drop_first=False)
    data = pd.concat([data, cat], axis=1)
data.head()

For our model to learn something, we'll need to get rid of the old columns containing the categorical variables, the dteday (object) and the instant (index).

Do we also need to drop 'casual' and 'registered'? Try to explain why or why not.
You may try to keep / drop 'atemp' and 'workingday' and see how the model reacts.

In [None]:
fields_to_drop = # TODO: Implement variable names to drop (remove) from the dataset as array containing strings
encoded_data = data.drop(fields_to_drop, axis=1)
encoded_data.head()

We'll save the last 31 days of the data to use as a test set after we've trained the model. We'll use this set to make predictions and compare them with the actual number of riders.

In [None]:
# Save the last 31 days for final testing
test_data = encoded_data[-31*24:]
train_data = encoded_data[:-31*24]

In [None]:
# Split the data in X (features) and y (target)
X, X_test = np.array(train_data.drop(['cnt'], axis=1)), np.array(test_data.drop(['cnt'], axis=1))
y, y_test = np.array(train_data['cnt']), np.array(test_data['cnt'])

In [None]:
target_scaler = preprocessing.MinMaxScaler()
y = target_scaler.fit_transform(y.reshape(-1, 1))

We'll split the data into two sets, one for training and one for validating as the model is being trained. It's important to split the data randomly so all cases are represented in both sets.

In [None]:
# Split X and y in a train
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

## Training Linear Regression Model with PyTorch

$$ \tilde{f}(\boldsymbol{x}) = \boldsymbol{w}^T \cdot\boldsymbol{x} + b $$

In [None]:
# Initialize PyTorch and Tensors
device = torch.device("cpu")
dtype = torch.float
X_train = torch.as_tensor(X_train, device=device, dtype=dtype)
X_val = torch.as_tensor(X_val, device=device, dtype=dtype)
y_train = torch.as_tensor(y_train, device=device, dtype=dtype)
y_val = torch.as_tensor(y_val, device=device, dtype=dtype)

### Choose the number of epochs
This is the number of times the dataset will pass through the model, each time updating the weights. As the number of epochs increases, the model becomes better and better at predicting the targets in the training set. However, it can become too specific to the training set and will fail to generalize to the validation set. This is called overfitting. You'll need to choose enough epochs to train the model well but not too many or you'll be overfitting.

### Choose the learning rate
This scales the size of weight updates. If this is too big, the weights tend to explode and the model fails to fit the data. A good choice to start at is 0.01. If the model has problems fitting the data, try reducing the learning rate. Note that the lower the learning rate, the smaller the steps are in the weight updates and the longer it takes for the model to converge.

### Choose a batch size
This defines the number of samples that will be propagated through the model. Remember: After each propagation the weights will be updated. For example: A batch size of 32 will update the weights after 32 samples have passed the model. Defining a batch size helps the model to converge much faster. Common values are 16, 32, 64 and 128.

In [None]:
# TODO: Hyperparameter
epoch = 
learning_rate = 
batch_size = 

In [None]:
# Initialize the model
model = nn.Sequential(
     nn.Linear(X_train.shape[1], 1),
     nn.Sigmoid())

criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

In [None]:
# Train the model
loss_hist = []
loss_hist_val = []
ix = int(epoch / 2)

for t in range(epoch):
    for batch in range(int(X_train.shape[0]/batch_size)):
        
        batch_x = X_train[batch * batch_size : (batch + 1) * batch_size, :]
        batch_y = y_train[batch * batch_size : (batch + 1) * batch_size, :]
        
        outputs = model(batch_x)
        loss = criterion(outputs, batch_y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        val_outputs = model(X_val)
        loss_v = criterion(val_outputs, y_val)

    # Save and print Error all ix iterations
    if t % ix == 0:
        loss_hist.append(loss.item())
        loss_hist_val.append(loss_v.item())
        print(t, f'Error: {loss.item()}, Val_Error: {loss_v.item()}')

$$ \lVert  \tilde{f}(\boldsymbol{x}) - y \rVert_2^2 = \frac{1}{N}\sum\left(\tilde{f}(\boldsymbol{x}) - y \right)^2 $$


Below you see a plot of the Mean Squared Error of the model. While training the model the Error / Loss should steadily decrease. Note that Traning Loss and Validation Loss should be close together, otherwise your model might be underfitting or overfitting.

In [None]:
plt.plot(loss_hist, label='Training loss')
plt.plot(loss_hist_val, label='Validation loss')
plt.legend()

Here, use the test data to check that model is accurately making predictions. If your predictions don't match the data, try adjusting the hyperparameters and / or experiment with the features. (You may also just upgrade to a Neural Network...)

In [None]:
X_test = torch.as_tensor(X_test, device=device, dtype=dtype)

fig, ax = plt.subplots(figsize=(20,10))

predictions = model(X_test)
ax.plot(target_scaler.inverse_transform(predictions.detach().numpy()), label='Prediction')
ax.plot(y_test, label='Data')
ax.set_xlim(right=len(predictions))
ax.legend()

dates = pd.to_datetime(data.loc[test_data.index]['dteday'])
dates = dates.apply(lambda d: d.strftime('%b %d'))
ax.set_xticks(np.arange(len(dates))[12::24])
_ = ax.set_xticklabels(dates[12::24], rotation=45)

How would you describe the perfomance of the model? Does it show weaknesses? If yes, why?

Bonus Question: What do we need to do to upgrade our simple Linear Regression Model to a Neural Network (ANN)? 
How about a Deep Neural Network (DNN), Deep Learning?