<a href="https://colab.research.google.com/github/Daivar/Deep_Learning_Models/blob/main/Pytorch_regresion_bike_sharing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Multiple regression w/ Pytorch

## Problem statement
Given data counting bike sharing statistics, predict future demand for bikes.

Bike sharing systems are a new generation of traditional bike rentals where the whole process from membership, rental and return back has become automatic. Through these systems, user is able to easily rent a bike from a particular position and return back to another position. Currently, there are about over 500 bike-sharing programs around the world which are composed of over 500 thousands bicycles. Today, there exists great interest in these systems due to their important role in traffic, environmental and health issues.

Apart from interesting real-world applications of bike sharing systems, the characteristics of data being generated by these systems make them attractive for the research. Opposed to other transport services such as bus or subway, the duration of travel, departure and arrival position is explicitly recorded in these systems. This feature turns bike sharing system into a virtual sensor network that can be used for sensing mobility in the city. Hence, it is expected that most of important events in the city could be detected via monitoring these data.

This dataset contains the hourly and daily count of rental bikes between years 2011 and 2012 in Capital bikeshare system in Washington, DC with the corresponding weather and seasonal information.

https://www.kaggle.com/marklvl/bike-sharing-dataset

In [None]:
!pip install graphviz
!pip install hiddenlayer

In [None]:
import torch
import hiddenlayer as hl

import pandas as pd
import numpy as np

import seaborn as sns
import matplotlib.pyplot as plt

import sklearn

In [None]:
# https://www.kaggle.com/marklvl/bike-sharing-dataset
# !wget https://www.kaggle.com/marklvl/bike-sharing-dataset/download -O data.zip # need to add auth data
!wget https://github.com/MindaugasBernatavicius/DeepLearningCourse/raw/master/06_Regression_with_Neural_Networks_and_Tabular_Data/28865_36778_bundle_archive.zip 


!unzip -n *archive.zip
# !rm -rf bike-sharing-dataset/
!rm -f *.txt *.csv
!rm -f my_model

In [None]:
data = pd.read_csv('./bike-sharing-dataset/hour.csv', index_col=0)

In [None]:
data.head()

### Reversing the ordinal encoding

In [None]:
data["season"].replace({1: "spring", 2: "summer", 3: "fall", 4: "winter"}, inplace=True)
data.head()

In [None]:
data.shape

### Visualizing relationships
We see that for both the years fall is the most popular season for bike rentals.
The Bike rentals seem to have increased in one year

In [None]:
plt.figure(figsize=(8, 6))

sns.barplot('yr', 'cnt', hue = 'season', data = data, ci=None)
plt.legend(loc = 'upper right', bbox_to_anchor=(1.2,0.5))

plt.xlabel('Year')
plt.ylabel('Total number of bikes rented')

plt.title('Number of bikes rented per season')

workingday: If day is neither weekend nor holiday is 1, otherwise is 0
we see that when it is not a working day more bikes are rented for all months

In [None]:
plt.figure(figsize=(8, 6))

sns.barplot(x = 'mnth', y = 'cnt', hue = 'workingday', data = data)

plt.title('Number of bikes rented per month')
plt.show()

number of bikes rented are higher for high temperature

In [None]:
plt.figure(figsize=(12, 12))

fig = sns.scatterplot(x = 'temp', y = 'cnt', data = data)

plt.xlabel('Temperature')
plt.ylabel('Total number of bikes rented')

In [None]:
plt.figure(figsize=(10, 10))
sns.heatmap(data.corr(), annot=True, linewidths=0.05, fmt= '.2f')
plt.show()

### One-hot encoding on season

In [None]:
data.sample(5)

In [None]:
data = pd.get_dummies(data, columns= ['season'])

In [None]:
data.sample(5)

### Choose features

In [None]:
columns = ['registered', 'holiday', 'weekday', 
           'weathersit', 'temp', 'atemp',
           'season_fall', 'season_spring', 
           'season_summer', 'season_winter']

features = data[columns]

In [None]:
features.head()

In [None]:
target = data[['cnt']]

In [None]:
target.head()

In [None]:
from sklearn.model_selection import train_test_split

X_train, x_test, Y_train, y_test = train_test_split(features, target, test_size=0.2)

In [None]:
X_train_tensor = torch.tensor(X_train.values, dtype = torch.float)
x_test_tensor = torch.tensor(x_test.values, dtype = torch.float)

Y_train_tensor = torch.tensor(Y_train.values, dtype = torch.float)
y_test_tensor = torch.tensor(y_test.values, dtype = torch.float)

In [None]:
X_train_tensor.shape

In [None]:
Y_train_tensor.shape

In [None]:
import torch.utils.data as data_utils 

### torch.utils.data.TensorDataset(*tensors) - Dataset wrapping tensors.Each sample will be retrieved by indexing tensors along the first dimension.

In [None]:
train_data = data_utils.TensorDataset(X_train_tensor, Y_train_tensor)

## torch.utils.data.DataLoader - combines a dataset and a sampler, and provides single or multi-process iterators over the dataset.

torch.utils.data.DataLoader provides

Batching the data
Shuffling the data
Load the data in parallel using multiprocessing workers

In [None]:
train_loader = data_utils.DataLoader(train_data, batch_size=1000, shuffle=True)

In [None]:
len(train_loader)

In [None]:
features_batch, target_batch = iter(train_loader).next()

In [None]:
features_batch.shape

In [None]:
target_batch.shape

## If it's a time series data, why can we just split it like that? We know that train_test_split() produces a random split, however while splitting like this we simply subdivide the data randomly and the pattern of bike shares increasing "as the days go by" is still there.

More concretelly we would be training a time series model with data missing. This is not ideal (that is why we ommited date dimension) and is a topic worthy of separate discusion which would be centered around "dealing with time series data", "splitting time series data", "time based cross validation", "rolling window analysis", etc.

https://stats.stackexchange.com/questions/117350/how-to-split-dataset-for-time-series-prediction

https://towardsdatascience.com/time-based-cross-validation-d259b13d42b8

### NM Definition
Define the parameters for the neural network
inp sets the input size matching the shape of the X_train_tensor.
out will be used to set the size of the output from the neural network. We only predict a single output for each day, so this will be 1
hid is used to set the number of hidden neurons in our neural network
loss_fn is MSELoss since we're performing a linear regression

In [None]:
inp = X_train_tensor.shape[1]
out = 1

hid = 10

loss_fn = torch.nn.MSELoss()

### nn.Sequential: Use the nn package to define our model as a sequence of layers. nn.Sequential is a Module which contains other Modules, and applies them in sequence to produce its output. Each Linear Module computes output from input using a linear function, and holds internal Tensors for its weight and bias.

nn.Linear: Applies a linear transformation to the incoming data: y=Ax+b
parameters:
in_features – size of each input sample out_features – size of each output sample bias – If set to False, the layer will not learn an additive bias. Default: True

Sigmoid : Applies the element-wise function Sigmoid(x)= 1 / (1+exp(−x))

Dropout : During training, randomly zeroes some of the elements of the input tensor with probability p using samples from a Bernoulli distribution. Each channel will be zeroed out independently on every forward call.

### Creating model using nn.Sequential
Steps:

first run only with two linear layer
then run ReLU linear
then use dropout with all the layers to regularise the model

### Linear layers are enough for this problem.

In [None]:
# model = torch.nn.Sequential(torch.nn.Linear(inp, hid),
#                             torch.nn.Linear(hid, out))

# model = torch.nn.Sequential(torch.nn.Linear(inp, hid),
#                             torch.nn.ReLU(),
#                             torch.nn.Linear(hid, out))

model = torch.nn.Sequential(torch.nn.Linear(inp, hid),
                            torch.nn.ReLU(),
                            torch.nn.Dropout(p=0.2),
                            torch.nn.Linear(hid, out))

# model = torch.nn.Sequential(torch.nn.Linear(inp, hid),
#                             torch.nn.Sigmoid(),
#                             torch.nn.Linear(hid, out))

In [None]:
# model = torch.nn.Sequential(torch.nn.Linear(inp, hid),
#                             torch.nn.Linear(hid, out))

model = torch.nn.Sequential(torch.nn.Linear(inp, hid),
                            torch.nn.ReLU(),
                            torch.nn.Linear(hid, out))

# model = torch.nn.Sequential(torch.nn.Linear(inp, hid),
#                            torch.nn.ReLU(),
#                           torch.nn.Dropout(p=0.2),
#                            torch.nn.Linear(hid, out))

# model = torch.nn.Sequential(torch.nn.Linear(inp, hid),
#                             torch.nn.Sigmoid(),
#                             torch.nn.Linear(hid, out))

In [None]:
hl.build_graph(model, torch.zeros([10, inp]))

In [None]:
optimizer = torch.optim.Adam(model.parameters(), lr = 0.001)

### Training our model
Foward Pass:

Pred# icting Y with input data X
# Finding Loss:

# Finding difference between Y_train_tensor(target) and output using MSEloss # function defined above
Back Propagation:

starting with zero gradients before back propogation
back propogation is done by simply loss.backward() function
optimizer step

All optimizers implement a step() method, that updates the parameters.
reducing weight with multiple of learning rate and gradient

In [None]:
total_step = len(train_loader)
print(total_step)

In [None]:
num_epochs = 100

for epoch in range(num_epochs + 1):
    for i, (features, target) in enumerate(train_loader):
        output = model(features)
        loss = loss_fn(output, target)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        if epoch % 10 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

In [None]:
model.eval()

with torch.no_grad():
    y_pred = model(x_test_tensor)

In [None]:
sample = x_test.iloc[41]
sample

In [None]:
sample_tensor = torch.tensor(sample.values, dtype = torch.float)
sample_tensor

In [None]:
with torch.no_grad():
    y_pred = model(sample_tensor)

print("Predicted count : ", (y_pred.item()))
print("Actual count : ", (y_test.iloc[41]))

In [None]:
with torch.no_grad():
    y_pred_tensor = model(x_test_tensor)

In [None]:
y_pred = y_pred_tensor.detach().numpy()
y_pred.shape

In [None]:
y_test.values.shape

In [None]:
compare_df = pd.DataFrame({'actual': np.squeeze(y_test.values), 'predicted': np.squeeze(y_pred)})

compare_df.sample(20)

In [None]:
from sklearn.metrics import mean_absolute_error

print(sklearn.metrics.r2_score(y_test, y_pred))
print(mean_absolute_error(y_test, y_pred))

# L -> L
# 0.947522541648828
# 24.4707706184464

# L w/ Relu -> L
# 0.9489632905556128
# 23.006869639502572

# L w/ Relu -> L
# 0.939177669114976
# 22.81379715507149

In [None]:
plt.figure(figsize=(8, 8))

plt.scatter(y_pred, y_test.values, s=20)

plt.xlabel("Actual count")
plt.ylabel("Predicted count")

plt.show()

In [None]:
plt.figure(figsize=(900, 20))

plt.plot(y_pred, label='Predicted count')
plt.plot(y_test.values, label='Actual count')

plt.legend()
plt.show()

In [None]:
torch.save(model, 'my_model')

In [None]:
!ls 

In [None]:
saved_model = torch.load('my_model')

In [None]:
y_pred_tensor = saved_model(x_test_tensor)

In [None]:
y_pred = y_pred_tensor.detach().numpy()
y_pred

In [None]:
!wget https://github.com/MindaugasBernatavicius/DeepLearningCourse/raw/master/06_Regression_with_Neural_Networks_and_Tabular_Data/28865_36778_bundle_archive.zip 
!unzip -n *archive.zip
# !rm -rf bike-sharing-dataset/
!rm -f *.txt *.csv
!rm -f my_model

In [None]:
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
import pandas as pd

data = pd.read_csv('./bike-sharing-dataset/hour.csv', index_col=0)

columns = ['registered', 'holiday', 'weekday', 'weathersit', 'temp', 'atemp', 'season']

features = data[columns]
target = data[['cnt']]

X_train, x_test, Y_train, y_test = train_test_split(features, target, test_size=0.2)

# clf = DecisionTreeRegressor(max_depth=4)
clf = DecisionTreeRegressor()
clf.fit(X_train, Y_train)

In [None]:
clf.feature_importances_
# columns = ['registered', 'holiday', 'weekday', 'weathersit', 'temp', 'atemp', 'season']

### Pytorch model in one place