# Foundation of Modern Machine Learning
## Module 9: Neural Networks
## Lab 5: MLP for regression
#### Module Coordinator: Shantanu Agrawal


You must be thinking that MLP are better used for classification purposes. But after going through the Tensorflow Playground, you would have seen that MLP can be used for Regression problems as well.

Also, we had seen regression problems in an earlier lab. In this lab, we will see how to do the MLP implementation for these kinds of problems.

# Demonstration on simple datasets

In [None]:
import torch
from torch.autograd import Variable
import torch.nn.functional as F
import torch.utils.data as Data

import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('ggplot')

import numpy as np

# MLP for Regression

We create a simple synthetic dataset and attempt to perform regression.

In [None]:
torch.manual_seed(1)    

# We generate data with a simple function
x = torch.unsqueeze(torch.linspace(-1, 1, 100), dim=1)  
y = x.pow(2) + 0.2*torch.rand(x.size())                 

x, y = Variable(x), Variable(y)

plt.figure(figsize=(10,4))
plt.scatter(x.data.numpy(), y.data.numpy(), color = "orange")
plt.title('Regression Analysis')
plt.xlabel('Independent varible')
plt.ylabel('Dependent varible')
plt.show()

In [None]:
def train(x,y,model,epochs,optimizer,loss_func):
  """ Function for training """
  losses = []

  for t in range(epochs):
    
    prediction = model(x)     

    loss = loss_func(prediction, y)     
    losses.append(loss.item())

    optimizer.zero_grad()   
    loss.backward()         
    optimizer.step()        
  
  return losses, prediction

def plot_training(x,y,prediction,losses,num_epochs):

  epochs = np.arange(num_epochs)

  fig = plt.figure(figsize=(10,4))
  plt.scatter(x.data.numpy(),y.data.numpy(),color="orange")
  plt.plot(x.data.numpy(),prediction.data.numpy(),color="green")
  plt.show()

  fig = plt.figure(figsize=(10,4))
  plt.title("Loss vs Epochs")
  plt.plot(epochs,losses,color="red")
  plt.show()

  print("Final loss: {}".format(losses[-1]))


We create a simple MLP and train it on our synthetic dataset.

In [None]:
# Simple MLP using PyTorch
net1 = torch.nn.Sequential(
    torch.nn.Linear(1,10),
    torch.nn.ReLU(),
    torch.nn.Linear(10,1)
)

optimizer = torch.optim.SGD(net1.parameters(), lr=0.2)
loss_func = torch.nn.MSELoss()

losses1,prediction1 = train(x,y,net1,200,optimizer,loss_func)
plot_training(x,y,prediction1,losses1,200)

We can try other architectures as well.

In [None]:
# Another architecture
net2 =  torch.nn.Sequential(
        torch.nn.Linear(1, 200),
        torch.nn.LeakyReLU(),
        torch.nn.Linear(200, 100),
        torch.nn.LeakyReLU(),
        torch.nn.Linear(100, 1),
    )

optimizer = torch.optim.Adam(net2.parameters(), lr=0.05)
loss_func = torch.nn.MSELoss()

losses2,prediction2 = train(x,y,net2,2000,optimizer,loss_func)
plot_training(x,y,prediction2,losses2,2000)

We can try training our model with some other synthetic datasets.

In [None]:
# Sine wave
x = torch.unsqueeze(torch.linspace(-10, 10, 1000), dim=1)  
y = torch.sin(x) + 0.2*torch.rand(x.size())                 

x, y = Variable(x), Variable(y)
plt.figure(figsize=(10,4))
plt.scatter(x.data.numpy(), y.data.numpy(), color = "orange")
plt.title('Sine wave')
plt.xlabel('Independent varible')
plt.ylabel('Dependent varible')
plt.show()

In [None]:
net3 = torch.nn.Sequential(
        torch.nn.Linear(1, 200),
        torch.nn.LeakyReLU(),
        torch.nn.Linear(200, 100),
        torch.nn.LeakyReLU(),
        torch.nn.Linear(100, 1),
    )

optimizer = torch.optim.Adam(net3.parameters(), lr=0.05)
loss_func = torch.nn.MSELoss()

losses3,prediction3 = train(x,y,net3,2000,optimizer,loss_func)
plot_training(x,y,prediction3,losses3,2000)

# Using a real world dataset ( Boston housing prices dataset)

In [None]:
import numpy as np

from os import path


import matplotlib.pyplot as plt

#scikit-learn related imports
import sklearn
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# pytorch relates imports
import torch
import torch.nn as nn
import torch.optim as optim



## Data loading and pre-processing

Let's load boston house prices dataset and corresponding labels from scikit-learn library. 

In [None]:
boston = load_boston()

# feature_names -> ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT']
feature_names = boston.feature_names

X = boston.data
y = boston.target


In order to retain deterministic results, let's fix the seeds.

In [None]:
torch.manual_seed(1234)
np.random.seed(1234)


Let's use 70% of our data for training and the remaining 30% for testing.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)


# Tensorizing inputs and creating batches

Below we tensorize input features and corresponding labels.


In [None]:
X_train = torch.tensor(X_train).float()
y_train = torch.tensor(y_train).view(-1, 1).float()

X_test = torch.tensor(X_test).float()
y_test = torch.tensor(y_test).view(-1, 1).float()

datasets = torch.utils.data.TensorDataset(X_train, y_train)
train_iter = torch.utils.data.DataLoader(datasets, batch_size=10, shuffle=True)


Defining default hyper parameters for the model.


In [None]:
batch_size = 50
num_epochs = 200
learning_rate = 0.0001
size_hidden1 = 100
size_hidden2 = 50
size_hidden3 = 10
size_hidden4 = 1

We define a four layer neural network containing ReLUs between each linear layer. This network is more complex than the standard linear regression model and results in a better accuracy.

In [None]:
class BostonModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.lin1 = nn.Linear(13, size_hidden1)
        self.relu1 = nn.ReLU()
        self.lin2 = nn.Linear(size_hidden1, size_hidden2)
        self.relu2 = nn.ReLU()
        self.lin3 = nn.Linear(size_hidden2, size_hidden3)
        self.relu3 = nn.ReLU()
        self.lin4 = nn.Linear(size_hidden3, size_hidden4)

    def forward(self, input):
        return self.lin4(self.relu3(self.lin3(self.relu2(self.lin2(self.relu1(self.lin1(input)))))))


In [None]:
model = BostonModel()
model.train()


## Train Boston Model

Defining the loss function that will be used for optimization.

In [None]:
criterion = nn.MSELoss(reduction='sum')

Defining the training function that contains the training loop and uses RMSprop and given input hyper-parameters to train the model defined in the cell above.

In [None]:
def train(model_inp, num_epochs = num_epochs):
    optimizer = torch.optim.RMSprop(model_inp.parameters(), lr=learning_rate)
    for epoch in range(num_epochs):  # loop over the dataset multiple times
        running_loss = 0.0
        for inputs, labels in train_iter:
            # forward pass
            outputs = model_inp(inputs)
            # defining loss
            loss = criterion(outputs, labels)
            # zero the parameter gradients
            optimizer.zero_grad()
            # computing gradients
            loss.backward()
            # accumulating running loss
            running_loss += loss.item()
            # updated weights based on computed gradients
            optimizer.step()
        if epoch % 20 == 0:    
            print('Epoch [%d]/[%d] running accumulative loss across all batches: %.3f' %
                  (epoch + 1, num_epochs, running_loss))
        running_loss = 0.0


In [None]:
train(model, 200)

# Evaluating the model

In [None]:
model.eval()
outputs = model(X_test)
err = np.sqrt(mean_squared_error(outputs.detach().numpy(), y_test.detach().numpy()))

print('Model error: ', err)

# Further experiments

1. Try experimenting with the architecture of the model. What kind of results can you obtain?
  - Try to explain why the particular change in the architecture brings the following change in the result.
2. Try using a different dataset suitable for regression and training on this dataset. Can you compare performance with a simple linear or polynomial regression based model?
  - Datasets are available in *sklearn.datasets* library as well.

# Checking your progress

1. Till this lab, you should be able to write up the code for the basic MLP model?
2. You should be able to understand the basic components for implementing the basic MLP model architecture?
  - like loss functions, optimizers, training and testing functions, model class generations, etc.
  - how will be the flow of the data in the model (as given in **forward()** function).
3. You should also look for the frequently used ways to implement the above mentioned componenets?