# Basic Neural Network Model


## Artificial neuron

Recall the concept of a [neuron](https://en.wikipedia.org/wiki/Artificial_neuron) based on its mathematical formula.

$$ y_k = \varphi \left( \sum_{j=0}^{m}{w_{kj}x_j} +b_k \right) $$

This is a simple **linear** neuron. If you look closely, you will see the formula for multiple linear regression (if $\varphi$ is removed)! If $\varphi$ is a sigmoid funciton then it becomes the formula for losistic regression. 

PyTorch, as well as other NN packages, support numerous types of neurons. Typically, neurons are composed into layers, and a single layer has only a single type of neuron.

In this lab, we devlop linear regression and logistic regression models with neural networks.


In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

import os, sys
import itertools
import numpy as np
import pandas as pd

from sklearn.preprocessing import scale, LabelBinarizer, StandardScaler, Normalizer
from sklearn.metrics import f1_score, confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn import datasets

# Random seed for numpy
np.random.seed(18937)

## Developing a Multiple Regression Model using Neural Network

Let's explore the Boston housing dataset which is used in a regression setting. 

In [None]:
# dataset = datasets.load_boston()
dataset = datasets.fetch_california_housing()
dataset.keys()

In [None]:
df = pd.DataFrame(dataset.data, columns=dataset.feature_names)
df['Price'] = dataset.target
df.head()

In [None]:
df.describe()

## Standardization/Normalization of Data

In [None]:
scaler = Normalizer()
data_scaled = scaler.fit_transform(df)
df_scaled = pd.DataFrame(data_scaled, columns=list(dataset.feature_names) + ['Price'])
df_scaled.head()

## Split training data and testing data

In [None]:
X = df_scaled.drop('Price', axis=1).to_numpy()
y = df_scaled['Price'].to_numpy()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=23)

## Construct a neural network

Now we will construct a basic Neural Network with
 * One hidden layer fed by 13 input values (as there are 13 features)
 * One output layer 
 
##### Note: The summary will show that we have 16 total learnable parameters:
  * 14 for the hidden layer (13 feature values and bias)
  * 1 for the output layer (Hidden ($H_0$) and without bias) 
  

<figure>
  <img src="../images/reg_as_nn.jpg" width=600 height=400 alt="figure alt text">
  <figcaption>
      <b>Fig. A neural network for solving muliple regression problem.</b> <!-- can also use <div>, <p>, etc. tags within <figcaption> -->
  </figcaption>
</figure>

In [None]:
# load necessary pytorch modules
import torch
from torch import nn
from torch import optim
from torch.utils.data import TensorDataset, DataLoader

### Defining the Model

One way to define a neural network in PyTorch is to subclass the `nn.Module` class. 


In [None]:
class MyRegNN(nn.Module):
    
    def __init__(self, D_in, H, D_out):
        """
        D_in: number of input
        H: number of nurons in the hidden layer
        D_out: number of output
        """
        super(MyRegNN, self).__init__()
        self.layer1 = nn.Linear(D_in, H) # input to hidden layer
        self.layer2 = nn.Linear(H, D_out, bias=False) # input to hidden layer
        
    def forward(self, x):
        h_pred = self.layer1(x)        # h = dot(input,w1) 
        y_pred = self.layer2(h_pred)   
        return y_pred


Now, we create an instance of the network class we have created. 

In [None]:
# here is a network with 13 inputs to 1 hidden neurons to one output neuron 

D_in, H, D_out = X_train.shape[1], 1, 1    

net = MyRegNN(D_in, H, D_out)

We can summarize this model using `summary` function from `torchsummary` package. 

In [None]:
!pip install torchsummary

In [None]:
from torchsummary import summary

In [None]:
summary(net, (X_train.shape[1],))

In [None]:
X_train.shape[1]

The first layer has 9 parameters to be learned: 8 input has 8 coefficients and the intercept b_0. 

### Define Loss Function and Optimizer

In [None]:
loss = nn.MSELoss()   # Mean Squared error loss
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.3)  
# optimizer = optim.SGD(net.parameters(), lr=0.001)  

### Training the Model

Before training the model, we need to convert the pandas/numpy datasets to pytorch's tensor data structure.

In [None]:
# create tensors from the train/test set
X_train_tensor = torch.tensor(X_train, dtype=torch.float)
X_test_tensor = torch.tensor(X_test, dtype=torch.float)
y_train_tensor = torch.tensor(y_train, dtype=torch.float).view(-1, 1)
y_test_tensor = torch.tensor(y_test, dtype=torch.float).view(-1, 1)

For better iteration over the train/test sets, there are two handy methods: TensorDataset and DataLoader.

In [None]:
BATCH_SIZE = 1  # it is possible to feed more than one istances to the model. 
# These set of instances is called batch. For simplicity, let's keep one instance per batch

train_data = TensorDataset(X_train_tensor, y_train_tensor)
test_data = TensorDataset(X_test_tensor, y_test_tensor)

train_loader = DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)
test_loader = DataLoader(dataset=test_data, batch_size=1)

Now, we train the mdoel with 100 epochs. The number of epochs is a hyperparameter that defines the number times that the learning algorithm will work through the entire training dataset. Within an epoch each sample in the training dataset has had an opportunity to update the internal model parameters. An epoch is comprised of one or more batches. 

Note: For simplicity we are skipping k-fold cross validation. 

In [None]:
N_EPOCHS = 100  # In each epoch, the model iterate over all the instances 

for epoch in range(N_EPOCHS):
    epoch_loss = 0
    
    for x, y in train_loader:
        output = net(x)        # Forward pass: get the network output for this instance
        l = loss(output, y)    # estimate error for this instance
        epoch_loss += l.item() # Aggregate error
        optimizer.zero_grad()  # As backward method accumulates gradient, we need to set it to 0
        l.backward()           # Backward pass: Estimate gradient 
        optimizer.step()

    if (epoch%5)==0:
        print(f'Epoch {epoch+0:03}: | Total Loss: {epoch_loss:.5f} | Avg Loss: {epoch_loss/len(train_loader):.5f}')

# Prediction with the model

In [None]:
net.eval()  # notify all the layers that we are in eval mode

with torch.no_grad(): 
    y_test_pred = net(X_test_tensor)


In [None]:
from sklearn.metrics import r2_score, mean_squared_error

print(f"R^2: {r2_score(y_test, y_test_pred.numpy())}")
print(f"MSE:{mean_squared_error(y_test, y_test_pred.numpy())}" )

In terms of MSE and R^2, the neural network performed better than the baseline which predicts mean as an output.

## Developing a Logistic Regression Model using Neural Network

For this lab, we will use sklearn breast cancer dataset. 

In [None]:
cancer = datasets.load_breast_cancer()
cancer.keys()

In [None]:
df = pd.DataFrame(cancer.data, columns=cancer.feature_names)
df['class'] = cancer.target
df.head()

## Standardization/Normalization of Data

In [None]:
X = df.drop('class', axis=1).to_numpy()
y = df['class'].to_numpy()

scaler = Normalizer()
X_scaled = scaler.fit_transform(X)
df_scaled = pd.DataFrame(X_scaled, columns=list(cancer.feature_names))
df_scaled['class'] = y
df_scaled.head()

In [None]:
# class distribution
df_scaled['class'].value_counts()

## Split training data and testing data

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.15, random_state=23)

In [None]:
X.shape

## Construct a neural network

Now we will construct a basic Neural Network with
 * One hidden layer fed by 30 input values (as there are 30 features)
 * One output layer 


In [None]:
class MyLogitNN(nn.Module):
    def __init__(self, D_in, H, D_out):
        """
        D_in: number of input
        """
        super(MyLogitNN, self).__init__()
        self.layer1 = nn.Linear(D_in, H) # input to hidden layer
        self.layer2 = nn.Linear(H, D_out, bias=False) # input to hidden layer
        
    def forward(self, x):
        h_pred = self.layer1(x)        
        y_pred = torch.sigmoid(self.layer2(h_pred))   
        return y_pred


Now, we create an instance of the network class we have created. 

In [None]:
# here is a network with 13 inputs to 1 hidden neurons to one output neuron 

D_in, H, D_out = X_train.shape[1], 1, 1    

net = MyLogitNN(D_in, H, D_out)

We can summarize this model using `summary` function from `torchsummary` package. 

In [None]:
summary(net, (X_train.shape[1],))

### Define Loss Function and Optimizer

In [None]:
# loss = nn.MSELoss()   
loss = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(net.parameters(), lr=0.01)
# optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.3)  
# optimizer = optim.SGD(net.parameters(), lr=0.001)  



### Training the Model

Before training the model, we need to convert the pandas/numpy datasets to pytorch's tensor data structure.

In [None]:
# create tensors from the train/test set
X_train_tensor = torch.tensor(X_train, dtype=torch.float)
X_test_tensor = torch.tensor(X_test, dtype=torch.float)
y_train_tensor = torch.tensor(y_train, dtype=torch.float).view(-1, 1)
y_test_tensor = torch.tensor(y_test, dtype=torch.float).view(-1, 1)

For better iteration over the train/test sets, there are two handy methods: TensorDataset and DataLoader.

In [None]:
BATCH_SIZE = 1  # it is possible to feed more than one istances to the model. 
# These set of instances is called batch. For simplicity, let's keep one instance per batch

train_data = TensorDataset(X_train_tensor, y_train_tensor)
test_data = TensorDataset(X_test_tensor, y_test_tensor)

train_loader = DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)
test_loader = DataLoader(dataset=test_data, batch_size=1)

Now, we train the mdoel with 100 epochs. The number of epochs is a hyperparameter that defines the number times that the learning algorithm will work through the entire training dataset. Within an epoch each sample in the training dataset has had an opportunity to update the internal model parameters. An epoch is comprised of one or more batches. 

Note: For simplicity we are skipping k-fold cross validation. 

In [None]:
N_EPOCHS = 100  # In each epoch, the model iterate over all the instances 

for epoch in range(N_EPOCHS):
    epoch_loss = 0
    
    for x, y in train_loader:
        output = net(x)        # Forward pass: get the network output for this instance
        l = loss(output, y)    # estimate error for this instance
        epoch_loss += l.item() # Aggregate error
        optimizer.zero_grad()  # As backward method accumulates gradient, we need to set it to 0
        l.backward()           # Backward pass: Estimate gradient 
        optimizer.step()

    if (epoch%5)==0:
        print(f'Epoch {epoch+0:03}: | Total Loss: {epoch_loss:.5f} | Avg Loss: {epoch_loss/len(train_loader):.5f}')

# Prediction with the model

In [None]:
net.eval()  # notify all the layers that we are in eval mode

with torch.no_grad(): 
    y_test_pred = net(X_test_tensor)
    
y_test_pred[:5]

In [None]:
y_test_pred_class = torch.round(y_test_pred)

In [None]:
from sklearn.metrics import classification_report, confusion_matrix

print(f"Confusion Matrix:\n {confusion_matrix(y_test, y_test_pred_class.numpy())}")
print(f"Classification Report:\n {classification_report(y_test, y_test_pred_class.numpy())}" )

In this lab, we learned a step-by-step process for developing neural networks for solving regression and classification problems. These are elementary neural networks, but the process is similar even if our network architecture has more layers/neurons.  

---
# PyTorch API and helpful links

 * Layers: https://pytorch.org/docs/stable/nn.html
 * Loss / Loss Functions : [link1](https://medium.com/udacity-pytorch-challengers/a-brief-overview-of-loss-functions-in-pytorch-c0ddb78068f7) [link2](https://neptune.ai/blog/pytorch-loss-functions)
 * Optimizers (learning algorithm) : https://pytorch.org/docs/stable/optim.html
 * Neuron Activation Functions : https://towardsdatascience.com/understanding-pytorch-activation-functions-the-maths-and-algorithms-part-1-7d8ade494cee
 

### Please restart the kernel and clear all output, then play around with parameters or add cells and create additional notebooks

# Save your notebook