# Basic Neural Network Model


## Artificial neuron

Recall the concept of a [neuron](https://en.wikipedia.org/wiki/Artificial_neuron) based on its mathematical formula.

$$ y_k = \varphi \left( \sum_{j=0}^{m}{w_{kj}x_j} +b_k \right) $$

This is a simple **linear** neuron. If you look closely, you will see the formula for multiple linear regression (if $\varphi$ is removed)! If $\varphi$ is a sigmoid funciton then it becomes the formula for losistic regression. 

PyTorch, as well as other NN packages, support numerous types of neurons. Typically, neurons are composed into layers, and a single layer has only a single type of neuron.

In this lab, we devlop linear regression and logistic regression models with neural networks.


In [1]:
%matplotlib inline
import matplotlib.pyplot as plt

import os, sys
import itertools
import numpy as np
import pandas as pd

from sklearn.preprocessing import scale, LabelBinarizer, StandardScaler, Normalizer
from sklearn.metrics import f1_score, confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn import datasets

# Random seed for numpy
np.random.seed(18937)

## Developing a Multiple Regression Model using Neural Network

Let's explore the (Boston housing dataset apparently has an ethical problem) Californai housing dataset, which is used in a regression setting. 

In [32]:
# dataset = datasets.load_boston()
dataset = datasets.fetch_california_housing()

dataset.keys()

dict_keys(['data', 'target', 'frame', 'target_names', 'feature_names', 'DESCR'])

In [33]:
df = pd.DataFrame(dataset.data, columns=dataset.feature_names)
df['Price'] = dataset.target
df.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,Price
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,4.526
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,3.585
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,3.521
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,3.413
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,3.422


In [34]:
df.describe()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,Price
count,20640.0,20640.0,20640.0,20640.0,20640.0,20640.0,20640.0,20640.0,20640.0
mean,3.870671,28.639486,5.429,1.096675,1425.476744,3.070655,35.631861,-119.569704,2.068558
std,1.899822,12.585558,2.474173,0.473911,1132.462122,10.38605,2.135952,2.003532,1.153956
min,0.4999,1.0,0.846154,0.333333,3.0,0.692308,32.54,-124.35,0.14999
25%,2.5634,18.0,4.440716,1.006079,787.0,2.429741,33.93,-121.8,1.196
50%,3.5348,29.0,5.229129,1.04878,1166.0,2.818116,34.26,-118.49,1.797
75%,4.74325,37.0,6.052381,1.099526,1725.0,3.282261,37.71,-118.01,2.64725
max,15.0001,52.0,141.909091,34.066667,35682.0,1243.333333,41.95,-114.31,5.00001


In [35]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20640 entries, 0 to 20639
Data columns (total 9 columns):
MedInc        20640 non-null float64
HouseAge      20640 non-null float64
AveRooms      20640 non-null float64
AveBedrms     20640 non-null float64
Population    20640 non-null float64
AveOccup      20640 non-null float64
Latitude      20640 non-null float64
Longitude     20640 non-null float64
Price         20640 non-null float64
dtypes: float64(9)
memory usage: 1.4 MB


## Standardization/Normalization of Data

In [36]:
scaler = Normalizer()
data_scaled = scaler.fit_transform(df)
df_scaled = pd.DataFrame(data_scaled, columns=list(dataset.feature_names) + ['Price'])
df_scaled.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,Price
0,0.023846,0.117437,0.020005,0.002933,0.922314,0.00732,0.108501,-0.350107,0.012964
1,0.003452,0.008734,0.002594,0.000404,0.998534,0.000877,0.015745,-0.050829,0.001491
2,0.014092,0.100968,0.016093,0.002084,0.963083,0.005441,0.073493,-0.237353,0.006837
3,0.009815,0.090448,0.010119,0.001866,0.970573,0.004432,0.065835,-0.212639,0.005936
4,0.006612,0.089393,0.010799,0.001858,0.971286,0.00375,0.065068,-0.210159,0.005883


## Split training data and testing data

In [37]:
X = df_scaled.drop('Price', axis=1).to_numpy()
y = df_scaled['Price'].to_numpy()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=17)

## Construct a neural network

Now we will construct a basic Neural Network with
 * One hidden layer fed by 13 input values (as there are 13 features)
 * One output layer 
 
##### Note: The summary will show that we have 16 total learnable parameters:
  * 14 for the hidden layer (13 feature values and bias)
  * 1 for the output layer (Hidden ($H_0$) and without bias) 
  

<figure>
  <img src="../images/reg_as_nn.jpg" width=600 height=400 alt="figure alt text">
  <figcaption>
      <b>Fig. A neural network for solving muliple regression problem.</b> <!-- can also use <div>, <p>, etc. tags within <figcaption> -->
  </figcaption>
</figure>

In [38]:
# load necessary pytorch modules
import torch
from torch import nn
from torch import optim
from torch.utils.data import TensorDataset, DataLoader

### Defining the Model

One way to define a neural network in PyTorch is to subclass the `nn.Module` class. 


In [39]:
class MyRegNN(nn.Module):
    
    def __init__(self, D_in, H, D_out):
        """
        D_in: number of input
        H: number of nurons in the hidden layer
        D_out: number of output
        """
        super(MyRegNN, self).__init__()
        self.layer1 = nn.Linear(D_in, H) # input to hidden layer
        self.layer2 = nn.Linear(H, D_out, bias=False) # input to hidden layer
        
    def forward(self, x):
        h_pred = self.layer1(x)        # h = dot(input,w1) 
        y_pred = self.layer2(h_pred)   
        return y_pred


Now, we create an instance of the network class we have created. 

In [40]:
# here is a network with 13 inputs to 1 hidden neurons to one output neuron 

D_in, H, D_out = X_train.shape[1], 1, 1    

net = MyRegNN(D_in, H, D_out)

We can summarize this model using `summary` function from `torchsummary` package. 

In [41]:
!pip install torchsummary



In [42]:
from torchsummary import summary

In [43]:
summary(net, (X_train.shape[1],))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                    [-1, 1]               9
            Linear-2                    [-1, 1]               1
Total params: 10
Trainable params: 10
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
----------------------------------------------------------------


In [44]:
X_train.shape[1]

8

The first layer has 9 parameters to be learned: 8 input has 8 coefficients and the intercept b_0. 

### Define Loss Function and Optimizer

In [45]:
loss = nn.SmoothL1Loss()     # Smooth L1 Loss (Huber Loss)
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.3)  


### Training the Model

Before training the model, we need to convert the pandas/numpy datasets to pytorch's tensor data structure.

In [46]:
# create tensors from the train/test set
X_train_tensor = torch.tensor(X_train, dtype=torch.float)
X_test_tensor = torch.tensor(X_test, dtype=torch.float)
y_train_tensor = torch.tensor(y_train, dtype=torch.float).view(-1, 1)
y_test_tensor = torch.tensor(y_test, dtype=torch.float).view(-1, 1)

For better iteration over the train/test sets, there are two handy methods: TensorDataset and DataLoader.

In [47]:
BATCH_SIZE = 5  # it is possible to feed more than one istances to the model. 
# These set of instances is called batch. For simplicity, let's keep one instance per batch

train_data = TensorDataset(X_train_tensor, y_train_tensor)
test_data = TensorDataset(X_test_tensor, y_test_tensor)

train_loader = DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)
test_loader = DataLoader(dataset=test_data, batch_size=1)

Now, we train the mdoel with 100 epochs. The number of epochs is a hyperparameter that defines the number times that the learning algorithm will work through the entire training dataset. Within an epoch each sample in the training dataset has had an opportunity to update the internal model parameters. An epoch is comprised of one or more batches. 

Note: For simplicity we are skipping k-fold cross validation. 

In [48]:
N_EPOCHS = 100  # In each epoch, the model iterate over all the instances 

for epoch in range(N_EPOCHS):
    epoch_loss = 0
    
    for x, y in train_loader:
        output = net(x)        # Forward pass: get the network output for this instance
        l = loss(output, y)    # estimate error for this instance
        epoch_loss += l.item() # Aggregate error
        optimizer.zero_grad()  # As backward method accumulates gradient, we need to set it to 0
        l.backward()           # Backward pass: Estimate gradient 
        optimizer.step()

    if (epoch%5)==0:
        print(f'Epoch {epoch+0:03}: | Total Loss: {epoch_loss:.5f} | Avg Loss: {epoch_loss/len(train_loader):.5f}')

Epoch 000: | Total Loss: 3.06682 | Avg Loss: 0.00087
Epoch 005: | Total Loss: 0.16502 | Avg Loss: 0.00005
Epoch 010: | Total Loss: 0.13566 | Avg Loss: 0.00004
Epoch 015: | Total Loss: 0.11498 | Avg Loss: 0.00003
Epoch 020: | Total Loss: 0.10028 | Avg Loss: 0.00003
Epoch 025: | Total Loss: 0.08971 | Avg Loss: 0.00003
Epoch 030: | Total Loss: 0.08203 | Avg Loss: 0.00002
Epoch 035: | Total Loss: 0.07641 | Avg Loss: 0.00002
Epoch 040: | Total Loss: 0.07219 | Avg Loss: 0.00002
Epoch 045: | Total Loss: 0.06904 | Avg Loss: 0.00002
Epoch 050: | Total Loss: 0.06658 | Avg Loss: 0.00002
Epoch 055: | Total Loss: 0.06465 | Avg Loss: 0.00002
Epoch 060: | Total Loss: 0.06312 | Avg Loss: 0.00002
Epoch 065: | Total Loss: 0.06184 | Avg Loss: 0.00002
Epoch 070: | Total Loss: 0.06076 | Avg Loss: 0.00002
Epoch 075: | Total Loss: 0.05983 | Avg Loss: 0.00002
Epoch 080: | Total Loss: 0.05900 | Avg Loss: 0.00002
Epoch 085: | Total Loss: 0.05826 | Avg Loss: 0.00002
Epoch 090: | Total Loss: 0.05757 | Avg Loss: 0

# Prediction with the model

In [49]:
net.eval()  # notify all the layers that we are in eval mode

with torch.no_grad(): 
    y_test_pred = net(X_test_tensor)


In [50]:
from sklearn.metrics import r2_score, mean_squared_error

print(f"R^2: {r2_score(y_test, y_test_pred.numpy())}")
print(f"MSE:{mean_squared_error(y_test, y_test_pred.numpy())}" )

R^2: -3.1266068756788705
MSE:3.2643763011573e-05


In terms of MSE and R^2, the neural network performed better than the baseline which predicts mean as an output.

## Developing a Logistic Regression Model using Neural Network

For this lab, we will use sklearn breast cancer dataset. 

In [52]:
cancer = datasets.load_breast_cancer()
cancer.keys()

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])

In [53]:
df = pd.DataFrame(cancer.data, columns=cancer.feature_names)
df['class'] = cancer.target
df.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,class
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,0
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,0
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,0
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,0
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,0


## Standardization/Normalization of Data

In [54]:
X = df.drop('class', axis=1).to_numpy()
y = df['class'].to_numpy()

scaler = Normalizer()
X_scaled = scaler.fit_transform(X)
df_scaled = pd.DataFrame(X_scaled, columns=list(cancer.feature_names))
df_scaled['class'] = y
df_scaled.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,class
0,0.007925,0.004573,0.054099,0.440986,5.2e-05,0.000122,0.000132,6.5e-05,0.000107,3.5e-05,...,0.007635,0.081325,0.889462,7.1e-05,0.000293,0.000314,0.000117,0.000203,5.2e-05,0
1,0.008666,0.007486,0.055988,0.558619,3.6e-05,3.3e-05,3.7e-05,3e-05,7.6e-05,2.4e-05,...,0.009862,0.066899,0.824026,5.2e-05,7.9e-05,0.000102,7.8e-05,0.000116,3.8e-05,0
2,0.009367,0.010109,0.061842,0.572276,5.2e-05,7.6e-05,9.4e-05,6.1e-05,9.8e-05,2.9e-05,...,0.012145,0.072545,0.812984,6.9e-05,0.000202,0.000214,0.000116,0.000172,4.2e-05,0
3,0.016325,0.029133,0.110899,0.551922,0.000204,0.000406,0.000345,0.00015,0.000371,0.000139,...,0.037881,0.141333,0.811515,0.0003,0.001238,0.000982,0.000368,0.000949,0.000247,0
4,0.009883,0.006985,0.065808,0.631774,4.9e-05,6.5e-05,9.6e-05,5.1e-05,8.8e-05,2.9e-05,...,0.00812,0.074137,0.767189,6.7e-05,0.0001,0.000195,7.9e-05,0.000115,3.7e-05,0


In [55]:
# class distribution
df_scaled['class'].value_counts()

1    357
0    212
Name: class, dtype: int64

## Split training data and testing data

In [56]:
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.15, random_state=17)

In [57]:
X.shape

(569, 30)

## Construct a neural network

Now we will construct a basic Neural Network with
 * One hidden layer fed by 30 input values (as there are 30 features)
 * One output layer 


In [58]:
class MyLogitNN(nn.Module):
    def __init__(self, D_in, H, D_out):
        """
        D_in: number of input
        """
        super(MyLogitNN, self).__init__()
        self.layer1 = nn.Linear(D_in, H) # input to hidden layer
        self.layer2 = nn.Linear(H, D_out, bias=False) # input to hidden layer
        
    def forward(self, x):
        h_pred = self.layer1(x)        
        y_pred = torch.sigmoid(self.layer2(h_pred))   
        return y_pred


Now, we create an instance of the network class we have created. 

In [59]:
# here is a network with 13 inputs to 1 hidden neurons to one output neuron 

D_in, H, D_out = X_train.shape[1], 1, 1    

net = MyLogitNN(D_in, H, D_out)

We can summarize this model using `summary` function from `torchsummary` package. 

In [60]:
summary(net, (X_train.shape[1],))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                    [-1, 1]              31
            Linear-2                    [-1, 1]               1
Total params: 32
Trainable params: 32
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
----------------------------------------------------------------


### Define Loss Function and Optimizer

In [61]:
# loss = nn.MSELoss()   
loss = nn.BCEWithLogitsLoss()
#optimizer = optim.Adam(net.parameters(), lr=0.01)
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.3)  
# optimizer = optim.SGD(net.parameters(), lr=0.001)  



### Training the Model

Before training the model, we need to convert the pandas/numpy datasets to pytorch's tensor data structure.

In [62]:
# create tensors from the train/test set
X_train_tensor = torch.tensor(X_train, dtype=torch.float)
X_test_tensor = torch.tensor(X_test, dtype=torch.float)
y_train_tensor = torch.tensor(y_train, dtype=torch.float).view(-1, 1)
y_test_tensor = torch.tensor(y_test, dtype=torch.float).view(-1, 1)

For better iteration over the train/test sets, there are two handy methods: TensorDataset and DataLoader.

In [63]:
BATCH_SIZE = 5  # it is possible to feed more than one istances to the model. 
# These set of instances is called batch. For simplicity, let's keep one instance per batch

train_data = TensorDataset(X_train_tensor, y_train_tensor)
test_data = TensorDataset(X_test_tensor, y_test_tensor)

train_loader = DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)
test_loader = DataLoader(dataset=test_data, batch_size=1)

Now, we train the mdoel with 100 epochs. The number of epochs is a hyperparameter that defines the number times that the learning algorithm will work through the entire training dataset. Within an epoch each sample in the training dataset has had an opportunity to update the internal model parameters. An epoch is comprised of one or more batches. 

Note: For simplicity we are skipping k-fold cross validation. 

In [64]:
N_EPOCHS = 100  # In each epoch, the model iterate over all the instances 

for epoch in range(N_EPOCHS):
    epoch_loss = 0
    
    for x, y in train_loader:
        output = net(x)        # Forward pass: get the network output for this instance
        l = loss(output, y)    # estimate error for this instance
        epoch_loss += l.item() # Aggregate error
        optimizer.zero_grad()  # As backward method accumulates gradient, we need to set it to 0
        l.backward()           # Backward pass: Estimate gradient 
        optimizer.step()

    if (epoch%5)==0:
        print(f'Epoch {epoch+0:03}: | Total Loss: {epoch_loss:.5f} | Avg Loss: {epoch_loss/len(train_loader):.5f}')

Epoch 000: | Total Loss: 64.67423 | Avg Loss: 0.66674
Epoch 005: | Total Loss: 64.73846 | Avg Loss: 0.66741
Epoch 010: | Total Loss: 64.73683 | Avg Loss: 0.66739
Epoch 015: | Total Loss: 64.66881 | Avg Loss: 0.66669
Epoch 020: | Total Loss: 64.66696 | Avg Loss: 0.66667
Epoch 025: | Total Loss: 64.66525 | Avg Loss: 0.66665
Epoch 030: | Total Loss: 64.66334 | Avg Loss: 0.66663
Epoch 035: | Total Loss: 64.72757 | Avg Loss: 0.66729
Epoch 040: | Total Loss: 64.72577 | Avg Loss: 0.66728
Epoch 045: | Total Loss: 64.72389 | Avg Loss: 0.66726
Epoch 050: | Total Loss: 64.65639 | Avg Loss: 0.66656
Epoch 055: | Total Loss: 64.58883 | Avg Loss: 0.66586
Epoch 060: | Total Loss: 64.71857 | Avg Loss: 0.66720
Epoch 065: | Total Loss: 64.65112 | Avg Loss: 0.66651
Epoch 070: | Total Loss: 64.64934 | Avg Loss: 0.66649
Epoch 075: | Total Loss: 64.71316 | Avg Loss: 0.66715
Epoch 080: | Total Loss: 64.64605 | Avg Loss: 0.66645
Epoch 085: | Total Loss: 64.57871 | Avg Loss: 0.66576
Epoch 090: | Total Loss: 64.

# Prediction with the model

In [65]:
net.eval()  # notify all the layers that we are in eval mode

with torch.no_grad(): 
    y_test_pred = net(X_test_tensor)
    
y_test_pred[:5]

tensor([[0.4908],
        [0.4897],
        [0.4903],
        [0.4904],
        [0.4910]])

In [66]:
y_test_pred_class = torch.round(y_test_pred)

In [68]:
from sklearn.metrics import classification_report, confusion_matrix

print(f"Confusion Matrix:\n {confusion_matrix(y_test, y_test_pred_class.numpy())}")
print(f"\nClassification Report:\n {classification_report(y_test, y_test_pred_class.numpy())}" )

Confusion Matrix:
 [[26  0]
 [60  0]]

Classification Report:
               precision    recall  f1-score   support

           0       0.30      1.00      0.46        26
           1       0.00      0.00      0.00        60

    accuracy                           0.30        86
   macro avg       0.15      0.50      0.23        86
weighted avg       0.09      0.30      0.14        86



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In this lab, we learned a step-by-step process for developing neural networks for solving regression and classification problems. These are elementary neural networks, but the process is similar even if our network architecture has more layers/neurons.  

---
# PyTorch API and helpful links

 * Layers: https://pytorch.org/docs/stable/nn.html
 * Loss / Loss Functions : [link1](https://medium.com/udacity-pytorch-challengers/a-brief-overview-of-loss-functions-in-pytorch-c0ddb78068f7) [link2](https://neptune.ai/blog/pytorch-loss-functions)
 * Optimizers (learning algorithm) : https://pytorch.org/docs/stable/optim.html
 * Neuron Activation Functions : https://towardsdatascience.com/understanding-pytorch-activation-functions-the-maths-and-algorithms-part-1-7d8ade494cee
 

### Please restart the kernel and clear all output, then play around with parameters or add cells and create additional notebooks

# Save your notebook