# Neural Network Examples

Since the homework is due the same week we discuss neural networks, I've created this short file to demonstrate how to create, train, and predict with simple neural networks. We will discuss a lot of the "what" and "why" behind them in class, but I wanted to send out a file with some examples you can (hopefully) make use of when implementing them in your homework.

## Overview

In this notebook, I will give one example for classification and one for regression. I won't go too much into a lot of the generic pre-processing required (like loading the datasets or encoding them) but instead will focus on the Neural Network implementation. I will use `Pytorch` to create my neural networks. You're welcome to use `TensorFlow` in your code as well. A helpful tutorial for creating networks in TensorFlow can be found [here](https://www.tensorflow.org/tutorials/quickstart/beginner).

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import pandas as pd
from ucimlrepo import fetch_ucirepo 
from sklearn.model_selection import train_test_split

## Classification

We will perform classification on the Wisconsin Breast Cancer Dataset (source: UCI)

In [2]:
# fetch dataset 
breast_cancer_wisconsin = fetch_ucirepo(id=15) 
  
# data (as pandas dataframes) 
X = breast_cancer_wisconsin.data.features
y = breast_cancer_wisconsin.data.targets
  
# variable information 
print(breast_cancer_wisconsin.variables) 

                           name     role         type demographic  \
0            Sample_code_number       ID  Categorical        None   
1               Clump_thickness  Feature      Integer        None   
2       Uniformity_of_cell_size  Feature      Integer        None   
3      Uniformity_of_cell_shape  Feature      Integer        None   
4             Marginal_adhesion  Feature      Integer        None   
5   Single_epithelial_cell_size  Feature      Integer        None   
6                   Bare_nuclei  Feature      Integer        None   
7               Bland_chromatin  Feature      Integer        None   
8               Normal_nucleoli  Feature      Integer        None   
9                       Mitoses  Feature      Integer        None   
10                        Class   Target       Binary        None   

                  description units missing_values  
0                        None  None             no  
1                        None  None             no  
2           

In [3]:
# df containes missing values, correcting that:
na_mask = X.isna().any(axis=1)
not_na = np.logical_not(na_mask)
X = X[not_na]
y = y[not_na]

In [4]:
print(X.shape)
X.head()

(683, 9)


Unnamed: 0,Clump_thickness,Uniformity_of_cell_size,Uniformity_of_cell_shape,Marginal_adhesion,Single_epithelial_cell_size,Bare_nuclei,Bland_chromatin,Normal_nucleoli,Mitoses
0,5,1,1,1,2,1.0,3,1,1
1,5,4,4,5,7,10.0,3,2,1
2,3,1,1,1,2,2.0,3,1,1
3,6,8,8,1,3,4.0,3,7,1
4,4,1,1,3,2,1.0,3,1,1


In [5]:
y.head()

Unnamed: 0,Class
0,2
1,2
2,2
3,2
4,2


In [6]:
# replace 2 with 0 and 4 with 1 in y
y = y.replace(2, 0).replace(4, 1)
y

Unnamed: 0,Class
0,0
1,0
2,0
3,0
4,0
...,...
694,0
695,0
696,1
697,1


In [7]:
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8)

In [8]:
# We need to convert our data to pytorch's Tensor objects using a combination of pandas to_numpy
#  and pytorch's from_numpy. We call .to(torch.float32) because the model expects data to be 32-bit floats
#  not 64 bit.
X_train = torch.from_numpy(X_train.to_numpy()).to(torch.float32)
y_train = torch.from_numpy(y_train.to_numpy()).to(torch.float32)
X_test = torch.from_numpy(X_test.to_numpy()).to(torch.float32)
y_test = torch.from_numpy(y_test.to_numpy()).to(torch.float32)

In [9]:
class BreastCancerClassifier(nn.Module):
    def __init__(self):
        # Call super().__init__() always (it does behind-the-scenes work for the model)
        super().__init__()
        
        # We now create the layers of our model:
        
        # The first number is the initial number of features in the data (9). 
        # The second is a hyperparameter, setting the number of features in the first hidden layer. I chose 18.
        self.hidden_1 = nn.Linear(9, 18)  
        self.activation_1 = nn.ReLU()  # Activation layer between each hidden layer
        
        # The first is the number at the end of the first linear layer. The second is again a hyperparameter.
        self.hidden_2 = nn.Linear(18, 9)  
        self.activation_2 = nn.ReLU() # Activation layer between each hidden layer
        
        # Finally, we take the number at the end of the second linear layer and map to 1 output neuron 
        # (our predictor). Sigmoid then takes our predictor and fits it to a probability distribution
        # that is 0 for low values and 1 for high values
        self.output = nn.Linear(9, 1)  
        self.predictions = nn.Sigmoid()  # Fits to a binary probability distribution.
        # In the end, we chained fully connected layers, interleaving activation layers between each
        # We went from 9 features, to 18, back to 9, then to 1.
        
        
    def forward(self, x):
        # The forward method is required: it tells pytorch what to do when features are passed into it.
        # Here, we make use of the linear layers we created above.
        x = self.activation_1(self.hidden_1(x))  # 9 -> 18
        x = self.activation_2(self.hidden_2(x))  # 18 -> 9
        x = self.predictions(self.output(x))     # 9 -> 1
        return x

In [10]:
classifier_model = BreastCancerClassifier()  # initialize a model
loss_fn = nn.BCELoss()  # Define the objective for our model: minimize Binary Cross Entropy (classification only)
optimizer = optim.Adam(classifier_model.parameters(), lr=0.001)  # Lets the model learn

In [11]:
n_epochs = 100  # How long to train for (how many times to loop through the training set)
batch_size = 13  # How many training items before learning (updating model weights)

# A test (if you are copying this code, you need the size of the training data to be evenly split by your
#   batch size, or you will have an index error. As indicated by the error I create, you can also avoid this 
#   requirement by dividing into batches, and then on the last batch just use the rest of the training data.
#    
#   If your training data does not have a lot of factors, prefer smaller factors (1, 2) over larger ones. 10 is 
#   generally a good number to be close to.
if not len(X_train) % batch_size == 0:
    raise ValueError('Select a batch size that evenly divides your data'
                    '(or code yourself: make the final batch contain the rest'
                    'of the elements in the set in the second loop below)')
    

# Two for loops: one for every epoch, and one for every batch in the training set.
#    This means we train on every sample on the training set once for every epoch.
for epoch in range(n_epochs):
    for batch in range(0, len(X_train), batch_size):  # range(start, stop, step)
        X_batch = X_train[batch:batch+batch_size]     # from our start index to our start + step size
        y_pred = classifier_model(X_batch)            # create predictions based on the current weights
        true_batch = y_train[batch:batch+batch_size]  # take the actual values from our y_train data
        loss = loss_fn(y_pred, true_batch)            # calculate how far off we were using our BCE loss function
        
        # Update our weights
        optimizer.zero_grad()                         # Zero out gradients (otherwise they are saved btwn batches)
        loss.backward()                               # Propogate our loss (more on this in that module)
        optimizer.step()                              # Update our weights with the loss.
    print(f'Epoch {epoch+1}: loss {loss}')

Epoch 1: loss 0.7303469181060791
Epoch 2: loss 0.6985543966293335
Epoch 3: loss 0.6678364872932434
Epoch 4: loss 0.6256605982780457
Epoch 5: loss 0.5562031269073486
Epoch 6: loss 0.46026185154914856
Epoch 7: loss 0.35547372698783875
Epoch 8: loss 0.26384228467941284
Epoch 9: loss 0.19551783800125122
Epoch 10: loss 0.1549387127161026
Epoch 11: loss 0.12886878848075867
Epoch 12: loss 0.11097212135791779
Epoch 13: loss 0.09781638532876968
Epoch 14: loss 0.08776700496673584
Epoch 15: loss 0.08072865754365921
Epoch 16: loss 0.07564664632081985
Epoch 17: loss 0.0719011127948761
Epoch 18: loss 0.06922096759080887
Epoch 19: loss 0.06754116714000702
Epoch 20: loss 0.0664314478635788
Epoch 21: loss 0.06633865833282471
Epoch 22: loss 0.06633733958005905
Epoch 23: loss 0.06753381341695786
Epoch 24: loss 0.0689278319478035
Epoch 25: loss 0.06996837258338928
Epoch 26: loss 0.07180933654308319
Epoch 27: loss 0.07332177460193634
Epoch 28: loss 0.07532535493373871
Epoch 29: loss 0.07652923464775085
Epo

In [12]:
# with torch.no_grad() is technically optional here, but good practice for when using validation
with torch.no_grad():
    predictions = classifier_model(X_test)  # Predict outputs with our trained weights on our test data

# Calculate accuracy score (fancy trick: because we map from 0 to 1 we can round and then compare to y_test 
#    with ==. In other words, predictions.round() picks the closer of 0 or 1 for our data, and we compare that 
#    to y_test to get a boolean array [True False True True ...], which we convert to 0 and 1s again and take the 
#    mean.)
accuracy = (predictions.round() == y_test).float().mean()
print(f'Trained classifier achieved accuracy {accuracy} on the testing set')

Trained classifier achieved accuracy 0.9416058659553528 on the testing set


## Regression

Regression will not change very much from classification - we will use a regression-specific loss function, and modify our output layers! We will be using the BUPA dataset from UCI for regression.

In [13]:
# fetch dataset 
liver_disorders = fetch_ucirepo(id=60) 
  
# data (as pandas dataframes) 
X = liver_disorders.data.features 
y = liver_disorders.data.targets 
  
# variable information 
print(liver_disorders.variables) 

       name     role         type demographic  \
0       mcv  Feature   Continuous        None   
1   alkphos  Feature   Continuous        None   
2      sgpt  Feature   Continuous        None   
3      sgot  Feature   Continuous        None   
4   gammagt  Feature   Continuous        None   
5    drinks   Target   Continuous        None   
6  selector    Other  Categorical        None   

                                         description units missing_values  
0                            mean corpuscular volume  None             no  
1                               alkaline phosphotase  None             no  
2                           alanine aminotransferase  None             no  
3                         aspartate aminotransferase  None             no  
4                      gamma-glutamyl transpeptidase  None             no  
5  number of half-pint equivalents of alcoholic b...  None             no  
6  field created by the BUPA researchers to split...  None             no  

In [14]:
X

Unnamed: 0,mcv,alkphos,sgpt,sgot,gammagt
0,85,92,45,27,31
1,85,64,59,32,23
2,86,54,33,16,54
3,91,78,34,24,36
4,87,70,12,28,10
...,...,...,...,...,...
340,99,75,26,24,41
341,96,69,53,43,203
342,98,77,55,35,89
343,91,68,27,26,14


In [15]:
y

Unnamed: 0,drinks
0,0.0
1,0.0
2,0.0
3,0.0
4,0.0
...,...
340,12.0
341,12.0
342,15.0
343,16.0


In [16]:
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8)

In [17]:
# Again, we need to convert to pytorch's Tensor objects using a combination of pandas to_numpy
#  and pytorch's from_numpy. We call .to(torch.float32) because the model expects data to be 32-bit floats
#  not 64 bit.
X_train = torch.from_numpy(X_train.to_numpy()).to(torch.float32)
y_train = torch.from_numpy(y_train.to_numpy()).to(torch.float32)
X_test = torch.from_numpy(X_test.to_numpy()).to(torch.float32)
y_test = torch.from_numpy(y_test.to_numpy()).to(torch.float32)

In [18]:
class BupaRegressor(nn.Module):
    def __init__(self):
        # Call super().__init__() always (it does behind-the-scenes work for the model)
        super().__init__()
        
        # We now create the layers of our model:
        
        # We still have to first use the number of features in our input: 5
        # The second is still a hyperparameter. This time I chose 15
        self.hidden_1 = nn.Linear(5, 15)  
        self.activation_1 = nn.ReLU()  # Activation layer between each hidden layer
        
        # The first is the number at the end of the first linear layer. The second is again a hyperparameter.
        self.hidden_2 = nn.Linear(15, 8)  
        self.activation_2 = nn.ReLU() # Activation layer between each hidden layer
        
        # Finally, we take the number at the end of the second linear layer and map to 1 output neuron 
        # (our predictor). We no longer need the sigmoid here - our output is already the predictor 
        # (no need to map to a probability distribution from 0 to 1)
        self.output = nn.Linear(8, 1)  

        # In the end, we chained fully connected layers, interleaving activation layers between each
        # We went from 5 features, to 15, to 8, then to 1.
        
        
    def forward(self, x):
        # The forward method is required: it tells pytorch what to do when features are passed into it.
        # Here, we make use of the linear layers we created above.
        x = self.activation_1(self.hidden_1(x))  # 5 -> 15
        x = self.activation_2(self.hidden_2(x))  # 15 -> 8
        x = self.output(x)                       # 8 -> 1
        return x

In [19]:
regressor_model = BupaRegressor()  # Initialize our model
loss_fn = nn.MSELoss()  # We need to use a loss function that makes sense for regression like MSE
optimizer = optim.Adam(regressor_model.parameters(), lr=0.001)  # Lets our model learn

In [20]:
n_epochs = 100
batch_size = 12

# Same reason as before
if not len(X_train) % batch_size == 0:
    raise ValueError('Select a batch size that evenly divides your data'
                    '(or code yourself: make the final batch contain the rest'
                    'of the elements in the set in the second loop below)')

# This loop is the same as before - we just change our model.
for epoch in range(n_epochs):
    for batch in range(0, len(X_train), batch_size):
        X_batch = X_train[batch:batch+batch_size]
        y_pred = regressor_model(X_batch)
        true_batch = y_train[batch:batch+batch_size]
        loss = loss_fn(y_pred, true_batch)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    print(f'Epoch {epoch+1}: loss {loss}')

Epoch 1: loss 8.95711612701416
Epoch 2: loss 6.778415203094482
Epoch 3: loss 6.246530055999756
Epoch 4: loss 5.6820149421691895
Epoch 5: loss 5.2181477546691895
Epoch 6: loss 4.806542873382568
Epoch 7: loss 4.64403772354126
Epoch 8: loss 4.45141077041626
Epoch 9: loss 4.3446736335754395
Epoch 10: loss 4.262369632720947
Epoch 11: loss 4.284063339233398
Epoch 12: loss 4.1806793212890625
Epoch 13: loss 4.1005682945251465
Epoch 14: loss 4.059868812561035
Epoch 15: loss 3.9484708309173584
Epoch 16: loss 4.017343044281006
Epoch 17: loss 4.013096332550049
Epoch 18: loss 3.9766674041748047
Epoch 19: loss 3.967500925064087
Epoch 20: loss 3.9358203411102295
Epoch 21: loss 3.9204752445220947
Epoch 22: loss 3.84529185295105
Epoch 23: loss 3.8084707260131836
Epoch 24: loss 3.8243472576141357
Epoch 25: loss 3.8625853061676025
Epoch 26: loss 3.783918619155884
Epoch 27: loss 3.767948865890503
Epoch 28: loss 3.7610976696014404
Epoch 29: loss 3.7294747829437256
Epoch 30: loss 3.7525370121002197
Epoch 31

In [21]:
# with torch.no_grad() is technically optional here, but good practice for when using validation
with torch.no_grad():
    predictions = regressor_model(X_test)

# This is my implementation of MSE
errors = predictions - y_test
mse = (errors**2).mean()
print(f'Trained classifier achieved MSE {mse} on the testing set')

Trained classifier achieved MSE 8.681280136108398 on the testing set
