## Your (My) First Neural Network with PyTorch

Based on [this article](https://machinelearningmastery.com/develop-your-first-neural-network-with-pytorch-step-by-step/) by Jason Brownlee

### Description

You will use the Pima Indians onset of diabetes dataset. This has been a standard machine learning dataset since the early days of the field. It describes patient medical record data for Pima Indians and whether they had an onset of diabetes within five years.

It is a binary classification problem (onset of diabetes as 1 or not as 0). All the input variables that describe each patient are transformed and numerical. This makes it easy to use directly with neural networks that expect numerical input and output values and is an ideal choice for our first neural network in PyTorch.

In [1]:
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim

In [3]:
# load the dataset, split into input (X) and output (y) variables
dataset = np.loadtxt('data/pima-indians-diabetes-data.csv', delimiter=',')
# All rows of input variables, as matrix X
X = dataset[:,0:8]
# All rows of output, as vector y
y = dataset[:,8]
print(X)
print(len(X))
print(y)
print(len(y))
X = torch.tensor(X, dtype=torch.float32)
X = X.cuda(0)
y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)
y = y.cuda(0)

[[  6.    148.     72.    ...  33.6     0.627  50.   ]
 [  1.     85.     66.    ...  26.6     0.351  31.   ]
 [  8.    183.     64.    ...  23.3     0.672  32.   ]
 ...
 [  5.    121.     72.    ...  26.2     0.245  30.   ]
 [  1.    126.     60.    ...  30.1     0.349  47.   ]
 [  1.     93.     70.    ...  30.4     0.315  23.   ]]
768
[1. 0. 1. 0. 1. 0. 1. 0. 1. 1. 0. 1. 0. 1. 1. 1. 1. 1. 0. 1. 0. 0. 1. 1.
 1. 1. 1. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 1. 1. 0. 0. 0. 1. 0. 1. 0. 0.
 1. 0. 0. 0. 0. 1. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0. 1. 0. 1. 0. 0. 0. 1. 0.
 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 0.
 0. 0. 0. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 0. 0. 1. 1. 1. 0. 0. 0.
 1. 0. 0. 0. 1. 1. 0. 0. 1. 1. 1. 1. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.
 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 1. 1. 0. 0. 0. 1. 0. 0. 0. 0. 1. 1. 0. 0.
 0. 0. 1. 1. 0. 0. 0. 1. 0. 1. 0. 1. 0. 0. 0. 0. 0. 1. 1. 1. 1. 1. 0. 0.
 1. 1. 0. 1. 0. 1. 1. 1. 0. 0. 0. 0. 0. 0. 1. 1. 0. 1. 0. 0. 0. 1. 1. 1.
 1.

### Description of Model

There are three layers, including output layer.

* The model expects rows of data with 8 variables (the first argument at the first layer set to 8)
* The first hidden layer has 12 neurons, followed by a ReLU activation function
* The second hidden layer has 8 neurons, followed by another ReLU activation function
* The output layer has one neuron, followed by a sigmoid activation function


In [4]:
model = nn.Sequential(
    nn.Linear(8, 12),
    nn.ReLU(),
    nn.Linear(12, 8),
    nn.ReLU(),
    nn.Linear(8, 1),
    nn.Sigmoid())
print(model)

Sequential(
  (0): Linear(in_features=8, out_features=12, bias=True)
  (1): ReLU()
  (2): Linear(in_features=12, out_features=8, bias=True)
  (3): ReLU()
  (4): Linear(in_features=8, out_features=1, bias=True)
  (5): Sigmoid()
)


### Never Mind that Model, Let's Do It As a Class

The same thing as above, but with more control.

_(Notice how I've changed the number of neurons in the layers)_

In [5]:
class PimaClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden1 = nn.Linear(8, 16)
        self.act1 = nn.ReLU()
        self.hidden2 = nn.Linear(16, 10)
        self.act2 = nn.ReLU()
        self.output = nn.Linear(10, 1)
        self.act_output = nn.Sigmoid()
 
    def forward(self, x):
        x = self.act1(self.hidden1(x))
        x = self.act2(self.hidden2(x))
        x = self.act_output(self.output(x))
        return x
 
device = torch.device("cuda")
model = PimaClassifier()
model.to(device)
print(model)

PimaClassifier(
  (hidden1): Linear(in_features=8, out_features=16, bias=True)
  (act1): ReLU()
  (hidden2): Linear(in_features=16, out_features=10, bias=True)
  (act2): ReLU()
  (output): Linear(in_features=10, out_features=1, bias=True)
  (act_output): Sigmoid()
)


### Preparation for Training

In [6]:
loss_fn = nn.BCELoss()  # binary cross entropy
optimizer = optim.Adam(model.parameters(), lr=0.001)

### Training a Model

**Epoch**: One epoch equals a single pass through entire training set    
**Batch**: A group of samples, to which gradient descent algorithm will be applied for one iteration (a subset of whole training set)    

More epochs = better results, but takes longer. Larger batches = more consumption of memory, more processing power.

In [9]:
n_epochs = 200
batch_size = 10
 
for epoch in range(n_epochs):
    for i in range(0, len(X), batch_size):
        Xbatch = X[i:i+batch_size]
        y_pred = model(Xbatch)
        ybatch = y[i:i+batch_size]
        loss = loss_fn(y_pred, ybatch)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    print(f'Finished epoch {epoch}, latest loss {loss}')

Finished epoch 0, latest loss 0.30724889039993286
Finished epoch 1, latest loss 0.30124783515930176
Finished epoch 2, latest loss 0.3001868724822998
Finished epoch 3, latest loss 0.29513218998908997
Finished epoch 4, latest loss 0.30115658044815063
Finished epoch 5, latest loss 0.29950520396232605
Finished epoch 6, latest loss 0.2959151566028595
Finished epoch 7, latest loss 0.29114043712615967
Finished epoch 8, latest loss 0.29491347074508667
Finished epoch 9, latest loss 0.29699355363845825
Finished epoch 10, latest loss 0.30004462599754333
Finished epoch 11, latest loss 0.3000401258468628
Finished epoch 12, latest loss 0.2990044355392456
Finished epoch 13, latest loss 0.30238819122314453
Finished epoch 14, latest loss 0.2956019639968872
Finished epoch 15, latest loss 0.29407888650894165
Finished epoch 16, latest loss 0.2962999939918518
Finished epoch 17, latest loss 0.29914093017578125
Finished epoch 18, latest loss 0.29775315523147583
Finished epoch 19, latest loss 0.29859873652458

Finished epoch 162, latest loss 0.19389747083187103
Finished epoch 163, latest loss 0.20041495561599731
Finished epoch 164, latest loss 0.20014788210391998
Finished epoch 165, latest loss 0.19989190995693207
Finished epoch 166, latest loss 0.19117200374603271
Finished epoch 167, latest loss 0.20413589477539062
Finished epoch 168, latest loss 0.19960837066173553
Finished epoch 169, latest loss 0.19967180490493774
Finished epoch 170, latest loss 0.20119434595108032
Finished epoch 171, latest loss 0.19694431126117706
Finished epoch 172, latest loss 0.20209386944770813
Finished epoch 173, latest loss 0.1908007264137268
Finished epoch 174, latest loss 0.19370698928833008
Finished epoch 175, latest loss 0.19892464578151703
Finished epoch 176, latest loss 0.18906790018081665
Finished epoch 177, latest loss 0.19236239790916443
Finished epoch 178, latest loss 0.19604504108428955
Finished epoch 179, latest loss 0.19457471370697021
Finished epoch 180, latest loss 0.19232258200645447
Finished epoc

### Make Predictions

We're reusing the training data as test data. Ideally, the predictions would be exactly the same as the real-life output, but the model isn't perfect, of course.

In [10]:
# compute accuracy (no_grad is optional)
with torch.no_grad():
    y_pred = model(X)
 
accuracy = (y_pred.round() == y).float().mean()
print(f"Accuracy {accuracy}")

Accuracy 0.8307291865348816


### Extra Fun

I experimented with using cuda, with increasing the number of epochs, and with changing the number of neurons.