<a href="https://colab.research.google.com/github/MJMortensonWarwick/AI-DL/blob/main/1_1_neural_network_from_scratch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Neural Network from Scratch (with TensorFlow)
This first Notebook will take us through building our first neural network, building the network from the ground up (as much as possible). If you haven't already, be sure to switch to GPU processing by clicking Runtime > Change runtime type and selecting GPU. We can test this has worked with the following code:

In [None]:
import tensorflow as tf

print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

Num GPUs Available:  1


Hopefully your code shows you have 1 GPU available! Next let's get some data. We'll start with an old favourite:

In [None]:
# the code from the last module
from sklearn.datasets import load_diabetes

import pandas as pd
import numpy as np

# import the data
data = load_diabetes()
data

{'DESCR': '.. _diabetes_dataset:\n\nDiabetes dataset\n----------------\n\nTen baseline variables, age, sex, body mass index, average blood\npressure, and six blood serum measurements were obtained for each of n =\n442 diabetes patients, as well as the response of interest, a\nquantitative measure of disease progression one year after baseline.\n\n**Data Set Characteristics:**\n\n  :Number of Instances: 442\n\n  :Number of Attributes: First 10 columns are numeric predictive values\n\n  :Target: Column 11 is a quantitative measure of disease progression one year after baseline\n\n  :Attribute Information:\n      - age     age in years\n      - sex\n      - bmi     body mass index\n      - bp      average blood pressure\n      - s1      tc, total serum cholesterol\n      - s2      ldl, low-density lipoproteins\n      - s3      hdl, high-density lipoproteins\n      - s4      tch, total cholesterol / HDL\n      - s5      ltg, possibly log of serum triglycerides level\n      - s6      glu, b

We are working on a regression problem, with "structured" data which has already been cleaned and standardised. We can skip the usual cleaning/engineering steps. However, we do need to get the data into TensorFlow:

In [None]:
tf_dataset = tf.data.Dataset.from_tensor_slices((data.data, data.target))

Now our data is stored in tensors we can do train/test splitting as before. However, as we will care about batch size lets pick an easy number to work with:

In [None]:
data_size = len(data.data)
print(data_size)

442


Given 442 records we can take 400 as training (roughly 90% ... it is a small dataset) and keep 42 for test. In TF we use _take_ to subset the first $n$ values and _skip_ to ignore these and subset the rest:

In [None]:
train_dataset = tf_dataset.take(400)
test_dataset = tf_dataset.skip(400)

Now we can set up our batches for training. As we have a nice round 400 let's go with batches of 50 (8 batches in total). We'll also seperate the features and labels:

In [None]:
train_batch = train_dataset.batch(50)
features, labels = next(iter(train_batch))

print(features)
print(labels)

tf.Tensor(
[[ 3.80759064e-02  5.06801187e-02  6.16962065e-02  2.18723550e-02
  -4.42234984e-02 -3.48207628e-02 -4.34008457e-02 -2.59226200e-03
   1.99084209e-02 -1.76461252e-02]
 [-1.88201653e-03 -4.46416365e-02 -5.14740612e-02 -2.63278347e-02
  -8.44872411e-03 -1.91633397e-02  7.44115641e-02 -3.94933829e-02
  -6.83297436e-02 -9.22040496e-02]
 [ 8.52989063e-02  5.06801187e-02  4.44512133e-02 -5.67061055e-03
  -4.55994513e-02 -3.41944659e-02 -3.23559322e-02 -2.59226200e-03
   2.86377052e-03 -2.59303390e-02]
 [-8.90629394e-02 -4.46416365e-02 -1.15950145e-02 -3.66564468e-02
   1.21905688e-02  2.49905934e-02 -3.60375700e-02  3.43088589e-02
   2.26920226e-02 -9.36191133e-03]
 [ 5.38306037e-03 -4.46416365e-02 -3.63846922e-02  2.18723550e-02
   3.93485161e-03  1.55961395e-02  8.14208361e-03 -2.59226200e-03
  -3.19914449e-02 -4.66408736e-02]
 [-9.26954778e-02 -4.46416365e-02 -4.06959405e-02 -1.94420933e-02
  -6.89906499e-02 -7.92878444e-02  4.12768238e-02 -7.63945038e-02
  -4.11803852e-02 -9.6

Now its time to build our model. We'll keep it simple ... a model with an input layer of 10 features and then 2x _Dense_ (fully connected) layers each with 20 neurons and ReLU activation. Our output layer will be size=1 given this is a regression problem and we want a single value output per prediction.

In [None]:
model = tf.keras.Sequential([
  tf.keras.layers.Dense(20, activation=tf.nn.relu, input_shape=(10, )),  # 10 features
  tf.keras.layers.Dense(20, activation=tf.nn.relu),
  tf.keras.layers.Dense(1)
])

With our model built we can define a loss function ... MSE in this case. We can also make a prediction on a single batch:

In [None]:
loss_object = tf.keras.losses.MeanSquaredError()

def loss(model, x, y):
  y_ = model(x)
  return loss_object(y_true=y, y_pred=y_)

l = loss(model, features, labels)
print("Loss test: {}".format(l))

Loss test: 25657.927734375


Not a great loss/MSE rate ... but the model hasn't been trained yet 😸

Let's get on with the training:

In [None]:
def grad(model, inputs, targets):
  with tf.GradientTape() as tape:
    loss_value = loss(model, inputs, targets)
  return loss_value, tape.gradient(loss_value, model.trainable_variables)

optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)

loss_value, grads = grad(model, features, labels)

print("Step: {}, Initial Loss: {}".format(optimizer.iterations.numpy(),
                                          loss_value.numpy()))

optimizer.apply_gradients(zip(grads, model.trainable_variables))

print("Step: {}, Loss: {}".format(optimizer.iterations.numpy(),
                                  loss(model, features, labels).numpy()))

Step: 0, Initial Loss: 25657.927734375
Step: 1, Loss: 25635.466796875


Here we have defined another function to manage the backpropogation. We use gradiant tape to record the loss and the change in our variables. We use Adam (with a learning rate of 0.01) as our optimiser.

Lastly we run a single gradient update to our mode. A modest improvement but we can do better.

Next we will train our model. We'll run for 600 epochs (its a small dataset) so 12,000 iterations. To keep an eye on things and track progress we will print the loss to screen every 75 epochs:

In [None]:
# Keep results for plotting
train_loss_results = []
train_accuracy_results = []

num_epochs = 600

for epoch in range(num_epochs):
  epoch_loss_avg = tf.keras.metrics.Mean()

  # Training loop - using batches of 50
  for x, y in train_batch:
    # Optimize the model
    loss_value, grads = grad(model, x, y)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))

    # Track progress
    epoch_loss_avg.update_state(loss_value)  # Add current batch loss

    # End epoch
    train_loss_results.append(epoch_loss_avg.result())

  if epoch % 75 == 0:
    print(f"Epoch {epoch}: Loss: {epoch_loss_avg.result()}")

# Print the final epoch loss
print(f"Epoch {epoch}: Loss: {epoch_loss_avg.result()}")

Epoch 0: Loss: 29102.8515625
Epoch 75: Loss: 2999.344970703125
Epoch 150: Loss: 2911.3115234375
Epoch 225: Loss: 2818.41455078125
Epoch 300: Loss: 2765.358642578125
Epoch 375: Loss: 2713.15185546875
Epoch 450: Loss: 2665.478759765625
Epoch 525: Loss: 2582.97607421875
Epoch 599: Loss: 2513.35302734375


Great - a working model. We can significant improvement early on - 29,160 MSE on epoch zero to 3,010 by epoch 75. After that, as we would expect, progress is slower but we keep improving. Of course we could run for many more epochs but this is just for fun. Let's see how it does on the test data: 

In [None]:
test_batch = test_dataset.batch(10)

for (x, y) in test_batch:
  y_pred = model(x)
  for pred, real in zip(y_pred, y):
    print(f"Predicted: {pred[0]};    Real: {real}")

Predicted: 151.572998046875;    Real: 175.0
Predicted: 77.31839752197266;    Real: 93.0
Predicted: 173.27821350097656;    Real: 168.0
Predicted: 252.98672485351562;    Real: 275.0
Predicted: 169.44113159179688;    Real: 293.0
Predicted: 302.7404479980469;    Real: 281.0
Predicted: 108.14032745361328;    Real: 72.0
Predicted: 169.71478271484375;    Real: 140.0
Predicted: 201.80247497558594;    Real: 189.0
Predicted: 170.5944366455078;    Real: 181.0
Predicted: 147.5943145751953;    Real: 209.0
Predicted: 106.42693328857422;    Real: 136.0
Predicted: 255.49354553222656;    Real: 261.0
Predicted: 113.81620025634766;    Real: 113.0
Predicted: 177.01800537109375;    Real: 131.0
Predicted: 163.92845153808594;    Real: 174.0
Predicted: 220.10789489746094;    Real: 257.0
Predicted: 144.87799072265625;    Real: 55.0
Predicted: 109.49939727783203;    Real: 84.0
Predicted: 88.90261840820312;    Real: 42.0
Predicted: 114.27027130126953;    Real: 146.0
Predicted: 211.2976531982422;    Real: 212.0
P

Mostly these seem to be pretty good predicitons ... well done you. To end the tutorial let's run this again but using some of the simplified Keras tools:

In [None]:
# Keras implementation
model.compile(optimizer='adam',
              loss=tf.keras.losses.MeanSquaredError(),
              metrics=['mse'])

model.fit(features, labels, epochs=300)

Epoch 1/300
Epoch 2/300
Epoch 3/300
Epoch 4/300
Epoch 5/300
Epoch 6/300
Epoch 7/300
Epoch 8/300
Epoch 9/300
Epoch 10/300
Epoch 11/300
Epoch 12/300
Epoch 13/300
Epoch 14/300
Epoch 15/300
Epoch 16/300
Epoch 17/300
Epoch 18/300
Epoch 19/300
Epoch 20/300
Epoch 21/300
Epoch 22/300
Epoch 23/300
Epoch 24/300
Epoch 25/300
Epoch 26/300
Epoch 27/300
Epoch 28/300
Epoch 29/300
Epoch 30/300
Epoch 31/300
Epoch 32/300
Epoch 33/300
Epoch 34/300
Epoch 35/300
Epoch 36/300
Epoch 37/300
Epoch 38/300
Epoch 39/300
Epoch 40/300
Epoch 41/300
Epoch 42/300
Epoch 43/300
Epoch 44/300
Epoch 45/300
Epoch 46/300
Epoch 47/300
Epoch 48/300
Epoch 49/300
Epoch 50/300
Epoch 51/300
Epoch 52/300
Epoch 53/300
Epoch 54/300
Epoch 55/300
Epoch 56/300
Epoch 57/300
Epoch 58/300
Epoch 59/300
Epoch 60/300
Epoch 61/300
Epoch 62/300
Epoch 63/300
Epoch 64/300
Epoch 65/300
Epoch 66/300
Epoch 67/300
Epoch 68/300
Epoch 69/300
Epoch 70/300
Epoch 71/300
Epoch 72/300
Epoch 73/300
Epoch 74/300
Epoch 75/300
Epoch 76/300
Epoch 77/300
Epoch 78

<keras.callbacks.History at 0x7fa5f3b8a410>

Testing ...

In [None]:
test_batch = test_dataset.batch(42) # the whole dataset
test_features, test_labels = next(iter(test_batch))

test_loss, test_mse = model.evaluate(test_features,  test_labels, verbose=2)
print('\nTest MSE:', test_mse)

2/2 - 0s - loss: 1653.4844 - mse: 1653.4844 - 159ms/epoch - 80ms/step

Test MSE: 1653.484375


Predictions ...

In [None]:
y_pred = model.predict(test_features)
for pred, real in zip(y_pred, test_labels):
    print(f"Predicted: {pred[0]};    Real: {real}")

_, rmse = model.evaluate(test_features, test_labels, verbose=0)

Predicted: 165.4960479736328;    Real: 175.0
Predicted: 81.77456665039062;    Real: 93.0
Predicted: 144.0576171875;    Real: 168.0
Predicted: 267.83673095703125;    Real: 275.0
Predicted: 182.22000122070312;    Real: 293.0
Predicted: 313.6932067871094;    Real: 281.0
Predicted: 108.27365112304688;    Real: 72.0
Predicted: 149.18771362304688;    Real: 140.0
Predicted: 200.7175750732422;    Real: 189.0
Predicted: 169.13662719726562;    Real: 181.0
Predicted: 154.9176025390625;    Real: 209.0
Predicted: 110.21495056152344;    Real: 136.0
Predicted: 239.6842041015625;    Real: 261.0
Predicted: 112.23800659179688;    Real: 113.0
Predicted: 163.11038208007812;    Real: 131.0
Predicted: 178.48025512695312;    Real: 174.0
Predicted: 209.123291015625;    Real: 257.0
Predicted: 148.43150329589844;    Real: 55.0
Predicted: 109.24411010742188;    Real: 84.0
Predicted: 77.59857940673828;    Real: 42.0
Predicted: 75.47003936767578;    Real: 146.0
Predicted: 192.97250366210938;    Real: 212.0
Predict

Overall very comparable results. One neural network down ... well done 👊