# Deep Learning with Python (2nd ed.)
[Website](https://www.manning.com/books/deep-learning-with-python-second-edition)

# 2.5 Looking Back at Our First Example

## 5.1 Reimplementing our first example from scratch in TensorFlow

### A Simple Dense Class

You’ve learned earlier that the Dense layer implements the following input transformation, where `W` and `b` are model parameters, and activation is an element-wise function (usually `relu`, but it would be `softmax` for the last layer).

`output = activation(dot(W, input) + b)`

Let’s implement a simple Python class, `NaiveDense`, that creates two TensorFlow variables, `W` and `b`, and exposes a `__call__()` method that applies the preceding transformation.

In [5]:
import tensorflow as tf
  
class NaiveDense:
    def __init__(self, input_size, output_size, activation):
        self.activation = activation
         
        # Create a matrix, W, of shape (input_size, output_size)
        w_shape = (input_size, output_size)
        # initialize with random values
        w_initial_value = tf.random.uniform(w_shape, minval=0, maxval=1e-1)
        self.W = tf.Variable(w_initial_value)
        # Create a vector, b, of shape (output_size,)
        b_shape = (output_size,)
        # initialize with zeros
        b_initial_value = tf.zeros(b_shape)
        self.b = tf.Variable(b_initial_value)
        
    # Apply the forward pass
    def __call__(self, inputs):
        return self.activation(tf.matmul(inputs, self.W) + self.b)
    
    # Convenience method for retrieving the layer’s weights
    @property
    def weights(self):
        return [self.W, self.b]

### A Simple Sequential Class

Now, let’s create a `NaiveSequential` class to chain these layers. It wraps a list of layers and exposes a `__call__()` method that simply calls the underlying layers on the inputs, in order. It also features a weights property to easily keep track of the layers’ parameters.

In [6]:
class NaiveSequential:
    def __init__(self, layers):
        self.layers = layers
  
    def __call__(self, inputs):
        x = inputs
        for layer in self.layers:
           x = layer(x)
        return x
  
    @property 
    def weights(self):
       weights = []
       for layer in self.layers:
           weights += layer.weights
       return weights

Using this `NaiveDense` class and this `NaiveSequential` class, we can create a mock Keras model.

In [7]:
model = NaiveSequential([
    NaiveDense(input_size=28 * 28, output_size=512, activation=tf.nn.relu),
    NaiveDense(input_size=512, output_size=10, activation=tf.nn.softmax)
]) 
assert len(model.weights) == 4

### A Batch Generator

Next, we need a way to iterate over the MNIST data in mini-batches. This is easy.

In [8]:
import math
  
class BatchGenerator:
    def __init__(self, images, labels, batch_size=128):
        assert len(images) == len(labels)
        self.index = 0
        self.images = images
        self.labels = labels
        self.batch_size = batch_size
        self.num_batches = math.ceil(len(images) / batch_size)
 
    def next(self):
        images = self.images[self.index : self.index + self.batch_size]
        labels = self.labels[self.index : self.index + self.batch_size]
        self.index += self.batch_size
        return images, labels

## 5.2 Running one training step

The most difficult part of the process is the “training step”: updating the weights of the model after running it on one batch of data. We need to

1. Compute the predictions of the model for the images in the batch.
2. Compute the loss value for these predictions, given the actual labels.
3. Compute the gradient of the loss with regard to the model’s weights.
4. Move the weights by a small amount in the direction opposite to the gradient.

To compute the gradient, we will use the TensorFlow `GradientTape` object we introduced in section 2.4.4.

In [9]:
def one_training_step(model, images_batch, labels_batch):
    with tf.GradientTape() as tape:
        predictions = model(images_batch)
        per_sample_losses = tf.keras.losses.sparse_categorical_crossentropy(
            labels_batch, predictions)
        average_loss = tf.reduce_mean(per_sample_losses)
    gradients = tape.gradient(average_loss, model.weights)
    update_weights(gradients, model.weights)
    return average_loss

As you already know, the purpose of the “weight update” step (represented by the preceding update_weights function) is to move the weights by “a bit” in a direction that will reduce the loss on this batch. 

The magnitude of the move is determined by the “learning rate,” typically a small quantity. 

The simplest way to implement this update_weights function is to subtract `gradient * learning_rate` from each weight:

In [10]:
learning_rate = 1e-3 
  
def update_weights(gradients, weights):
    for g, w in zip(gradients, weights):
        w.assign_sub(g * learning_rate)

In practice, you would almost never implement a weight update step like this by hand. Instead, you would use an `Optimizer` instance from Keras, like this.

In [11]:
from tensorflow.keras import optimizers
  
optimizer = optimizers.SGD(learning_rate=1e-3)
  
def update_weights(gradients, weights):
    optimizer.apply_gradients(zip(gradients, weights))

Now that our per-batch training step is ready, we can move on to implementing an entire epoch of training.

## 5.3 The Full Training Loop

An epoch of training simply consists of repeating the training step for each batch in the training data, and the full training loop is simply the repetition of one epoch.

In [12]:
def fit(model, images, labels, epochs, batch_size=128):
    for epoch_counter in range(epochs):
        print(f"Epoch {epoch_counter}")
        batch_generator = BatchGenerator(images, labels)
        for batch_counter in range(batch_generator.num_batches):
            images_batch, labels_batch = batch_generator.next()
            loss = one_training_step(model, images_batch, labels_batch)
            if batch_counter % 100 == 0:
                print(f"loss at batch {batch_counter}: {loss:.2f}")

Let’s test drive it.

In [13]:
from tensorflow.keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
  
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype("float32") / 255  
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype("float32") / 255 
  
fit(model, train_images, train_labels, epochs=10, batch_size=128)

Epoch 0
loss at batch 0: 4.41
loss at batch 100: 2.21
loss at batch 200: 2.19
loss at batch 300: 2.11
loss at batch 400: 2.23
Epoch 1
loss at batch 0: 1.91
loss at batch 100: 1.86
loss at batch 200: 1.82
loss at batch 300: 1.73
loss at batch 400: 1.85
Epoch 2
loss at batch 0: 1.60
loss at batch 100: 1.57
loss at batch 200: 1.50
loss at batch 300: 1.44
loss at batch 400: 1.53
Epoch 3
loss at batch 0: 1.34
loss at batch 100: 1.33
loss at batch 200: 1.24
loss at batch 300: 1.22
loss at batch 400: 1.30
Epoch 4
loss at batch 0: 1.14
loss at batch 100: 1.15
loss at batch 200: 1.05
loss at batch 300: 1.06
loss at batch 400: 1.13
Epoch 5
loss at batch 0: 1.00
loss at batch 100: 1.01
loss at batch 200: 0.91
loss at batch 300: 0.94
loss at batch 400: 1.01
Epoch 6
loss at batch 0: 0.89
loss at batch 100: 0.90
loss at batch 200: 0.81
loss at batch 300: 0.85
loss at batch 400: 0.92
Epoch 7
loss at batch 0: 0.80
loss at batch 100: 0.81
loss at batch 200: 0.73
loss at batch 300: 0.77
loss at batch 40

## 5.4 Evaluating the Model

We can evaluate the model by taking the argmax of its predictions over the test images, and comparing it to the expected labels.

In [16]:
import numpy as np

predictions = model(test_images)
predictions = predictions.numpy()
predicted_labels = np.argmax(predictions, axis=1)
matches = predicted_labels == test_labels
print(f"accuracy: {matches.mean():.2f}")

accuracy: 0.82


All done! As you can see, it’s quite a bit of work to do “by hand” what you can do in a few lines of Keras code. But because you’ve gone through these steps, you should now have a crystal clear understanding of what goes on inside a neural network when you call `fit()`. Having this low-level mental model of what your code is doing behind the scenes will make you better able to leverage the high-level features of the Keras API.