# Create an ANN from Scratch
In this project, we will build a neural network from scratch (without using any Keras functionality) by applying the mathematics of deep learning. Finally, we'll implement it in image classification task on MNIST dataset.

## The Dense Layer
The Dense layer implements the following input transformation: 

**output = activation(dot(input, W) + b)**

where W and b are model parameters - W is the connection matrix and b the bias added to each neuron, and activation is an element-wise function (usually relu, but it would be softmax for the last layer). Let’s implement a simple Python class, NaiveDense, that creates two TensorFlow variables, W and b, and exposes a call() method that applies the preceding transformation.

In [1]:
import tensorflow as tf

class NaiveDense:
    def __init__(self, input_size, output_size, activaton):
        self.activaton = activaton
        
        #Create a matrix, W
        w_shape = (input_size, output_size)
        w_initial_value = tf.random.uniform(w_shape, minval=0, maxval=1e-1)
        self.W = tf.Variable(w_initial_value)
        
        #Create a vector, b
        b_shape = (output_size, )
        b_initial_value = tf.zeros(b_shape)
        self.b = tf.Variable(b_initial_value)
        
    #forward pass
    def __call__(self, inputs):
        return self.activaton(tf.matmul(inputs, self.W) + self.b)
    
    @property
    def weights(self):
        return [self.W, self.b]

create a NaiveSequential class to chain these layers

In [2]:
class NaiveSequential:
    def __init__(self, layers):
        self.layers = layers
    
    def __call__(self, inputs):
        x = inputs
        for layer in self.layers:
            x = layer(x)
        return x
    
    @property
    def weights(self):
        weights = []
        for layer in self.layers:
            weights += layer.weights
        return weights

Now simulate a ANN model with activation functions:

In [3]:
model = NaiveSequential([
    NaiveDense(input_size=28*28, output_size=512, activaton=tf.nn.relu),
    NaiveDense(input_size=512, output_size=10, activaton=tf.nn.softmax)
])

assert len(model.weights) == 4

2023-01-04 16:48:54.766851: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


Create a batch generator

In [4]:
import math

class BatchGenerator:
    def __init__(self, images, labels, batch_size=128):
        assert len(images) == len(labels)
        self.index = 0
        self.images = images
        self.labels = labels
        self.batch_size = batch_size
        self.num_batches = math.ceil(len(images) / batch_size)
        
    def next(self):
        images = self.images[self.index : self.index + self.batch_size]
        labels = self.labels[self.index : self.index + self.batch_size]
        self.index += self.batch_size
        return images, labels

Let's create one single training step including:
1. Compute the predictions of the model for the images in the batch.
2. Compute the loss value for these predictions, given the actual labels.
3. Compute the gradient of the loss with regard to the model’s weights.
4. Move the weights by a small amount in the direction opposite to the gradient.

In [5]:
learning_rate = 1e-3

#move the weights by “a bit” in a direction that will reduce the loss on this batch
def update_weights(gradients, weights): 
    for g, w in zip(gradients, weights):
        w.assign_sub(g * learning_rate)
        
def one_training_step(model, images_batch, labels_batch):
    with tf.GradientTape() as tape:
        predictions = model(images_batch)
        per_sample_losses = tf.keras.losses.sparse_categorical_crossentropy(labels_batch, predictions)
        average_loss = tf.reduce_mean(per_sample_losses)
        
    gradients = tape.gradient(average_loss, model.weights)
    update_weights(gradients, model.weights)
    return average_loss

Define the full training loop:

In [6]:
def fit(model, images, labels, epochs, batch_size=128): 
    for epoch_counter in range(epochs):
        print(f"Epoch {epoch_counter}")
    batch_generator = BatchGenerator(images, labels)
    for batch_counter in range(batch_generator.num_batches):
        images_batch, labels_batch = batch_generator.next()
        loss = one_training_step(model, images_batch, labels_batch) 
        if batch_counter % 100 == 0:
            print(f"loss at batch {batch_counter}: {loss:.2f}")

## Test the Model:

In [7]:
from tensorflow.keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype("float32") / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype("float32") / 255

fit(model, train_images, train_labels, epochs=10, batch_size=128)

Epoch 0
Epoch 1
Epoch 2
Epoch 3
Epoch 4
Epoch 5
Epoch 6
Epoch 7
Epoch 8
Epoch 9
loss at batch 0: 4.36
loss at batch 100: 2.28
loss at batch 200: 2.24
loss at batch 300: 2.11
loss at batch 400: 2.26


Evaluate the model:

In [8]:
import numpy as np 

predictions = model(test_images)
predictions = predictions.numpy() 
predicted_labels = np.argmax(predictions, axis=1) 
matches = predicted_labels == test_labels 
print(f"accuracy: {matches.mean():.2f}")

accuracy: 0.46


## Exkurs
If we utilize the Keras Layer subclass, we do not have to explicitly specify the input and output shapes of each layer. The layer itself will infer the shape. We would build a simple dense model like this:

In [1]:
from tensorflow import keras

In [None]:
class SimpleDense(keras.layers.Layer):
    
    def __init__(self, units, activation=None):
        super().__init__()
        self.units = units
        self.activation = activation
        
    #weight creation
    def build(self, input_shape):
        input_dim = input_shape[-1]
        self.W = self.add_weight(shape=(input_dim, self.units), initializer='random_normal')
        self.b = self.add_weight(shape=(self.units,), initializer='zeros')
        
    #forward pass
    def call(self, inputs):
        y = tf.matmul(inputs, self.W) + self.b
        if self.activation is not None:
            y = self.activation(y)
        return y