## Create NaiveDense Model

The `NaiveDense` class is a simple implementation of a dense (fully connected) neural network layer. It allows you to create a basic dense layer with specified input and output sizes, along with an activation function of your choice. This implementation includes the following functionalities:

- **Initialization:** The class constructor (`__init__`) takes `input_size`, `output_size`, and `activation` as parameters. It initializes the layer's weights and bias. The weights (`W`) are randomly initialized using uniform values, and the bias (`b`) is initialized to zeros.

- **Forward Pass:** The `__call__` method performs the forward pass of the layer. It takes `inputs` as input and computes the output of the layer by applying the activation function to the matrix multiplication of the inputs and the weights, followed by the addition of the bias.

- **Weight Access:** The `weights` property provides a convenient way to access the layer's weights and bias as a list, which is useful for weight initialization, visualization, or other purposes.

This `NaiveDense` class offers a basic building block for constructing neural network architectures and is suitable for educational purposes or for implementing simple models. Keep in mind that this implementation lacks some features and optimizations found in production-level deep learning frameworks.

To use the `NaiveDense` class, you can create an instance with the desired input size, output size, and activation function. Then, you can call the instance on input data to get the layer's output.

In [1]:
import tensorflow as tf;

class NaiveDense :
    def __init__(self, input_size, output_size, activation): 
        self.activation = activation
        w_shape = (input_size, output_size)
        w_initial_value = tf.random.uniform(w_shape, minval=0, maxval=1e-1)
        self.W = tf.Variable(w_initial_value)
        b_shape = (output_size, )
        b_initial_value = tf.zeros(b_shape)
        self.b = tf.Variable(b_initial_value)

    def __call__(self, inputs):
        return self.activation(tf.matmul(inputs, self.W) + self.b)
    
    @property
    def weights(self):
        return [self.W, self.b]
    
    

2023-08-21 19:50:31.966943: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-08-21 19:50:32.010069: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-08-21 19:50:32.010585: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2, in other operations, rebuild TensorFlow with the appropriate compiler flags.


# Create NaiveSequential Model

The `NaiveSequential` class represents a simple sequential neural network model. It is designed to stack multiple layers sequentially to create a neural network architecture. This class offers the following features:

- **Initialization:** The constructor (`__init__`) takes a list of `layers` as input. These layers are added sequentially to the model.

- **Forward Pass:** The `__call__` method performs a forward pass through the model. It takes `inputs` as input and applies each layer in the sequence to the inputs, building the network's output step by step.

- **Weight Access:** The `weights` property provides a way to access the weights and biases of all layers in the model. It aggregates the weights of each layer and returns them as a list.

The `NaiveSequential` class simplifies the process of creating and using a basic neural network model by allowing you to define a sequential architecture and perform forward passes with ease. While this class lacks some advanced functionalities found in more sophisticated deep learning libraries, it serves as a useful starting point for educational purposes or simple model building.

In [2]:
class NaiveSequential:
    def __init__(self, layers):
        self.layers = layers

    def __call__(self, inputs):
        x = inputs
        for layer in self.layers:
            x = layer(x)
        return x

    @property
    def weights(self):
        weights = []
        for layer in self.layers:
            weights += layer.weights
        return weights

## Create NaiveSequential Model with Two NaiveDense Layers

The provided code snippet demonstrates the creation of a neural network model using the `NaiveSequential` class. This model is comprised of two `NaiveDense` layers, each with distinct input and output sizes, as well as activation functions. The layers are organized sequentially to establish the architecture of the neural network.

The model's construction can be summarized as follows:

1. **First Layer (`NaiveDense`):**
   - Input size: 28 x 28 (784 pixels)
   - Output size: 512
   - Activation function: ReLU (Rectified Linear Unit)

2. **Second Layer (`NaiveDense`):**
   - Input size: 512
   - Output size: 10
   - Activation function: Softmax

This architecture is designed to transform input data, perform intermediate computations, and generate outputs suitable for classification tasks. By using the `NaiveSequential` class, you can conveniently construct and configure neural network models with sequential layers. The `weights` property allows you to access the weights associated with the layers in the model for analysis and evaluation purposes.


In [3]:
model = NaiveSequential([
    NaiveDense(input_size=28 * 28, output_size=512, activation=tf.nn.relu),
    NaiveDense(input_size=512, output_size=10, activation=tf.nn.softmax)
])

assert len(model.weights) == 4

## Create a Batch Generator

The `BatchGenerator` class facilitates the creation of batches from a given dataset for efficient training in machine learning models. This class streamlines the process of loading and organizing data during training, promoting better resource utilization and quicker convergence. Here's an overview of the `BatchGenerator` class:

- **Initialization (`__init__`):**
  The constructor takes `images` and `labels` as input, representing the dataset. It also includes an optional parameter, `batch_size`, to determine the size of each batch. The constructor asserts that the length of the input images matches the length of the labels.

- **Batch Retrieval (`next`):**
  The `next` method retrieves the next batch of images and labels from the dataset. It uses the `index` attribute to keep track of the current position in the dataset. The batch is extracted by slicing the dataset from `index` to `index + batch_size`. After retrieval, the `index` is updated to point to the next batch.

- **Batch Generation (`__call__`):**
  The `__call__` method is provided for a more convenient way to generate batches. It simply calls the `next` method to retrieve and return the next batch.

By employing the `BatchGenerator` class, data can be efficiently loaded in batches, reducing memory requirements and allowing models to process data more swiftly. This class is particularly useful when working with large datasets that cannot be loaded entirely into memory. It ensures a steady and optimized flow of data for model training and validation, contributing to enhanced training performance.

In [4]:
import math

class BatchGenerator : 
    def __init__(self, images, labels, batch_size=128):
        assert len(images) == len(labels)
        self.index = 0
        self.images = images
        self.labels = labels
        self.batch_size = batch_size
        self.num_batches = math.ceil(len(images) / batch_size)

    def next(self):
        images = self.images[self.index : self.index + self.batch_size]
        labels = self.labels[self.index : self.index + self.batch_size]
        self.index += self.batch_size
        return images, labels

    def __call__(self):
        return self.next()

## One Training Step with Gradient Descent
The provided code snippet illustrates the implementation of a single training step utilizing the gradient descent optimization technique. This process is a pivotal element in training neural network models to enhance their predictive capabilities. Here's a breakdown of the code:

### `one_training_step` Function
- **Forward Pass and Loss Calculation:**
  - The function takes the neural network `model`, `images_batch`, and `labels_batch` as inputs.
  - Within a `tf.GradientTape` context, it computes predictions by passing the `images_batch` through the model.
  - Per-sample losses are computed using the sparse categorical cross-entropy loss function.
  - The average loss is derived by calculating the mean of the per-sample losses.
  
- **Gradient Computation and Weight Update:**
  - Gradients of the average loss with respect to the model's weights are determined using the `tape.gradient` method.
  - The `update_weights` function is called to update the model's weights using the computed gradients and a predefined learning rate.
  - The learning rate (`learning_rate`) determines the magnitude of weight adjustments.
  
- **Return Value:**
  - The function returns the calculated average loss.

### `update_weights` Function and Learning Rate
- The `update_weights` function iterates through gradients and weights, applying weight updates based on the gradients and the learning rate.
- The learning rate (`learning_rate`) determines the step size for weight adjustments in the gradient descent process.

This iterative process seeks to minimize the loss and improve the model's accuracy and performance.

In [5]:
def one_training_step(model, images_batch, labels_batch):
    with tf.GradientTape() as tape:
        predictions = model(images_batch)
        per_sample_losses = tf.keras.losses.sparse_categorical_crossentropy(labels_batch, predictions)
        average_loss = tf.reduce_mean(per_sample_losses)
    gradients = tape.gradient(average_loss, model.weights)
    update_weights(gradients, model.weights)
    return average_loss

learning_rate = 1e-3

def update_weights(gradients, weights):
    for g, w in zip(gradients, model.weights):
        w.assign_sub(g * learning_rate)

## The Full Training Loop

The code snippet provides an overview of the complete training loop for training a neural network model. This loop encompasses multiple epochs and batch-wise training steps. The key components of the training loop are as follows:

### `fit` Function: Full Training Loop

- **Function Description:**
  The `fit` function performs the entire training process for a neural network model.

- **Input Arguments:**
  - `model`: The neural network model to be trained.
  - `images`: The training images dataset.
  - `labels`: The corresponding labels for the training images.
  - `epochs`: The number of training epochs.
  - `batch_size`: The size of each training batch (default is 128).

- **Training Loop:**
  The outer loop iterates over the specified number of epochs (`epochs`).
  - For each epoch, the current epoch counter is displayed.

- **Batch Generation and Training Steps:**
  - A `BatchGenerator` is initialized with the training images, labels, and the specified batch size.
  - The inner loop iterates over the batches within the current epoch.
  - For each batch, the `images_batch` and `labels_batch` are extracted from the `batch_generator`.
  - A training step is performed using the `one_training_step` function, updating the model's weights and calculating the loss.

- **Loss Logging:**
  - After each training step (batch), if the batch counter is a multiple of 100, the current loss is displayed for monitoring the training progress.

This loop facilitates the iterative improvement of the model's performance by adjusting its weights based on computed gradients and minimizing the loss.

In [6]:
def fit(model, images, labels, epochs, batch_size=128):
    for epoch_counter in range(epochs):
        print('Epoch %d' % epoch_counter)
        batch_generator = BatchGenerator(images, labels)
        for batch_counter in range(batch_generator.num_batches):
            images_batch, labels_batch = batch_generator.next()
            loss = one_training_step(model, images_batch, labels_batch)
            if batch_counter % 100 == 0:
                print('loss at batch %d: %.2f' % (batch_counter, loss))

## Data Preparation and Training Initialization

The provided code snippet outlines the process of preparing data and initializing the training for a neural network model utilizing Keras. Here's an overview of the steps performed:

### Data Preparation

- **Data Loading:** The MNIST dataset is loaded using the `mnist.load_data()` function, resulting in training and testing sets of images and labels.

- **Data Reshaping:** The images are reshaped into a flattened format `(num_samples, 28 * 28)` to match the model's input requirements.

- **Data Normalization:** The pixel values of the images are normalized to a range between 0 and 1 by dividing by 255, which is a common practice in data preprocessing.

### Training Initialization

- **Model Training:** The `fit` function is used to initiate the training process for the neural network model. It takes the model, training images, training labels, number of epochs, and optionally the batch size as inputs.

By following these steps, the code sets the stage for training a neural network model on image classification tasks using the MNIST dataset. The preparation of data and training initialization are crucial components in the overall training process.

In [7]:
from tensorflow.keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

train_images = train_images.reshape((60000, 28 * 28))

train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28 * 28))

test_images = test_images.astype('float32') / 255

fit(model, train_images, train_labels, epochs=10, batch_size=128)

Epoch 0
loss at batch 0: 4.04
loss at batch 100: 2.20
loss at batch 200: 2.19
loss at batch 300: 2.06
loss at batch 400: 2.19
Epoch 1
loss at batch 0: 1.88
loss at batch 100: 1.85
loss at batch 200: 1.82
loss at batch 300: 1.69
loss at batch 400: 1.81
Epoch 2
loss at batch 0: 1.56
loss at batch 100: 1.56
loss at batch 200: 1.50
loss at batch 300: 1.41
loss at batch 400: 1.50
Epoch 3
loss at batch 0: 1.31
loss at batch 100: 1.32
loss at batch 200: 1.23
loss at batch 300: 1.20
loss at batch 400: 1.27
Epoch 4
loss at batch 0: 1.11
loss at batch 100: 1.14
loss at batch 200: 1.04
loss at batch 300: 1.04
loss at batch 400: 1.11
Epoch 5
loss at batch 0: 0.97
loss at batch 100: 1.01
loss at batch 200: 0.90
loss at batch 300: 0.92
loss at batch 400: 0.99
Epoch 6
loss at batch 0: 0.86
loss at batch 100: 0.90
loss at batch 200: 0.79
loss at batch 300: 0.83
loss at batch 400: 0.90
Epoch 7
loss at batch 0: 0.78
loss at batch 100: 0.82
loss at batch 200: 0.71
loss at batch 300: 0.76
loss at batch 40

## Evaluating the Model

The provided code snippet focuses on evaluating the performance of a trained neural network model using a test dataset. The key steps involved in this evaluation process are as follows:

### Prediction and Evaluation

- **Prediction:** The trained `model` is utilized to generate predictions for the `test_images` dataset. This results in a `predictions` array, computed using NumPy.

- **Predicted Labels:** Using the `argmax` function along the appropriate axis, the `predicted_labels` are derived from the `predictions`. These labels represent the classes predicted by the model for each input image.

- **Accuracy Calculation:** By comparing the `predicted_labels` with the actual `test_labels`, the code calculates the accuracy of the model's predictions. This is achieved by creating the `matches` array, which contains Boolean values indicating whether the predictions match the ground truth labels.

### Displaying Results

- **Accuracy Display:** The accuracy of the model's predictions is computed by calculating the mean of the `matches` array. This value reflects the proportion of correct predictions made by the model on the test dataset.

The code's objective is to quantitatively assess the model's effectiveness in classifying new, unseen data. The accuracy metric provides valuable insights into the model's performance and its ability to generalize well to new examples. This evaluation step is vital for understanding the real-world applicability and robustness of the trained model.

In [8]:
import numpy as np

predictions = model(test_images)
predictions  = predictions.numpy()

predicted_labels = np.argmax(predictions, axis=1)

matches = predicted_labels == test_labels

print('accuracy: %.2f' % matches.mean())

accuracy: 0.81
