# DeMystifying Machine Learning

## This Notebook

This notebook consists of 4 parts:


### Part 1 - NumPY DNN

This part will walk through with an example constructing a DNN(MLP) using numoy from scratch.

This will demonstrate how the forward and back propagation work for a simple training of a controlled dataset.

### Part 2 - PyTorch DNN

This part will demonstrate repeating the process of building an MLP using the PyTorch API to repeat the same work.

Hopefully this demonstrates the advantage of working with frameworks which do the heavy lifting of generating the back-propagation for us from scratch

If in doubt the PyTorch docs are available here: https://pytorch.org/docs/stable/nn.html

## Part 3 - PyTorch Classifier

This part will demonstrate constructing a Classifier model to classify data from the mnist numerical dataset.

We will also use an independent data sub-set to estimate the model accuracy after training.

## Part 4 - Projecting beyond the Training Window (Bonus)

This part of the notebook is a bonus part if you've been able to finish all of the work above.

This is set out to answer the question. What happens when we project beyond our training window with our Sinusoid model

## Marking

You will get marks for completeing the different tasks within this notebook:

Any code expected for you to complete will contain `## FINISH_ME ##` indicating the code isn't expected to run until you have completed it.

I would recommend tackling the playbook in order from Part1 -> Part2 -> Part3 -> Part4.


| <p align='left'> Title                         | <p align='left'> Parts | <p align='left'> Number of marks |
| ------------------------------------- | ----- | --- |
| <p align='left'> 1. Completing the NumPY DNN model      | <p align='left'>  2  | <p align='left'> 2 |
| <p align='left'> 2. Training the NumPY DNN model & verifying by prediction | <p align='left'>  2  | <p align='left'> 1 |
| <p align='left'> 3. Construct a PyTorch DNN model       | <p align='left'>  1  | <p align='left'> 1 |
| <p align='left'> 4. Train the PyTorch DNN model & verify using evaluate    | <p align='left'>  2  | <p align='left'> 1 |
| <p align='left'> 5. Examine the MNist dataset           | <p align='left'>  1  | <p align='left'> 1 |
| <p align='left'> 6. Evaluate pre-trained model accuracy | <p align='left'>  1  | <p align='left'> 1 |
| <p align='left'> 7. Build PyTorch Classifier            | <p align='left'>  1  | <p align='left'> 1 |
| <p align='left'> 8. Train the PyTorch Classifier        | <p align='left'>  1  | <p align='left'> 1 |
| <p align='left'> 9. Estimate the PyTorch model Classifier Accuracy | <p align='left'>  1  | <p align='left'> 1 |
| <p align='left'> **Bonus 1:** Projecting both DNN models beyond the training window | <p align='left'>  2 | <p align='left'> 1 |
| <p align='left'> **Total** | | <p align='left'> max **10** |

***
***
# Part 1 - NumPy DNN
This part of the notebook walks you through building a DNN from scratch using nothing but numpy
***
***

***
## Part 1 - Imports and Globals

First we're going to import the numpy modules and pyplot modules to allow us to manipulate and plot data

In [None]:
import numpy as np
import matplotlib.pyplot as plt

In [None]:
# Reproducibility in Science is critial, in computing it's often just a convenience
_FIXED_SEED=12345
np.random.seed(_FIXED_SEED)

In [None]:
# We haven't made any attempt to optimize our simple model so let's give it plenty of 'time' to train
epochs = 100000
# We know that the model can potentially be unstable, so let's take small steps toward the 'minima'
learning_rate = 0.0001

***
## Part 1 - Build our model using numPy

You will need to complete this module which builds a short DNN, or MLP which consists of fully interconnected nodes.

Nodes are connected by the information being passed from one layer to the other so multiplying all of the nodes in the output from later 1 in layer 2 is 'connecting' them.

You will have to complete the constructor for this class as well as layer 2 within this model for the forward and backward propagation.

In [None]:
# Define the SimpleDNN_NP architecture
class SimpleDNN_NP:
    def __init__(self, hidden_size):
        # We need to have a constructor here as our model has some parameters which need to be initialized & stored

        # This example only works for input and output elements of dimension 1
        # Formally you can extend this to work with batches with elements larger than 1
        # But that is not the focus of this example
        input_size = 1
        output_size = 1

        # We want to build a simple 3 layer network
        # The first layer has input_size nodes, the second hidden_size nodes, and the third output_size nodes
        # Weights need to be constructed to connect the nodes in each layer
        # Weights should be initialized randomly, biases are initialized to zero

        # Initialize weights and biases
        self.W1 = np.random.randn(input_size, hidden_size) # Weights Layer 1
        self.b1 = np.zeros(hidden_size) # Biases Layer 1
        self.W2 = ## FINISH_ME ##
        self.b2 = ## FINISH_ME ##
        self.W3 = ## FINISH_ME ##
        self.b3 = ## FINISH_ME ##


        # These are used for tracking the model states during the Forward Pass
        self.z1 = self.z2 = self.z3 = None # pre-activation values
        self.a1 = self.a2 = self.a3 = None # activation values

        # These are used for tracking the model states during the Backward Pass
        self.gradient_W3 = self.gradient_W2 = self.gradient_W1 = None # gradients of weights
        self.gradient_b3 = self.gradient_b2 = self.gradient_b1 = None # gradients of biases

    def forward(self, x):
        # This method will be the 'evaluation' of the model

        # Forward pass through the network

        # We expect to have our data batched into dimension: ( ?, 1)
        # This means that we multiply each element within the input so:
        #     (?, 1) * (1, hidden_dim) = (?, hidden_dim) + b
        # From this we then get:
        #     Activation((?, hidden_dim)+b) * ((hidden_dim, hidden_dim)+b2) = (?, hidden_dim)
        # Finally:
        #     Acivation((?, hidden_dim)+b2) * (hidden_dim, 1) = (?, 1)

        # From our lecture we know that the forward pass is just a series of matrix multiplications and activations
        # In order to perform back-propagation we need to store the intermediate values of the forward pass
        # This means we need to store the pre-activation (z) and activation (a) values of each layer

        # Pass the input data x through the first layer of the network
        # Apply Weights and biases, Layer1
        self.z1 = np.matmul(x, self.W1) + self.b1 # pre-activation of layer1
        self.a1 = np.tanh(self.z1)  # activation function gives activated layer1 output

        # Pass the output from later 1 through the second layer of the network
        # Apply Weights and biases, Layer2

        self.z2 = ## FINISH_ME ##
        self.a2 = ## FINISH_ME ##

        # Pass the output from layer 2 through the third layer of the network
        # Apply Weights and biases, Layer3
        self.z3 = np.matmul(self.a2, self.W3) + self.b3 # pre-activation of layer3
        # No activation function for Layer3

        # 'formally' some models need to 'project' their internal state to the output
        # For us this is done in Layer3
        y_pred = self.z3
        return y_pred

    def backward(self, x, loss_prime):
        # Backward pass (gradient descent)

        # The backward pass is the 'reverse' of the forward pass
        
        # Initial gradient for the whole graph is given as the derivative of the loss function
        # aka. loss_prime

        # Going from Loss back through layer3
        # Gradients for Layer3  = dy/dz3 * dL/dy = Layer2_output * Layer3_gradient = a2 * loss_prime
        self.gradient_W3 = np.matmul(self.a2.T, loss_prime) # No activation for W3 so just calculate graph gradient here
        self.gradient_b3 = np.sum(loss_prime, axis=0)       # Calculate the bias gradient here db3 = dL/dz3 = loss_prime

        # Step from Layer3 -> layer2
        # Stepping back from Layer3 to Layer2 => Undo the effect of W3 on the gradient and the activation on Layer2 output
        gradient_a2 = np.matmul(loss_prime, self.W3.T)          # 'Undo' the effect of W3 on the gradient
        gradient_z2 = gradient_a2 * (1 - np.tanh(self.a2) ** 2) # 'Undo' the effect of Activation on Layer2 output

        # Gradients for Layer2
        # Gradients for Layer2 = a1 * gradient_z2 = a1 * dL/dz3 * dz3/da2 * da2/dz2 = a1 * loss_prime * W3 * (1 - tanh(a2)^2)

        self.gradient_W2 = ## FINISH_ME ##                   # Apply Gradient at layer2 to graph for full gradient
        self.gradient_b2 = ## FINISH_ME ##                   # Calculate bias gradient here db2 = dL/dz2 = loss_prime * W3 * (1 - tanh(a2)^2)

        # Step from Layer2 -> Layer1
        # Stepping back from Layer2 to Layer1 => Undo the effect of W2 on the gradient and the activation on Layer1 output
        gradient_a1 = ## FINISH_ME ##                        # 'Undo' the effect of W2 on the gradient
        gradient_z1 = ## FINISH_ME ##                        # 'Undo' the effecr of the Activation on Layer1 output

        # Gradients for Layer1
        # Gradients for Layer1 = x * gradient_z1 = x * dL/dz3 * dz3/da2 * da2/dz2 * dz2/da1 * da1/dz1 = x * loss_prime * W3 * (1 - tanh(a2)^2) * W2 * (1 - tanh(a1)^2)
        self.gradient_W1 = np.matmul(x.T, gradient_z1) # Apply Gradient at Layer1 onto the input data
        self.gradient_b1 = np.sum(gradient_z1, axis=0) # Calculate the bias gradient here

        # Reached the end of the graph
        # Store the gradients for the weights and biases

    def calculate_loss(self, y, y_pred):
        # Calculate the loss of the whole 'graph'

        # For simplicity
        diff = y_pred - y

        # Loss for whole graph is (y_pred-y)^2/2
        loss = diff**2 / 2.0

        # Initial gradient for whole graph
        # dL/dy = 2.0 * diff / 2.0
        loss_prime = diff # derivative of loss w.r.t. prediction

        return loss, loss_prime

    def optimize(self, learning_rate):
        # Perform the optimization step of the training loop, i.e. update weights
        # Very similar to plain Stochastic Gradient Descent (SGD)
        
        # Update weights and biases using gradients of each component and LR
        self.W1 -= learning_rate * self.gradient_W1
        self.b1 -= learning_rate * self.gradient_b1

        self.W2 -= ## FINISH_ME ##
        self.b2 -= ## FINISH_ME ##

        self.W3 -= learning_rate * self.gradient_W3
        self.b3 -= learning_rate * self.gradient_b3

        # For completeness, but shouldn't matter
        self.a1 = self.a2 = self.a3 = None
        self.z1 = self.z2 = self.z3 = None
        self.gradient_W3 = self.gradient_W2 = self.gradient_W1 = None
        self.gradient_b3 = self.gradient_b2 = self.gradient_b1 = None  
    
    def train(self, x, y, epochs, learning_rate):

        # History to store the evolution of the loss function vs epoch
        loss_history = []

        # Loop through x Epochs
        for epoch in range(epochs):

            ## Formally there should be some batching of data that is done here
            ## This example explicitly evaluates the whole dataset in each forward/backward pass
            ## For training on a simple sinusouid this is OK
            ## For training on 'real data' this approach will kill your performance
            
            # Take a forward step through our model
            y_pred = self.forward(x)

            loss, loss_prime = self.calculate_loss(y, y_pred)

            # Now take a backward step through our model
            self.backward(x, loss_prime)

            # Now update our weights based on the gradients at each point in the graph
            self.optimize(learning_rate)

            avg_loss = ## FINISH_ME ##
            # Some code to give output during training 
            if epoch%5000 == 0:
                print(f"epoch: {epoch}, loss: {avg_loss}")

            loss_history.append(avg_loss)

        return loss_history

***
## Part 1 - Construct our input dataset

In [None]:
# Generate sinusoidal data
timesteps = 100  # number of timesteps in the data

# It's up to you to populate x with an ndarray of 'timesteps' linearly spaced samples between 0 and 2*np.pi using the np.linspace function
x = np.linspace( ## FINISH_ME ## )
print(f"x type: {type(x)}")
print(f"x shape: {x.shape}")
# Now we want to fill y with the sin of the above parameters giving us 1 full sinusoid waveform
# If you've constructed x correctly you can just use np.sin(x)
y = ## FINISH_ME ##
print(f"y type: {type(y)}")
print(f"y shape: {y.shape}")

In [None]:
# Reshape x and y for training (our model is explicitly designed to take inputs of (1,) in shape and make an output the same)
x_train = x.reshape(-1, 1)
y_train = y.reshape(-1, 1)

***
## Part 1 - Construct Our Model

In [None]:
# Training the DNN
hidden_size = 10  # number of neurons in the hidden layers which we will pass to our model

# Initialize and train the model
part1_model = SimpleDNN_NP(hidden_size)

***
## Part 1 - Make prediction using our un-trained model

This step is important for 2 reasons:

1) It shows that the forward part of our model and the constructor appear consistent and run correctly. (This reduces the possible code errors in training)
2) It allows us to visualize our dataset and compare what the un-trained model evaluates to

In [None]:
# Predictions before training this is achieved by calling model.forward(data)
y_pred_before = part1_model.forward( ## FINISH_ME ## )

In [None]:
# Plotting the results before training
# We can use plt.scatter to construct a scatter plot of x,y coordinates with a label
plt.scatter(x, y, label='Original')
plt.scatter( ## FINISH_ME ## , label='Model Prediction (Before Training)')

# Define some important labels that make the graph mean something
plt.title("DNN Predictions vs Original Data (Before Training)")
plt.xlabel("x")
plt.ylabel("y")

# Lets add a Legend and plot our graph
plt.legend()
plt.show()

***
## Part 1 - Now lets train our model

In [None]:
# Our model is trained by calling model.train( ... )

# The parameters we need to pass to this model are:
#    input_data, input_labels, how-long-to-train, learning-rate
#    x_train,    y_train,      epochs,            learning-rate

history = part1_model.train( ## FINISH_ME ## )

***
## Part 1 - Let's plot the loss function through this training 

In [None]:
# Now use matplotlib to plot the loss function vs epoch
# This can be achieved simply by using the plt.plot( ... ) method which takes a list of values to plot
# We can do this because the number of epochs is just an interating list so need to construct this for plotting a scatter plot

plt.plot(history, label='Average Model Loss')

# Add labels, legend, make log and plot
plt.title("DNN Model Loss vs Epoch")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.legend()
plt.yscale('log')
plt.show()

***
## Part 1 - Make prediction after model has been trained

In [None]:
# Predictions after training
# As before lets make a prediction but with our trained model
y_pred_after = part1_model.forward(x_train)

In [None]:
# Plotting the results before training
plt.scatter(x, y, label='Original')
plt.scatter( ## FINISH_ME ##, label='Model Prediction (After Training)')
plt.title("DNN Predictions vs Original Data (After Training)")
plt.xlabel("x")
plt.ylabel("y")
plt.legend()
plt.show()

***
***

***
***
# Part 2 - PyTorch DNN
This part of the notebook walks through building a DNN using the PyTorch API
***
***

***
## Part 2 - Imports and Globals

Here we want to make sure we have the relavent parts of the PyTorch framework loaded for us to use later on

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim

In [None]:
#PyTorch Supports 'accelerator' devices such as CUDA on Linux and MPS on MacOS

# If you're lucky enough to have access to this, lets take advantage of it

# Select the best available device
if torch.cuda.is_available():
    device = torch.device("cuda")  # NVIDIA GPU
elif torch.backends.mps.is_available():
    device = torch.device("mps")   # Apple Metal (MPS)
else:
    device = torch.device("cpu")   # CPU fallback

# Report the device that we're using
print(f"Using device: {device}")

In [None]:
# This is a connection of globals needed to make everything re-producible

torch.manual_seed(_FIXED_SEED)  # PyTorch CPU

# Ensure reproducibility on Metal (MPS)
if torch.backends.mps.is_available():
    torch.mps.manual_seed(_FIXED_SEED)  # Fix seed for MPS backend

if torch.cuda.is_available():
    torch.cuda.manual_seed(_FIXED_SEED)  # PyTorch GPU (if used)
    torch.cuda.manual_seed_all(_FIXED_SEED)  # If using multi-GPU

    # Ensure deterministic behavior in CUDA operations (if available)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False  # Disable auto-tuner for determinism

***
## Part 2 - Convert the NumPy dataset to use with PyTorch

PyTorch uses objects called "Tensors" for passing around data.

PyTorch also expects these Tensor objects to be sent to a device if the data needs to be there.

e.g. before a model can run on a GPU we need to send the model and the data to the GPU

In [None]:
# Convert to PyTorch tensors
# We're converting the shape here because PyTorch will helpfully reduce the extra 1-dim from our dataset but we want it
#
# Why do you think we're using torch floats here?
#
X_tensor = torch.tensor(x_train, dtype=torch.float32).reshape(-1,1).to(device)
Y_tensor = torch.tensor(y_train, dtype=torch.float32).reshape(-1,1).to(device)

***
## Part 2 - Construct a simple DNN in PyTorch

In [None]:
# Define a simple DNN model
class SimpleDNN(nn.Module):

    def __init__(self, hidden_size):
        # This is our model's constructor

        # This model is inheriting from existing classes in PyTorch
        # This is needed for inheritance to work properly
        super(SimpleDNN, self).__init__()

        # The object self.model will contain the important part of the model in PyTorch

        ## The nn.Sequantial model allows us to simply pass a list of layers that we want PyTorch to construct
        ## Every single layer in a DNN(MLP) in a nn.Linear class in PyTorch which needs to know it's input and output dim
        ## Between each layer (but not at the output!) we need to add an activation
        ## As above we're going to use the Tanh function which is accessed via nn.Tanh()

        ## We want a 3 layer DNN which has an input dim of 1, ("hidden_size", "hidden_size") middle layer
        ## and an output dim of 1 
        self.model = nn.Sequential(
            nn.Linear(1, hidden_size),
            nn.Tanh(),
            ## FINISH_ME ##
            ## FINISH_ME ##
            nn.Linear(hidden_size, 1)
        )

    ## We also want a forward pass method to know how to evaluate our model
    def forward(self, x):
        # This simply calls the internal self.model object
        return self.model(x)

    ## We don't need to explicitly define a backwards method, we get that free from PyTorch :)

In [None]:
# Construct our model and pass it to any accelerator we have access to
part2_model = SimpleDNN(hidden_size).to(device)

***
##  Part 2 - Make predictions using our pre-trained model

In [None]:
# Evaluate model

## Make sure the model is in evaluate mode
part2_model.eval()

## Take a prediction using
prediction_Tensor = part2_model(X_tensor)

print(f"Prediction type: {type(prediction_Tensor)}")
print(f"Input Data Shape: {X_tensor.shape}")
print(f"Output Data Shape: {prediction_Tensor.shape}")

## Make sure that our prediction has been copied back to the CPU
## Then convert it to numpy so we can use it elsewhwre
predictions = prediction_Tensor.cpu().detach().numpy()

In [None]:
# Plot results
## Construct the canvas
plt.figure(figsize=(8, 5))

## As with the NumPY model use plt.scatter
plt.scatter(x_train, y_train, label='True')

## We can also plt.plot to plot "scatter-like" data
plt.plot( ## FINISH_ME ##, label='Predicted', color='red')

## Make the graph so we can see it
plt.legend()
plt.show()

***
## Part 2 - Construct some other objects needed to work with the PyTorch API

In [None]:
# Initialize model, loss function, and optimizer

## We need some way of defining the loss function
## For our example we will use the nn.L1Loss class as our criterion
## This class gives |y-y_pred| which is good for our example
criterion = nn.L1Loss()

## We also want to use a built-in optimizer for PyTorch
## There are many different possible optimizers, but we want to use the optim.SGD class
## The optimizer needs to know 2 things:
##   what it's optimizing  - part2_model.parameters()
##   how fast to train     - learning-rate (lr)
optimizer = optim.SGD(part2_model.parameters(), lr=0.001)

## As with training our NumPy class we want to keep track of our Loss function values
losses = []

***
## Part 2 - Train our PyTorch model

In [None]:
# Training loop
epochs = 75000

## Iterate through all of the epochs
for epoch in range(epochs):

    ## If we were using batches we would loop through batches here
    ## Put the model into 'training' mode
    part2_model.train()

    ## Re-Set the optimizer
    optimizer.zero_grad()

    ## Evaluate a Forward Pass of our model
    outputs = part2_model(X_tensor)

    ## Calculate our loss based on the criterion
    ## (Outputs - Truth) with the criterion function
    loss =  ## FINISH_ME ##

    ## Evaluate the Backward Pass of our model
    loss.backward()
    ## There we go, nothing else needed :)

    ## Now use the Optimizer to tune our model based on our recent loss
    optimizer.step()


    losses.append(loss.item())  # Store loss for plotting

    ## Report how far through the training we are
    if epoch % 5000 == 0:
        print(f"Epoch [{epoch}/{epochs}], Loss: {loss.item():.4f}")

***
## Part 2 - Evaluate our model and plot the results

In [None]:
# Evaluate model
# As with above put the model in evaluate mode
part2_model.eval()

## As with above we want to evaluate our model using our dataset
## Then we need to pass it back to the CPU then NumPY
predictions = part2_model(X_tensor)
predictions = predictions.cpu().detach().numpy()

In [None]:
# Plot results
## Construct the Canvas
plt.figure(figsize=(8, 5))

## Plot the input data and the Model 'Prediction' (Output)
plt.scatter(x_train, y_train, label='True', alpha=0.6)
plt.scatter( ## FINISH_ME ##, label='Predicted', color='red')

## Plot the Graph
plt.legend()
plt.show()

***
## Part 2 - Lets examine our loss

In [None]:
# Plot results
## Construct the Canvas
plt.figure(figsize=(8, 5))

## Plot the losses collected during training
plt.plot(losses, label="Loss vs. Epoch", color="blue")

## Add labels and Plot
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.title("Training Loss Over Epochs")
plt.legend()
plt.yscale('log')
plt.show()

***
***
# Part 3 - Building a Classifier
This part of the notebook walks through building and training a Classifier using the NumPy API
***
***

***
## Part 3 - Load the modules needed to Build, Train and examine a Classifier

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
import numpy as np
import matplotlib.pyplot as plt

***
## Part 3 - Set some Globals to make everything reproducible

In [None]:
# Set manual seed for reproducibility
# This is a connection of globals needed to make everything re-producible

torch.manual_seed(_FIXED_SEED)  # PyTorch CPU

# Ensure reproducibility on Metal (MPS)
if torch.backends.mps.is_available():
    torch.mps.manual_seed(_FIXED_SEED)  # Fix seed for MPS backend

if torch.cuda.is_available():
    torch.cuda.manual_seed(_FIXED_SEED)  # PyTorch GPU (if used)
    torch.cuda.manual_seed_all(_FIXED_SEED)  # If using multi-GPU

    # Ensure deterministic behavior in CUDA operations (if available)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False  # Disable auto-tuner for determinism

np.random.seed(_FIXED_SEED)

In [None]:
# As in Part 2 if you have a supported 'accelerator' it's nice to use it

# Select the best available device
if torch.cuda.is_available():
    device = torch.device("cuda")  # NVIDIA GPU
elif torch.backends.mps.is_available():
    device = torch.device("mps")   # Apple Metal (MPS)
else:
    device = torch.device("cpu")   # CPU fallback

print(f"Using device: {device}")

In [None]:
epochs = 10

***
## Part 3 - Loading the Dataset so that we can use it to build a Classifier

In [None]:
# Load MNIST dataset and split into training and verification sets
def load_mnist_as_numpy(train=True):
    dataset = torchvision.datasets.MNIST(root="./data", train=train, download=True)

    # Convert images & labels to NumPy arrays
    images = np.array([np.array(img, dtype=np.float32) for img, _ in dataset])
    labels = np.array([label for _, label in dataset], dtype=np.int64)

    # Normalize manually: Convert [0, 255] → [-1, 1]
    images = (images / 127.5) - 1.0

    # Reshape to (N, 28, 28) for experimenting
    images = images.reshape(-1, 28, 28)

    # Convert labels to one-hot encoding (equivalent to looking at `categorical`)
    labels = np.eye(10)[labels]

    return images, labels  # For test set (no split)

### First Load the dataset

This returns a numpy object which you can use to examine the data

In [None]:
# Load datasets
X_train, Y_train = load_mnist_as_numpy(train=True)
X_test, Y_test = load_mnist_as_numpy(train=False)

***
## Part 3 - Examine the dataset

What is the size and shape of the data we're working with?

In [None]:
print(f"X shape: {X_train.shape}")
print(f"Y shape: {Y_train.shape}")

In [None]:
fig = plt.figure()
fig.set_tight_layout(True)
## Plot a single image from the dataset
plt.imshow(## FINISH_ME ##, cmap='gray')
fig.show()

### Now we need to convert this to Tensors for PyTorch

In [None]:
# Convert NumPy arrays to PyTorch tensors
## Use the torch.from_numpy to construct Tensors from Numpy Arrarys
X_train, Y_train = torch.from_numpy(X_train).to(device), torch.from_numpy(Y_train).float().to(device)
X_test, Y_test = torch.from_numpy(X_test).to(device), torch.from_numpy(Y_test).float().to(device)

# Create DataLoaders
## Our dataset is constructed from the Data and Labels
## We are chosing to use batches of 64 images in size with this model
train_dataset = torch.utils.data.DataLoader(list(zip(X_train, Y_train)), batch_size=64, shuffle=True)
test_dataset = torch.utils.data.DataLoader(list(zip(X_test, Y_test)), batch_size=64, shuffle=False)

***
## Part 3 - Build a Classifier DNN

In [None]:
# Define the model using PyTorch's `Sequential`
## We just want to use the nn.Sequential directly here no wrapper classes
part3_model = nn.Sequential(
    nn.Flatten(),                # Used to make sure the data from each batch is a flat numerical array
    nn.Linear(28 * 28, 128),     # Fully connected layer input -> 128 dim
    nn.ReLU(),                   # Activation   (I like ReLU)
    ## FINISH_ME ##              # Hidden layer (128 -> 64 dim)
    ## FINISH_ME ##              # Activation   (I like ReLU)
    nn.Linear(64, 10),           # Output layer (64-dim -> logits output)
    nn.Sigmoid()                 # Final output
)
# Why do you think we're using a Sigmoid activator on this model but not for the earlier one?

# Move model to device
part3_model = part3_model.to(device)

***
## Part 3 - Make Some predictions before Training

In [None]:
# In order to pass a single entry to our model for evaluation/prediction it needs to be in a (1, X) rather than just passing 1 element
# This is the same as 'passing a batch of 1' to the model
prediction = part3_model(X_train[0].view(1, -1))
print(f"Prediction Type: {type(prediction)}")

truth = Y_train[0].cpu().detach().numpy()
prediction = prediction.cpu().detach().numpy()


In [None]:
# plotting the predictions
fig = plt.figure()
x=[_ for _ in range(len(prediction[0]))]
plt.bar(x, prediction[0])
plt.title('Prediction Distribution')
plt.yscale('log')
plt.ylabel('Probability')
plt.xlabel('Number')

In [None]:
print(f"Truth: {np.argmax( ## FINISH_ME ##
print(f"Prediction: {np.argmax( ## FINISH_ME ##

***
## Part 3 - Define some objects needed for training

In [None]:
# Training parameters
epochs = 10

# We want to track losses and the accuracy of our model during training
train_losses = []
train_accuracies = []

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()  # No need for one-hot encoding in loss function
optimizer = optim.SGD(part3_model.parameters(), lr=0.001, momentum=0.9)

***
## Part 3 - Now train our classifier

In [None]:
# Training loop
## Loop through n-epochs
for epoch in range(epochs):

    ## Reset some counters we're going to use
    total_loss, correct, total = 0, 0, 0

    ## Looping through all batches in our dataset here
    for images, labels in train_dataset:

        ## Put our model into training mode
        part3_model.train()

        ## Make sure that the data we're interested in evaluating is on the correct device
        images, labels = images.to(device), labels.to(device)

        ## Reset our Optimizer
        optimizer.zero_grad()

        ## Evaluate this batch of images with our model
        outputs = part3_model(images)

        ## Our model outputs an array of values, lets compare this to truth to get our loss
        loss = criterion(outputs, labels)

        ## Evaluate the back-propagation of our model
        loss.backward()

        ## Optimize our model based on evaluating this batch
        optimizer.step()

        
        ## We can put the model back into non-training mode here 
        part3_model.eval()
        with torch.no_grad():

            ## Add the loss from batch to the total loss from this epoch
            ## .item() here returns the raw values no need to move off GPU
            total_loss += loss.item()

            ## Calculate how many times the model evaluated correctly
            ## .item() here returns the raw values no need to move off GPU
            correct += (outputs.argmax(dim=1) == labels.argmax(dim=1)).sum().item()

            ## What was the size of this batch? i.e. how many datapoints processed?
            total += len(labels)

    # Back into non-training mode
    part3_model.eval()
    with torch.no_grad():

        ## Calculate the average loss of the dataset over the whole epoch
        avg_train_loss = ## FINISH_ME ##
        ## Store the average loss per epoch
        train_losses.append(avg_train_loss)
        ## Calculate the average accuracy per epoch
        train_accuracy = ## FINISH_ME ##
        ## Store the average accuracy per epoch
        train_accuracies.append(train_accuracy)

    ## Report the average loss and Accuracy per epoch during fitting
    print(f"Epoch [{epoch+1}/{epochs}], Train Loss: {avg_train_loss:.4f}, Train Acc: {train_accuracy:.4f}")


***
## Part 3 - Lets examine our taining history

In [None]:
# Plot loss and accuracy
## Make a convas
plt.figure(figsize=(12, 5))

# Plot Training & Validation Loss
## Mak
plt.subplot(1, 2, 1)
plt.plot( ## FINISH_ME ##, label="Train Loss")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.title("Training Loss Over Epochs")
plt.legend()

# Plot Training & Validation Accuracy
plt.subplot(1, 2, 2)
plt.plot( ## FINISH_ME ##, label="Train Accuracy")
plt.xlabel("Epochs")
plt.ylabel("Accuracy")
plt.title("Training Accuracy Over Epochs")
plt.legend()

plt.show()

***
## Part 3 - Estimate model accuracy

Now use the test dataset to make an estimate as to how accurate the model is

In [None]:
correct = 0
test_prediction = part3_model(X_test.to(device))
print(test_prediction.shape)
print(Y_test.shape)

In [None]:
correct += (test_prediction.argmax(dim=1) == Y_test.argmax(dim=1)).sum()
print(f"Test Accuracy = {correct/len(test_prediction)*100:.4}%")

***
***
# Part 4 - Projecting a DNN beyond the training window
***
***

***
## Part 4 - Now generate a new dataset

This dataset needs to be 5x as long with 5x as much data and containing 5 waveforms compared to Part1 and Part2

In [None]:
# Generate sinusoidal data
timesteps = ## FINISH_ME ##
x = ## FINISH_ME ##
y = ## FINISH_ME ##
# Reshape x and y for training (our model is explicitly designed to take inputs of (1,) in shape and make an output the same)
x_beyond = x.reshape(-1, 1)
y_beyond = y.reshape(-1, 1)

***
## Part 4 - Now evaluate our NumPY DNN and plot what we see

In [None]:
# Evaluate model
predictions = part1_model.forward(x_beyond)

In [None]:
# Plot results
plt.figure(figsize=(8, 5))
plt.scatter(x_beyond, y_beyond, label='True', alpha=0.6)
plt.scatter( ## FINISH_ME ## , label='Predicted', color='red')
plt.legend()
plt.show()

In [None]:
X_tensor = torch.tensor(x_beyond, dtype=torch.float32).reshape(-1,1).to(device)

***
## Part 4 - Now evaluate our PyTorch DNN and plot the result

In [None]:
# Evaluate model
part2_model.eval()
predictions = part2_model(X_tensor).cpu().detach().numpy()

In [None]:
# Plot results
plt.figure(figsize=(8, 5))
plt.scatter(x_beyond, y_beyond, label='True', alpha=0.6)
plt.scatter( ## FINISH_ME ##, label='Predicted', color='red')
plt.legend()
plt.show()

***
## Part 4 - Do the DNN/MLP models extend as you expected it to?

What do you see and can you think why this is the case?

### Answer

`## FINISH_ME ##`