# Neural Numpy
This notebook will go through using the neural numpy module we have developed, found in ../src/neural_numpy. To start off, this project is managed using the awesome package manager [uv](https://docs.astral.sh/uv/). Firstly, ensure that you have installed uv and run `uv sync` to create the venv. After this, select the venv as your kernel in the jupyter notebook. This should now allow you to start importing the libraries needed, as well as the neural-numpy module.

## The network config
The network has a top level classed creatively called `network`. A network consists of layers, either dense or activation layers, and it handles orchestrating the layers, such as when running forward, backpropagating or training the network. This network can be built manually, or alternatively using the builder class, which is the preferred method. The builder class takes in a config, and builds a network using the parameters given in the config. For the purpose of this project we have mostly used wandb for logging, and therefore a wandb config dictionary can be passed to the builder (However, a method for normal python parameters also exists) Let's start by importing some of the libraries needed to run the code:

In [None]:
import wandb
from neural_numpy.builder import NetworkBuilder
from neural_numpy.loss import CategoricalCrossEntropy
from neural_numpy.optimizer import ADAM, SGD
from data import DataLoader

## Hyperparameter sweeps
As a starting point we needed to figure out what hyperparameters would be optimal for out network. An obvious way to do this was using wandb hyperparameter sweeps. As a starting point we define a sweep configuration, which was used to determine the optimal hyperparameters. Bayesian optimization was used to speed up the tuning process as opposed to for example using random sweeps. We chose a value of 50 epochs, which was enough to determine what models yielded good results without taking too long to train. We tested a variety of batch size and number of hidden units, which we chose to cap at 1024, as the network took an extreme amount of time to train when going up to for instance 2048

In [None]:
sweep_configuration = {
    "method": "bayes",
    "metric": {
        "name": "val_acc",
        "goal": "maximize",
    },
    "parameters": {
        "epochs": {"value": 50},  # Fixed value
        "batch_size": {"values": [32, 64, 128]},
        "learning_rate": {
            "max": 0.01,
            "min": 0.0001,
            "distribution": "log_uniform_values",
        },
        "hidden_layers": {"values": [2, 3, 4, 5]},
        "hidden_units": {"values": [64, 128, 256, 512, 1024]},
        "activation": {"values": ["ReLU", "Tanh", "Sigmoid"]},
        "weight_initializer": {"values": ["He", "Xavier"]},
        "optimizer": {"values": ["adam", "sgd"]},
        "weight_decay": {"values": [0.0, 1e-3, 1e-4]},
    },
}


After this we imported a dataset, in this case CIFAR10, and split the traing data into a training and validation set. Note that to get out mnist results, `Dataloader.load_cifar10` can simply be replaced with `Dataloader.load_mnist`. 

In [None]:
# Import data
X_train, y_train, X_test, y_test = DataLoader.load_cifar10(
    normalize=True, flatten=True, one_hot=True
)

# Split data
val_split = 0.2
split_idx = int(X_train.shape[0] * (1 - val_split))
X_val = X_train[split_idx:]
y_val = y_train[split_idx:]
X_train_split = X_train[:split_idx]
y_train_split = y_train[:split_idx]

# Determine input dimensions
input_dim = X_train.shape[1]
num_classes = y_train.shape[1]

We define this training data once so that we don't have to import it at every training run. This leads us to the actual training function, which runs for every training run in the sweep:


In [None]:
def train_sweep():
    #Initialize Wandb
    with wandb.init() as run:
        # Use the config from the sweep
        config = wandb.config

        # Initialize builder
        builder = NetworkBuilder()

        # Build network from the wandb config
        network = builder.build_from_wandb(
            input_size=input_dim, output_size=num_classes, config=config
        )
        
        # As the optimizer is passed to the training run, this has to be defined externally
        if config.optimizer.lower() == "adam":
            optimizer = ADAM(
                learning_rate=config.learning_rate,
                weight_decay=getattr(config, "weight_decay", 0.0),
            )
        elif config.optimizer.lower() == "sgd":
            optimizer = SGD(
                learning_rate=config.learning_rate,
                weight_decay=getattr(config, "weight_decay", 0.0),
            )
        # As does the loss function
        loss_fn = CategoricalCrossEntropy()

        #Finally, pass the parameters to the training run
        network.train(
            X=X_train_split,
            y=y_train_split,
            X_val=X_val,
            y_val=y_val,
            loss_function=loss_fn,
            epochs=config.epochs,
            optimizer=optimizer,
            batch_size=config.batch_size,
        )

Now we can finally run the sweep, which we set to run overnight on one of our computers with a powerful CPU (No GPU-acceleration sadly). Warning that if you actualy run this it will take quite a while

In [None]:
# Login to WandB
wandb.login()

# Initialize the sweep with the config and a project name
sweep_id = wandb.sweep(sweep=sweep_configuration, project="sweepalicious")

# Start the sweep agent with the train_sweep as the callback. This is the one we ran overnight
wandb.agent(sweep_id, function=train_sweep, count=70)

## Training and measuring the network
Now that we had found some promising hyperparameters using the hyperparameter sweep, we went on to test out a candidate. This was done quite similairly to the last run, however this time with a specific config. Like before, we started out importing the data: 

In [None]:
import numpy as np
from typing import List
from rich import box, print
from rich.table import Table

import wandb
from data import DataLoader
from neural_numpy.builder import NetworkBuilder
from neural_numpy.confusion_matrix import confusion_matrix
from neural_numpy.loss import CategoricalCrossEntropy
from neural_numpy.optimizer import ADAM, SGD

X_train, y_train, X_test, y_test = DataLoader.load_cifar10( # Can be replaced by mnist
    normalize=True, flatten=True, one_hot=True
)
print("[bold green]Data Loaded:[/bold green] CIFAR-10 Dataset")

val_split = 0.2
split_idx = int(X_train.shape[0] * (1 - val_split))

# Validation data
X_val = X_train[split_idx:]
y_val = y_train[split_idx:]

# Training data
X = X_train[:split_idx]
y = y_train[:split_idx]

print(f"[bold green]Training Set:[/bold green] {X.shape[0]} samples")
print(f"[bold green]Validation Set:[/bold green] {X_val.shape[0]} samples")

After this we define the config from one of the promising sweeps. Note that for the mnist set this config was used:
```
config={
    "epochs": 100,
    "learning_rate": 0.000992898461755694,
    "batch_size": 32,
    # Architecture
    "hidden_layers": 4,
    "hidden_units": 512,
    "weight_decay": 0.0001,
    "activation": "ReLU",
    "optimizer": "sgd",
    "weight_initializer": "He",
},
```

In [None]:
wandb.login()
run = wandb.init(
    project="azure",
    config={
        "epochs": 4,
        "learning_rate": 0.00909273148274152,
        "batch_size": 64,
        "hidden_layers": 3,
        "hidden_units": 512,
        "weight_decay": 0.001,
        "activation": "ReLU",
        "optimizer": "sgd",
        "weight_initializer": "Xavier",
    },
)
config = wandb.config

We build the network with the config:

In [None]:
input_dim = X.shape[1]
num_classes = y.shape[1]

builder = NetworkBuilder()
network = builder.build_from_wandb(
    input_size=input_dim, output_size=num_classes, config=config
)

if config.optimizer.lower() == "adam":
    optimizer = ADAM(
        learning_rate=config.learning_rate,
        weight_decay=getattr(config, "weight_decay", 0.0),
    )
elif config.optimizer.lower() == "sgd":
    optimizer = SGD(
        learning_rate=config.learning_rate,
        weight_decay=getattr(config, "weight_decay", 0.0),
    )
loss_fn = CategoricalCrossEntropy()

We now train the network with our config from before (This might take a while) (We mostly use the terminal, but this should also print interactively in the jupyter terminal!)

In [None]:
network.train(
    X=X_train,
    y=y_train,
    X_val=X_val,
    y_val=y_val,
    loss_function=loss_fn,
    epochs=config.epochs,
    optimizer=optimizer,
    batch_size=config.batch_size,
)

We can now finally determine the accuracy of the neural network on the test data, and print out the confusion matrix

In [None]:
# Measure network predictions against actual values 
y_true = np.argmax(y_test, axis=1)
val_pred = network.forward(X_test)
y_pred = np.argmax(val_pred, axis=1)
val_acc_train = np.mean(np.argmax(val_pred, axis=1) == np.argmax(y_test, axis=1))
print(f"[bold green]Test Accuracy: {val_acc_train:.1%}")

# Print class names of confusion matrix - this could probably be done in a better way hehe
cm = confusion_matrix(y_true, y_pred)
class_names = [
    "airplane",
    "automobile",
    "bird",
    "cat",
    "deer",
    "dog",
    "frog",
    "horse",
    "ship",
    "truck",
]
num_classes = cm.shape[0]

# Make a rich table
table = Table(title="Confusion Matrix", box=box.ROUNDED, show_lines=True)
table.add_column("True\\Pred", style="dim", width=12)

# Construct table 
for i in range(num_classes):
    table.add_column(class_names[i], justify="right")

for i in range(num_classes):
    row_data: List = []
    row_data.append(class_names[i])
    for j, val in enumerate(cm[i]):
        if j == i:
            row_data.append("[bold green]" + str(int(val)))
        else:
            row_data.append(str(int(val)))
    table.add_row(*row_data)
print(table)

# Finish run :)
run.finish()