<a href="https://colab.research.google.com/github/AugusGuarna/Feedforward_NN/blob/main/NN_multiclass_pred.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Neural Network for Multiclass Prediction

This notebook implements a **feedforward neural network** for multiclass classification using the **Iris dataset** and **PyTorch**.

---

## 1. Architecture

We define the network layout as follows:

| Layer | Description |
|-------|-------------|
| **Input** | Vector in $\mathbb{R}^{4}$ (4 features per sample) |
| **Hidden** | 3 layers with **6**, **8**, and **10** neurons respectively |
| **Output** | Vector in $\mathbb{R}^{3}$ (3 classes) |
| **Activations** | **ReLU** in hidden layers, **Softmax** at the output (via cross-entropy) |

The input size matches the 4 Iris attributes (e.g. sepal length, width; petal length, width), and the output size matches the 3 species: `setosa`, `versicolor`, and `virginica`.

In [113]:
# --- Imports ---
import torch
from torch import nn
from torch.utils.data import Dataset, TensorDataset, DataLoader, random_split
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.datasets import load_iris
import numpy as np
import matplotlib.pyplot as plt
import torch.nn.functional as F

In [114]:
# --- Load Iris and build DataLoaders ---
dataset = load_iris()

X = dataset["data"]
y = dataset["target"]
y = LabelEncoder().fit_transform(y)

full_dataset = TensorDataset(torch.from_numpy(X).float(), torch.from_numpy(y))

# 90% train / 10% test (small dataset)
train_size = int(0.9 * len(full_dataset))
test_size = len(full_dataset) - train_size
train_dataset, test_dataset = random_split(full_dataset, [train_size, test_size])

batch_size = 5
train_loader = DataLoader(
    dataset=train_dataset,
    batch_size=batch_size,
    shuffle=True,
    num_workers=0,
)
test_loader = DataLoader(
    dataset=test_dataset,
    batch_size=batch_size,
    shuffle=False,
    num_workers=0,
)


In [115]:
class FeedForwardNeuralNetwork(nn.Module):
    """Feedforward NN with configurable layers; ReLU in hidden layers, linear output (softmax in loss)."""

    def __init__(self, input_size, output_size, number_layers, size_layers):
        super().__init__()
        self.weights = nn.ParameterList()
        self.bias = nn.ParameterList()
        self.number_layers = number_layers
        self.size_layers = size_layers
        for i in range(number_layers + 1):
            if i == 0:
                self.weights.append(nn.Parameter(torch.randn(size=(input_size, size_layers[i]),
                                                             dtype=torch.float, requires_grad=True)))
                self.bias.append(nn.Parameter(torch.randn(size_layers[i]), requires_grad=True))
            elif i > 0 and i < number_layers:
                self.weights.append(nn.Parameter(torch.randn(size=(size_layers[i - 1], size_layers[i]),
                                                             dtype=torch.float, requires_grad=True)))
                self.bias.append(nn.Parameter(torch.randn(size_layers[i]), requires_grad=True))
            else:
                self.weights.append(nn.Parameter(torch.randn(size=(size_layers[i - 1], output_size),
                                                             dtype=torch.float, requires_grad=True)))
                self.bias.append(nn.Parameter(torch.randn(output_size), requires_grad=True))

    def forward(self, data):
        Z = data
        for i in range(len(self.weights)):
            Z = torch.matmul(Z, self.weights[i]) + self.bias[i]
            if i < len(self.weights) - 1:
                Z = F.relu(Z)
        return Z

In [116]:
# --- Instantiate model: 4 inputs, 3 outputs, 3 hidden layers (6, 8, 10) ---
torch.manual_seed(42)
model0 = FeedForwardNeuralNetwork(4, 3, 3, [6, 8, 10])


In [117]:
# Sanity check: inspect initialized parameters
model0.state_dict()

OrderedDict([('weights.0',
              tensor([[ 1.9269,  1.4873,  0.9007, -2.1055,  0.6784, -1.2345],
                      [-0.0431, -1.6047,  0.3559, -0.6866, -0.4934,  0.2415],
                      [-1.1109,  0.0915, -2.3169, -0.2168, -0.3097, -0.3957],
                      [ 0.8034, -0.6216, -0.5920, -0.0631, -0.8286,  0.3309]])),
             ('weights.1',
              tensor([[-0.4880,  1.1914, -0.8140, -0.7360, -0.8371, -0.9224,  1.8113,  0.1606],
                      [-0.0978,  1.8446, -1.1845,  1.3835, -1.2024,  0.7078, -1.0759,  0.5357],
                      [ 0.3466, -0.1973, -1.0546,  1.2780,  0.1453,  0.2311,  0.0087, -0.1423],
                      [ 0.5750, -0.6417, -2.2064, -0.7508,  2.8140,  0.3598, -0.0898,  0.4584],
                      [ 0.5362,  0.5246,  1.1412,  0.0516,  0.7281, -0.7106, -0.6021,  0.9604],
                      [-1.7223, -0.8278,  1.3347,  0.4835, -0.1976,  1.2683,  1.2243,  0.0981]])),
             ('weights.2',
              tensor([[ 0

---

## 2. Training and Testing

We train the network and evaluate it on the held-out test set.

**Setup:**

- **Loss:** Cross-entropy (includes softmax).
- **Optimizer:** `torch.optim.SGD` with learning rate `0.01`.
- **Epochs:** 200.
- **Metrics:** Train/test loss and accuracy (logged every 10 epochs).

In [118]:
# Loss (cross-entropy applies softmax internally) and optimizer
loss_f = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(params=model0.parameters(), lr=0.01)

In [127]:
# --- Training loop (200 epochs) ---
torch.manual_seed(42)

epochs = 200
train_loss_values = []
test_loss_values = []
epoch_count = []

for epoch in range(epochs):
    # Training
    model0.train()
    train_loss = 0
    train_acc = 0
    train_total = 0

    # Loop through training batches
    for X_batch, y_batch in train_loader:
        # 1. Forward pass
        y_pred = model0(X_batch)

        # 2. Calculate loss
        loss = loss_f(y_pred, y_batch)

        # 3. Zero gradients
        optimizer.zero_grad()

        # 4. Backpropagation
        loss.backward()

        # 5. Update parameters
        optimizer.step()

        # Accumulate batch loss
        train_loss += loss.item()

        # Calculate the softmax over the predictions in order to compute accuracy
        predicted = y_pred.argmax(dim=1)
        train_acc += (predicted == y_batch).sum().item()
        train_total += y_batch.size(0)

    # Calculate average training loss for the epoch
    train_loss = train_loss / len(train_loader)
    train_acc /=  train_total

    # Evaluation on test set
    model0.eval()
    test_loss = 0
    test_acc = 0
    test_total = 0
    with torch.inference_mode():
        # Loop through test batches
        for X_batch, y_batch in test_loader:
            # 1. Forward pass on test data
            test_pred = model0(X_batch)

            # 2. Calculate test loss
            batch_test_loss = loss_f(test_pred, y_batch)

            # Accumulate batch test loss
            test_loss += batch_test_loss.item()

            # Calculate the softmax over the predictions in order to compute accuracy
            predicted = test_pred.argmax(dim=1)
            test_acc += (predicted == y_batch).sum().item()
            test_total += y_batch.size(0)

        # Calculate average test loss for the epoch
        test_loss = test_loss / len(test_loader)
        test_acc /= test_total

    # Print progress every 10 epochs
    if epoch % 10 == 0:
        epoch_count.append(epoch)
        train_loss_values.append(train_loss)
        test_loss_values.append(test_loss)
        print(f"Epoch: {epoch} | Train Loss: {train_loss:.5f}| Test Loss: {test_loss:.5f}")
        print(f"Train acc: {train_acc} | Test acc: {test_acc}")

Epoch: 0 | Train Loss: 0.06046| Test Loss: 0.07228
Train acc: 0.9703703703703703 | Test acc: 0.9333333333333333
Epoch: 10 | Train Loss: 0.06365| Test Loss: 0.01664
Train acc: 0.9703703703703703 | Test acc: 1.0
Epoch: 20 | Train Loss: 0.06075| Test Loss: 0.03326
Train acc: 0.9703703703703703 | Test acc: 1.0
Epoch: 30 | Train Loss: 0.05966| Test Loss: 0.02613
Train acc: 0.9777777777777777 | Test acc: 1.0
Epoch: 40 | Train Loss: 0.05768| Test Loss: 0.01355
Train acc: 0.9777777777777777 | Test acc: 1.0
Epoch: 50 | Train Loss: 0.08195| Test Loss: 0.03703
Train acc: 0.9555555555555556 | Test acc: 1.0
Epoch: 60 | Train Loss: 0.05170| Test Loss: 0.01182
Train acc: 0.9851851851851852 | Test acc: 1.0
Epoch: 70 | Train Loss: 0.06519| Test Loss: 0.03497
Train acc: 0.9703703703703703 | Test acc: 1.0
Epoch: 80 | Train Loss: 0.06684| Test Loss: 0.02449
Train acc: 0.9629629629629629 | Test acc: 1.0
Epoch: 90 | Train Loss: 0.05581| Test Loss: 0.02706
Train acc: 0.9851851851851852 | Test acc: 1.0
Epoch:

**Summary:** The network achieves high accuracy on this architecture and classification task. The very high test accuracy may be partly due to the small size of the dataset; with more data, a proper validation strategy would give a clearer picture of generalization.