## (A) 🧮 Optional: En enda neuron

- A1: Neuron-implementation: for-loopar och Python-listor (som på tavlan lektion 1), ***alternativt***
- A2: Neuron-implementation: NumPy vektor-multiplikation internt i varje Neuron-objekt

In [1]:
def neuron(inputs, weights, bias):

    # Initialize output
    output = 0

    # Calculate output
    for i, w in zip(inputs, weights):
        output += i * w

    # Add bias  
    output += bias

    # Apply activation function (ReLU)
    output = max(0, output)

    return output

In [2]:
import numpy as np

def neuron(inputs, weights, bias):

    # Initialize output
    output = 0

    # Calculate output
    output = np.dot(inputs, weights) + bias

    # Apply activation function (ReLU)
    output = np.maximum(0, output)

    return output

## (B) ✅ ANN-lager: NumPy version

Det betyder att vi nu inte längre behöver någon klass Neuron, eftersom vi kommer beräkna ett helt lager som en enda stor matris-multiplikation:

- Alla input till ett lager = NumPy-vektor
- Alla vikter för alla neuroner i ett lager = en NumPy-matris
- Observera att vi inte kommer att träna nätverket som är implementerat som en NumPy-beräkning - eftersom det blir mycket enklare i (C) när vi övergår till PyTorch.

In [3]:
import numpy as np

def layer(inputs, weights, bias):
    outputs = []
    for i in range(len(weights)):
        outputs.append(neuron(inputs, weights[i], bias[i]))
    return outputs

## (C) ✅ ANN-lager: PyTorch version:

- Använd PyTorch 2.1 (eller bättre). Använd helst Python 3.10 (eller bättre).
- Kopplas först ihop alla lager i perceptronen så att du får en PyTorch-modell (a.k.a. module). Denna definierar i detalj compute-grafen för din perceptron.
- Använd därefter din perceptron via PyTorch. Googla själv för att få information om hur detta går till rent praktiskt. Det finns gott om information på webben kring PyTorch!
- I denna version ska även träning av nätverket ske, d.v.s. vi ska loopa över epochs, och applicera back-prop. En vidareutveckling av back-prop som kallas ADAM brukar användas eftersom den är både snabb och inte lika ofta fastnar i dåliga lokala minima, jämfört med ren back-prop.
- Se avsnittet “Tips för (C)” nedan.

In [4]:
# Define the perceptron neural-network model

import torch
import torch.nn as nn

class Perceptron(nn.Module):
    # Define the constructor
    def __init__(self):
        super().__init__()

        # Flatten the input
        self.flatten = nn.Flatten()

        # Define the layers with ReLU activation function
        self.linear_relu_stack = nn.Sequential(
            # Input layer   
            nn.Linear(28*28, 512),
            nn.ReLU(),

            # Hidden layer
            nn.Linear(512, 512),
            nn.ReLU(),

            # Output layer
            nn.Linear(512, 10),
        )

    # Define the forward pass
    def forward(self, x):
        # Flatten the input
        x = self.flatten(x)

        # Pass through the layers
        logits = self.linear_relu_stack(x)
        return logits

In [None]:
# Set pytorch device

device = torch.accelerator.current_accelerator().type if torch.accelerator.is_available() else "cpu"
print(f"Using {device} device\n")

model = Perceptron().to(device)
print(model)

Using cuda device

Perceptron(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)


In [7]:
# Download MNIST dataset and create dataloaders

from torchvision import datasets
from torchvision.transforms import ToTensor
from torch.utils.data import DataLoader

train_data = datasets.MNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor()
)

test_data = datasets.MNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor()
)

train_dataloader = DataLoader(train_data, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=64, shuffle=False)

100.0%
100.0%
100.0%
100.0%


In [8]:
# Train the network using ADAM back-propagation

import torch.optim as optim

# Define hyperparameters
learning_rate = 0.001
num_epochs = 5

# Define loss function and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

# Training loop
def train():
    model.train()
    train_loss = 0
    correct = 0
    total = 0
    
    for batch, (X, y) in enumerate(train_dataloader):
        X, y = X.to(device), y.to(device)
        
        # Forward pass
        pred = model(X)
        loss = loss_fn(pred, y)
        
        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        train_loss += loss.item()
        _, predicted = torch.max(pred.data, 1)
        total += y.size(0)
        correct += (predicted == y).sum().item()
        
        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{len(train_dataloader.dataset):>5d}]")
    
    print(f"Train Accuracy: {100 * correct / total:>0.1f}%")
    return train_loss / len(train_dataloader)

# Test loop
def test():
    model.eval()
    test_loss = 0
    correct = 0
    total = 0
    
    with torch.no_grad():
        for X, y in test_dataloader:
            X, y = X.to(device), y.to(device)
            pred = model(X)
            
            test_loss += loss_fn(pred, y).item()
            _, predicted = torch.max(pred.data, 1)
            total += y.size(0)
            correct += (predicted == y).sum().item()
    
    test_loss /= len(test_dataloader)
    print(f"Test Error: \n Accuracy: {(100 * correct / total):>0.1f}%, Avg loss: {test_loss:>8f}")
    return test_loss

# Train the model
for epoch in range(num_epochs):
    print(f"Epoch {epoch+1}\n-------------------------------")
    train_loss = train()
    test_loss = test()
    
print("Training complete!")

Epoch 1
-------------------------------
loss: 2.306261  [   64/60000]
loss: 0.420665  [ 6464/60000]
loss: 0.183054  [12864/60000]
loss: 0.250991  [19264/60000]
loss: 0.082767  [25664/60000]
loss: 0.144676  [32064/60000]
loss: 0.171033  [38464/60000]
loss: 0.453433  [44864/60000]
loss: 0.123790  [51264/60000]
loss: 0.130486  [57664/60000]
Train Accuracy: 93.4%
Test Error: 
 Accuracy: 96.9%, Avg loss: 0.104069
Epoch 2
-------------------------------
loss: 0.092852  [   64/60000]
loss: 0.035435  [ 6464/60000]
loss: 0.040061  [12864/60000]
loss: 0.020085  [19264/60000]
loss: 0.147745  [25664/60000]
loss: 0.013643  [32064/60000]
loss: 0.021708  [38464/60000]
loss: 0.012838  [44864/60000]
loss: 0.094302  [51264/60000]
loss: 0.047970  [57664/60000]
Train Accuracy: 97.3%
Test Error: 
 Accuracy: 97.3%, Avg loss: 0.085493
Epoch 3
-------------------------------
loss: 0.068880  [   64/60000]
loss: 0.040552  [ 6464/60000]
loss: 0.052918  [12864/60000]
loss: 0.149939  [19264/60000]
loss: 0.049010  

## (D) ✅ Samma som (C), men exekverad på en CUDA GPU

- GPU:n behöver stöda CUDA v11.6 eller högre, vilket motsvarar en GPU från NVIDIA’s Pascal-generation eller senare (Exempel på Pascal-kort: GeForce GTX-1080, Quadro P5000, Tesla P100). (Senare generationer: Volta, Turing, Ampère, Ada, Hopper, Blackwell).
- Google Colab har billiga/gratis notebook-instanser med NVIDIA T4 GPU, vilket är en enkel type av Turing-GPU. Denna fungerar utmärkt för uppgiften, men har du en modern NVIDIA-GPU i din dator är den troligen snabbare än en T4.

In [None]:
# Set device to cuda if available (already done previously for the sake of speedy training).

device = torch.accelerator.current_accelerator().type if torch.accelerator.is_available() else "cpu"
print(f"Using {device} device\n")