### Lecture 7 - Automatisk differentiering och optimering
Assignment: Gradients and optimizers

Instructions:

- Use PyTorch or JAX
- Keep examples small and explain with comments

## Task 1: Autodiff basics
Use automatic differentiation and compare with an analytic derivative.

In [1]:
# TODO: Define f(x) = x**3 + 2*x
def f(x):
    return (x**3 + 2*x)

print(f(3))

33


In [2]:
# TODO: Use autodiff to compute df/dx at x=3
import torch

x = torch.tensor(3.0, requires_grad=True)
# f = x**3 + 2 * x
g = f(x)
g.backward()
print("Autodiff df/dx at x=3:", x.grad.item())

Autodiff df/dx at x=3: 29.0


In [3]:
# TODO: Compare with the analytic derivative

def f_prim(x):
    return 3 * x**2 + 2

print(f_prim(3))

29


### Task 2: Optimizer comparison
Train a small model and compare optimizers.

In [None]:
# TODO: Train a small model (e.g., logistic regression)
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

import torch
import torch.nn as nn
import torch.optim as optim

# We start by creating a small synthetic classification dataset. 
# For that, we use sklearn’s make_classification
X, y = make_classification(
    n_samples=500, n_features=6, n_informative=4, n_redundant=0, random_state=42
)

# We do not want to do scaling before splitting the data into train/test.
# WHY?: Information from the test set will leak into the training set.
# This is called data leakage; the model will become artificially good!
# But it will generalize poorly
X = StandardScaler().fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Here we would typically scale the data!
X_train_t = torch.tensor(X_train, dtype=torch.float32)
X_test_t = torch.tensor(X_test, dtype=torch.float32)
y_train_t = torch.tensor(y_train, dtype=torch.float32).view(-1, 1)
y_test_t = torch.tensor(y_test, dtype=torch.float32).view(-1, 1)

In [None]:
# TODO: Compare SGD vs Adam for 20-50 epochs

def train_with_optimizer(optimizer_cls, lr=0.01, epochs=30):
    model = nn.Sequential(nn.Linear(X_train.shape[1], 1), nn.Sigmoid())
    loss_fn = nn.BCELoss() # Binary cross-entropy because there are only 2 classes; with more classes, we use categorical cross-entropy
    optimizer = optimizer_cls(model.parameters(), lr=lr)
    
# In the loop below, we use _ instead of a variable (e.g., i).
# That way, we don’t store the value → we save a bit of memory.
# In ML/DL, memory is often a limiting factor.
    for _ in range(epochs):
        model.train()
        preds = model(X_train_t)
        loss = loss_fn(preds, y_train_t)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    model.eval()
    with torch.no_grad():
        test_preds = (model(X_test_t) > 0.5).float()
        acc = (test_preds == y_test_t).float().mean().item()
        test_loss = loss_fn(model(X_test_t), y_test_t).item()
    return test_loss, acc

sgd_loss, sgd_acc = train_with_optimizer(optim.SGD, lr=0.1, epochs=40)
adam_loss, adam_acc = train_with_optimizer(optim.Adam, lr=0.01, epochs=40)

In [None]:
# TODO: Record final loss and accuracy

# How do we interpret the results?
# Accuracy: What proportion did we get correct? (0–100%, a value between 0 and 1)
# Loss: How much error did the model have? (0–∞, lower is better, good for comparing different models on the same problem)
print(f"SGD -> loss: {sgd_loss:.4f}, acc: {sgd_acc:.4f}")
print(f"Adam -> loss: {adam_loss:.4f}, acc: {adam_acc:.4f}")

SGD -> loss: 0.6475, acc: 0.6300
Adam -> loss: 0.6635, acc: 0.6200
