# Lab 4 (Student Version)

Work in groups of up to **2** students, but each student should submit their own work.

Put the full names of everyone in your group here.

**Names**:
- 


## Part A: Train an XOR network using PyTorch

**Goal:** Use PyTorch to train a neural network that correctly predicts XOR for all 4 inputs (**4 out of 4 correct**).

**Requirements**
- Use a **Binary Cross Entropy** loss (e.g., `nn.BCELoss()` or `nn.BCEWithLogitsLoss()`).
- You must **train** the network (do not manually set weights).
- Show the final predictions and the final accuracy.

**What to turn in**
- Your code.
- A short answer (2–6 sentences) explaining what architecture + training setup you used, and what finally worked.


In [None]:
# Imports
import torch
import torch.nn as nn


### A1) Create the XOR dataset

Fill in the XOR input points and labels.


In [None]:
# TODO: define the XOR inputs (shape: 4 x 2) as float tensors
X = torch.tensor([
    # [0., 0.],
    # [1., 0.],
    # [0., 1.],
    # [1., 1.],
])

# TODO: define the XOR labels (shape: 4 x 1) as float tensors
y_true = torch.tensor([
    # [0.],
    # [1.],
    # [1.],
    # [0.],
])


### A2) Define your model

Hints:
- XOR is not linearly separable, so you need **at least one hidden layer** and a **nonlinearity** (ReLU, Tanh, etc.).
- If you use `nn.BCELoss()`, the model output should be in (0,1), so include a `Sigmoid()` at the end.
- If you use `nn.BCEWithLogitsLoss()`, **do not** use `Sigmoid()` in the model.


In [None]:
# TODO: build a small network for XOR
# Example structure (edit it): Linear -> activation -> Linear -> (Sigmoid)

d_in = 2
d_hidden = None  # TODO: choose a hidden size (e.g., 2, 4, 8, ...)
d_out = 1

model = nn.Sequential(
    # TODO: add layers
)

model


### A3) Train the model

Requirements:
- Choose an optimizer (SGD or Adam).
- Train long enough to get 4/4 correct.
- Print (or store) the loss occasionally so we can see training progress.

Tip: If SGD is unstable, try Adam, or change the learning rate.


In [None]:
# TODO: choose a loss function
# Option 1 (probability output): nn.BCELoss()
# Option 2 (logits output): nn.BCEWithLogitsLoss()
loss_fn = None

# TODO: choose an optimizer
optimizer = None

# TODO: training loop
num_steps = None  # TODO: choose number of steps/iterations

for step in range(num_steps):
    # Forward pass
    y_hat = model(X)

    # Loss
    loss = loss_fn(y_hat, y_true)

    # Backprop
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # Optional: print progress
    if step % 100 == 0:
        print(step, float(loss))


### A4) Evaluate XOR accuracy

Compute:
- predicted probabilities
- predicted labels (use threshold 0.5)
- accuracy out of 4


In [None]:
# TODO: compute probabilities (or logits) and convert to predicted labels
with torch.no_grad():
    y_hat = model(X)

# TODO: if y_hat are probabilities, threshold at 0.5
# TODO: if y_hat are logits, apply sigmoid first, then threshold at 0.5
y_pred = None

# TODO: compute accuracy
accuracy = None

print("y_hat =", y_hat)
print("y_pred =", y_pred)
print("y_true =", y_true)
print("Accuracy =", accuracy)


**Short answers (write in markdown below):**
1. What model architecture did you use (layers/activations)?
2. What optimizer + learning rate did you use?
3. Why does XOR need a nonlinearity?


## Part B: Train an MNIST network using PyTorch

**Goal:** Adapt the PyTorch tutorial approach to train an MNIST classifier and reach **at least 91% test accuracy**.

Use `datasets.MNIST` (not FashionMNIST).

Recommended reference:
- PyTorch quickstart tutorial (structure for loaders/train/test loops).

**What to turn in**
- Your code.
- Final test accuracy (as a number).
- A short answer (4–8 sentences) describing what you changed to reach ≥91%.


In [None]:
from torchvision import datasets
from torchvision.transforms import ToTensor
from torch.utils.data import DataLoader


### B1) Load MNIST

Create:
- `training_data` (train=True)
- `test_data` (train=False)

Use `transform=ToTensor()`.


In [None]:
# TODO: load MNIST train and test datasets
training_data = None
test_data = None


### B2) Create DataLoaders

Choose a batch size and create:
- `train_dataloader`
- `test_dataloader`


In [None]:
batch_size = None  # TODO

train_dataloader = None  # TODO
test_dataloader = None   # TODO

# Optional: inspect one batch
# for X, y in test_dataloader:
#     print(X.shape, y.shape)
#     break


### B3) Define your model

Hints:
- A simple start is: `Flatten -> Linear -> ReLU -> Linear -> ReLU -> Linear(10)`.
- You can also try other architectures, but keep it reasonably small.
- For MNIST classification with `nn.CrossEntropyLoss()`, the model should output **raw scores** (logits) of shape `(batch, 10)` and you should **not** apply softmax in the model.


In [None]:
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()

        # TODO: define layers
        self.layers = nn.Sequential(
            # nn.Linear(..., ...),
            # nn.ReLU(),
            # ...
            # nn.Linear(..., 10),
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.layers(x)
        return logits

model = NeuralNetwork()
model


### B4) Loss + optimizer

Use `nn.CrossEntropyLoss()`.

Pick an optimizer (SGD or Adam) and a learning rate.


In [None]:
loss_fn = nn.CrossEntropyLoss()

# TODO: choose optimizer
optimizer = None


### B5) Training and testing loops

Fill in `train()` and `test()`.

Hints:
- `model.train()` during training, `model.eval()` during testing.
- Wrap testing in `torch.no_grad()`.
- Accuracy: compare `pred.argmax(1)` to labels `y`.


In [None]:
def train(dataloader, model, loss_fn, optimizer):
    model.train()
    for X, y in dataloader:
        # TODO: forward pass
        pred = None

        # TODO: compute loss
        loss = None

        # TODO: backprop + step
        optimizer.zero_grad()
        # loss.backward()
        # optimizer.step()

    # Optional: print last loss
    # print("train loss:", float(loss))


def test(dataloader, model, loss_fn):
    model.eval()
    test_loss = 0.0
    correct = 0
    num_examples = 0

    with torch.no_grad():
        for X, y in dataloader:
            # TODO: forward pass
            pred = None

            # TODO: accumulate loss
            # test_loss += ...

            # TODO: accumulate accuracy counts
            # correct += ...
            # num_examples += ...

    # TODO: compute average loss and accuracy
    avg_loss = None
    accuracy = None

    print(f"Test accuracy: {accuracy}")
    print(f"Test avg loss: {avg_loss}")
    return accuracy, avg_loss


### B6) Train until you reach at least 91% test accuracy

Tune:
- model size
- learning rate
- optimizer choice
- number of epochs
- batch size


In [None]:
epochs = None  # TODO

for epoch in range(epochs):
    print(f"Epoch {epoch+1}")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)


**Short answers (write in markdown below):**
1. What final test accuracy did you achieve?
2. What settings did you use (batch size, epochs, optimizer, learning rate, model layers)?
3. What change had the biggest impact on accuracy?
