# Lab 4 - Math 178, Spring 2024

You are encouraged to work in groups of up to 2 total students, but each student should make a submission on Canvas. (It's fine for everyone in the group to submit the same link.)

Put the full names of everyone in your group (even if you're working alone) here. This makes grading easier.

**Names**:

## Train an XOR network using PyTorch

* Use PyTorch to train a neural network which produces a perfect (4 out of 4) prediction rate for XOR.  (This is similar to what you did "by hand" on question 2d on Homework 3.  You should not be manually setting the weights, but instead, should be using PyTorch to find weights.  Use a Binary Cross Entropy loss function.  Feel free to use a more complex Neural Network architecture than what you did by hand in the homework.  I was able to eventually get a small neural network architecture to work, but I had to re-run the code numerous times.)

Recommended references:
1. I primarily used the attached University of Washington notebook, which I downloaded from [Google Colab](https://colab.research.google.com/drive/1up-BwDyjNLISMtXKCMjNomyIOW96JlJh?usp=sharing).
2. I personally solved this exercise before reading through the [PyTorch tutorial](https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html) which will be used below.  If I had started with that tutorial, maybe I would have used a different approach.  But that tutorial is fancier than what we need here, because of the data loaders etc.

Comment:
1.  The MNIST portion below is probably easier, in terms of what you need to do, but I'm putting this part first because the resulting neural network here is conceptually simpler.

In [1]:
import torch
import torch.nn as nn
from torchvision import datasets
from torchvision.transforms import ToTensor
from torch.utils.data import DataLoader

In [2]:
X = torch.tensor(
    [
        [0.,0],
        [1,0],
        [0,1],
        [1,1]
    ]
)

In [3]:
y_true = torch.tensor(
    [
        [0.],
        [1],
        [1],
        [0]
    ]
)

In [4]:
d_in = 2
d_hidden = 2
d_out = 1
model = torch.nn.Sequential(
    nn.Linear(d_in, d_hidden),
    nn.ReLU(),
    nn.Linear(d_hidden, d_out),
    nn.Sigmoid()
)

optim = torch.optim.SGD(model.parameters(), lr=0.05)
loss_fn = nn.BCELoss()

for i in range(2000):
    y_hat = model(X)
    loss = loss_fn(y_hat, y_true)
    optim.zero_grad()
    loss.backward()
    optim.step()

    if i%100 == 0:
        print(i, loss.item())

0 0.6954530477523804
100 0.6921792030334473
200 0.6901718974113464
300 0.6860920190811157
400 0.676921010017395
500 0.6580075621604919
600 0.6295149922370911
700 0.5954500436782837
800 0.5621861219406128
900 0.5356307029724121
1000 0.5170131921768188
1100 0.5053509473800659
1200 0.49796003103256226
1300 0.49292001128196716
1400 0.489668071269989
1500 0.487196683883667
1600 0.48557594418525696
1700 0.48441213369369507
1800 0.4832879304885864
1900 0.48266875743865967


In [5]:
model(X)

tensor([[0.0177],
        [0.6596],
        [0.6596],
        [0.6596]], grad_fn=<SigmoidBackward0>)

In [6]:
for param in model.parameters():
    print(param)

Parameter containing:
tensor([[ 0.1311, -0.1602],
        [-1.6684, -1.6603]], requires_grad=True)
Parameter containing:
tensor([-0.1365,  1.6591], requires_grad=True)
Parameter containing:
tensor([[-0.4692, -2.8178]], requires_grad=True)
Parameter containing:
tensor([0.6615], requires_grad=True)


## Train an MNIST network using PyTorch

* Adapt the code at [the PyTorch tutorial](https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html) to train an MNIST neural network.  Adjust parameters as necessary to reach at least a 91% test accuracy.  (Be sure you're using `datasets.MNIST` rather than what's in the tutorial: `datasets.FashionMNIST`.  Most other parts of the tutorial should adapt easily.  I deleted the the GPU parts such as `if torch.cuda.is_available()` because I don't think they will work on Deepnote, but perhaps they are useful also here.)

In [7]:
# Load the data
training_data = datasets.MNIST(
    root="data",
    train=True,
    download=True,
    transform=ToTensor(),
)

test_data = datasets.MNIST(
    root="data",
    train=False,
    download=True,
    transform=ToTensor(),
)

100%|██████████| 9.91M/9.91M [00:00<00:00, 43.0MB/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 1.10MB/s]
100%|██████████| 1.65M/1.65M [00:00<00:00, 10.0MB/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 2.72MB/s]


In [8]:
batch_size = 64

# Create data loaders.
train_dataloader = DataLoader(training_data, batch_size=batch_size)
test_dataloader = DataLoader(test_data, batch_size=batch_size)

for X, y in test_dataloader:
    print(f"Shape of X [N, C, H, W]: {X.shape}")
    print(f"Shape of y: {y.shape} {y.dtype}")
    break

Shape of X [N, C, H, W]: torch.Size([64, 1, 28, 28])
Shape of y: torch.Size([64]) torch.int64


In [9]:
# Define model
class NeuralNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.layers = nn.Sequential(
            nn.Linear(784, 256),
            nn.ReLU(),
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Linear(128, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.layers(x)
        return logits

model = NeuralNetwork()
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (layers): Sequential(
    (0): Linear(in_features=784, out_features=256, bias=True)
    (1): ReLU()
    (2): Linear(in_features=256, out_features=128, bias=True)
    (3): ReLU()
    (4): Linear(in_features=128, out_features=10, bias=True)
  )
)


In [10]:
model = NeuralNetwork()

In [11]:
sum([p.numel() for p in model.parameters()])

235146

In [12]:
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

In [13]:
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    model.train()
    for batch, (X, y) in enumerate(dataloader):

        # Compute prediction error
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

        if batch % 100 == 0:
            loss, current = loss.item(), (batch + 1) * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

In [14]:
def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0, 0
    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()
    test_loss /= num_batches
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

In [15]:
epochs = 5
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train(train_dataloader, model, loss_fn, optimizer)
    test(test_dataloader, model, loss_fn)
print("Done!")

Epoch 1
-------------------------------
loss: 2.301928  [   64/60000]
loss: 2.267008  [ 6464/60000]
loss: 2.226871  [12864/60000]
loss: 2.064916  [19264/60000]
loss: 1.949013  [25664/60000]
loss: 1.725028  [32064/60000]
loss: 1.385321  [38464/60000]
loss: 1.241456  [44864/60000]
loss: 0.975897  [51264/60000]
loss: 0.851995  [57664/60000]
Test Error: 
 Accuracy: 80.6%, Avg loss: 0.771274 

Epoch 2
-------------------------------
loss: 0.841805  [   64/60000]
loss: 0.636083  [ 6464/60000]
loss: 0.662229  [12864/60000]
loss: 0.578300  [19264/60000]
loss: 0.548512  [25664/60000]
loss: 0.497509  [32064/60000]
loss: 0.403166  [38464/60000]
loss: 0.536237  [44864/60000]
loss: 0.485979  [51264/60000]
loss: 0.516598  [57664/60000]
Test Error: 
 Accuracy: 87.8%, Avg loss: 0.427428 

Epoch 3
-------------------------------
loss: 0.479756  [   64/60000]
loss: 0.340990  [ 6464/60000]
loss: 0.383078  [12864/60000]
loss: 0.423388  [19264/60000]
loss: 0.358532  [25664/60000]
loss: 0.409016  [32064/600