[![Dataflowr](https://raw.githubusercontent.com/dataflowr/website/master/_assets/dataflowr_logo.png)](https://dataflowr.github.io/website/)

# [Module 5](https://dataflowr.github.io/website/modules/5-stacking-layers/): overfitting a MLP on CIFAR10

Training loop over CIFAR10 (40,000 train images, 10,000 test images). What happens if you
- switch the training to a GPU? Is it faster?
- Remove the `ReLU()`? 
- Increase the learning rate?
- Stack more layers? 
- Perform more epochs?

Can you completely overfit the training set (i.e. get 100% accuracy?)

This code is highly non-modulable. Create functions for each specific task. 
(hint: see [this](https://github.com/pytorch/examples/blob/master/mnist/main.py))

Your training went well. Good. Why not save the weights of the network (`net.state_dict()`) using `torch.save()`?

In [None]:
import torch
import torchvision
import torch.nn as nn
import torchvision.transforms as t

# define network structure 
net = nn.Sequential(nn.Linear(3 * 32 * 32, 1000), nn.ReLU(), nn.Linear(1000, 10))
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(net.parameters(), lr=0.01, momentum=0.9)

# load data
to_tensor =  t.ToTensor()
normalize = t.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
flatten =  t.Lambda(lambda x:x.view(-1))

transform_list = t.Compose([to_tensor, normalize, flatten])
train_set = torchvision.datasets.CIFAR10(root='.', train=True, transform=transform_list, download=True)
test_set = torchvision.datasets.CIFAR10(root='.', train=False, transform=transform_list, download=True)

train_loader = torch.utils.data.DataLoader(train_set, batch_size=64)
test_loader = torch.utils.data.DataLoader(test_set, batch_size=64)

# === Train === ###
net.train()

# train loop
for epoch in range(3):
    train_correct = 0
    train_loss = 0
    print('Epoch {}'.format(epoch))
    
    # loop per epoch 
    for i, (batch, targets) in enumerate(train_loader):
        output = net(batch)
        loss = criterion(output, targets)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        pred = output.max(1, keepdim=True)[1]
        train_correct += pred.eq(targets.view_as(pred)).sum().item()
        train_loss += loss

        if i % 100 == 10: print('Train loss {:.4f}, Train accuracy {:.2f}%'.format(
            train_loss / ((i+1) * 64), 100 * train_correct / ((i+1) * 64)))
        
print('End of training.\n')
    
# === Test === ###
test_correct = 0
net.eval()

# loop, over whole test set
for i, (batch, targets) in enumerate(test_loader):
    output = net(batch)
    pred = output.max(1, keepdim=True)[1]
    test_correct += pred.eq(targets.view_as(pred)).sum().item()
    
print('End of testing. Test accuracy {:.2f}%'.format(
    100 * test_correct / (len(test_loader) * 64)))

Files already downloaded and verified
Files already downloaded and verified
Epoch 0
Train loss 0.0338, Train accuracy 20.74%
Train loss 0.0294, Train accuracy 32.05%
Train loss 0.0284, Train accuracy 35.09%
Train loss 0.0278, Train accuracy 36.54%
Train loss 0.0272, Train accuracy 38.04%
Train loss 0.0268, Train accuracy 38.97%
Train loss 0.0265, Train accuracy 39.79%
Train loss 0.0262, Train accuracy 40.40%
Epoch 1
Train loss 0.0236, Train accuracy 47.44%
Train loss 0.0235, Train accuracy 46.31%
Train loss 0.0234, Train accuracy 47.17%
Train loss 0.0232, Train accuracy 47.81%
Train loss 0.0229, Train accuracy 48.24%
Train loss 0.0228, Train accuracy 48.57%
Train loss 0.0227, Train accuracy 48.72%
Train loss 0.0226, Train accuracy 49.10%
Epoch 2
Train loss 0.0211, Train accuracy 51.70%
Train loss 0.0212, Train accuracy 51.68%
Train loss 0.0212, Train accuracy 51.95%
Train loss 0.0211, Train accuracy 52.17%
Train loss 0.0209, Train accuracy 52.55%
Train loss 0.0207, Train accuracy 52.83

In [None]:
class CIFAR_MODEL(nn.Module):
  def __init__(self) -> None:
      super(CIFAR_MODEL, self).__init__()
      self.layers = nn.Sequential(nn.Linear(3 * 32 * 32, 1000),
                                  nn.ReLU(),
                                  nn.Linear(1000, 10))
  def forward(self, x):
    x = self.layers(x)
    return x


def train_loop(model, train_loader, device, optimizer, epochs):
  # === Train === ###
  model.train()

  # train loop
  for epoch in range(3):
      train_correct = 0
      train_loss = 0
      print('Epoch {}'.format(epoch))
      
      # loop per epoch 
      for i, (batch, targets) in enumerate(train_loader):
        batch, targets = batch.to(device), targets.to(device)
        output = model(batch)
        loss = criterion(output, targets)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        pred = output.max(1, keepdim=True)[1]
        train_correct += pred.eq(targets.view_as(pred)).sum().item()
        train_loss += loss.item()

        if i % 100 == 10: print('Train loss {:.4f}, Train accuracy {:.2f}%'.format(
              train_loss / ((i+1) * 64), 100 * train_correct / ((i+1) * 64)))


def test_loop(model, test_loader, device):
  # === Test === ###
  test_correct = 0
  model.eval()
  with torch.inference_mode():
    # loop, over whole test set
    for i, (batch, targets) in enumerate(test_loader):
        batch, targets = batch.to(device), targets.to(device)
        output = model(batch)
        pred = output.max(1, keepdim=True)[1]
        test_correct += pred.eq(targets.view_as(pred)).sum().item()

    print('End of testing. Test accuracy {:.2f}%'.format(
          100 * test_correct / (len(test_loader) * 64)))

# device
device = 'cuda' if torch.cuda.is_available() else 'cpu'
# instantiating the model 
model_0 = CIFAR_MODEL()
model_0 = model_0.to(device)
# loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model_0.parameters(), lr=0.01, momentum=0.9)
# training
train_loop(model_0, train_loader, device, optimizer, 10)
test_loop(model_0, test_loader, device)

Epoch 0
Train loss 0.0346, Train accuracy 18.04%
Train loss 0.0295, Train accuracy 32.93%
Train loss 0.0282, Train accuracy 36.60%
Train loss 0.0274, Train accuracy 38.31%
Train loss 0.0267, Train accuracy 39.91%
Train loss 0.0263, Train accuracy 40.72%
Train loss 0.0260, Train accuracy 41.38%
Train loss 0.0258, Train accuracy 41.91%
Epoch 1
Train loss 0.0230, Train accuracy 49.86%
Train loss 0.0230, Train accuracy 49.37%
Train loss 0.0229, Train accuracy 49.04%
Train loss 0.0228, Train accuracy 49.33%
Train loss 0.0225, Train accuracy 49.65%
Train loss 0.0225, Train accuracy 49.82%
Train loss 0.0224, Train accuracy 49.93%
Train loss 0.0224, Train accuracy 50.07%
Epoch 2
Train loss 0.0209, Train accuracy 54.26%
Train loss 0.0212, Train accuracy 53.32%
Train loss 0.0212, Train accuracy 52.98%
Train loss 0.0211, Train accuracy 53.51%
Train loss 0.0209, Train accuracy 53.74%
Train loss 0.0208, Train accuracy 53.92%
Train loss 0.0208, Train accuracy 53.84%
Train loss 0.0208, Train accuracy

## Autograd tips and tricks

Pointers are everywhere!

In [None]:
net = nn.Linear(2, 2)
w = net.weight
print(w)

x = torch.rand(1, 2)
y = net(x).sum()
y.backward()
net.weight.data -= 0.01 * net.weight.grad # <--- What is this?
print(w)

Parameter containing:
tensor([[-0.4018,  0.0834],
        [ 0.2269,  0.2882]], requires_grad=True)
Parameter containing:
tensor([[-0.4020,  0.0823],
        [ 0.2267,  0.2871]], requires_grad=True)


In [None]:
net = nn.Linear(2, 2)
w = net.weight.clone()
print(w)

x = torch.rand(1, 2)
y = net(x).sum()
y.backward()
net.weight.data -= 0.01 * net.weight.grad # <--- What is this?
print(w)

tensor([[ 0.5300,  0.3895],
        [-0.1400, -0.5465]], grad_fn=<CloneBackward0>)
tensor([[ 0.5300,  0.3895],
        [-0.1400, -0.5465]], grad_fn=<CloneBackward0>)


Sharing weights 

In [None]:
net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
net[0].weight = net[1].weight  # weight sharing

x = torch.rand(1, 2)
y = net(x).sum()
y.backward()
print(net[0].weight.grad)
print(net[1].weight.grad)

tensor([[ 1.2128,  0.3815],
        [ 0.8211, -0.0821]])
tensor([[ 1.2128,  0.3815],
        [ 0.8211, -0.0821]])


[![Dataflowr](https://raw.githubusercontent.com/dataflowr/website/master/_assets/dataflowr_logo.png)](https://dataflowr.github.io/website/)