<a href="https://colab.research.google.com/github/asia281/dnn2022/blob/main/Asia_of_DNN_Lab_4_MNIST_in_Pytorch_student_version.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<center><img src='https://drive.google.com/uc?id=1_utx_ZGclmCwNttSe40kYA6VHzNocdET' height="60"></center>

AI TECH - Akademia Innowacyjnych Zastosowań Technologii Cyfrowych. Program Operacyjny Polska Cyfrowa na lata 2014-2020
<hr>

<center><img src='https://drive.google.com/uc?id=1BXZ0u3562N_MqCLcekI-Ens77Kk4LpPm'></center>

<center>
Projekt współfinansowany ze środków Unii Europejskiej w ramach Europejskiego Funduszu Rozwoju Regionalnego 
Program Operacyjny Polska Cyfrowa na lata 2014-2020,
Oś Priorytetowa nr 3 "Cyfrowe kompetencje społeczeństwa" Działanie  nr 3.2 "Innowacyjne rozwiązania na rzecz aktywizacji cyfrowej" 
Tytuł projektu:  „Akademia Innowacyjnych Zastosowań Technologii Cyfrowych (AI Tech)”
    </center>

In [11]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms

In [12]:
from typing import List
class Net(nn.Module):
    def __init__(self, sizes: List):
        super(Net, self).__init__()
        # After flattening an image of size 28x28 we have 784 inputs
        self.layers = nn.ModuleList([nn.Linear(in_f, out_f) for in_f, out_f in zip(sizes, sizes[1:])])


    def forward(self, x):
        x = torch.flatten(x, 1)
        for layer in self.layers[:-1]:
          x = layer(x)
          x = F.relu(x)
        x = self.layers[-1](x)
        output = F.log_softmax(x, dim=1)
        return output

In [13]:
import matplotlib.pyplot as plt

def plot_learning_curve(epochs, losses):
    plt.subplots(1, figsize=(10,10))
    # plt.plot(train_sizes, np.mean, '--', color="#111111",  label="Training score")
    plt.plot(epochs, losses, color="#111111", label="Cross-validation score")

    plt.title("Learning Curve")
    plt.xlabel("Training Set Size"), plt.ylabel("Accuracy Score"), plt.legend(loc="best")
    plt.tight_layout()
    plt.show()

In [20]:
def train(model, device, train_loader, optimizer, epoch, log_interval):
    model.train()
    losses = []
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        losses.append(loss.item())
        optimizer.step()
        if batch_idx % log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))
            
    return losses


def test(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    losses = []
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))
    return test_loss



In [15]:
def update_kwargs(use_cuda, seed, batch_size, test_batch_size):
  use_cuda = not use_cuda and torch.cuda.is_available()

  torch.manual_seed(seed)
  device = torch.device("cuda" if use_cuda else "cpu")

  train_kwargs = {'batch_size': batch_size}
  test_kwargs = {'batch_size': test_batch_size}
  if use_cuda:
      cuda_kwargs = {'num_workers': 1,
                      'pin_memory': True,
                      'shuffle': True}
      train_kwargs.update(cuda_kwargs)
      test_kwargs.update(cuda_kwargs)

  return train_kwargs, test_kwargs, device

In [16]:
def transform(train_kwargs, test_kwargs):
  transform=transforms.Compose([
      transforms.ToTensor(),
      transforms.Normalize((0.1307,), (0.3081,))
      ])
  dataset1 = datasets.MNIST('../data', train=True, download=True,
                      transform=transform)
  dataset2 = datasets.MNIST('../data', train=False,
                      transform=transform)
  
  train_loader = torch.utils.data.DataLoader(dataset1,**train_kwargs)
  test_loader = torch.utils.data.DataLoader(dataset2, **test_kwargs)
  return train_loader, test_loader

In [21]:
def run_experiment(batch_size, test_batch_size, epochs, lr, log_interval, sizes, MomentOptimizer = False):
  train_kwargs, test_kwargs, device = update_kwargs(use_cuda = False, seed=1, batch_size=batch_size, test_batch_size=test_batch_size)
  model = Net(sizes).to(device)
  optimizer = optim.Adam(model.parameters(), lr=lr)
  
  if MomentOptimizer:
    optimizer = optim.SGD(model.parameters(), lr=lr, momentum=0.9)

  train_loader, test_loader = transform(train_kwargs, test_kwargs)
  losses_test = []
  epoch_list = [i for i in range(1175)]
  for epoch in range(1, epochs + 1):
      train(model, device, train_loader, optimizer, epoch, log_interval)
      losses_test.extend(test(model, device, test_loader))

  plot_learning_curve(epoch_list, losses_test)

In [22]:
sizes = [784, 128, 128, 10]

In [23]:
run_experiment(batch_size = 256, test_batch_size = 1000, epochs = 5, lr = 1e-2, sizes=sizes, log_interval = 10)


Test set: Average loss: 0.2275, Accuracy: 9311/10000 (93%)



TypeError: ignored

In [None]:
run_experiment(batch_size = 256, test_batch_size = 1000, epochs = 5, lr = 1e-3, sizes=sizes, log_interval = 10)

In [None]:
run_experiment(batch_size = 256, test_batch_size = 1000, epochs = 5, lr = 6e-4, sizes=sizes, log_interval = 10)

In [None]:
run_experiment(batch_size = 256, test_batch_size = 1000, epochs = 5, lr = 1e-2, log_interval = 10, sizes=sizes, MomentOptimizer=True)

In [None]:
run_experiment(batch_size = 256, test_batch_size = 1000, epochs = 5, lr = 2e-2, log_interval = 10, sizes=sizes, MomentOptimizer=True)

In [None]:
run_experiment(batch_size = 256, test_batch_size = 1000, epochs = 5, lr = 3e-2, log_interval = 10, sizes=sizes, MomentOptimizer=True)

Code based on https://github.com/pytorch/examples/blob/master/mnist/main.py

In this exercise we are using high level abstractions from torch.nn like nn.Linear.
Note: during the next lab session we will go one level deeper and implement more things
with bare hands.

Tasks:

    1. Read the code.

    2. Check that the given implementation reaches 95% test accuracy for architecture input-128-128-10 after few epochs.

    3. Add the option to use SGD with momentum instead of ADAM.

    4. Experiment with different learning rates, plot the learning curves for different
    learning rates for both ADAM and SGD with momentum.

    5. Parameterize the constructor by a list of sizes of hidden layers of the MLP.
    Note that this requires creating a list of layers as an atribute of the Net class,
    and one can't use a standard python list containing nn.Modules (why?).
    Check torch.nn.ModuleList.
