<a href="https://colab.research.google.com/github/am2145/ai-science-training-series/blob/homeworks/02_intro_neural_networks/HW2_angirasm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to Neural Networks
Author: Bethany Lusch, adapting materials from Marieme Ngom, Asad Khan, Prasanna Balaprakash, Taylor Childers, Corey Adams, Kyle Felker, and Tanwi Mallick.

This tutorial will serve as a gentle introduction to neural networks and deep learning through a hands-on classification problem using the MNIST dataset.

In particular, we will introduce neural networks and how to train and improve their learning capabilities.  We will use the PyTorch Python library.

The [MNIST dataset](http://yann.lecun.com/exdb/mnist/) contains thousands of examples of handwritten numbers, with each digit labeled 0-9.
<img src="https://github.com/am2145/ai-science-training-series/blob/homeworks/02_intro_neural_networks/images/mnist_task.png?raw=1"  align="left"/>



In [None]:
%matplotlib inline

import torch
import torchvision
from torch import nn

import numpy
import matplotlib.pyplot as plt
import time

## The MNIST dataset

We will now download the dataset that contains handwritten digits. MNIST is a popular dataset, so we can download it via the PyTorch library. Note:
- x is for the inputs (images of handwritten digits) and y is for the labels or outputs (digits 0-9)
- We are given "training" and "test" datasets. Training datasets are used to fit the model. Test datasets are saved until the end, when we are satisfied with our model, to estimate how well our model generalizes to new data.

Note that downloading it the first time might take some time.
The data is split as follows:
- 60,000 training examples, 10,000 test examples
- inputs: 1 x 28 x 28 pixels
- outputs (labels): one integer per example

In [None]:
training_data = torchvision.datasets.MNIST(
    root="data",
    train=True,
    download=True,
    transform=torchvision.transforms.ToTensor()
)

test_data = torchvision.datasets.MNIST(
    root="data",
    train=False,
    download=True,
    transform=torchvision.transforms.ToTensor()
)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 83623224.72it/s]


Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 48825350.19it/s]


Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 24325605.94it/s]


Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 10286462.62it/s]

Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw






In [None]:
training_data, validation_data = torch.utils.data.random_split(training_data, [0.8, 0.2], generator=torch.Generator().manual_seed(55))

In [None]:
print('MNIST data loaded: train:',len(training_data),' examples, validation: ', len(validation_data), 'examples, test:',len(test_data), 'examples')
print('Input shape', training_data[0][0].shape)

MNIST data loaded: train: 48000  examples, validation:  12000 examples, test: 10000 examples
Input shape torch.Size([1, 28, 28])


In [None]:
batch_size = 128

# The dataloader makes our dataset iterable
train_dataloader = torch.utils.data.DataLoader(training_data, batch_size=batch_size)
val_dataloader = torch.utils.data.DataLoader(validation_data, batch_size=batch_size)

In [None]:
def train_one_epoch(dataloader, model, loss_fn, optimizer):
    model.train()
    for batch, (X, y) in enumerate(dataloader):
        # forward pass
        pred = model(X)
        loss = loss_fn(pred, y)

        # backward pass calculates gradients
        loss.backward()

        # take one step with these gradients
        optimizer.step()

        # resets the gradients
        optimizer.zero_grad()

In [None]:
def evaluate(dataloader, model, loss_fn):
    # Set the model to evaluation mode - some NN pieces behave differently during training
    # Unnecessary in this situation but added for best practices
    model.eval()
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    loss, correct = 0, 0

    # We can save computation and memory by not calculating gradients here - we aren't optimizing
    with torch.no_grad():
        # loop over all of the batches
        for X, y in dataloader:
            pred = model(X)
            loss += loss_fn(pred, y).item()
            # how many are correct in this batch? Tracking for accuracy
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    loss /= num_batches
    correct /= size

    accuracy = 100*correct
    return accuracy, loss

In [None]:
def show_failures(model, dataloader, maxtoshow=10):
    model.eval()
    batch = next(iter(dataloader))
    predictions = model(batch[0])

    rounded = predictions.argmax(1)
    errors = rounded!=batch[1]
    print('Showing max', maxtoshow, 'first failures. '
          'The predicted class is shown first and the correct class in parentheses.')
    ii = 0
    plt.figure(figsize=(maxtoshow, 1))
    for i in range(batch[0].shape[0]):
        if ii>=maxtoshow:
            break
        if errors[i]:
            plt.subplot(1, maxtoshow, ii+1)
            plt.axis('off')
            plt.imshow(batch[0][i,0,:,:], cmap="gray")
            plt.title("%d (%d)" % (rounded[i], batch[1][i]))
            ii = ii + 1

<!-- # Exercise:
- Try changing the loss function,
- Try changing the optimizer -->

In [None]:
class NonlinearClassifier(nn.Module):

    def __init__(self):
        super().__init__()
        self.flatten = nn.Flatten()
        self.layers_stack = nn.Sequential(
            nn.Linear(28*28, 50),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(50, 50),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(50, 10)
        )

    def forward(self, x):
        x = self.flatten(x)
        x = self.layers_stack(x)

        return x

# Homework: train a Nonlinear Classifier

1. Write some code to train the NonlinearClassifier.
2. Create a data loader for the test data and check your model's accuracy on the test data.

If you have time, experiment with how to improve the model. Note: training and validation data can be used to compare models, but test data should be saved until the end as a final check of generalization.

In [None]:
def dataloader(dataset, batch_size):
  return torch.utils.data.DataLoader(dataset, batch_size = batch_size)


In [None]:
def train_val_n_epochs(train_dataloader, val_dataloader, model, loss_fn, num_epochs, lr, verbose = True):
  optimizer = torch.optim.SGD(model.parameters(), lr=lr)

  for n in range(num_epochs):
    train_one_epoch(train_dataloader, model, loss_fn, optimizer)

    train_acc, train_loss = evaluate(train_dataloader, model, loss_fn)
    val_acc, val_loss = evaluate(val_dataloader, model, loss_fn)

    if verbose:
      # checking on the training loss and accuracy once per epoch
      print(f"Epoch {n}: training loss: {train_loss}, accuracy: {train_acc}")

      # checking on the validation loss and accuracy once per epoch
      print(f"Epoch {n}: validation loss: {val_loss}, accuracy: {val_acc}")

In [None]:
# load the data
my_batch_size = 128
train_dataloader = dataloader(training_data, my_batch_size)
val_dataloader = dataloader(validation_data, my_batch_size)

In [None]:
#initialize model and loss function
nonlinear_model = NonlinearClassifier()
loss_fn = nn.CrossEntropyLoss()

In [None]:
#Select number of epochs, learning rate, and optimizer.
n_epochs = 5
lr = 0.1


In [None]:
train_val_n_epochs(train_dataloader, val_dataloader, nonlinear_model, loss_fn, n_epochs, lr) #try an example.

Epoch 0: training loss: 0.3620865137974421, accuracy: 89.63958333333333
Epoch 0: validation loss: 0.3583734010128265, accuracy: 89.55
Epoch 1: training loss: 0.2535987759629885, accuracy: 92.58958333333334
Epoch 1: validation loss: 0.2500910741534639, accuracy: 92.45833333333333
Epoch 2: training loss: 0.21031810522079467, accuracy: 93.72708333333334
Epoch 2: validation loss: 0.21044380891513317, accuracy: 93.63333333333334
Epoch 3: training loss: 0.1797093785703182, accuracy: 94.66041666666666
Epoch 3: validation loss: 0.18371592065755357, accuracy: 94.27499999999999
Epoch 4: training loss: 0.16030347616473833, accuracy: 95.1875
Epoch 4: validation loss: 0.16458108561470153, accuracy: 95.05


In [None]:
# load the test data and evaluate on it.
test_dataloader = dataloader(test_data, my_batch_size)
test_acc, test_loss = evaluate(test_dataloader, nonlinear_model, loss_fn)
print(f"Final model testing loss: {test_loss}, accuracy: {test_acc}")

Final model testing loss: 0.1644941335514518, accuracy: 94.91000000000001


In [None]:
import itertools as it

def get_dict_combinations(input_dict):
  allNames = sorted(input_dict)
  combinations = it.product(*(input_dict[Name] for Name in allNames))
  comb_list = list(combinations)
  return comb_list

In [None]:
#Some basic hyperparameter searching:

def tune_train_nonlinear(param_dict):
  test_accs = []
  test_losses = []
  param_combs = get_dict_combinations(param_dict)

  for combination in param_combs:
    n_epochs = combination[1]
    lr = combination[0]

    nonlinear_model = NonlinearClassifier()
    loss_fn = nn.CrossEntropyLoss()

    train_val_n_epochs(train_dataloader, val_dataloader, nonlinear_model, loss_fn, n_epochs, lr, verbose=False)
    test_acc, test_loss = evaluate(test_dataloader, nonlinear_model, loss_fn)
    print(f"Final model testing loss: {test_loss}, accuracy: {test_acc}")
    test_accs.append(test_acc)
    test_losses.append(test_loss)

  return test_accs, test_losses



In [None]:
param_dict = {'n_epochs': [5,10], 'lr' : [0.05, 0.1]}

In [None]:
param_combs = get_dict_combinations(param_dict)
param_combs

[(0.05, 5), (0.05, 10), (0.1, 5), (0.1, 10)]

In [None]:
tune_train_nonlinear(param_dict)

Final model testing loss: 0.22767401788430877, accuracy: 93.03
Final model testing loss: 0.15926099219064735, accuracy: 94.99
Final model testing loss: 0.1722397496776443, accuracy: 94.75
Final model testing loss: 0.11935077312319906, accuracy: 96.46000000000001


([93.03, 94.99, 94.75, 96.46000000000001],
 [0.22767401788430877,
  0.15926099219064735,
  0.1722397496776443,
  0.11935077312319906])

# JupyterHub Reminder

From [Homework 0](https://github.com/argonne-lcf/ai-science-training-series/blob/main/00_introToAlcf/02_jupyterNotebooks.md): "If you simply close your browser window, or logout without shutting down the jupyter server, your job will continue to occupy the worker node. Be considerate and shutdown your job when you finish."

File --> Hub Control Panel --> Stop my server