<a href="https://colab.research.google.com/github/ayulockin/debugNNwithWandB/blob/master/MNIST_pytorch_wandb_Overfit_Small.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Imports and Setups

In [1]:
!pip install wandb -q

[K     |████████████████████████████████| 1.4MB 23.0MB/s 
[K     |████████████████████████████████| 102kB 14.6MB/s 
[K     |████████████████████████████████| 460kB 55.0MB/s 
[K     |████████████████████████████████| 92kB 13.2MB/s 
[K     |████████████████████████████████| 102kB 15.4MB/s 
[K     |████████████████████████████████| 71kB 10.9MB/s 
[K     |████████████████████████████████| 71kB 11.7MB/s 
[?25h  Building wheel for gql (setup.py) ... [?25l[?25hdone
  Building wheel for shortuuid (setup.py) ... [?25l[?25hdone
  Building wheel for watchdog (setup.py) ... [?25l[?25hdone
  Building wheel for subprocess32 (setup.py) ... [?25l[?25hdone
  Building wheel for graphql-core (setup.py) ... [?25l[?25hdone
  Building wheel for pathtools (setup.py) ... [?25l[?25hdone


In [0]:
import wandb

In [3]:
!wandb login

[34m[1mwandb[0m: You can find your API key in your browser here: https://app.wandb.ai/authorize
[34m[1mwandb[0m: Paste an API key from your profile and hit enter: 69f60a7711ce6b8bbae91ac6d15e45d6b1f1430e
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[32mSuccessfully logged in to Weights & Biases![0m


In [0]:
import torch
from torch import nn
from torch import optim
from torch.nn import functional as F
import torchvision
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

import matplotlib.pyplot as plt
import numpy as np

#### For GPU

In [5]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)

cuda:0


## MNIST Hand written Dataset

In [6]:
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.1307,), (0.3081,))])

trainset = torchvision.datasets.MNIST(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

testset = torchvision.datasets.MNIST(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=False)

0it [00:00, ?it/s]

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


9920512it [00:02, 3380381.50it/s]                            


Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw


0it [00:00, ?it/s]

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


32768it [00:00, 49386.55it/s]                           
0it [00:00, ?it/s]

Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


1654784it [00:02, 817525.30it/s]                             
0it [00:00, ?it/s]

Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


8192it [00:00, 18514.62it/s]            

Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw
Processing...
Done!





In [0]:
classes = ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9')

## Small Dataset 

In [0]:
x_small, y_small = iter(trainloader).next()

In [28]:
x_small.shape, y_small.shape

(torch.Size([64, 1, 28, 28]), torch.Size([64]))

## Model

In [0]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, 1, bias=False)
        self.conv2 = nn.Conv2d(32, 64, 3, 1, bias=False)

        self.fc1 = nn.Linear(9216, 128, bias=False)
        self.fc2 = nn.Linear(128, 10, bias=False)

    def forward(self, x):
        ## Conv 1st Block
        x = self.conv1(x)
        x = F.relu(x)
        x = self.conv2(x)
        x = F.relu(x) 
        x = F.max_pool2d(x, 2)

        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.fc2(x)
        output = F.log_softmax(x, dim=1)
        return output

#### Train loop modified for one batch.

In [0]:
def train(model, device, data, target, optimizer, epoch, steps_per_epoch=20):
  # Switch model to training mode. This is necessary for layers like dropout, batchnorm etc which behave differently in training and evaluation mode
  model.train()
  train_total = 0
  train_correct = 0

  # Load the input features and labels from the training dataset
  data, target = data.to(device), target.to(device)
  
  # Reset the gradients to 0 for all learnable weight parameters
  optimizer.zero_grad()
  
  # Forward pass: Pass image data from training dataset, make predictions about class image belongs to (0-9 in this case)
  output = model(data)
  
  # Define our loss function, and compute the loss
  loss = F.nll_loss(output, target)

  scores, predictions = torch.max(output.data, 1)
  train_total += target.size(0)
  train_correct += int(sum(predictions == target))
          
  # Backward pass: compute the gradients of the loss w.r.t. the model's parameters
  loss.backward()
  
  # Update the neural network weights
  optimizer.step()

  acc = round((train_correct / train_total) * 100, 2)
  print('Epoch [{}], Loss: {}, Accuracy: {}, '.format(epoch, loss.item(), acc))
  wandb.log({'Train Loss': loss.item(), 'Train Accuracy': acc})


In [33]:
net = Net().to(device)
print(net)

optimizer = optim.Adam(net.parameters())

Net(
  (conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), bias=False)
  (conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), bias=False)
  (fc1): Linear(in_features=9216, out_features=128, bias=False)
  (fc2): Linear(in_features=128, out_features=10, bias=False)
)


In [34]:
wandb.init(project='overfitsmall')
wandb.watch(net, log='all')

for epoch in range(5):
  train(net, device, x_small, y_small, optimizer, epoch)

print('Finished Training')

Epoch [0], Loss: 2.295197010040283, Accuracy: 17.19, 
Epoch [1], Loss: 1.8784966468811035, Accuracy: 23.44, 
Epoch [2], Loss: 1.437056541442871, Accuracy: 54.69, 
Epoch [3], Loss: 0.9664705991744995, Accuracy: 89.06, 
Epoch [4], Loss: 0.6278926134109497, Accuracy: 92.19, 
Finished Training


> The model quickly fitted to the small data. Our model is good to go. We can now train on full dataset.

