# First tutorial on Pytorch

this tutorial teaches you how to **create/train/test(evaluate)** a Pytorch model

## step 1: create your pytorch model

In [2]:
import torch
from torch import nn

in the first step, you need to know how to create a Pytorch DL model(namely, **nn.Module**):

you just need simply create a class inheriant from the nn.Module.

This class must have defined: 
- self.loss
- self.optimizer
- self.forward()

like:

In [None]:
class NeuralNetwork(nn.Module):

    def __init__(self):
        super(NeuralNetwork, self).__init__()
        
        self.loss = None  # or self.criterion, this is a name which is more widely used.
        self.optimizer = None
        
    def forward(self, x):
        pass

1. loss is easy, you can just got one from the library. eg. nn.CrossEntropyLoss()
2. optimizer is also easy, just got on from the library. eg.torch.optim.SGD()
    but you need to input self.parameters manually:
    > self.optimizer = torch.optim.SGD(self.parameters(), lr=1e-3) m

In [None]:
class NeuralNetwork(nn.Module):

    def __init__(self):
        super(NeuralNetwork, self).__init__()
        
        self.loss = nn.CrossEntropyLoss()
        self.optimizer = torch.optim.SGD(self.parameters(), lr=1e-3)  # call by reference
        
    def forward(self, x):
        pass

3. lets introduce the .forward function now.

here basically you defined your model **architecture**:

In [None]:
def forward(self, x):
    y = nn.Sequential(
            nn.Flatten(),
            nn.Linear(28 * 28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
            nn.ReLU(),
            nn.functional.sigmoid, 
        )(x)
    return y

you can also defined you blocks in __init__(), or in some other methods, like:

In [1]:
import nn.functional as F

class NeuralNetwork(nn.Module):

    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28 * 28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
            nn.ReLU()
        )
        self.sigmoid = F.sigmoid

        # loss and optimizer
        self.loss = nn.BCELoss()
        self.optimizer = torch.optim.SGD(self.parameters(), lr=1e-3)  # call by reference

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        y = self.softmax(logits, dim=1)
        return y

NameError: name 'nn' is not defined

P.S. In pytorch, if you nn.CrossEntropyLoss() (or nn.BCEwithLogitsLoss()), no log_softmax() or nn.Sigmoid() is needed. (must **without**) 

## step 2: train it

Now you have your pytorch model, to train it you should 
define your **train_step()** first.

train_step() is how do you treat with your each batch data (x_batch, y_batch).
normally, you do calculate the gradients and feed it to the optimizer:

- 1) clear the gradient information.
- 2) calculate the gradient information and feed it to the parameters.
- 3) optimizer(which is connected to the parameters) updates the parameters based on gradient information.

In [None]:
def train_step(self, x, y):
    
    # 1) clear previous gradient:
    self.optimizer.zero_grad()
    
    # 2) calculate the gradient:
    y_pred = self(x)
    loss = self.loss(y_pred, y)
    loss.backward()
    # regularly (y_pred, y) are also called: (output, target)
    
    # 3) update the paramters:
    self.optimizer.step()
    
    return self.loss.item()

or you could split the logic into following two APIs:
- step_function()
- backprop()

In [None]:
def step_function(self, x, y):
    output = self(x)
    loss = self.loss(output, y)
    return output, loss

def backprop(self):
    self.optimizer.zero_grad()
    self.loss.backward()
    self.optimizer.step()


def train_step(self, x, y):
    output, loss = self.step_function(x, y)
    self.backprop()
    return loss.item()



with this helper function, we can train the model:

In [None]:
# train Pytorch model
model = NeuralNetwork()    

# first of all, change to train mode
model.train()

for t in range(epochs):
    for x, y in tqdm(datagen, total=len(datagen)):
        model.train_step(x, y)

## step3: evaluate it

to evaluate the model, you need:

- switch to evaluate mode
- turn off gradient calculation
- use **.item()** to get the scalars instead of the calculatable variables

In [None]:
model.eval()  # switch to evaluation mode

test_loss, correct = 0, 0
with torch.no_grad():
    for X, y in datagen:
        y_pred = self(X)
        test_loss += model.loss(y_pred, y).item()
        correct += (y_pred.argmax(1) == y).type(torch.float).sum().item()

size = len(datagen.dataset)
test_loss_avg = test_loss / size  # avg_loss
acc = correct / size  # accuracy

use model is simple: you just employ y = model(x).

however to accelate it, you are highly suggested to switch to eval mode and turn off gradient calculation as well.

In [None]:
model.eval()

with torch.no_grad():
    y_pred = model(x)

return y_pred

# from .py

In [1]:
%%capture
from tqdm import tqdm

import torch
import torch.nn as nn
import torch.nn.functional as F

from material.data import mnist_datagen

from i_nn import *


In [2]:
train_gen, test_gen = fmnist_datagen()
for x, y in test_gen:
    x_sample, y_sample = x[0], y[0]
    break

Shape of X [N, C, H, W]:  torch.Size([64, 1, 28, 28])
Shape of y:  torch.Size([64]) torch.int64 tensor([9, 2, 1, 1, 6, 1, 4, 6, 5, 7])


In [3]:
net = NeuralNetwork()

In [4]:
net.train()
for batch_idx, (x, y) in tqdm(enumerate(train_gen), total=len(train_gen)):
    net.optimizer.zero_grad()
    y_pred = net(x)
    print(y.shape)
    print(y_pred.shape)
    net.loss(y_pred, y).backward()
    net.optimizer.step()
    break

  0%|          | 0/938 [00:00<?, ?it/s]

torch.Size([64])
torch.Size([64, 10])


  0%|          | 0/938 [00:00<?, ?it/s]


In [5]:
model = PytorchNN()

In [7]:
model.fit_datagen(train_gen, epochs=3)
model.eval_datagen(test_gen)

  1%|▏         | 12/938 [00:00<00:07, 116.98it/s]

Epoch 1
-------------------------------


100%|██████████| 938/938 [00:06<00:00, 136.80it/s]
  1%|▏         | 14/938 [00:00<00:06, 132.70it/s]

Test Error: 
 Accuracy: 62.1%, Avg loss: 0.016695 

Epoch 2
-------------------------------


100%|██████████| 938/938 [00:06<00:00, 138.21it/s]
  1%|▏         | 14/938 [00:00<00:06, 132.45it/s]

Test Error: 
 Accuracy: 63.4%, Avg loss: 0.016114 

Epoch 3
-------------------------------


100%|██████████| 938/938 [00:06<00:00, 137.14it/s]


Test Error: 
 Accuracy: 64.4%, Avg loss: 0.015623 

Done!
Test Error: 
 Accuracy: 63.2%, Avg loss: 0.015886 



In [8]:
# use
pred = model.predict(x_sample)
print('Predicted: {}, Actual: {}'.format(
    fmnist_classes[pred[0].argmax(0)], fmnist_classes[y_sample]
))


Predicted: Ankle boot, Actual: Ankle boot
