# PyTorch and Deep Learning, without a PhD

In [5]:
import torch
import numpy as np

![](./images/bdd8f1f362889583.png)

#### Deep Learning Recipe
Training digits => updates to weights and biases => better recognition (loop)

![](./images/d5222c6e3d15770a.png)

## Softmax

![](./images/softmax.png)

## Weights and Biases

![](./images/weights.png)

## Mixing it all!

![](./images/softmax_weights.png)

# But how do we learn?

![](./images/cross_entropy.png)
![](./images/cross_entropy_vis.png)

# Wrap up

Training digits and labels => loss function => gradient (partial derivatives) => steepest descent => update weights and biases => repeat with next mini-batch of training images and labels

# More layers!!!

![](./images/go_deep.png)

![](./images/multi_layer.png)

![](./images/sigmoid.png)

# Special care for deep nets

![](./images/relu.png)
![](./images/relu_func.png)

# Initialisation!

In [None]:
def weight_init(model):
    for m in model.modules():
        if isinstance(m, torch.nn.Linear):
            m.bias.data.zero_()
            m.weight.data.normal_(0, 0.1)

weight_init(model) # initialize the weights

# Slow down man!
![](./images/lr_decay.png)

## Too fast
![](./images/bad.png)

In [7]:
def adjust_lr(optimizer, i):
    # learning rate decay
    max_learning_rate = 0.003
    min_learning_rate = 0.0001
    decay_speed = 2000.0 # 0.003-0.0001-2000=>0.9826 done in 5000 iterations
    lr = min_learning_rate + (max_learning_rate - min_learning_rate) * np.exp(-i/decay_speed)

    for param_group in optimizer.param_groups:
        param_group['lr'] = lr

![](./images/better.png)

# All convolutional

![](./images/all_convolutional.png)
![](./images/convolution.png)
![](./images/factor.png)

## Max Pool to save space

![](./images/max_pool.png)

## Put everything together

![](./images/conv_architecture.png)