# Building and training your first Network with PyTorch

### hello nn.module world

<div class="alert alert-block alert-success">
<b>__init__</b> is a reseved method in python classes. It is known as a constructor in object oriented concepts.  <br>
    This method called when an object is created from the class and it allow the class to initialize the attributes of a class. <br>
    We will define how our network is created in this method!
</div>

In [6]:
%%writefile mycoolnetwork.py
import torch
import torch.nn as nn
import torch.nn.functional as F

class MyCoolNetwork(nn.Module):

    def __init__(self):
        super(MyCoolNetwork, self).__init__()
        # 1 input image channel, 6 output channels, 3x3 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

Overwriting mycoolnetwork.py


In [7]:
from mycoolnetwork import MyCoolNetwork

net = MyCoolNetwork()
print(net)

MyCoolNetwork(
  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=576, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


In [13]:
input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)

tensor([[ 0.0086,  0.0913,  0.1293,  0.0327, -0.0482,  0.0070,  0.0132, -0.0926,
         -0.1540,  0.1185]], grad_fn=<AddmmBackward>)


<div class="alert alert-block alert-success">
<b>PyTorch has your back!</b> You just have to define the <b>forward function</b>, and the backward function (where gradients are computed) is automatically defined for you using <b>autograd</b>. You can use any of the Tensor operations in the forward function.
</div>


### Where are the weights?
The weights or learnable parameters of a model are returned by net.parameters() or for single layers mylayer.parameters()

In [11]:
params = list(net.parameters())
print(len(params))
print(params[0].size()) 
print(list(net.fc2.parameters()))

10
torch.Size([6, 1, 3, 3])
[Parameter containing:
tensor([[ 0.0537,  0.0095,  0.0531,  ...,  0.0594,  0.0349, -0.0253],
        [-0.0561,  0.0631, -0.0290,  ..., -0.0359,  0.0074,  0.0511],
        [-0.0383, -0.0544, -0.0094,  ..., -0.0169, -0.0293,  0.0564],
        ...,
        [ 0.0787, -0.0835, -0.0592,  ...,  0.0427, -0.0752, -0.0084],
        [ 0.0024, -0.0393, -0.0572,  ..., -0.0271, -0.0412,  0.0745],
        [-0.0036, -0.0815,  0.0035,  ...,  0.0289, -0.0773, -0.0057]],
       requires_grad=True), Parameter containing:
tensor([-0.0902, -0.0867, -0.0007, -0.0181, -0.0582,  0.0492, -0.0409,  0.0528,
        -0.0617,  0.0339,  0.0341,  0.0260,  0.0502,  0.0836,  0.0894,  0.0681,
        -0.0774,  0.0040,  0.0683, -0.0839, -0.0050,  0.0124, -0.0876,  0.0096,
        -0.0811, -0.0563,  0.0619,  0.0429,  0.0851,  0.0669, -0.0385, -0.0299,
         0.0205,  0.0350, -0.0760, -0.0733,  0.0163, -0.0456,  0.0054,  0.0879,
         0.0712,  0.0660, -0.0361, -0.0025, -0.0414, -0.0635, -0.

<div class="alert alert-block alert-info">
<b>Exercise:</b> Make a network that combines an input x with another input z through separate fully-connected layers.</div>

## Writing a supervised learning training loop

In [None]:
# how do we fit this thing?


### Loss Functions
A loss function takes the (output, target) pair of inputs, and computes a value that estimates how far away the output is from the target.
A simple loss is: nn.MSELoss which computes the mean-squared error between the input and the target.

In [15]:
# There are several different loss functions under the nn package 
[x for x in dir(nn) if x.endswith('Loss')]

['AdaptiveLogSoftmaxWithLoss',
 'BCELoss',
 'BCEWithLogitsLoss',
 'CTCLoss',
 'CosineEmbeddingLoss',
 'CrossEntropyLoss',
 'HingeEmbeddingLoss',
 'KLDivLoss',
 'L1Loss',
 'MSELoss',
 'MarginRankingLoss',
 'MultiLabelMarginLoss',
 'MultiLabelSoftMarginLoss',
 'MultiMarginLoss',
 'NLLLoss',
 'PoissonNLLLoss',
 'SmoothL1Loss',
 'SoftMarginLoss',
 'TripletMarginLoss']

In [22]:
output = net(input)
target = torch.randn(10)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

tensor(0.5063, grad_fn=<MseLossBackward>)


### Backprop
To backpropagate the error all we have to do is to loss.backward().
You need to clear the existing gradients though, else gradients will be accumulated to existing gradients.

Now we shall call loss.backward(), and have a look at conv1’s bias gradients before and after the backward.

In [23]:
print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

conv1.bias.grad before backward
tensor([ 0.0006, -0.0211, -0.0094, -0.0042,  0.0065,  0.0231])
conv1.bias.grad after backward
tensor([-0.0029, -0.0203,  0.0004, -0.0038,  0.0192,  0.0168])


### Updating the weigths
To take this search for the minimum error to practice we use **Stochastic Gradient Descent**, it consists of showing the input vectors of a subset of training data, compute the outputs, their errors, calculate the gradient for those examples and adjust the weights accordingly. This process is repeated over several subsets of examples until the objective function average stops decreasing.

![alt text](https://github.com/celiacintas/star_wars_hackathon/raw/8d46effee4e4a82429eb989f017a31c03a6bc2fd/images/saddle_point_evaluation_optimizers.gif)

 Check out  http://sebastianruder.com/optimizing-gradient-descent/

We can do it with simple Python code ...

In [25]:
learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)


<div class="alert alert-block alert-success">
<b>PyTorch has your back!</b> However, as you use neural networks, you want to use various different update rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc. To enable this, we built a small package: <b>torch.optim</b> that implements all these methods. 
</div>



In [26]:
from torch import optim
[x for x in dir(optim) if '__' not in x]

['ASGD',
 'Adadelta',
 'Adagrad',
 'Adam',
 'AdamW',
 'Adamax',
 'LBFGS',
 'Optimizer',
 'RMSprop',
 'Rprop',
 'SGD',
 'SparseAdam',
 'lr_scheduler']

Or ... with Pytorch

In [27]:
# create your optimizer with Pytorch
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update

We need to add one more Exercise here

## References & Interesting links
- [PyTorch Tutorials](https://pytorch.org/tutorials/)
- [NIPS 2016 Tutorial: Generative Adversarial Networks](https://arxiv.org/abs/1701.00160)