WHAT IS PYTORCH?             
It’s a Python-based scientific computing package targeted at two sets of audiences:

A replacement for NumPy to use the power of GPUs                   
a deep learning research platform that provides maximum flexibility and speed

In [14]:
from __future__ import print_function
import torch

An uninitialized matrix is declared, but does not contain definite known values before it is used. When an uninitialized matrix is created, whatever values were in the allocated memory at the time will appear as the initial values.

In [22]:
#Construct a 5x3 matrix, uninitialized:

x = torch.empty(5, 3)
print(x)

tensor([[9.2755e-39, 1.0561e-38, 6.3368e-39],
        [9.5511e-39, 8.9082e-39, 8.4490e-39],
        [9.6428e-39, 1.1112e-38, 9.5511e-39],
        [1.0102e-38, 1.0286e-38, 1.0194e-38],
        [9.6429e-39, 9.2755e-39, 9.1837e-39]])


In [24]:
#Construct a randomly initialized matrix:

x = torch.rand(5, 3)
print(x)

tensor([[0.1387, 0.0616, 0.4566],
        [0.2519, 0.7142, 0.9082],
        [0.2300, 0.0722, 0.7007],
        [0.6438, 0.7686, 0.7629],
        [0.3744, 0.5876, 0.5389]])


In [25]:
#Construct a matrix filled zeros and of dtype long:

x = torch.zeros(5, 3, dtype=torch.long)
print(x)

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])


In [26]:
#Construct a tensor directly from data:

x = torch.tensor([5.5, 3])
print(x)

tensor([5.5000, 3.0000])


In [27]:
#or create a tensor based on an existing tensor. These methods will reuse properties of the input tensor, e.g. dtype, unless new values are provided by user

x = x.new_ones(5, 3, dtype=torch.double)      # new_* methods take in sizes
print(x)

x = torch.randn_like(x, dtype=torch.float)    # override dtype!
print(x)                                      # result has the same size

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]], dtype=torch.float64)
tensor([[ 3.0182e-01, -1.1315e+00,  2.2732e-01],
        [-5.0184e-01,  5.2509e-01,  4.0408e-01],
        [-1.7382e-01,  1.7076e-01,  2.3399e+00],
        [ 1.5807e-03,  1.0610e+00,  1.2487e+00],
        [ 1.1443e+00,  8.6503e-01, -9.8409e-01]])


In [28]:
#Get its size: torch.Size is in fact a tuple, so it supports all tuple operations.

print(x.size())

torch.Size([5, 3])


In [29]:
#Operations
#There are multiple syntaxes for operations. In the following example, we will take a look at the addition operation.

#Addition: syntax 1

y = torch.rand(5, 3)
print(x + y)

tensor([[ 0.3537, -0.4866,  0.9244],
        [-0.0338,  0.5402,  1.0974],
        [-0.0682,  0.4901,  2.5330],
        [ 0.5077,  1.7489,  2.0591],
        [ 1.5089,  1.0169, -0.0398]])


In [30]:
#Addition: syntax 2

print(torch.add(x, y))

tensor([[ 0.3537, -0.4866,  0.9244],
        [-0.0338,  0.5402,  1.0974],
        [-0.0682,  0.4901,  2.5330],
        [ 0.5077,  1.7489,  2.0591],
        [ 1.5089,  1.0169, -0.0398]])


In [31]:
# Addition: providing an output tensor as argument

result = torch.empty(5, 3)
torch.add(x, y, out=result)
print(result)

tensor([[ 0.3537, -0.4866,  0.9244],
        [-0.0338,  0.5402,  1.0974],
        [-0.0682,  0.4901,  2.5330],
        [ 0.5077,  1.7489,  2.0591],
        [ 1.5089,  1.0169, -0.0398]])


In [33]:
# Addition: in-place
# adds x to y
y.add_(x)
print(y)

tensor([[ 0.3537, -0.4866,  0.9244],
        [-0.0338,  0.5402,  1.0974],
        [-0.0682,  0.4901,  2.5330],
        [ 0.5077,  1.7489,  2.0591],
        [ 1.5089,  1.0169, -0.0398]])


In [34]:
#You can use standard NumPy-like indexing with all bells and whistles!

print(x[:, 1])

tensor([-1.1315,  0.5251,  0.1708,  1.0610,  0.8650])


In [35]:
# Resizing: If you want to resize/reshape tensor, you can use torch.view:

x = torch.randn(4, 4)
y = x.view(16)
z = x.view(-1, 8)  # the size -1 is inferred from other dimensions
print(x.size(), y.size(), z.size())

torch.Size([4, 4]) torch.Size([16]) torch.Size([2, 8])


In [36]:
#Converting NumPy Array to Torch Tensor
#See how changing the np array changed the Torch Tensor automatically

import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)
print(b)

[2. 2. 2. 2. 2.]
tensor([2., 2., 2., 2., 2.], dtype=torch.float64)


NEURAL NETWORKS             
Neural networks can be constructed using the torch.nn package.

Now that you had a glimpse of autograd, nn depends on autograd to define models and differentiate them. An nn.Module contains layers, and a method forward(input)that returns the output.

For example, look at this network that classifies digit images:

convnet

It is a simple feed-forward network. It takes the input, feeds it through several layers one after the other, and then finally gives the output.

A typical training procedure for a neural network is as follows:

Define the neural network that has some learnable parameters (or weights)
Iterate over a dataset of inputs
Process input through the network
Compute the loss (how far is the output from being correct)
Propagate gradients back into the network’s parameters
Update the weights of the network, typically using a simple update rule: weight = weight - learning_rate * gradient
Define the network         
Let’s define this network:             

In [39]:
import torch
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 3x3 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


net = Net()
print(net)

Net(
  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
  (fc1): Linear(in_features=576, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


In [40]:
#You just have to define the forward function, and the backward function (where gradients are computed) is
#automatically defined for you using autograd. You can use any of the Tensor operations in the forward function.

#The learnable parameters of a model are returned by net.parameters()

params = list(net.parameters())
print(len(params))
print(params[0].size())  # conv1's .weight

10
torch.Size([6, 1, 3, 3])


In [41]:
# Let try a random 32x32 input. Note: expected input size of this net (LeNet) is 32x32. 
#To use this net on MNIST dataset, please resize the images from the dataset to 32x32.

input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)

tensor([[-0.1232,  0.0129,  0.0108, -0.0157, -0.0834,  0.0720,  0.0276,  0.0119,
          0.0313,  0.0198]], grad_fn=<AddmmBackward>)


In [42]:
# Zero the gradient buffers of all parameters and backprops with random gradients:

net.zero_grad()
out.backward(torch.randn(1, 10))

Loss Function         
A loss function takes the (output, target) pair of inputs, and computes a value that estimates how far away the output is from the target.

There are several different loss functions under the nn package . A simple loss is: nn.MSELoss which computes the mean-squared error between the input and the target.

For example: 

In [43]:
output = net(input)
target = torch.randn(10)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

tensor(0.6996, grad_fn=<MseLossBackward>)


In [44]:
#So, when we call loss.backward(), the whole graph is differentiated w.r.t. the loss, and all Tensors
# in the graph that has requires_grad=True will have their .grad Tensor accumulated with the gradient.

#For illustration, let us follow a few steps backward:

print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0][0])  # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU


<MseLossBackward object at 0x000002150274E6D8>
<AddmmBackward object at 0x000002150274E6D8>
<AccumulateGrad object at 0x0000021502736978>


Backprop     
To backpropagate the error all we have to do is to loss.backward(). You need to clear the existing gradients though, else gradients will be accumulated to existing gradients.

Now we shall call loss.backward(), and have a look at conv1’s bias gradients before and after the backward.

In [45]:
net.zero_grad()     # zeroes the gradient buffers of all parameters

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

conv1.bias.grad before backward
tensor([0., 0., 0., 0., 0., 0.])
conv1.bias.grad after backward
tensor([ 7.3512e-03,  8.7762e-03,  1.5997e-05, -1.3845e-03, -5.6112e-03,
        -2.8729e-03])
