Setup environment.

In [1]:
from __future__ import print_function

import torch as th
import numpy as np

## 1. Tensors

Tensors are like matricies and we can use them in GPU-accelerated computations.

This script creates an unitialized `5x3` tensor:

In [2]:
x = th.Tensor(5, 3)
s = x.size() # returns tuple torch.Size
print(x)
print(s)


 0.0000e+00  4.6566e-10  1.8563e+37
 2.8657e-42  5.6052e-45  0.0000e+00
 0.0000e+00  0.0000e+00  4.6127e-33
 1.4013e-45  2.8855e-33  1.4013e-45
 5.6052e-45  0.0000e+00  0.0000e+00
[torch.FloatTensor of size 5x3]

torch.Size([5, 3])


There are many other different ways to create tensors:

In [3]:
x1 = th.rand(5, 3)
x2 = th.randn(5, 3)
x3 = th.eye(5, 3)
x4 = th.zeros(5, 3)
x5 = th.ones(5, 3)

### Operations

Operations can be expressed in different ways. Let's consider addition as an example. The simplest form is to use `+` operator:

In [4]:
x = th.rand(2, 2)
y = th.rand(2, 2)
print(x + y)


 0.8858  0.7849
 1.5983  0.3860
[torch.FloatTensor of size 2x2]



There's analogous version using `add` function:

In [5]:
print(th.add(x, y))


 0.8858  0.7849
 1.5983  0.3860
[torch.FloatTensor of size 2x2]



The same `add` function can be used to output result into output variable:

In [6]:
result = th.Tensor(2, 2)
th.add(x, y, out=result)
print(result)


 0.8858  0.7849
 1.5983  0.3860
[torch.FloatTensor of size 2x2]



`add` can be also used as a method on tensor instances:

In [7]:
print(x.add(y))


 0.8858  0.7849
 1.5983  0.3860
[torch.FloatTensor of size 2x2]



And there's one more instance function, namely `add_`, which modifies operand in place:

In [8]:
x.add_(y) # this will modify x
print(x)


 0.8858  0.7849
 1.5983  0.3860
[torch.FloatTensor of size 2x2]



By convension used in PyTorch methods ending with `_` mutate the tensor.

### Indexing

PyTorch tensors indexing works much like in NumPy arrays.

In [9]:
print(x[:1])
print(x[:,:1])
# etc.


 0.8858  0.7849
[torch.FloatTensor of size 1x2]


 0.8858
 1.5983
[torch.FloatTensor of size 2x1]



### CUDA Tensors

If you have CUDA installed, you can move tensors into GPU:

In [10]:
if th.cuda.is_available():
    x = x.cuda()
    y = y.cuda()
    print(x + y)

## 2. Autograd

PyTorch `autograd` package can be used to automatically calculate derivatives of calculation graphs with respect to the "leaves" of this
graph.

In [11]:
from torch.autograd import Variable

Create a variable:

In [12]:
x = Variable(th.ones(2, 2), requires_grad=True)
print(x)
print("Is leaf?", x.is_leaf)
print("Grad function:", x.grad_fn)
print("Grad:", x.grad)

Variable containing:
 1  1
 1  1
[torch.FloatTensor of size 2x2]

Is leaf? True
Grad function: None
Grad: None


As you can see, it's a "leaf" node, but both `grad_fn` and `grad` properties are empty. `grad_fn` will be allways empty, but `grad` is populated once we use `backward` in computation graph.

Let' do some more calculations:

In [13]:
y = x + 2
z = y * y * 3
out = z.mean()
print(z)

Variable containing:
 27  27
 27  27
[torch.FloatTensor of size 2x2]



Results of the calculations are `Variable`s (non-leaf) themselves. We can "backpropagate" through the whole calculation:

In [14]:
out.backward()

In [15]:
x.grad

Variable containing:
 4.5000  4.5000
 4.5000  4.5000
[torch.FloatTensor of size 2x2]

Let's make sure the result is correct:

$
\frac{\partial o}{\partial x_i}=
\frac{1}{4}\frac{\partial 3(x_i + 2)^2}{\partial x_i}=
\frac{3}{2}(x_i + 2)=4.5
$

We can do much more complex things with Autograd:

In [16]:
x = th.randn(3)
x = Variable(x, requires_grad=True)
y = x * 2
while y.data.norm() < 1000:
    y = y * 2
    
print(y)

Variable containing:
-1311.6152
-1020.5250
 -280.0303
[torch.FloatTensor of size 3]



Autograd supports differentiation of scalar values. Because `y` is tensor, we should provide initial gradient values as well:

In [17]:
gradients = th.FloatTensor([0.1, 1.0, 0.0001])
y.backward(gradients)
print(x.grad)

Variable containing:
  102.4000
 1024.0000
    0.1024
[torch.FloatTensor of size 3]



## 3. Neural Networks

As an example of using `torch.nn`, let's create LeNet network:

In [129]:
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)
        
    def num_flat_features(self, x):
        size = x.size()[1:] # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features
        
    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

In [130]:
net = Net()
print(net)

Net (
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear (400 -> 120)
  (fc2): Linear (120 -> 84)
  (fc3): Linear (84 -> 10)
)


All the weights of this network are contained in `parameters`. Notice that each set of weights is followed by bias terms.

In [131]:
params = list(net.parameters())
for i in range(0, len(params)):
    print(i, params[i].size())

0 torch.Size([6, 1, 5, 5])
1 torch.Size([6])
2 torch.Size([16, 6, 5, 5])
3 torch.Size([16])
4 torch.Size([120, 400])
5 torch.Size([120])
6 torch.Size([84, 120])
7 torch.Size([84])
8 torch.Size([10, 84])
9 torch.Size([10])


You can also access parameters of individual layers:

In [132]:
conv1 = net.conv1
print(conv1.bias.size())
print(conv1.weight.size())

torch.Size([6])
torch.Size([6, 1, 5, 5])


Let's use some input and propagate it through our network:

In [133]:
input = Variable(th.randn(1, 1, 32, 32))
out = net(input)
print(out)

Variable containing:
-0.1425  0.0286  0.0422 -0.0031 -0.0008  0.0015 -0.0779 -0.0956 -0.0330 -0.0185
[torch.FloatTensor of size 1x10]



To calculate gradient we first reset all gradients and then perform backward pass:

In [134]:
net.zero_grad()
out.backward(th.ones(1, 10))

`params` object now contains gradients for the whole network:

In [135]:
params[1]

Parameter containing:
 0.0553
 0.0656
 0.0288
-0.0731
 0.1723
 0.0692
[torch.FloatTensor of size 6]

### Loss function

Loss function can be defined by taking output from network and comparing it to the target.

In [136]:
output = net(input)
target = Variable(th.arange(1, 11))
criterion = nn.MSELoss()
loss = criterion(output, target)
print(loss)

Variable containing:
 38.8556
[torch.FloatTensor of size 1]



In [137]:
loss.backward()

In [138]:
net.conv1.bias.grad

Variable containing:
1.00000e-02 *
  2.9592
 -3.6716
 -4.9503
 -3.3395
 -5.3026
 -0.7026
[torch.FloatTensor of size 6]

### How to train network

The simplest way to train the network is to calculate gradients for every network parameter and update it by using simple learning procedure:

In [145]:
learning_rate = 0.01
target = Variable(th.arange(1, 11))
criterion = nn.MSELoss()

for iter in range(0, 100):
    output = net(input)
    loss = criterion(output, target)
    net.zero_grad()
    loss.backward()
    for p in net.parameters():
        p.data.sub_(p.grad.data * learning_rate)
    if iter % 10 == 0:
        print(iter//10, loss.data[0])

0 1.2960269657469325e-07
1 8.108693805297662e-08
2 5.0439876275731876e-08
3 3.160358730269763e-08
4 1.9684899044136728e-08
5 1.2347288524949818e-08
6 7.851657990443073e-09
7 4.974533673873793e-09
8 2.8731732548692435e-09
9 1.8219793451734745e-09


### Using optim package

But more powerful way to train network is to use different optimizers from `torch.optim` package:

In [180]:
import torch.optim as optim

optimizer = optim.Adagrad(net.parameters())

In [194]:
# training loop
for i in range(0, 10):
    optimizer.zero_grad()
    output = net(input)
    loss = criterion(output, target)
    print(loss.data[0])
    loss.backward()
    optimizer.step()

2.421217004666687e-09
2.1802861738251522e-09
1.9369124082402323e-09
1.7485518588600257e-09
1.5261718555592552e-09
1.3743957083534042e-09
1.2064618193363685e-09
1.0899341429393417e-09
9.65225455118457e-10
8.638266768556946e-10
