# PyTorch basics

Working through the tutorial available at: http://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html.

In [1]:
import torch

#### Initialising matrices

In [2]:
x = torch.Tensor(5, 3) # uninitialised matrix
print(x)


1.00000e-04 *
 -2.4954  0.0000 -2.4951
  0.0000 -2.4952  0.0000
 -2.4952  0.0000 -2.4952
  0.0000  0.0000  0.0000
 -0.1522  0.0000  8.4913
[torch.FloatTensor of size 5x3]



In [3]:
x = torch.rand(5, 3) # randomly initialised matrix
print(x)


 0.4607  0.8523  0.3392
 0.6836  0.8309  0.3948
 0.3958  0.3686  0.0884
 0.9978  0.9993  0.1058
 0.7097  0.9369  0.2855
[torch.FloatTensor of size 5x3]



In [5]:
x.size() # returns an object that supports all tuple operations

torch.Size([5, 3])

#### Operation syntax

In [8]:
y = torch.rand(5, 3)

In [10]:
y + x


 0.9680  1.4975  0.5266
 1.0555  0.8871  0.7507
 1.3617  0.9848  0.0913
 1.1082  1.0515  1.0949
 1.5429  0.9503  0.7589
[torch.FloatTensor of size 5x3]

In [11]:
torch.add(x, y)


 0.9680  1.4975  0.5266
 1.0555  0.8871  0.7507
 1.3617  0.9848  0.0913
 1.1082  1.0515  1.0949
 1.5429  0.9503  0.7589
[torch.FloatTensor of size 5x3]

In [12]:
result = torch.Tensor(5, 3)
torch.add(x, y, out=result) # output directly into another tensor


 0.9680  1.4975  0.5266
 1.0555  0.8871  0.7507
 1.3617  0.9848  0.0913
 1.1082  1.0515  1.0949
 1.5429  0.9503  0.7589
[torch.FloatTensor of size 5x3]

In [13]:
y.add_(x) # inplace addition


 0.9680  1.4975  0.5266
 1.0555  0.8871  0.7507
 1.3617  0.9848  0.0913
 1.1082  1.0515  1.0949
 1.5429  0.9503  0.7589
[torch.FloatTensor of size 5x3]

In [14]:
y


 0.9680  1.4975  0.5266
 1.0555  0.8871  0.7507
 1.3617  0.9848  0.0913
 1.1082  1.0515  1.0949
 1.5429  0.9503  0.7589
[torch.FloatTensor of size 5x3]

Any operation that mutates a tensor inplace is posted with an _ 

Torch tensors come with all the standard numpy indexing

In [15]:
y[1,1]

0.8871132135391235

In [16]:
y[1:,1:]


 0.8871  0.7507
 0.9848  0.0913
 1.0515  1.0949
 0.9503  0.7589
[torch.FloatTensor of size 4x2]

In [17]:
x = torch.randn(4, 4)
y = x.view(16) # view is the reshaping operation
z = x.view(-1, 8)  # the size -1 is inferred from other dimensions
print(x.size(), y.size(), z.size())

torch.Size([4, 4]) torch.Size([16]) torch.Size([2, 8])


Tensor operation docs: http://pytorch.org/docs/master/torch.html

#### Numpy-Torch Bridge

The Torch Tensor and NumPy array will share their underlying memory locations, and changing one will change the other.

In [22]:
a = torch.ones(5)
a


 1
 1
 1
 1
 1
[torch.FloatTensor of size 5]

In [23]:
b = a.numpy()
b

array([ 1.,  1.,  1.,  1.,  1.], dtype=float32)

In [24]:
a.add_(1)
print(a)
print(b)


 2
 2
 2
 2
 2
[torch.FloatTensor of size 5]

[ 2.  2.  2.  2.  2.]


In [25]:
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)
print(b)

[ 2.  2.  2.  2.  2.]

 2
 2
 2
 2
 2
[torch.DoubleTensor of size 5]



All the Tensors on the CPU except a CharTensor support converting to NumPy and back.

#### Cuda tensors

Tensors can be moved onto GPU using the .cuda method.

In [29]:
torch.cuda.is_available()

True

In [30]:
x = torch.rand(5, 3)
y = torch.rand(5, 3)

In [31]:
x = x.cuda()
y = y.cuda()

x+y


 0.7733  0.0530  1.0625
 1.9898  0.4817  1.3803
 0.6299  1.2278  1.1665
 1.2909  0.3667  0.5617
 1.7041  0.8302  0.9406
[torch.cuda.FloatTensor of size 5x3 (GPU 0)]

#### Torch variables

In [32]:
from torch.autograd import Variable

autograd.Variable is the central class of the package. It wraps a Tensor, and supports nearly all of operations defined on it. Once you finish your computation you can call .backward() and have all the gradients computed automatically.

In [33]:
x = Variable(torch.ones(2, 2), requires_grad=True)
print(x)

Variable containing:
 1  1
 1  1
[torch.FloatTensor of size 2x2]



You can access the raw tensor through the .data attribute, while the gradient w.r.t. this variable is accumulated into .grad.

In [36]:
x.data


 1  1
 1  1
[torch.FloatTensor of size 2x2]

In [34]:
y = x + 2
y

Variable containing:
 3  3
 3  3
[torch.FloatTensor of size 2x2]

There’s one more class which is very important for autograd implementation - a Function.

Variable and Function are interconnected and build up an acyclic graph, that encodes a complete history of computation. Each variable has a .grad_fn attribute that references a Function that has created the Variable (except for Variables created by the user - their grad_fn is None).

In [35]:
y.grad_fn

<AddBackward0 at 0x232b988bb38>

In [38]:
x.grad_fn # returns None because defined by user

If you want to compute the derivatives, you can call .backward() on a Variable. If Variable is a scalar you don’t need to specify any arguments to backward(), however if it has more elements, you need to specify a gradient argument that is a tensor of matching shape.

In [39]:
z = y * y * 3
out = z.mean()

print(z, out)

Variable containing:
 27  27
 27  27
[torch.FloatTensor of size 2x2]
 Variable containing:
 27
[torch.FloatTensor of size 1]



out.backward() is equivalent to doing out.backward(torch.Tensor([1.0]))

In [41]:
out.backward()

In [42]:
x.grad

Variable containing:
 4.5000  4.5000
 4.5000  4.5000
[torch.FloatTensor of size 2x2]

Function and Variable documentation: http://pytorch.org/docs/master/autograd.html

#### Creating a neural network 

Neural networks can be constructed using the torch.nn package.

nn depends on autograd to define models and differentiate them. An nn.Module contains layers, and a method forward(input)that returns the output.

Creating a convnet: 

In [45]:
import torch.nn as nn
import torch.nn.functional as F

In [46]:
class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution kernel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features


net = Net()
print(net)

Net(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)


In [47]:
params = list(net.parameters())
print(len(params))
print(params[0].size())

10
torch.Size([6, 1, 5, 5])


In [48]:
params

[Parameter containing:
 (0 ,0 ,.,.) = 
  -0.0783 -0.1181  0.0529  0.1727 -0.0359
   0.1588 -0.0661  0.1984  0.1448 -0.1018
   0.0077  0.1965 -0.0743  0.1693  0.0878
  -0.0440  0.0827  0.0833  0.1307  0.1294
   0.0388 -0.0748  0.0360  0.0542  0.1102
 
 (1 ,0 ,.,.) = 
  -0.0529  0.0670 -0.1865  0.0160  0.1115
  -0.0064  0.1919  0.0058  0.1634 -0.1577
  -0.1486  0.0105  0.1566 -0.0309 -0.1847
  -0.1655  0.1145  0.0934 -0.1488  0.0590
  -0.0228  0.0251 -0.0228 -0.0406  0.1407
 
 (2 ,0 ,.,.) = 
   0.0794 -0.1414 -0.0676 -0.0095  0.0038
  -0.0505 -0.0623 -0.1938 -0.1157  0.0462
  -0.0177  0.0431 -0.1573 -0.1721  0.0525
  -0.0614  0.1292  0.0296 -0.1541 -0.1539
  -0.0400 -0.0323 -0.0706 -0.0742  0.1712
 
 (3 ,0 ,.,.) = 
   0.0782 -0.1220  0.1012  0.0810  0.1773
   0.1694  0.0641 -0.0329 -0.1339 -0.1371
  -0.0062 -0.1343  0.0608 -0.1506  0.0833
   0.0882  0.0891 -0.0692  0.1241  0.0836
  -0.0858 -0.0642 -0.0141  0.1131 -0.0705
 
 (4 ,0 ,.,.) = 
  -0.0152 -0.0155 -0.0272 -0.1815 -0.0815
  -0.00

torch.nn only supports mini-batches. The entire torch.nn package only supports inputs that are a mini-batch of samples, and not a single sample.

For example, nn.Conv2d will take in a 4D Tensor of nSamples x nChannels x Height x Width.  

If you have a single sample, just use input.unsqueeze(0) to add a fake batch dimension.

In [49]:
input = Variable(torch.randn(1, 1, 32, 32))
out = net(input)
print(out)

Variable containing:
1.00000e-02 *
 -3.6797  2.4595  0.1389  3.6198  7.1146  3.3592  2.9009 -6.2261 -8.4600 -2.8673
[torch.FloatTensor of size 1x10]



In [50]:
net.zero_grad()
out.backward(torch.randn(1, 10))

#### Defining the loss function

In [51]:
output = net(input)
target = Variable(torch.arange(1, 11))  # a dummy target, for example
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

Variable containing:
 38.6271
[torch.FloatTensor of size 1]



In [56]:
loss.grad_fn

<MseLossBackward at 0x232b9874ef0>

If you follow loss in the backward direction, using its .grad_fn attribute, you will see a graph of computations that looks like this:

input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
      -> view -> linear -> relu -> linear -> relu -> linear
      -> MSELoss
      -> loss

when we call loss.backward(), the whole graph is differentiated w.r.t. the loss, and all Variables in the graph will have their .grad Variable accumulated with the gradient.

#### Backprop

To backpropagate the error all we have to do is to loss.backward(). You need to clear the existing gradients though, else gradients will be accumulated to existing gradients.

In [57]:
net.zero_grad()     # zeroes the gradient buffers of all parameters

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

conv1.bias.grad before backward
Variable containing:
 0
 0
 0
 0
 0
 0
[torch.FloatTensor of size 6]

conv1.bias.grad after backward
Variable containing:
 0.1002
-0.0215
 0.0067
 0.0306
 0.0532
 0.0024
[torch.FloatTensor of size 6]



nn module documentation: http://pytorch.org/docs/master/nn.html

updating the weights:

In [60]:
learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate) # Basic stochastic gradient descent

Torch comes with handy optimisers with implementations of Adam etc:

In [59]:
import torch.optim as optim

In [61]:
optimizer = optim.SGD(net.parameters(), lr=0.01)

In [62]:
# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update