# Pytorch Tutorial

Pytorch is a python framework for machine learning

- GPU-accelerated computations
- automatic differentiation
- modules for neural networks

This tutorial will teach you the fundamentals of operating on pytorch tensors. For a worked example of how to build and train a pytorch network, see `pytorch-example.py`.

For additional tutorials, see http://pytorch.org/tutorials/

In [26]:
import torch
import numpy as np
from torch.autograd import Variable

## Tensors

Tensors are the fundamental object for array data. The most common types you will use are `IntTensor` and `FloatTensor`.

In [27]:
# Create uninitialized tensor
x = torch.FloatTensor(2,3)
print(x)
# Initialize to zeros
x.zero_()
print(x)


 1.4406e+06  4.5734e-41 -6.3731e-27
 4.5733e-41  1.3011e-37  0.0000e+00
[torch.FloatTensor of size 2x3]


 0  0  0
 0  0  0
[torch.FloatTensor of size 2x3]



In [28]:
# Create from numpy array (seed for repeatability)
np.random.seed(123)
np_array = np.random.random((2,3))
print(torch.FloatTensor(np_array))
print(torch.from_numpy(np_array))


 0.6965  0.2861  0.2269
 0.5513  0.7195  0.4231
[torch.FloatTensor of size 2x3]


 0.6965  0.2861  0.2269
 0.5513  0.7195  0.4231
[torch.DoubleTensor of size 2x3]



In [29]:
# Create random tensor (seed for repeatability)
torch.manual_seed(123)
x=torch.randn(2,3)
print(x)


-0.5214 -1.4914 -0.2381
 1.0306  0.2221  1.5162
[torch.FloatTensor of size 2x3]



In [30]:
# special tensors (see documentation)
print(torch.eye(3))
print(torch.ones(2,3))
print(torch.zeros(2,3))
print(torch.arange(0,3))


 1  0  0
 0  1  0
 0  0  1
[torch.FloatTensor of size 3x3]


 1  1  1
 1  1  1
[torch.FloatTensor of size 2x3]


 0  0  0
 0  0  0
[torch.FloatTensor of size 2x3]


 0
 1
 2
[torch.FloatTensor of size 3]



All tensors have a `size` and `type`

In [31]:
x=torch.FloatTensor(3,4)
print(x.size())
print(x.type())

torch.Size([3, 4])
torch.FloatTensor


## CPU and GPU

Tensors can be copied between CPU and GPU. It is important that everything involved in a calculation is on the same device. 

This portion of the tutorial may not work for you if you do not have a GPU available.

In [32]:
# create a tensor
x = torch.rand(3,2)
print(x)
# copy to GPU
y = x.cuda()
print(y)
# copy back to CPU
z = y.cpu()
print(z)
# get CPU tensor as numpy array
print(z.numpy())
# cannot get GPU tensor as numpy array directly
try:
  y.numpy()
except RuntimeError as e:
  print(e)


 0.5513  0.7192
 0.7195  0.4911
 0.4231  0.7800
[torch.FloatTensor of size 3x2]


 0.5513  0.7192
 0.7195  0.4911
 0.4231  0.7800
[torch.cuda.FloatTensor of size 3x2 (GPU 0)]


 0.5513  0.7192
 0.7195  0.4911
 0.4231  0.7800
[torch.FloatTensor of size 3x2]

[[ 0.55131477  0.7191503 ]
 [ 0.71946895  0.49111894]
 [ 0.42310646  0.78002775]]
can't convert CUDA tensor to numpy (it doesn't support GPU arrays). Use .cpu() to move the tensor to host memory first.


Operations between GPU and CPU tensors will fail. Operations require all arguments to be on the same device.

In [33]:
x = torch.rand(3,5)  # CPU tensor
y = torch.rand(5,4).cuda()  # GPU tensor
try:
  torch.mm(x,y)  # Operation between CPU and GPU fails
except TypeError as e:
  print(e)

torch.mm received an invalid combination of arguments - got (torch.FloatTensor, torch.cuda.FloatTensor), but expected one of:
 * (torch.FloatTensor source, torch.FloatTensor mat2)
      didn't match because some of the arguments have invalid types: ([32;1mtorch.FloatTensor[0m, [31;1mtorch.cuda.FloatTensor[0m)
 * (torch.SparseFloatTensor source, torch.FloatTensor mat2)
      didn't match because some of the arguments have invalid types: ([31;1mtorch.FloatTensor[0m, [31;1mtorch.cuda.FloatTensor[0m)



Typical code should include `if` statements or utilize helper functions so it can operate with or without the GPU.

In [34]:
# Put tensor on CUDA if available
x = torch.rand(3,2)
if torch.cuda.is_available():
  x = x.cuda()

# Do some calculations
y = x ** 2 

# Copy to CPU if on GPU
if y.is_cuda:
  y = y.cpu()

A convenient method is `new`, which creates a new tensor on the same device as another tensor. It should be used for creating tensors whenever possible.

In [35]:
x1 = torch.rand(3,2)
x2 = x1.new(1,2)  # create cpu tensor
print(x2)
x1 = torch.rand(3,2).cuda()
x2 = x1.new(1,2)  # create cuda tensor
print(x2)


 1.4406e+06  4.5734e-41
[torch.FloatTensor of size 1x2]


 0.1280  0.5219
[torch.cuda.FloatTensor of size 1x2 (GPU 0)]



Calculations executed on the GPU can be many times faster than numpy. However, numpy is still optimized for the CPU and many times faster than python `for` loops. Numpy calculations may be faster than GPU calculations for small arrays due to the cost of interfacing with the GPU.

In [36]:
from timeit import timeit
# Create random data
x = torch.rand(1000,64)
y = torch.rand(64,32)
number = 10000  # number of iterations

def square():
  z=torch.mm(x, y) # dot product (mm=matrix multiplication)

# Time CPU
print('CPU: {}ms'.format(timeit(square, number=number)*1000))
# Time GPU
x, y = x.cuda(), y.cuda()
print('GPU: {}ms'.format(timeit(square, number=number)*1000))

CPU: 185.08557113818824ms
GPU: 52.751455921679735ms


## Math, Linear Algebra, and Indexing

Pytorch math and linear algebra is similar to numpy. Operators are overridden so you can use standard math operators (`+`,`-`, etc.) and expect a tensor as a result. See pytorch documentation for a complete list of available functions.

In [37]:
x = torch.arange(0,5)
print(torch.sum(x))
print(torch.sum(torch.exp(x)))
print(torch.mean(x))

10.0
85.79102325439453
2.0


Pytorch indexing is similar to numpy indexing. See pytorch documentation for details.

In [38]:
x = torch.rand(3,2)
print(x)
print(x[1,:])


 0.7526  0.5557
 0.6445  0.7588
 0.4765  0.2728
[torch.FloatTensor of size 3x2]


 0.6445
 0.7588
[torch.FloatTensor of size 2]



## Variables and Differentiation

Variables are used similarly to tensors but actually wrap tensors and provide automatic differentiation.

- Variables you are differentiating with respect to must have `requires_grad=True`
- Call `.backward()` on variables you are differentiating

In [39]:
# Create variable
x = Variable(torch.arange(0,4), requires_grad=True)
# Calculate y=sum(x**2)
y = torch.sum(x**2)
# Calculate gradient (dy/dx=2x)
y.backward()
# Print values
print(x)
print(y)
print(x.grad)

Variable containing:
 0
 1
 2
 3
[torch.FloatTensor of size 4]

Variable containing:
 14
[torch.FloatTensor of size 1]

Variable containing:
 0
 2
 4
 6
[torch.FloatTensor of size 4]



Variables and Tensors cannot be mixed. Wrap all tensors to use them in automatic differentiation.

In [40]:
x=torch.rand(3,5)  # tensor
y=torch.rand(5,4)  # tensor
xv=Variable(x)  # variable
yv=Variable(y)  # variable
print(torch.mm(x,y))  # dot between two tensors OK
print(torch.mm(xv,yv))  # dot between two variables OK
try:
  fail=torch.mm(x,yv)  # dot between tensor and variable FAIL
except TypeError as e:
  print(e)


 1.3053  1.4458  0.8122  1.8480
 2.2392  2.2687  0.8424  2.3457
 1.9050  2.0709  1.0581  2.4536
[torch.FloatTensor of size 3x4]

Variable containing:
 1.3053  1.4458  0.8122  1.8480
 2.2392  2.2687  0.8424  2.3457
 1.9050  2.0709  1.0581  2.4536
[torch.FloatTensor of size 3x4]

torch.mm received an invalid combination of arguments - got (torch.FloatTensor, Variable), but expected one of:
 * (torch.FloatTensor source, torch.FloatTensor mat2)
      didn't match because some of the arguments have invalid types: ([32;1mtorch.FloatTensor[0m, [31;1mVariable[0m)
 * (torch.SparseFloatTensor source, torch.FloatTensor mat2)
      didn't match because some of the arguments have invalid types: ([31;1mtorch.FloatTensor[0m, [31;1mVariable[0m)



Differentiation accumulates gradients. This is sometimes what you want and sometimes not. **Make sure to zero gradients between batches if performing SGD or you will get strange results!**

In [41]:
# Create a variable
x=Variable(torch.arange(0,4), requires_grad=True)
# Differentiate
torch.sum(x**2).backward()
print(x.grad)
# Differentiate again (accumulates gradient)
torch.sum(x**2).backward()
print(x.grad)
# Zero gradient before differentiating
x.grad.data.zero_()
torch.sum(x**2).backward()
print(x.grad)

Variable containing:
 0
 2
 4
 6
[torch.FloatTensor of size 4]

Variable containing:
  0
  4
  8
 12
[torch.FloatTensor of size 4]

Variable containing:
 0
 2
 4
 6
[torch.FloatTensor of size 4]



## Neural Network Modules

Pytorch provides a framework for developing neural network modules that takes care of things like tracking a list of parameters for you.

- `nn.Module` objects are reusable components such as dense layers and activation functions
- You can write custom modules for any experimental layers
- You can combine modules into larger module classes

In [42]:
# create a simple sequential network (`nn.Module` object) from layers (other `nn.Module` objects)
net = torch.nn.Sequential(
    torch.nn.Linear(28*28,256),
    torch.nn.Sigmoid(),
    torch.nn.Linear(256,10))

In [43]:
# create a more customizable network module
class MyNetwork(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = torch.nn.Linear(28*28,256)
        self.layer2 = torch.nn.Sigmoid()
        self.layer3 = torch.nn.Linear(256,10)

    def forward(self, input_val):
        h = input_val
        h = self.layer1(h)
        h = self.layer2(h)
        h = self.layer3(h)
        return h

net = MyNetwork()

## Saving and Loading

In [44]:
# get dictionary of keys to weights using `state_dict`
net = torch.nn.Sequential(
    torch.nn.Linear(28*28,256),
    torch.nn.Sigmoid(),
    torch.nn.Linear(256,10))
print(net.state_dict().keys())

odict_keys(['0.weight', '0.bias', '2.weight', '2.bias'])


In [45]:
# save a dictionary
torch.save(net.state_dict(),'test.t7')
# load a dictionary
net.load_state_dict(torch.load('test.t7'))

## Building a Neural Network

For a worked example of how to build and train a pytorch network, see `pytorch-example.py`.

Good luck!