## Introduction to PyTorch

PyTorch's tensor library
The most of PyTorch operations are running on tensors. A tensor is an multidimensional array. Lets have a look on some basic tensor operations. But first, lets import some important PyTorch libraries:

torch - a Tensor library similar to NumPy, with strong GPU support
torch.autograd - a "tape-based" (about this - later on) automatic differentiation library
torch.nn - a neural networks library deeply integrated with autograd
torch.optim - an optimization package to be used with torch.nn with standard optimization methods such as SGD, RMSProp, LBFGS, Adam etc.
We also set a seed to be able to reproduce the same results later.

In [7]:
%matplotlib inline # to make the graphs display within the notebook

In [1]:
!pip install http://download.pytorch.org/whl/cpu/torch-0.3.1-cp27-cp27mu-linux_x86_64.whl 
!pip install torchvision

Collecting torch==0.3.1 from http://download.pytorch.org/whl/cpu/torch-0.3.1-cp27-cp27mu-linux_x86_64.whl
  Downloading http://download.pytorch.org/whl/cpu/torch-0.3.1-cp27-cp27mu-linux_x86_64.whl (47.2MB)
[K    100% |████████████████████████████████| 47.2MB 23.7MB/s ta 0:00:01
Installing collected packages: torch
Successfully installed torch-0.3.1
Collecting torchvision
  Downloading torchvision-0.2.0-py2.py3-none-any.whl (48kB)
[K    100% |████████████████████████████████| 51kB 2.3MB/s ta 0:00:011
Installing collected packages: torchvision
Successfully installed torchvision-0.2.0


In [2]:
import torch
import torch.autograd as autograd
import torch.nn as nn
import torch.optim as optim

torch.manual_seed(123)

<torch._C.Generator at 0x7fe935760350>

### Creating Tensors

In [6]:
# Create a torch.Tensor object from python list
v = [1, 2, 3]
print(type(v))
v_tensor = torch.Tensor(v)
print(v_tensor)

# Create a torch.Tensor object of size 2x3 from 2x3 matrix
m2x3 = [[1, 2, 3], [4, 5, 6]]
m2x3_tensor = torch.Tensor(m2x3)
print(m2x3_tensor)

# Create a 3D torch.Tensor object of size 3x3x3.
m3x3x3 = [[[1, 2, 3], [4, 5, 6], [7, 8, 9]],
          [[10, 11, 12],[13, 14, 15], [16, 17, 18]],
            [[19, 20, 21],[22, 23, 24], [25, 26, 27]]]
m3x3x3_tensor = torch.Tensor(m3x3x3)
print(m3x3x3_tensor)

#Create a 4Dtensor from random data and given dimensions (in this case 3x4x5x6) with torch.randn()
m4x3x3x3_tensor = torch.randn((4, 3, 3, 3))
m4x3x3x3_tensor.shape
print(m4x3x3x3_tensor)

<type 'list'>

 1
 2
 3
[torch.FloatTensor of size 3]


 1  2  3
 4  5  6
[torch.FloatTensor of size 2x3]


(0 ,.,.) = 
   1   2   3
   4   5   6
   7   8   9

(1 ,.,.) = 
  10  11  12
  13  14  15
  16  17  18

(2 ,.,.) = 
  19  20  21
  22  23  24
  25  26  27
[torch.FloatTensor of size 3x3x3]


(0 ,0 ,.,.) = 
 -0.1115  0.1204 -0.3696
 -0.2404 -1.1969  0.2093
 -0.9724 -0.7550  0.3239

(0 ,1 ,.,.) = 
 -0.1085  0.2103 -0.3908
  0.2350  0.6653  0.3528
  0.9728 -0.0386 -0.8861

(0 ,2 ,.,.) = 
 -0.4709 -0.4269 -0.0283
  1.4220 -0.3886 -0.8903
 -0.9601 -0.4087  1.0764

(1 ,0 ,.,.) = 
 -0.4015 -0.7291 -0.1218
 -0.4796 -0.5166 -0.3107
  0.2057  0.9657  0.7057

(1 ,1 ,.,.) = 
  0.7290  1.2775 -1.0815
 -1.3027  1.0827 -1.3841
  0.4033 -1.2239  0.7017

(1 ,2 ,.,.) = 
  2.2139 -0.0276  1.0541
  0.5661 -0.3820  0.8807
  0.2710  0.7694  0.3453

(2 ,0 ,.,.) = 
  1.8979 -0.2357  0.7885
  0.3208  0.8456 -0.3621
  0.1027 -3.5310  0.5485

(2 ,1 ,.,.) = 
 -1.6063  0.7281  0.6609
  0.2391  0.0340  0.1164

### What is a multidimensional tensor?

Since we frequently deal with n > 3 dimensional tensors, its understanding is very important. The best way to think of a higher (n) dimensional object (and tensor in particular) is as of a container which keeps a series of n-1 dimensional objects "inside" of it. We can "pull out" these "inner" objects by indexing in to higher dimensional tensor container. Let's have a look on some examples:

For a vector v (dim(v)=1), indexing into it ("pulling out of it") returns its "slice" - a scalar s (dim(s)=0).

For a matrix, indexing into it returns its "slice" - a (row or column) vector.

3D tensor can be seen as a cube or 3D rectangular consisting of horizontally "stacked" matrices. So if we index into a such tensor it will give us its slice which is a matrix!

We can't easily visualize 5D (or n-D) tensors, but the idea is actually the same. If we index in to them, we will pull out an object of dimension n-1.

E.g. a 4D tensor can be seen as a list of cubes or 3D reactangulars. If we index in to a 4D tensor, we will get 3D rectangulars.

In [8]:
# Index into v_tensor and get a scalar
print(v_tensor[0])

# Index into m2x3_tensor and get a vector
print(m2x3_tensor[0])

# Index into m3x3x3_tensor and get a matrix
print(m3x3x3_tensor[0])

# Index into m4x3x3x3_tensor and get a 3D rectangular of size 4x5x6
print(m4x3x3x3_tensor[0])

1.0

 1
 2
 3
[torch.FloatTensor of size 3]


 1  2  3
 4  5  6
 7  8  9
[torch.FloatTensor of size 3x3]


(0 ,.,.) = 
 -0.1115  0.1204 -0.3696
 -0.2404 -1.1969  0.2093
 -0.9724 -0.7550  0.3239

(1 ,.,.) = 
 -0.1085  0.2103 -0.3908
  0.2350  0.6653  0.3528
  0.9728 -0.0386 -0.8861

(2 ,.,.) = 
 -0.4709 -0.4269 -0.0283
  1.4220 -0.3886 -0.8903
 -0.9601 -0.4087  1.0764
[torch.FloatTensor of size 3x3x3]



### Operations with Tensors

You can operate on tensors in the ways you would expect. See the documentation http://pytorch.org/docs/torch.html for a complete list of operations.

Simple mathematical operations: Addition, Multiplication

In [9]:
x = torch.Tensor([1, 2, 3])
y = torch.Tensor([4, 5, 6])
print(x)
print(y)

w = torch.matmul(x, y)
print(w)


 1
 2
 3
[torch.FloatTensor of size 3]


 4
 5
 6
[torch.FloatTensor of size 3]

32.0


In [10]:
# By default, it concatenates along the axis with 0 (rows). It's "stacking" the rows.

x_1 = torch.randn(2, 5)
print(x_1)
y_1 = torch.randn(3, 5)
print(y_1)
z_1 = torch.cat([x_1, y_1])
print(z_1)

# Concatenate columns:
x_2 = torch.randn(2, 3)
print(x_2)
y_2 = torch.randn(3, 5)
print(y_2)
# second arg specifies which axis to concat along. Here we select 1 (columns). It's attaching the columns.
z_2 = torch.cat([x_2, y_2], 1)
print(z_2)

# If your tensors are not compatible, torch will complain.  Uncomment to see the error
torch.cat([x_1, x_2])


 0.4934 -0.2766  0.2439 -1.2116 -0.1520
 0.1509 -0.6251 -0.4416  0.3208 -0.3273
[torch.FloatTensor of size 2x5]


-0.5305 -0.0172  0.4719  0.5671  2.7930
 0.3229  0.8552  0.7492 -1.7119  0.6025
-0.7018 -1.3130  0.1574  2.0114  0.1004
[torch.FloatTensor of size 3x5]


 0.4934 -0.2766  0.2439 -1.2116 -0.1520
 0.1509 -0.6251 -0.4416  0.3208 -0.3273
-0.5305 -0.0172  0.4719  0.5671  2.7930
 0.3229  0.8552  0.7492 -1.7119  0.6025
-0.7018 -1.3130  0.1574  2.0114  0.1004
[torch.FloatTensor of size 5x5]


 0.8222 -0.0176  1.2481
-0.0710  2.1627  1.5215
[torch.FloatTensor of size 2x3]


-1.0547  1.7822  1.9736 -0.3101 -0.8211
 0.1315 -0.6948 -0.5823  1.0035 -1.4613
 0.8985  0.6210 -0.9679  0.6740 -1.2828
[torch.FloatTensor of size 3x5]



RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 2 and 3 in dimension 0 at /pytorch/torch/lib/TH/generic/THTensorMath.c:2897

### Reshaping Tensors

We can use the .view() method to reshape a tensor. Often we will need to reshape our data before passing it to a neuronal network.

Let's assume we have 64000 RGB images with the size of 28x28 pixels. We can define an array fo shape (64000, 3, 28, 28) to hold them, where 3 is number of color channels:

In [11]:
x = torch.randn(64000, 3, 28, 28)
# Now we want to add a batch dimension of size 32. We can then infer the second dimension by placing -1:
x_rehsaped = x.view(32, -1, 3, 28, 28)
print(x_rehsaped.shape)

torch.Size([32, 2000, 3, 28, 28])


### Computation Graphs and Automatic Differentiation

A computation graph is a specification of what parameters with which operations are involved in the computation to give the output.

The fundamental class of Pytorch autograd.Variable keeps track of how it was created.



In [12]:
# Variables wrap tensor objects
x = autograd.Variable(torch.Tensor([1, 2, 3]), requires_grad=True)
# You can access the data with the .data attribute
print(x.data)

y = autograd.Variable(torch.Tensor([4, 5, 6]), requires_grad=True)

# With autograd.Variable you can also perform all the same operations you did with tensors
z = x + y
print(z.data)

#  w knows also that it's result of addition of z lements (AddBackward)
operation = z.grad_fn
print(operation)


 1
 2
 3
[torch.FloatTensor of size 3]


 5
 7
 9
[torch.FloatTensor of size 3]

<AddBackward1 object at 0x7fe90ae7ee10>


In [21]:
# The autograd.Variable knows which operation has created it. But how does that help compute a gradient?
# Lets sum up all the entries in z
s = z.sum()
print(s)
print(s.grad_fn)

1.28212755919


AttributeError: 'float' object has no attribute 'grad_fn'

### Gradient

So now, what is the derivative of this sum with respect to the first component of x? Remember, that x is a tensor of 3 elements: $x = (x_0, x_1, x_2)$

In math, we want a partial derivative of $s$ with respect to $x_0$: $\frac{\partial s}{\partial x_0}$

Well, $s$ knows that it was created as a $sum$ of the tensor $z$ elements $(z_0, z_1, z_2)$. $z$ knows that it was the sum $x + y$. So

$$\begin{align}s = \overbrace{x_0 + y_0}^\text{$z_0$} + \overbrace{x_1 + y_1}^\text{$z_1$} + \overbrace{x_2 + y_2}^\text{$z_2$}\end{align}$$
And so $s$ contains enough information to determine that the derivative of $s$ with respect to $x_0$ is 1!

Reminder: If you compute the partial derivative with respekt to one variable, you handle all other variables as constants. Therefore they all $(x_1, x_2, y_0, y_1, y_2)$ get zeroes, and the derivative of $f(x_0) = x_0$ is 1.

First we need to run backpropagation and calculate gradients with respect to every variable. Note: if you run backward multiple times, the gradient will increment. That is because Pytorch accumulates the gradient into the .grad property, since for many models this is very convenient. Lets now have Pytorch compute the gradient, and see that we were right with our guess of 1:

In [19]:
# calling .backward() on any variable will run backprop, starting from it.
s.backward(retain_graph=True)

In [20]:
print(x)
print(x.grad)
print(y.grad)


-0.3503  1.4220
-0.0637  0.3308
[torch.FloatTensor of size 2x2]



AttributeError: 'torch.FloatTensor' object has no attribute 'grad'

### How NOT to break the computational graph

Let's create two torch tensors and add them up:

In [16]:
x = torch.randn((2, 2))
y = torch.randn((2, 2))
z = x + y  # These are Tensor types, and backprop would not be possible

print(z)


-0.0693  1.8363
 1.4892 -1.9741
[torch.FloatTensor of size 2x2]



In [17]:
# Now we wrap the torch tensors in autograd.Variable. The var_z contains the information for backpropagation
var_x = autograd.Variable(x, requires_grad=True)
var_y = autograd.Variable(y, requires_grad=True)
# var_z contains enough information to compute gradients, as we saw above
var_z = var_x + var_y
print(var_z.grad_fn)

<AddBackward1 object at 0x7fe90ae65910>


In [22]:
#But what happens if we extract the wrapped tensor object out of var_z and re-wrap the tensor in a new autograd.Variable?
var_z_data = var_z.data
new_var_z = autograd.Variable(var_z_data)
print(new_var_z.grad_fn)

None


In [23]:
# The variable chain is not existing anymore, since we have extracted only data and the whole operations chain was lost. 
# If we try now to compute backward on new_var_z, it will throw an error:
new_var_z.backward(retain_graph=True)

RuntimeError: element 0 of variables does not require grad and does not have a grad_fn

### CUDA

Check wether GPU accelaration with CUDA is available

In [24]:
# let us run this cell only if CUDA is available
if torch.cuda.is_available():
    # creates a LongTensor and transfers it
    # to GPU as torch.cuda.LongTensor
    a = torch.LongTensor(10).fill_(3).cuda()
    print(type(a))
    b = a.cpu()
    # transfers it to CPU, back to
    # being a torch.LongTensor

## Linear Model

In [25]:
import torch
import torch.nn as nn
from torch.autograd import Variable
import numpy as np

In [26]:
x = [i for i in range(20)] #list comprehention
x_train = np.array(x, dtype=np.float32)
x_train = x_train.reshape(-1, 1)
print(x)
print(x_train.shape)

y = [(5*i + 2) for i in x] #list comprehention
y_train = np.array(y, dtype=np.float32)
y_train = y_train.reshape(-1, 1)
print(y)
print(y_train.shape)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
(20, 1)
[2, 7, 12, 17, 22, 27, 32, 37, 42, 47, 52, 57, 62, 67, 72, 77, 82, 87, 92, 97]
(20, 1)


##  Create Model Class

In [27]:
class LinearRegressor(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(LinearRegressor, self).__init__()
        self.linear = nn.Linear(input_dim, output_dim)  
    
    def forward(self, x):
        out = self.linear(x)
        return out

input_dim = 1
output_dim = 1

model = LinearRegressor(input_dim, output_dim)

model

LinearRegressor(
  (linear): Linear(in_features=1, out_features=1, bias=True)
)

## Loss & Optimizer

In [28]:
loss_function = nn.MSELoss()


optimizer = torch.optim.SGD(model.parameters(), lr=0.001)
optimizer
loss_function

MSELoss(
)

In [29]:
epochs = 500

for epoch in range(epochs):
    epoch += 1
    #Convert inputs and outputs to torch variable
    inputs = Variable(torch.from_numpy(x_train))
    
    real_outputs = Variable(torch.from_numpy(y_train))
    
    # Reset Gradients
    optimizer.zero_grad()
    
    # Forward - compute the output
    pred_outputs = model(inputs)
    
    # Loss
    loss = loss_function(pred_outputs, real_outputs)
    
    # Backword - compute gradients
    loss.backward()
    
    # Update parameters
    optimizer.step()
    
    print('epoch {}, loss {}'.format(epoch, loss.data[0]))

epoch 1, loss 2709.36669922
epoch 2, loss 1530.92407227
epoch 3, loss 865.333129883
epoch 4, loss 489.403076172
epoch 5, loss 277.07510376
epoch 6, loss 157.150436401
epoch 7, loss 89.4155883789
epoch 8, loss 51.157913208
epoch 9, loss 29.5491027832
epoch 10, loss 17.3436183929
epoch 11, loss 10.4491834641
epoch 12, loss 6.55447530746
epoch 13, loss 4.35402727127
epoch 14, loss 3.11049962044
epoch 15, loss 2.40744829178
epoch 16, loss 2.00967669487
epoch 17, loss 1.78431284428
epoch 18, loss 1.65633547306
epoch 19, loss 1.58335328102
epoch 20, loss 1.54144215584
epoch 21, loss 1.51707673073
epoch 22, loss 1.5026242733
epoch 23, loss 1.49377059937
epoch 24, loss 1.48807883263
epoch 25, loss 1.48417425156
epoch 26, loss 1.48128032684
epoch 27, loss 1.47895789146
epoch 28, loss 1.47695767879
epoch 29, loss 1.47514283657
epoch 30, loss 1.47343087196
epoch 31, loss 1.47177815437
epoch 32, loss 1.47015976906
epoch 33, loss 1.46856331825
epoch 34, loss 1.46697747707
epoch 35, loss 1.465401172