https://pytorch.org/

Linux, pip, python 3.6 no cuda

In [2]:
!pip install https://download.pytorch.org/whl/cpu/torch-1.1.0-cp36-cp36m-linux_x86_64.whl
!pip install https://download.pytorch.org/whl/cpu/torchvision-0.3.0-cp36-cp36m-linux_x86_64.whl

Collecting torch==1.1.0 from https://download.pytorch.org/whl/cpu/torch-1.1.0-cp36-cp36m-linux_x86_64.whl
[?25l  Downloading https://download.pytorch.org/whl/cpu/torch-1.1.0-cp36-cp36m-linux_x86_64.whl (101.2MB)
[K    100% |################################| 101.2MB 200kB/s eta 0:00:01
[?25hCollecting numpy (from torch==1.1.0)
[?25l  Downloading https://files.pythonhosted.org/packages/87/2d/e4656149cbadd3a8a0369fcd1a9c7d61cc7b87b3903b85389c70c989a696/numpy-1.16.4-cp36-cp36m-manylinux1_x86_64.whl (17.3MB)
[K    100% |################################| 17.3MB 949kB/s eta 0:00:01
[31mtensorflow 1.13.1 requires tensorboard<1.14.0,>=1.13.0, which is not installed.[0m
[?25hInstalling collected packages: numpy, torch
Successfully installed numpy-1.16.4 torch-1.1.0
Collecting torchvision==0.3.0 from https://download.pytorch.org/whl/cpu/torchvision-0.3.0-cp36-cp36m-linux_x86_64.whl
[?25l  Downloading https://download.pytorch.org/whl/cpu/torchvision-0.3.0-cp36-cp36m-linux_x86_64.whl (2.5M

In [3]:
import torch # a Tensor library similar to NumPy, with strong GPU support
import torch.autograd as autograd # a "tape-based" automatic differentiation library
import torch.nn as nn # a neural network library with deep integrations with autograd
import torch.optim # an optimization package to be used with torch.nn with standard optimization methods such as SGD, RMSProp, LBFGS, Adam etc.

torch.manual_seed(123)

<torch._C.Generator at 0x7fea438b8c50>

# Creating Tensore

Tensors can be created from Python lists with the ***torch.Tensor()*** function.

In [4]:
# Create a torch.Tensor object from python list
v = [1, 2, 3]
print(type(v))
v_tensor = torch.Tensor(v)
print(v_tensor)

<class 'list'>
tensor([1., 2., 3.])


In [5]:
# Create a torch.Tensor object of size 2x3 from 2x3 matrix
m2x3 = [[1, 2, 3], [4, 5, 6]]
m2x3_tensor = torch.Tensor(m2x3)
print(m2x3_tensor)

tensor([[1., 2., 3.],
        [4., 5., 6.]])


In [6]:
# Create a 3D torch.Tensor object of size 3x3x3.
m3x3x3 = [[[1, 2, 3], [4, 5, 6], [7, 8, 9]],
          [[10, 11, 12],[13, 14, 15], [16, 17, 18]],
            [[19, 20, 21],[22, 23, 24], [25, 26, 27]]]
m3x3x3_tensor = torch.Tensor(m3x3x3)
print(m3x3x3_tensor)

tensor([[[ 1.,  2.,  3.],
         [ 4.,  5.,  6.],
         [ 7.,  8.,  9.]],

        [[10., 11., 12.],
         [13., 14., 15.],
         [16., 17., 18.]],

        [[19., 20., 21.],
         [22., 23., 24.],
         [25., 26., 27.]]])


In [7]:
#Create a 4Dtensor from random data and given dimensions (in this case 4x3x3x3) with torch.randn()
m4x3x3x3_tensor = torch.randn((4, 3, 3, 3))
m4x3x3x3_tensor.shape
print(m4x3x3x3_tensor)

tensor([[[[ 3.3737e-01, -1.7778e-01, -3.0353e-01],
          [-5.8801e-01,  3.4861e-01,  6.6034e-01],
          [-2.1964e-01, -3.7917e-01,  7.6711e-01]],

         [[-1.1925e+00,  6.9835e-01, -1.4097e+00],
          [ 1.7938e-01,  1.8951e+00,  4.9545e-01],
          [ 2.6920e-01, -7.7020e-02, -1.0205e+00]],

         [[-1.6896e-01,  9.1776e-01,  1.5810e+00],
          [ 1.3010e+00,  1.2753e+00, -2.0095e-01],
          [ 4.9647e-01, -1.5723e+00,  9.6657e-01]]],


        [[[-1.1481e+00, -1.1589e+00,  3.2547e-01],
          [-6.3151e-01, -2.8400e+00, -1.3250e+00],
          [ 1.7843e-01, -2.1338e+00,  1.0524e+00]],

         [[-3.8848e-01, -9.3435e-01, -4.9914e-01],
          [-1.0867e+00,  8.8054e-01,  1.5542e+00],
          [ 6.2662e-01, -1.7549e-01,  9.8284e-02]],

         [[-9.3507e-02,  2.6621e-01, -5.8504e-01],
          [ 8.7684e-01,  1.6221e+00, -1.4779e+00],
          [ 1.1331e+00, -1.2203e+00,  1.3139e+00]]],


        [[[ 1.0533e+00,  1.3881e-01,  2.2473e+00],
          [-8.0

## What is a multidimensional tensor?

Since we frequently deal with n > 3 dimensional tensors, its understanding is very important. The best way to think of a higher (n) dimensional object (and tensor in particular) is as of a container which keeps a series of n-1 dimensional objects "inside" of it. We can "pull out" these "inner" objects by indexing in to higher dimensional tensor container. Let's have a look on some examples:

For a vector v (dim(v)=1), indexing into it ("pulling out of it") returns its "slice" - a scalar s (dim(s)=0).

For a matrix, indexing into it returns its "slice" - a (row or column) vector.

3D tensor can be seen as a cube or 3D rectangular consisting of horizontally "stacked" matrices. So if we index into a such tensor it will give us its slice which is a matrix!

We can't easily visualize 5D (or n-D) tensors, but the idea is actually the same. If we index in to them, we will pull out an object of dimension n-1.

E.g. a 4D tensor can be seen as a list of cubes or 3D reactangulars. If we index in to a 4D tensor, we will get 3D rectangulars.

In [8]:
# Index into v_tensor and get a scalar
print(v_tensor[0])

# Index into m2x3_tensor and get a vector
print(m2x3_tensor[0])

# Index into m3x3x3_tensor and get a matrix
print(m3x3x3_tensor[0])

# Index into m4x3x3x3_tensor and get a 3D rectangular of size 3x3x3
print(m4x3x3x3_tensor[0])

tensor(1.)
tensor([1., 2., 3.])
tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])
tensor([[[ 0.3374, -0.1778, -0.3035],
         [-0.5880,  0.3486,  0.6603],
         [-0.2196, -0.3792,  0.7671]],

        [[-1.1925,  0.6984, -1.4097],
         [ 0.1794,  1.8951,  0.4954],
         [ 0.2692, -0.0770, -1.0205]],

        [[-0.1690,  0.9178,  1.5810],
         [ 1.3010,  1.2753, -0.2010],
         [ 0.4965, -1.5723,  0.9666]]])


## Operations with Tensors

You can operate on tensors in the ways you would expect. See the documentation http://pytorch.org/docs/torch.html for a complete list of operations.

Simple mathematical operations: Addition, Multiplication

In [9]:
x = torch.Tensor([1, 2, 3])
y = torch.Tensor([4, 5, 6])
print(x)
print(y)

w = torch.matmul(x, y)
print(w)

tensor([1., 2., 3.])
tensor([4., 5., 6.])
tensor(32.)


Helpful operation: **Concatenation**

In [10]:
# By default, it concatenates along the axis with 0 (rows). It's "stacking" the rows.

x_1 = torch.randn(2, 5)
print(x_1)
y_1 = torch.randn(3, 5)
print(y_1)
z_1 = torch.cat([x_1, y_1])
print(z_1)

tensor([[ 0.5485, -1.6063,  0.7281,  0.6609,  0.2391],
        [ 0.0340,  0.1164, -0.9905,  0.5646,  0.0686]])
tensor([[-1.0035, -0.7874,  0.9840,  0.2045, -0.3604],
        [ 1.2101, -1.0814,  0.0789,  0.2913, -0.5023],
        [-0.9306,  0.9086, -0.7788, -1.4453,  0.7636]])
tensor([[ 0.5485, -1.6063,  0.7281,  0.6609,  0.2391],
        [ 0.0340,  0.1164, -0.9905,  0.5646,  0.0686],
        [-1.0035, -0.7874,  0.9840,  0.2045, -0.3604],
        [ 1.2101, -1.0814,  0.0789,  0.2913, -0.5023],
        [-0.9306,  0.9086, -0.7788, -1.4453,  0.7636]])


In [11]:
# Concatenate columns:
x_2 = torch.randn(2, 3)
print(x_2)
y_2 = torch.randn(2, 5)
print(y_2)
# second arg specifies which axis to concat along. Here we select 1 (columns). It's attaching the columns.
z_2 = torch.cat([x_2, y_2], 1)
print(z_2)

tensor([[-0.2469,  0.5857,  0.9906],
        [ 0.0417, -1.1668,  1.3251]])
tensor([[-0.7990,  0.6292, -1.2097, -2.1362, -0.1212],
        [-0.1443,  0.9969,  0.5697, -0.4930,  0.3155]])
tensor([[-0.2469,  0.5857,  0.9906, -0.7990,  0.6292, -1.2097, -2.1362, -0.1212],
        [ 0.0417, -1.1668,  1.3251, -0.1443,  0.9969,  0.5697, -0.4930,  0.3155]])


In [12]:
# If your tensors are not compatible, torch will complain.  Uncomment to see the error
#torch.cat([x_1, y_1])

## Reshaping Tensors

We can use the .view() method to reshape a tensor. Often we will need to reshape our data before passing it to a neuronal network.

Let's assume we have 64000 RGB images with the size of 28x28 pixels. We can define an array fo shape (64000, 3, 28, 28) to hold them, where 3 is number of color channels:

In [13]:
x = torch.randn(64000, 3, 28, 28)
# Now we want to add a batch dimension of size 32. We can then infer the second dimension by placing -1:
x_rehsaped = x.view(32, -1, 3, 28, 28)
print(x_rehsaped.shape)

torch.Size([32, 2000, 3, 28, 28])


## Computation Graphs and Automatic Differentiation

A computation graph is a specification of what parameters with which operations are involved in the computation to give the output.

The fundamental class of Pytorch autograd.Variable keeps track of how it was created.

In [14]:
# Variables wrap tensor objects
x = autograd.Variable(torch.Tensor([1, 2, 3]), requires_grad=True)
# You can access the data with the .data attribute
print(x.data)

y = autograd.Variable(torch.Tensor([4, 5, 6]), requires_grad=True)
print(y.data)

# With autograd.Variable you can also perform all the same operations you did with tensors
z = x + y
print(z.data)

tensor([1., 2., 3.])
tensor([4., 5., 6.])
tensor([5., 7., 9.])


In [15]:
#  w knows also that it's result of addition of z elements (AddBackward)
operation = z.grad_fn
print(operation)

<AddBackward0 object at 0x7fea32240a20>


The autograd.Variable knows which operation has created it. But how does that help compute a gradient?

In [16]:
# Lets sum up all the entries in z
s = z.sum()
print(s)
print(s.grad_fn)

tensor(21., grad_fn=<SumBackward0>)
<SumBackward0 object at 0x7fea32240d68>


## Gradient

So now, what is the derivative of this sum with respect to the first component of x? Remember, that x is a tensor of 3 elements: $x = (x_0, x_1, x_2)$

In math, we want a partial derivative of $s$ with respect to $x_0$: $\frac{\partial s}{\partial x_0}$

Well, $s$ knows that it was created as a $sum$ of the tensor $z$ elements $(z_0, z_1, z_2)$. $z$ knows that it was the sum $x + y$. So

$$\begin{align}s = \overbrace{x_0 + y_0}^\text{$z_0$} + \overbrace{x_1 + y_1}^\text{$z_1$} + \overbrace{x_2 + y_2}^\text{$z_2$}\end{align}$$
And so $s$ contains enough information to determine that the derivative of $s$ with respect to $x_0$ is 1!

Reminder: If you compute the partial derivative with respekt to one variable, you handle all other variables as constants. Therefore they all $(x_1, x_2, y_0, y_1, y_2)$ get zeroes, and the derivative of $f(x_0) = x_0$ is 1.

First we need to run backpropagation and calculate gradients with respect to every variable. Note: if you run backward multiple times, the gradient will increment. That is because Pytorch accumulates the gradient into the .grad property, since for many models this is very convenient. Lets now have Pytorch compute the gradient, and see that we were right with our guess of 1:

In [17]:
# calling .backward() on any variable will run backprop, starting from it.
s.backward(retain_graph=True)

In [18]:
print(x)
print(x.grad)
print(y.grad)

tensor([1., 2., 3.], requires_grad=True)
tensor([1., 1., 1.])
tensor([1., 1., 1.])


In [19]:
# everytime you call backward the gradient is accumulated
s.backward(retain_graph=True)
print(x)
print(x.grad)
print(y.grad)

tensor([1., 2., 3.], requires_grad=True)
tensor([2., 2., 2.])
tensor([2., 2., 2.])


In [20]:
s.backward(retain_graph=True)
print(x)
print(x.grad)
print(y.grad)

tensor([1., 2., 3.], requires_grad=True)
tensor([3., 3., 3.])
tensor([3., 3., 3.])


## How NOT to break the computational graph

Let's create two torch tensors and add them up:

In [21]:
x = torch.randn((2, 2))
y = torch.randn((2, 2))
z = x + y  # These are Tensor types, and backprop would not be possible

print(z)

tensor([[-0.8157,  1.5286],
        [-1.1737, -0.5775]])


Now we wrap the torch tensors in autograd.Variable. The var_z contains the information for backpropagation:

In [22]:
var_x = autograd.Variable(x, requires_grad=True)
var_y = autograd.Variable(y, requires_grad=True)
# var_z contains enough information to compute gradients, as we saw above
var_z = var_x + var_y
print(var_z.grad_fn)

<AddBackward0 object at 0x7fea321d9278>


But what happens if we extract the wrapped tensor object out of var_z and re-wrap the tensor in a new autograd.Variable?

In [23]:
var_z_data = var_z.data
new_var_z = autograd.Variable(var_z_data)
print(new_var_z.grad_fn)

None


The variable chain is not existing anymore, since we have extracted only data and the whole operations chain was lost. If we try now to compute backward on new_var_z, it will throw an error:

In [26]:
#new_var_z.backward(retain_graph=True)

# CUDA

Check wether GPU accelaration with CUDA is available

In [25]:
# let us run this cell only if CUDA is available
if torch.cuda.is_available():
    # creates a LongTensor and transfers it
    # to GPU as torch.cuda.LongTensor
    a = torch.LongTensor(10).fill_(3).cuda() #remove .cuda() to make it use cpu
    print(type(a))
    b = a.cpu()
    # transfers it to CPU, back to
    # being a torch.LongTensor

# Linear Model

In [27]:
import torch
import torch.nn as nn
from torch.autograd import Variable
import numpy as np

In [28]:
x = [i for i in range(20)] #list comprehention
x_train = np.array(x, dtype=np.float32)
x_train = x_train.reshape(-1, 1)
print(x)
print(x_train.shape)

y = [(5*i + 2) for i in x] #list comprehention
y_train = np.array(y, dtype=np.float32)
y_train = y_train.reshape(-1, 1)
print(y)
print(y_train.shape)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
(20, 1)
[2, 7, 12, 17, 22, 27, 32, 37, 42, 47, 52, 57, 62, 67, 72, 77, 82, 87, 92, 97]
(20, 1)


# Create Model Class

In [29]:
class LinearRegressor(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(LinearRegressor, self).__init__()
        self.linear = nn.Linear(input_dim, output_dim)  
    
    def forward(self, x):
        out = self.linear(x)
        return out

input_dim = 1
output_dim = 1

model = LinearRegressor(input_dim, output_dim)

model

LinearRegressor(
  (linear): Linear(in_features=1, out_features=1, bias=True)
)

# Loss & Optimizer

In [34]:
loss_function = nn.MSELoss()

optimizer = torch.optim.SGD(model.parameters(), lr=0.001)
optimizer

SGD (
Parameter Group 0
    dampening: 0
    lr: 0.001
    momentum: 0
    nesterov: False
    weight_decay: 0
)

In [44]:
epochs = 10000

for epoch in range(epochs):
    epoch += 1
    #Convert inputs and outputs to torch variable
    inputs = Variable(torch.from_numpy(x_train))
    
    real_outputs = Variable(torch.from_numpy(y_train))
    
    # Reset Gradients
    optimizer.zero_grad()
    
    # Forward - compute the output
    pred_outputs = model(inputs)
    
    # Loss
    loss = loss_function(pred_outputs, real_outputs)
    
    # Backword - compute gradients
    loss.backward()
    
    # Update parameters
    optimizer.step()
    
    print('epoch {}, loss {}'.format(epoch, loss.data))

epoch 1, loss 0.34414684772491455
epoch 2, loss 0.34377866983413696
epoch 3, loss 0.3434096574783325
epoch 4, loss 0.34304389357566833
epoch 5, loss 0.3426758348941803
epoch 6, loss 0.3423088490962982
epoch 7, loss 0.3419426381587982
epoch 8, loss 0.34157654643058777
epoch 9, loss 0.34121087193489075
epoch 10, loss 0.3408455550670624
epoch 11, loss 0.3404814302921295
epoch 12, loss 0.3401169180870056
epoch 13, loss 0.33975279331207275
epoch 14, loss 0.33938971161842346
epoch 15, loss 0.33902493119239807
epoch 16, loss 0.3386630117893219
epoch 17, loss 0.3383008539676666
epoch 18, loss 0.3379392921924591
epoch 19, loss 0.33757686614990234
epoch 20, loss 0.33721596002578735
epoch 21, loss 0.336854487657547
epoch 22, loss 0.33649441599845886
epoch 23, loss 0.3361341655254364
epoch 24, loss 0.3357733488082886
epoch 25, loss 0.3354148268699646
epoch 26, loss 0.3350555896759033
epoch 27, loss 0.33469632267951965
epoch 28, loss 0.3343394696712494
epoch 29, loss 0.3339814245700836
epoch 30, lo

In [45]:
loss.data

tensor(7.6999e-06)

In [46]:
print(pred_outputs, real_outputs)

tensor([[ 1.9947],
        [ 6.9951],
        [11.9955],
        [16.9959],
        [21.9963],
        [26.9967],
        [31.9971],
        [36.9975],
        [41.9980],
        [46.9984],
        [51.9988],
        [56.9992],
        [61.9996],
        [67.0000],
        [72.0004],
        [77.0008],
        [82.0013],
        [87.0017],
        [92.0021],
        [97.0025]], grad_fn=<AddmmBackward>) tensor([[ 2.],
        [ 7.],
        [12.],
        [17.],
        [22.],
        [27.],
        [32.],
        [37.],
        [42.],
        [47.],
        [52.],
        [57.],
        [62.],
        [67.],
        [72.],
        [77.],
        [82.],
        [87.],
        [92.],
        [97.]])
