
Introduction to PyTorch
============

PyTorch's tensor library
------------------------------------

The most of PyTorch operations are running on <b>tensors</b>.
A tensor is an multidimensional array.
Lets have a look on some basic tensor operations.
But first, lets import some important PyTorch libraries:
- <b>torch</b> - a Tensor library similar to NumPy, with strong GPU support
- <b>torch.autograd</b> - a "tape-based" (about this - later on) automatic differentiation library
- <b>torch.nn</b> - a neural networks library deeply integrated with autograd 
- <b>torch.optim</b> - an optimization package to be used with torch.nn with standard optimization methods such as SGD, RMSProp, LBFGS, Adam etc.


We also set a seed to be able to reproduce the same results later.

In [1]:
import torch
import torch.autograd as autograd
import torch.nn as nn
import torch.optim as optim

torch.manual_seed(123)

<torch._C.Generator at 0x7fc863709e10>

Creating Tensors
----------------

Tensors can be created from Python lists with the <b>torch.Tensor()</b> function.

In [2]:
# Create a torch.Tensor object from python list
v=[1,2,3]
print(type(v))

v_tensor=torch.tensor(v)
print(type(v_tensor))
print(v_tensor)

<class 'list'>
<class 'torch.Tensor'>
tensor([1, 2, 3])


In [3]:
# Create a torch.Tensor object of size 2x3 from 2x3 matrix
m2x3=[[1,2,3], [4,5,6]]
m2x3_tensor=torch.tensor(m2x3)
print(m2x3_tensor)

tensor([[1, 2, 3],
        [4, 5, 6]])


In [4]:
# Create a 3D torch.Tensor object of size 3x3x3.
m3x3x3=[[[1,2,3],[4,5,6],[7,8,9]],
       [[10,11,12],[13,14,15],[16,17,18]],
       [[19,20,21],[22,23,24],[25,26,27]]]
m3x3x3_tensor=torch.tensor(m3x3x3)
print(m3x3x3_tensor)

tensor([[[ 1,  2,  3],
         [ 4,  5,  6],
         [ 7,  8,  9]],

        [[10, 11, 12],
         [13, 14, 15],
         [16, 17, 18]],

        [[19, 20, 21],
         [22, 23, 24],
         [25, 26, 27]]])


In [5]:
#Create a 4Dtensor from random data and given dimensions (in this case 3x4x5x6) with torch.randn()
m4x3x3x3_tensor=torch.randn(4,3,3,3)
m4x3x3x3_tensor.shape
print(m4x3x3x3_tensor)

tensor([[[[ 3.3737e-01, -1.7778e-01, -3.0353e-01],
          [-5.8801e-01,  3.4861e-01,  6.6034e-01],
          [-2.1964e-01, -3.7917e-01,  7.6711e-01]],

         [[-1.1925e+00,  6.9835e-01, -1.4097e+00],
          [ 1.7938e-01,  1.8951e+00,  4.9545e-01],
          [ 2.6920e-01, -7.7020e-02, -1.0205e+00]],

         [[-1.6896e-01,  9.1776e-01,  1.5810e+00],
          [ 1.3010e+00,  1.2753e+00, -2.0095e-01],
          [ 4.9647e-01, -1.5723e+00,  9.6657e-01]]],


        [[[-1.1481e+00, -1.1589e+00,  3.2547e-01],
          [-6.3151e-01, -2.8400e+00, -1.3250e+00],
          [ 1.7843e-01, -2.1338e+00,  1.0524e+00]],

         [[-3.8848e-01, -9.3435e-01, -4.9914e-01],
          [-1.0867e+00,  8.8054e-01,  1.5542e+00],
          [ 6.2662e-01, -1.7549e-01,  9.8284e-02]],

         [[-9.3507e-02,  2.6621e-01, -5.8504e-01],
          [ 8.7684e-01,  1.6221e+00, -1.4779e+00],
          [ 1.1331e+00, -1.2203e+00,  1.3139e+00]]],


        [[[ 1.0533e+00,  1.3881e-01,  2.2473e+00],
          [-8.0

What is a multidimensional tensor?
-------------------
Since we frequently deal with n > 3 dimensional tensors, its understanding is very important. 
The best way to think of a higher (n) dimensional object (and tensor in particular) is as of a container which keeps a series of n-1 dimensional objects "inside" of it. We can "pull out" these "inner" objects by indexing in to higher dimensional tensor container.
Let's have a look on some examples:

- For a vector v (dim(v)=1), indexing into it ("pulling out of it") returns its "slice" - a scalar s (dim(s)=0). 

- For a matrix, indexing into it returns its "slice" - a (row or column) vector. 

- 3D tensor can be seen as a cube or 3D rectangular consisting of horizontally "stacked" matrices. So if we index into a such tensor it will give us its slice which is a matrix!

- We can't easily visualize 5D (or n-D) tensors, but the idea is actually the same. If we index in to them, we will pull out an object of dimension n-1.

- E.g. a 4D tensor can be seen as a list of cubes or 3D reactangulars. If we index in to a 4D tensor, we will get 3D rectangulars.

![IMG_0120.png](attachment:IMG_0120.png)

In [6]:
# Index into v_tensor and get a scalar
print(v_tensor[0])

# Index into m2x3_tensor and get a vector
print(m2x3_tensor[0])

# Index into m3x3x3_tensor and get a matrix
print(m3x3x3_tensor[0])

# Index into m4x3x3x3_tensor and get a 3D rectangular of size 4x5x6
print(m4x3x3x3_tensor[0])

tensor(1)
tensor([1, 2, 3])
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
tensor([[[ 0.3374, -0.1778, -0.3035],
         [-0.5880,  0.3486,  0.6603],
         [-0.2196, -0.3792,  0.7671]],

        [[-1.1925,  0.6984, -1.4097],
         [ 0.1794,  1.8951,  0.4954],
         [ 0.2692, -0.0770, -1.0205]],

        [[-0.1690,  0.9178,  1.5810],
         [ 1.3010,  1.2753, -0.2010],
         [ 0.4965, -1.5723,  0.9666]]])


Operations with Tensors
----------------------

You can operate on tensors in the ways you would expect.
See the documentation <http://pytorch.org/docs/torch.html> for a complete list of operations.

Simple mathematical operations: <b>Addition, Multiplication</b>
                                     

In [7]:
x=torch.tensor([1,2,3])
y=torch.tensor([4,5,6])

print(x)
print(y)

w=torch.matmul(x,y)
print(w)

tensor([1, 2, 3])
tensor([4, 5, 6])
tensor(32)


Helpful operation: <b>Concatenation</b>

In [8]:
# By default, it concatenates along the axis with 0 (rows). It's "stacking" the rows.
x_1=torch.randn(2,5)
print(x_1)

y_1=torch.randn(3,5)
print(y_1)

z_1=torch.cat([x_1,y_1])
print(z_1)

tensor([[ 0.5485, -1.6063,  0.7281,  0.6609,  0.2391],
        [ 0.0340,  0.1164, -0.9905,  0.5646,  0.0686]])
tensor([[-1.0035, -0.7874,  0.9840,  0.2045, -0.3604],
        [ 1.2101, -1.0814,  0.0789,  0.2913, -0.5023],
        [-0.9306,  0.9086, -0.7788, -1.4453,  0.7636]])
tensor([[ 0.5485, -1.6063,  0.7281,  0.6609,  0.2391],
        [ 0.0340,  0.1164, -0.9905,  0.5646,  0.0686],
        [-1.0035, -0.7874,  0.9840,  0.2045, -0.3604],
        [ 1.2101, -1.0814,  0.0789,  0.2913, -0.5023],
        [-0.9306,  0.9086, -0.7788, -1.4453,  0.7636]])


In [9]:
# Concatenate columns:
x_2=torch.randn(3,3)
print(x_2)

y_2=torch.randn(3,5)
print(y_2)

# second arg specifies which axis to concat along. Here we select 1 (columns). It's attaching the columns.
z_2 = torch.cat([x_2, y_2], 1)
print(z_2)

tensor([[-0.2469,  0.5857,  0.9906],
        [ 0.0417, -1.1668,  1.3251],
        [-0.7990,  0.6292, -1.2097]])
tensor([[-2.1362, -0.1212, -0.1443,  0.9969,  0.5697],
        [-0.4930,  0.3155, -0.2275, -1.7942,  1.0417],
        [-0.2358, -0.3030,  0.4934, -0.2766,  0.2439]])
tensor([[-0.2469,  0.5857,  0.9906, -2.1362, -0.1212, -0.1443,  0.9969,  0.5697],
        [ 0.0417, -1.1668,  1.3251, -0.4930,  0.3155, -0.2275, -1.7942,  1.0417],
        [-0.7990,  0.6292, -1.2097, -0.2358, -0.3030,  0.4934, -0.2766,  0.2439]])


Reshaping Tensors
----------------

We can use the <code>.view()</code> method to reshape a tensor. Often we will need to reshape our data before passing it
to a neuronal network.

Let's assume we have 64000 RGB images with the size of 28x28 pixels.
We can define an array fo shape (64000, 3, 28, 28) to hold them, where 3 is number of color channels:

In [10]:
x=torch.randn(64000,3,28,28)

# Now we want to add a batch dimension of size 32. We can then infer the second dimension by placing -1:
x_reshaped=x.view(32,-1,3,28,28)

print(x_reshaped.shape)

torch.Size([32, 2000, 3, 28, 28])


Computation Graphs and Automatic Differentiation
---------------------------------------------

A computation graph is a specification of what parameters with which operations are involved in the computation to give the output. 

The fundamental class of Pytorch <code>autograd.Variable</code> keeps track of how it was created.

In [11]:
# Variables wrap tensor objects
x = autograd.Variable(torch.Tensor([1, 2, 3]), requires_grad=True)

# You can access the data with the .data attribute
print(x.data)

y=autograd.Variable(torch.Tensor([4,5,6]), requires_grad=True)

# With autograd.Variable you can also perform all the same operations you did with tensors
z=x+y
print(z.data)

tensor([1., 2., 3.])
tensor([5., 7., 9.])


In [12]:
#  w knows also that it's result of addition of z elements (AddBackward)
operation = z.grad_fn
print(operation)

<AddBackward0 object at 0x7fc81e491c30>


The autograd.Variable knows which operation has created it. But how does that help <b>compute a gradient</b>?

In [13]:
# Lets sum up all the entries in z
s=z.sum()
print(s)

print(s.grad_fn)

tensor(21., grad_fn=<SumBackward0>)
<SumBackward0 object at 0x7fc81e65bfd0>


Gradient
-------
So now, what is the derivative of this sum with respect to the first component of x? Remember, that x is a tensor of 3 elements: $x = (x_0, x_1, x_2)$

In math, we want a partial derivative of $s$ with respect to $x_0$: $\frac{\partial s}{\partial x_0}$

Well, $s$ knows that it was created as a $sum$ of the tensor $z$ elements $(z_0, z_1, z_2)$. $z$ knows
that it was the sum $x + y$. So

\begin{align}s = \overbrace{x_0 + y_0}^\text{$z_0$} + \overbrace{x_1 + y_1}^\text{$z_1$} + \overbrace{x_2 + y_2}^\text{$z_2$}\end{align}

And so $s$ contains enough information to determine that the derivative of $s$ with respect to $x_0$ is 1!

*Reminder:* If you compute the partial derivative with respekt to one variable, you handle all other variables as constants. Therefore they all $(x_1, x_2, y_0, y_1, y_2)$ get zeroes, and the derivative of $f(x_0) = x_0$ is 1.

First we need to run <b>backpropagation</b> and calculate gradients with respect to every variable.
*Note:* if you run <code>backward</code> multiple times, the gradient will increment.
That is because Pytorch *accumulates* the gradient into the <b>.grad
property</b>, since for many models this is very convenient.
Lets now have Pytorch compute the gradient, and see that we were right with our guess of 1:

In [14]:
# calling .backward() on any variable will run backprop, starting from it.
s.backward(retain_graph=True)

In [15]:
print(x)
print(y)
print(x.grad)
print(y.grad)

tensor([1., 2., 3.], requires_grad=True)
tensor([4., 5., 6.], requires_grad=True)
tensor([1., 1., 1.])
tensor([1., 1., 1.])


How NOT to break the computational graph
----------------------------------

Let's create two torch tensors and add them up:

In [16]:
x=torch.randn((2,2))
y=torch.randn((2,2))

print(x)
print(y)

# These are Tensor types, and backprop would not be possible
z=x+y

print(z)

tensor([[-1.2116, -0.7564],
        [-0.3584, -0.9658]])
tensor([[ 1.0298,  0.3542],
        [ 0.0929, -0.5416]])
tensor([[-0.1818, -0.4022],
        [-0.2654, -1.5074]])


Now we wrap the torch tensors in <code>autograd.Variable</code>. The <code>var_z</code> contains the information for backpropagation:

In [17]:
var_x=autograd.Variable(x, requires_grad=True)
var_y=autograd.Variable(y, requires_grad=True)

# var_z contains enough information to compute gradients, as we saw above
var_z=var_x+var_y
print(var_z.grad_fn)

<AddBackward0 object at 0x7fc81e490970>


But what happens if we extract the wrapped tensor object out of <code>var_z</code> and re-wrap the tensor in a new <code>autograd.Variable</code>?

In [18]:
var_z_data=var_z.data
new_var_z=autograd.Variable(var_z_data)
print(new_var_z.grad_fn)

None


The variable chain is not existing anymore, since we have extracted only data and the whole operations chain was lost.
If we try now to compute <code>backward</code> on <code>new_var_z</code>, it will throw an error:

In [19]:
new_var_z.backward(retain_graph=True)

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

CUDA
----
Check wether GPU accelaration with **CUDA** is available

In [20]:
# let us run this cell only if CUDA is available

if torch.cuda.is_available():
    # creates a LongTensor and transfers it
    # to GPU as torch.cuda.LongTensor
    a=torch.LongTensor(10).fill_(3).cuda()
    print(type(a))
    
    b=a.cpu()
    # transfers it to CPU, back to
    # being a torch.LongTensor

Linear Model
=======

In [21]:
import torch
import torch.nn as nn
from torch.autograd import Variable
import numpy as np

In [22]:
x = [i for i in range(20)] #list comprehention
x_train = np.array(x, dtype=np.float32)
x_train = x_train.reshape(-1, 1)
print(x)
print(x_train.shape)

y = [(5*i + 2) for i in x] #list comprehention
y_train = np.array(y, dtype=np.float32)
y_train = y_train.reshape(-1, 1)
print(y)
print(y_train.shape)


[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
(20, 1)
[2, 7, 12, 17, 22, 27, 32, 37, 42, 47, 52, 57, 62, 67, 72, 77, 82, 87, 92, 97]
(20, 1)


Create Model Class
-----------------

In [23]:

class LinearRegressor(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(LinearRegressor, self).__init__()
        self.linear = nn.Linear(input_dim, output_dim)  
    
    def forward(self, x):
        out = self.linear(x)
        return out

input_dim = 1
output_dim = 1

model = LinearRegressor(input_dim, output_dim)

model

LinearRegressor(
  (linear): Linear(in_features=1, out_features=1, bias=True)
)

Loss & Optimizer
---------------

In [24]:
loss_function = nn.MSELoss()


optimizer = torch.optim.SGD(model.parameters(), lr=0.001)
optimizer
loss_function

MSELoss()

In [25]:
epochs = 500

for epoch in range(epochs):
    epoch += 1
    #Convert inputs and outputs to torch variable
    inputs = Variable(torch.from_numpy(x_train))
    
    real_outputs = Variable(torch.from_numpy(y_train))
    
    # Reset Gradients
    optimizer.zero_grad()
    
    # Forward - compute the output
    pred_outputs = model(inputs)
    
    # Loss
    loss = loss_function(pred_outputs, real_outputs)
    
    # Backword - compute gradients
    loss.backward()
    
    # Update parameters
    optimizer.step()
    
    #print('epoch {}, loss {}'.format(epoch, loss.data[0]))
    print('epoch {}, loss {}'.format(epoch, loss.data.item()))
    
    

epoch 1, loss 2877.345947265625
epoch 2, loss 1625.35107421875
epoch 3, loss 918.2175903320312
epoch 4, loss 518.8246459960938
epoch 5, loss 293.2452392578125
epoch 6, loss 165.83644104003906
epoch 7, loss 93.87519836425781
epoch 8, loss 53.2308235168457
epoch 9, loss 30.27443504333496
epoch 10, loss 17.308353424072266
epoch 11, loss 9.984807968139648
epoch 12, loss 5.848206043243408
epoch 13, loss 3.511596202850342
epoch 14, loss 2.1916496753692627
epoch 15, loss 1.4459121227264404
epoch 16, loss 1.0244970321655273
epoch 17, loss 0.7862521409988403
epoch 18, loss 0.6514715552330017
epoch 19, loss 0.5751251578330994
epoch 20, loss 0.5317803621292114
epoch 21, loss 0.5070769190788269
epoch 22, loss 0.49290379881858826
epoch 23, loss 0.4846774935722351
epoch 24, loss 0.4798111319541931
epoch 25, loss 0.47684136033058167
epoch 26, loss 0.47494402527809143
epoch 27, loss 0.473651260137558
epoch 28, loss 0.47270339727401733
epoch 29, loss 0.4719468057155609
epoch 30, loss 0.4713005423545837