# Lesson 1: Tensors
**Pytorch:** Optimized tensor library for deep learning using GPUs and CPUs.
The basic building block of PyTorch is the torch.tensor (similar to a numpy array)

In [3]:
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
import time

## Comparison of Numpy and PyTorch
Arrays are created in the same fashion

In [4]:
n = np.linspace(0, 1, 5)
t = torch.linspace(0, 1, 5)

tensor([0.0000, 0.2500, 0.5000, 0.7500, 1.0000])

They can be resized in similar ways

In [7]:
n = np.arange(0, 9)
n.reshape(3, 3)

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [8]:
t = torch.arange(0, 9)
t.reshape(3, 3)

tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]])

## General broadcasting rules

In [10]:
x = np.linspace(1, 5, 10)
y = np.arange(0, 10)
x == y

array([False, False, False, False, False, False, False, False, False,
       False])

In [11]:
a = np.array([2, 3])
b = np.array([5, 10])
a * b

array([10, 30])

**Rules for higher dimensions**

When the dimensions aren't the same it gets a bit more complicated.\
When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing (i.e. rightmost) dimensions and works its way left. Two dimensions are compatible when, 
* 1. they are equal, or
* 2. one of them is 1

In order to apply these rules we check the shape of the arrays (a.shape)

Shape 1: (1, 3)
Shape 2: (5, 3)

In this case we can clearly see that the rules hold so they are compatible and we could operate (addition, multiplication, ...) as we expect.

In [19]:
a = np.ones((6, 5))
print(a.shape)
a

(6, 5)


array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [20]:
b = np.arange(5).reshape((1, 5))
print(b.shape)
b

(1, 5)


array([[0, 1, 2, 3, 4]])

According to the rules they are compatible and if we perform a+b we expect the element-wise summation of the b array to every row 

In [21]:
a + b

array([[1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.]])

In [27]:
c = np.ones((5, 6))
print(c.shape)
c

(5, 6)


array([[1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.]])

In [26]:
d = np.arange(5).reshape((5, 1))
print(d.shape)
d

(5, 1)


array([[0],
       [1],
       [2],
       [3],
       [4]])

In this case the dimensions are switched so we expect it to sum d to every column of c

In [28]:
c + d

array([[1., 1., 1., 1., 1., 1.],
       [2., 2., 2., 2., 2., 2.],
       [3., 3., 3., 3., 3., 3.],
       [4., 4., 4., 4., 4., 4.],
       [5., 5., 5., 5., 5., 5.]])

**The same rules apply to PyTorhc**

In [29]:
at = torch.ones((6, 5))
print(at.shape)
at

torch.Size([6, 5])


tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]])

In [30]:
bt = torch.arange(5).reshape((1, 5))
print(bt.shape)
bt

torch.Size([1, 5])


tensor([[0, 1, 2, 3, 4]])

In [31]:
at + bt

tensor([[1., 2., 3., 4., 5.],
        [1., 2., 3., 4., 5.],
        [1., 2., 3., 4., 5.],
        [1., 2., 3., 4., 5.],
        [1., 2., 3., 4., 5.],
        [1., 2., 3., 4., 5.]])

### What if we don't have the same number of dimensions?

In the examples above we had two dims per array/tensor:\
(5, 6) and (5, 1)\
But this is not a condition for broadcasting to be possible

Image  (3d array): 256 x 256 x 3\
Scale  (1d array):             3\
Result (3d array): 256 x 256 x 3

If so, python will automatically fill the 'missing' dimension with 1 and the rules will apply as before

In [32]:
scale = torch.tensor([0.5, 1.5, 1]) # RGB format
scale

tensor([0.5000, 1.5000, 1.0000])

In [33]:
image = torch.rand((256, 256, 3))

In [40]:
result = image * scale
result.shape

torch.Size([256, 256, 3])

the result is also an image with the appropiate dimensions

In [43]:
image[0,0,:]

tensor([0.1264, 0.0715, 0.7159])

and if we check for example the first element we can see that the scale factor have worked as expected

In [45]:
result[0,0,:]

tensor([0.0632, 0.1072, 0.7159])

**Example of an array with 2 images**

Image  (4d array): 2 x 256 x 256 x 3\
Scale  (4d array): 2 x 1 x 1 x 3\
Result (3d array): 2 x 256 x 256 x 3

Now we want to apply a different filter to each image

In [47]:
image = torch.rand((2, 256, 256, 3))
scale = torch.tensor([1, 1, 1, 0.5, 0.5, 0.5]).reshape(2, 1, 1, 3)
scale

tensor([[[[1.0000, 1.0000, 1.0000]]],


        [[[0.5000, 0.5000, 0.5000]]]])

as we can see we are leaving the first image untouched and scaling by 0.5 every color (RGB) of the second one

In [48]:
result = image * scale
result.shape

torch.Size([2, 256, 256, 3])

In [49]:
# first pixel of the first image
image[0,0,0,:]

tensor([0.1902, 0.6314, 0.5439])

In [50]:
# first pixel of the second image
image[1,0,0,:]

tensor([0.5144, 0.5427, 0.7108])

In [51]:
result[0,0,0,:]

tensor([0.1902, 0.6314, 0.5439])

as we wanted the first image remains unchanged

In [52]:
result[1,0,0,:]

tensor([0.2572, 0.2714, 0.3554])

while the second image is scaled by 0.5

### Operations across dimensions 
Obviusly, basic operations can be done as usual in one 1dim tensor,

In [4]:
t = torch.tensor([0.5,1,3,4])
t.mean(), t.std(), t.max(), t.min()

(tensor(2.1250), tensor(1.6520), tensor(4.), tensor(0.5000))

But suppose we have a 2d tensor and want to compute the mean value of the columns

**Note:** Taking the mean of each column meanstaking the mean **across the rows** which are the first dimension.

In [5]:
t = torch.arange(20, dtype = float).reshape(5, 4)
t

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.],
        [12., 13., 14., 15.],
        [16., 17., 18., 19.]], dtype=torch.float64)

In [6]:
# Mean across the rows
torch.mean(t, axis = 0)

tensor([ 8.,  9., 10., 11.], dtype=torch.float64)

For arrays of higher dimensionality

In [8]:
t = torch.rand(5, 256, 256, 3) # five images
# shape = image x rows x columns x RGB

* Mean across the batch (number of images):

In [9]:
torch.mean(t, axis = 0).shape

torch.Size([256, 256, 3])

* Mean across the color channels:

In [10]:
torch.mean(t, axis = -1).shape

torch.Size([5, 256, 256])

* Taking only the **maximum** color channel value:\
Commonly use in image segmentation models

In [15]:
values, indices = torch.max(t, axis = -1)
values.shape

torch.Size([5, 256, 256])

### Where do PyTorch and NumPy differ?

The main advantage of PyTorch is its capacity to automatically compute gradients of operations, for example

$$
\begin{equation}
y = \sum_i x^{3}_i
\end{equation}
$$
has a gradient
$$
\begin{equation}
\frac{\partial y}{\partial x_i} = 3 x^{2}_i
\end{equation}
$$

In [16]:
x = torch.tensor([[5., 8.], [4., 6.]], requires_grad = True)
x

tensor([[5., 8.],
        [4., 6.]], requires_grad=True)

In [18]:
y = x.pow(3).sum()
y

tensor(917., grad_fn=<SumBackward0>)

Compute the gradient:

In [19]:
y.backward() # method of torch tensors
x.grad # the gradient is store in the initial tensor object and can be access as here

tensor([[ 75., 192.],
        [ 48., 108.]])

Double check with the analytical formula: 

In [20]:
3 * x * x

tensor([[ 75., 192.],
        [ 48., 108.]], grad_fn=<MulBackward0>)

### Additional benefits

Any sort of large matrix multiplication problem is faster with torch tensors than it is with numpy, especially if you are running on a GPU

In [2]:
A = torch.randn((1000, 1000))
B = torch.randn((1000, 1000))

t1 = time.perf_counter()
torch.matmul(A, B)
t2 = time.perf_counter()
print(t2-t1)

0.003537200012942776


Using Numpy

In [None]:
A = np.random.randn(int(1e4)).reshape(100, 100)
B = np.random.randn(int(1e4)).reshape(100, 100)

t1 = time.perf_counter()
A@B
t2 = time.perf_counter()
print(t2-t1)

# THE KERNEL DIES IF YOU RUN THIS