https://youtu.be/v43SlgBcZ5Y

In [2]:
import torch
import torch.nn
import numpy as np
import matplotlib.pyplot as plt
import time

## Numpy vs.Torch

Numpy arrays and pytorch tensors can be created in the same way

In [3]:
n = np.linspace(0,1,5)
t = torch.linspace(0,1,5)

In [4]:
n

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [5]:
t

tensor([0.0000, 0.2500, 0.5000, 0.7500, 1.0000])

In [6]:
n = np.arange(48).reshape(3,4,4)
t = np.arange(48).reshape(3,4,4)

In [7]:
n

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15]],

       [[16, 17, 18, 19],
        [20, 21, 22, 23],
        [24, 25, 26, 27],
        [28, 29, 30, 31]],

       [[32, 33, 34, 35],
        [36, 37, 38, 39],
        [40, 41, 42, 43],
        [44, 45, 46, 47]]])

In [8]:
t

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15]],

       [[16, 17, 18, 19],
        [20, 21, 22, 23],
        [24, 25, 26, 27],
        [28, 29, 30, 31]],

       [[32, 33, 34, 35],
        [36, 37, 38, 39],
        [40, 41, 42, 43],
        [44, 45, 46, 47]]])

Most importantly, they have the same broadcasting rules. In order to use pytorch (and even numpy) most efficiently, one needs to have a very strong grasp on the broadcasting rules.

## General Broadcasting rules

When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing (i.e. rightmost) dimensions and works its way left. Two dimensions are compatible when

1. they are equal, or
2. one of them is 1

Example: The following are compatible (both 6 dimensional)

- Shape1: (1,6,4,1,7,2)
- Shape2: (5,6,1,3,1,2)

observe that either the dimensions are same or one of them is 1 between Shape1 and Shape2

In [18]:
a = np.ones((6,5))
b = np.arange(5).reshape((1,5))
print(a.shape)
print(b.shape)

(6, 5)
(1, 5)


In [19]:
a

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [20]:
b

array([[0, 1, 2, 3, 4]])

In [21]:
a+b

array([[1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.]])

In [22]:
a = torch.ones((6,5))
b = torch.arange(5).reshape((1,5))
print(a.shape)
print(b.shape)

torch.Size([6, 5])
torch.Size([1, 5])


In [23]:
a

tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]])

In [24]:
b

tensor([[0, 1, 2, 3, 4]])

In [25]:
a+b

tensor([[1., 2., 3., 4., 5.],
        [1., 2., 3., 4., 5.],
        [1., 2., 3., 4., 5.],
        [1., 2., 3., 4., 5.],
        [1., 2., 3., 4., 5.],
        [1., 2., 3., 4., 5.]])

The arrays/tensors don't need to have the same number of dimensions. If one of the arrays/tensors has less dimensions than the other

Example: Scaling each other the color channels of an image by a different amount:<br>
&emsp; &emsp; Image (3d array): 256 x 256 x 3 <br>
&emsp; &emsp; Scale (1d array):             3 <br>
&emsp; &emsp; Result (3d array):256 x 256 x 3 <br>

Here it checks last dimension for the above mentioned rules

In [27]:
Image = torch.randn((256,256,3))
Scale = torch.tensor([0.5,1.5,1])

In [31]:
Result = Scale * Image
print(Result.shape)

torch.Size([256, 256, 3])


In [32]:
Result

tensor([[[ 0.4876, -1.2668,  0.9319],
         [-0.2365, -0.7776,  1.7210],
         [-1.2332,  0.4431, -1.4029],
         ...,
         [-0.3949,  1.2514,  0.8014],
         [-0.3161,  2.7752, -1.9957],
         [ 0.0271, -0.5577,  0.4232]],

        [[-1.2474,  2.8289, -0.6902],
         [-0.5624, -0.2220,  0.0614],
         [-1.0093,  1.3674, -2.0899],
         ...,
         [-0.3086, -0.2575,  0.1295],
         [ 0.9888, -2.3733,  0.9128],
         [ 0.6741,  1.5198,  0.9000]],

        [[ 0.6294, -0.5966, -0.5844],
         [ 0.6573, -0.2683,  0.3732],
         [ 0.6097, -2.4371, -1.5944],
         ...,
         [ 0.0132, -0.0130, -0.1795],
         [ 0.4778,  0.7356,  0.9148],
         [-0.1930, -0.1843, -0.3399]],

        ...,

        [[ 0.4235,  0.6058, -0.1965],
         [-0.8552,  0.0714, -1.0518],
         [ 0.8501,  0.2962, -0.7398],
         ...,
         [-0.6483,  0.4685, -0.4696],
         [-0.4783,  1.6980,  1.3606],
         [ 0.0914,  4.6713, -0.4690]],

        [[

Example: One has a an array of 2 images and wants to scale the color channels of each image by slightly different amount: <br>
&emsp; &emsp; Images (4d array):   2 x 256 x 256 x 3 <br>
&emsp; &emsp; Scales (4d array):   2 x 1 x 1 x 3 <br>
&emsp; &emsp; Results (4d array):  2 x 256 x 256 x 3 <br>

2 here signifies 2 images

In [33]:
Images = torch.randn((2,256,256,3))

# Say we want 2 different scales for 2 images, then we would define like this.
Scales = torch.tensor([0.5,1.5,1,1.5,1,0.5]).reshape((2,1,1,3))

In [34]:
Scales

tensor([[[[0.5000, 1.5000, 1.0000]]],


        [[[1.5000, 1.0000, 0.5000]]]])

In [35]:
Results = Images * Scales

In [36]:
Results

tensor([[[[ 0.6929,  1.4107,  0.8972],
          [ 0.4677, -0.0986, -0.2567],
          [-0.2831, -0.2370,  0.1464],
          ...,
          [ 0.2556,  2.5052,  1.5145],
          [ 0.7128, -2.1129, -1.1398],
          [ 0.3209,  1.6698, -0.5017]],

         [[ 0.3063, -1.8100, -0.2835],
          [-0.3262, -1.5199, -1.1615],
          [-0.8247,  0.1935,  0.3085],
          ...,
          [ 0.2399, -0.4343,  2.0732],
          [-0.2324,  1.3758,  0.8689],
          [ 0.1035,  1.0893,  0.1744]],

         [[-0.3944, -0.5098,  1.3413],
          [ 0.0570,  2.9195,  0.8431],
          [-0.2680,  3.8371, -1.2710],
          ...,
          [-0.6567,  0.0108, -0.3122],
          [-0.5072,  0.7982,  1.5671],
          [-0.8977, -1.1838, -0.2149]],

         ...,

         [[-0.2987,  0.9782, -0.9533],
          [-0.6666, -2.0188,  0.0994],
          [-0.1500, -1.0863,  0.2345],
          ...,
          [ 0.5902, -0.2546,  2.4463],
          [-0.1256,  0.9782, -0.9762],
          [-0.2759,  0

## Operations Across Dimensions

Of course simple operations can be done on 1D tensors just like on 1D arrays.

In [37]:
t = torch.tensor([0.5,1,1,3])
torch.mean(t), torch.std(t), torch.max(t), torch.min(t)

(tensor(1.3750), tensor(1.1087), tensor(3.), tensor(0.5000))

But suppose we have 2D tensor and we want to compute the mean value of each columns:

- Note: taking the mean of each column means taking the mean across the rows (which are the first dimension)

In [40]:
t = torch.arange(20, dtype = float).reshape((4,5))
torch.mean(t,axis=0)

tensor([ 7.5000,  8.5000,  9.5000, 10.5000, 11.5000], dtype=torch.float64)

This can be done for higher dimensions as well.

In [41]:
t = torch.randn(5,256,256,3)

Take the mean across the batch (i.e. size of the image batch = 5)

In [47]:
torch.mean(t,axis=0)


tensor([[[ 0.0642,  0.2051,  0.0551],
         [ 0.7510, -0.6714,  0.1091],
         [-0.5724,  0.2458, -0.3806],
         ...,
         [ 0.2940, -0.1770, -0.2411],
         [ 0.8691, -0.4686, -0.1377],
         [ 0.0602, -0.4241,  0.0614]],

        [[-0.4278,  0.2958, -0.9688],
         [-0.7137,  0.1241, -0.2354],
         [-0.1568, -0.4136,  0.5662],
         ...,
         [ 0.7637,  0.5564,  0.4509],
         [-0.9906, -0.6080, -0.2802],
         [-0.0775, -0.8193, -0.2493]],

        [[-0.4896,  0.0719,  0.9491],
         [-0.2043, -0.8947,  0.7929],
         [-0.0464, -0.0614,  0.5344],
         ...,
         [ 0.0168,  0.2275,  0.3948],
         [-0.6240,  0.2267,  0.4433],
         [ 0.0861,  0.2235,  0.1683]],

        ...,

        [[-0.0317,  0.2783,  0.8129],
         [ 0.3143, -0.7961,  0.0895],
         [-0.0102,  0.2944, -0.2746],
         ...,
         [-1.0010,  0.5071, -0.3565],
         [-1.0485,  0.1542,  0.2210],
         [ 0.2217,  0.0341,  0.6168]],

        [[

In [43]:
# This means taking mean of each color channel across all the images.
torch.mean(t,axis=0) # axis=0 ==> axis = -4 in this case (reverse indexing).

tensor([[[ 0.0642,  0.2051,  0.0551],
         [ 0.7510, -0.6714,  0.1091],
         [-0.5724,  0.2458, -0.3806],
         ...,
         [ 0.2940, -0.1770, -0.2411],
         [ 0.8691, -0.4686, -0.1377],
         [ 0.0602, -0.4241,  0.0614]],

        [[-0.4278,  0.2958, -0.9688],
         [-0.7137,  0.1241, -0.2354],
         [-0.1568, -0.4136,  0.5662],
         ...,
         [ 0.7637,  0.5564,  0.4509],
         [-0.9906, -0.6080, -0.2802],
         [-0.0775, -0.8193, -0.2493]],

        [[-0.4896,  0.0719,  0.9491],
         [-0.2043, -0.8947,  0.7929],
         [-0.0464, -0.0614,  0.5344],
         ...,
         [ 0.0168,  0.2275,  0.3948],
         [-0.6240,  0.2267,  0.4433],
         [ 0.0861,  0.2235,  0.1683]],

        ...,

        [[-0.0317,  0.2783,  0.8129],
         [ 0.3143, -0.7961,  0.0895],
         [-0.0102,  0.2944, -0.2746],
         ...,
         [-1.0010,  0.5071, -0.3565],
         [-1.0485,  0.1542,  0.2210],
         [ 0.2217,  0.0341,  0.6168]],

        [[

In [44]:
torch.mean(t, axis=0).shape

torch.Size([256, 256, 3])

Take the mean across the color channels (i.e. avg of red, blue and green)

In [45]:
torch.mean(t, axis=-1) #reverese indexing, axis = -1 ==> 3

tensor([[[-0.1617,  0.3292,  0.0342,  ...,  0.6304,  0.4107,  0.1644],
         [ 0.4738, -0.2491, -0.3601,  ...,  1.7603, -1.2306, -1.1799],
         [ 0.2525,  0.8443,  0.3030,  ..., -0.5203, -0.4542,  0.0079],
         ...,
         [ 1.0589, -0.0463, -0.6647,  ..., -0.6272, -0.8201, -0.3166],
         [-0.5265,  0.0766,  0.7740,  ...,  0.5703, -1.0215,  0.7784],
         [-0.3792,  0.0615,  0.4251,  ...,  0.8647, -0.6191,  0.6757]],

        [[ 0.0559,  0.2359, -0.5436,  ...,  0.5156, -0.6520, -0.0832],
         [-0.0943, -0.9542,  0.4693,  ..., -0.1839,  0.0130, -0.4686],
         [ 0.1807, -0.5179, -0.2068,  ..., -0.1632, -0.0118,  0.2821],
         ...,
         [-0.1721, -0.4227,  0.2760,  ..., -0.0801,  0.0842,  0.1807],
         [-0.7419, -0.0790,  0.4206,  ...,  0.1439,  0.6987, -0.1322],
         [ 0.1527,  0.0508, -0.0990,  ..., -0.4904,  0.6641, -0.2374]],

        [[ 0.2655,  0.4166, -0.5444,  ..., -0.9541, -0.1425,  0.2583],
         [-0.2766, -0.4614,  0.6729,  ...,  0

In [46]:
torch.mean(t, axis=-1).shape


torch.Size([5, 256, 256])

Take only the maximum color channel values (and get the corresponding indices):
- This is done all the time in image segmentation models (i.e. take an image, decide which pixels correspond to, say, a car)

In [48]:
values, indices = torch.max(t, axis=-1) #torch.max() returns 2 variables.

In [49]:
values

tensor([[[ 3.5988e-01,  7.1235e-01,  7.2779e-01,  ...,  1.0628e+00,
           2.2426e+00,  1.5064e+00],
         [ 2.2642e+00,  5.1064e-01,  3.6486e-01,  ...,  2.3695e+00,
          -8.0365e-01, -2.0256e-01],
         [ 1.4702e+00,  1.4896e+00,  8.5115e-01,  ..., -1.4374e-01,
           1.5337e-01,  1.0135e+00],
         ...,
         [ 1.3137e+00,  2.5316e-01, -4.0980e-01,  ...,  4.7765e-01,
           4.3091e-01,  4.7359e-01],
         [-4.3098e-01,  1.7360e+00,  1.7577e+00,  ...,  1.0432e+00,
           3.5823e-01,  1.2910e+00],
         [ 1.0890e+00,  2.6363e+00,  2.1503e+00,  ...,  1.5021e+00,
           1.9044e+00,  1.9849e+00]],

        [[ 8.5424e-01,  1.9954e+00,  3.6158e-01,  ...,  2.9042e+00,
           7.0042e-01,  4.7188e-01],
         [ 1.7775e+00,  1.9370e-01,  2.8727e+00,  ...,  8.5829e-01,
           4.1965e-01,  4.4634e-02],
         [ 1.9483e+00,  7.4862e-01,  3.4150e-02,  ...,  7.7793e-01,
           5.1307e-01,  1.2506e+00],
         ...,
         [ 1.0090e-01,  5

In [50]:
indices # 0,1,2 means red, green, blue etc

tensor([[[0, 0, 1,  ..., 0, 0, 2],
         [1, 1, 2,  ..., 0, 1, 2],
         [2, 2, 1,  ..., 2, 2, 0],
         ...,
         [1, 2, 1,  ..., 1, 2, 2],
         [1, 1, 0,  ..., 2, 2, 1],
         [2, 2, 1,  ..., 0, 0, 2]],

        [[1, 0, 0,  ..., 1, 0, 2],
         [0, 2, 2,  ..., 0, 1, 2],
         [2, 2, 0,  ..., 1, 1, 1],
         ...,
         [1, 0, 2,  ..., 1, 2, 1],
         [2, 1, 2,  ..., 1, 1, 1],
         [0, 1, 0,  ..., 0, 0, 2]],

        [[0, 0, 2,  ..., 0, 2, 0],
         [1, 1, 2,  ..., 1, 1, 2],
         [1, 2, 2,  ..., 1, 1, 2],
         ...,
         [2, 0, 0,  ..., 2, 1, 2],
         [0, 2, 1,  ..., 0, 2, 1],
         [1, 1, 1,  ..., 0, 2, 1]],

        [[2, 2, 1,  ..., 2, 0, 1],
         [1, 1, 1,  ..., 1, 2, 0],
         [2, 2, 2,  ..., 0, 1, 1],
         ...,
         [2, 0, 1,  ..., 1, 1, 2],
         [0, 1, 1,  ..., 0, 2, 0],
         [2, 2, 1,  ..., 1, 2, 2]],

        [[2, 2, 1,  ..., 0, 0, 0],
         [2, 1, 2,  ..., 0, 0, 0],
         [1, 0, 1,  ..., 2

In [51]:
values.shape, indices.shape

(torch.Size([5, 256, 256]), torch.Size([5, 256, 256]))

## So where do Pytorch and Numpy differ?

Pytorch starts to really differ from numpy in terms of automatically computing gradients of operations

$$y = \sum_i x_i^{3}$$
has a gradient <br>
$$ \frac{\partial y}{\partial x} = 3x_i^2 $$

In [59]:
x = torch.tensor([[5,4], [3,2]],dtype=float, requires_grad=True)
x

tensor([[5., 4.],
        [3., 2.]], dtype=torch.float64, requires_grad=True)

In [60]:
y = x.pow(3).sum()
y

tensor(224., dtype=torch.float64, grad_fn=<SumBackward0>)

In [61]:
y.backward() #compute gradients
x.grad #print the gradient (everything that has happed to x, hence gradient of x stored in x)

tensor([[75., 48.],
        [27., 12.]], dtype=torch.float64)

In [62]:
3*x.pow(2) #this matches with above result, hence pytorch has calculated gradients correctly

tensor([[75., 48.],
        [27., 12.]], dtype=torch.float64, grad_fn=<MulBackward0>)

The automatic computation of gradients is the backbone of training deep learning models. Unlike in the example above, most gradient computations don't have an analytical formula, so the automatic computation of gradients is essential. In general, if one has

$$y = f(\vec x)$$

then pytorch can compute ${\partial y} / {\partial x}$
. For each of element of the vector 
. In the context of machine learning, 
 contains all the weights (also known as parameters) of the neural network and  is the Loss Function of the neural network.

## Additional Benefits

**In addition, any sort of large matrix multiplication problem is faster with torch tensors than it is with numpy arrays, especially if you're running on a GPU**

Using torch: faster with CPU
Using torch: Much much faster with GPU

In [63]:
A = torch.randn((1000,1000))
B = torch.randn((1000,1000))


In [65]:
t1 = time.perf_counter()
torch.matmul(A,B)
t2 = time.perf_counter()
print(t2 - t1)

0.014809000014793128


In [66]:
A = np.random.randn(int(1e6)).reshape((1000,1000))
B = np.random.randn(int(1e6)).reshape((1000,1000))


In [68]:
t1 = time.perf_counter()
A@B
t2 = time.perf_counter()
print(t2 - t1)

0.059518199996091425


Pytorch is more than 4 times faster here. But this comes at the expense of memory (RAM) as tensor objects are heavier than numpy arrays.