<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Module-1-:-Tensor-basics" data-toc-modified-id="Module-1-:-Tensor-basics-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Module 1 : Tensor basics</a></span><ul class="toc-item"><li><span><a href="#Creating-Tensors" data-toc-modified-id="Creating-Tensors-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Creating Tensors</a></span></li><li><span><a href="#Manipulating-tensors" data-toc-modified-id="Manipulating-tensors-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Manipulating tensors</a></span><ul class="toc-item"><li><span><a href="#Indexing" data-toc-modified-id="Indexing-1.2.1"><span class="toc-item-num">1.2.1&nbsp;&nbsp;</span>Indexing</a></span></li><li><span><a href="#Element-wise-operations" data-toc-modified-id="Element-wise-operations-1.2.2"><span class="toc-item-num">1.2.2&nbsp;&nbsp;</span>Element wise operations</a></span></li><li><span><a href="#Matrix-multiplication-(2D-tensors)" data-toc-modified-id="Matrix-multiplication-(2D-tensors)-1.2.3"><span class="toc-item-num">1.2.3&nbsp;&nbsp;</span>Matrix multiplication (2D tensors)</a></span></li></ul></li><li><span><a href="#Broadcasting" data-toc-modified-id="Broadcasting-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Broadcasting</a></span></li></ul></li></ul></div>

# Module 1 : Tensor basics 

In [26]:
import torch 
import torch.nn as nn
import numpy as np

The core of Pytorch its implementation of [tensors](https://en.wikipedia.org/wiki/Tensor). These tensors are multidimensional arrays containing data of a similar [type](https://pytorch.org/docs/stable/tensors.html). They highly similar to numpy ndarrays with some exceptions :

- Pytorch tensors can be operated on CUDA-capable Nvidia GPUs (faster matrix multiplication)
- Gardient computation with [torch.autograd](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html)
- Usually Pytorch tensors used for deep learning whereas numpy ndarrays are used for classical machine learning


The idea of this first module is to provide tensors basics. We will then in subsequent modules use this basics to gradually build most important Pytroch stuff 


This section will cover the basics of tensors in Pytorch. They are the building block of most of the library. Their manipulation is highly similar to numpy arrays so I will not got too much in detail

As stated tensors are multidimensional arrays. A tensor has a rank (number of dimensions) and each dimension has a size. For example a matrix is a rank 2 tensor with varying dimension (nxm)

In [25]:
matrix = torch.randn(10, 5)
matrix, matrix.shape

(tensor([[ 0.7312, -0.0998, -0.2789, -0.2186,  0.5331],
         [ 0.9110, -0.8401,  0.0840, -0.2927, -0.3597],
         [-0.2645, -0.0855, -0.9046,  1.3065, -0.9253],
         [-1.8455,  0.4012,  0.8345,  0.9345,  0.4251],
         [-0.0872, -0.4266, -1.0446,  1.6130,  0.0269],
         [ 0.9529,  0.8912, -0.7392, -0.9767, -0.6420],
         [ 1.2685, -0.8448, -1.9350,  0.1971,  1.3197],
         [-0.1816,  0.1549, -0.8650,  0.1292, -0.1952],
         [ 0.3840,  1.8032,  0.0381, -0.9023, -0.1485],
         [-0.5200, -1.2285,  1.3115,  1.8989, -2.0948]]),
 torch.Size([10, 5]))

## Creating Tensors

From a python array

In [2]:
m = [[1, 2, 3], [4, 5, 6]]
t = torch.tensor(m)
print(t, t.dtype)

tensor([[1, 2, 3],
        [4, 5, 6]]) torch.int64


From a numpy array

In [3]:
m = np.random.randn(5, 2)
t = torch.as_tensor(m)
print(t)

tensor([[ 0.8401,  0.8451],
        [ 1.0272, -0.0291],
        [-0.2754,  0.2021],
        [-1.1618,  0.3961],
        [ 0.0422,  0.5322]], dtype=torch.float64)


Be careful, the precision when constructing a tensor from numpy is 64 bits. Usually model weight matrices are of dtype float32 which make is not possible to apply it to float64 tensors 

In [4]:
#model = nn.Linear(2, 1)
#model(t)

In [5]:
#%debug

Random tensor from normal distribution of given shape

In [6]:
t = torch.randn(10, 1)
print(t, t.shape)

tensor([[-0.2493],
        [ 0.4139],
        [ 1.0771],
        [-0.0745],
        [ 0.4799],
        [-0.6022],
        [-0.3491],
        [ 0.9839],
        [-1.7531],
        [ 1.1543]]) torch.Size([10, 1])


Other similar to numpy 

In [7]:
??torch.zeros

In [None]:
??torch.zeros_like

In [None]:
??torch.ones

In [None]:
??torch.ones_like

In [36]:
??torch.eye

Creating a tensor with requires_grad=True will have the effect that operations on it will be recorded so that the gradient of this tensor with respect to these operations can be automatically computed 

In [8]:
w = torch.tensor([1], dtype=torch.float, requires_grad=True)
loss = (w * 2) - 1.5
loss.backward()

In order to call .backward() on a pytorch tensor loss :

1. one of the tensors of the computation leading to loss must have requires_grad=True
2. loss has to be a scalar !

In [9]:
#w = torch.tensor([1], dtype=torch.float)
#loss = (w * 2) - 1.5
#loss.backward()

In [10]:
#w = torch.tensor([1], dtype=torch.float, requires_grad=True)
#loss = (w * 2) - torch.tensor([1.5, 2])
#loss.backward()

This automatic differentiation the basic of deep learning with pytorch

Finally, the tensors can be stored and manipulated on the gpu 

In [11]:
t = torch.randn(1000, 1000, device='cuda')
t.device

device(type='cuda', index=0)

In [12]:
t = torch.randn(1000, 1000)
t = t.to('cuda')
t.device

device(type='cuda', index=0)

In [13]:
t = t.to('cpu')
t.device

device(type='cpu')

Matrix multiplication is way faster on gpu 

In [14]:
a, b = torch.randn(500, 100, device='cuda'), torch.randn(100, 500, device='cuda')

In [59]:
%%timeit
a @ b

13.9 µs ± 42.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [60]:
a, b = torch.randn(500, 100), torch.randn(100, 500)

In [61]:
%%timeit
a @ b

116 µs ± 814 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


Around 10 times slower on cpu ! 

## Manipulating tensors

### Indexing 

Indexing for Pytorch tensors is similar to numpy array indexing

In [15]:
t = torch.randn(5, 2)
t

tensor([[-0.3915, -2.4807],
        [-0.2013, -0.0578],
        [-0.0894,  0.6247],
        [-0.2787, -0.1527],
        [ 0.8075, -0.4570]])

Getting one element

In [16]:
t[0, 0]

tensor(-0.3915)

Getting sub-tensors (rows or columns for instance) 

In [17]:
t[:, 0], t[0, :]

(tensor([-0.3915, -0.2013, -0.0894, -0.2787,  0.8075]),
 tensor([-0.3915, -2.4807]))

In [18]:
rows = [0, 2, 4]
t[rows, :]

tensor([[-0.3915, -2.4807],
        [-0.0894,  0.6247],
        [ 0.8075, -0.4570]])

### Element wise operations

In [19]:
A, B = torch.zeros(5, 5), torch.ones(5, 5)

In [20]:
A + B

tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]])

In [21]:
A - B

tensor([[-1., -1., -1., -1., -1.],
        [-1., -1., -1., -1., -1.],
        [-1., -1., -1., -1., -1.],
        [-1., -1., -1., -1., -1.],
        [-1., -1., -1., -1., -1.]])

In [22]:
A * B

tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])

In [23]:
A / B

tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])

In [24]:
A == B

tensor([[False, False, False, False, False],
        [False, False, False, False, False],
        [False, False, False, False, False],
        [False, False, False, False, False],
        [False, False, False, False, False]])

Aggregating dimensions

In [84]:
B.mean(dim=1), B.std(dim=1), B.sum(dim=1)

(tensor([1., 1., 1., 1., 1.]),
 tensor([0., 0., 0., 0., 0.]),
 tensor([5., 5., 5., 5., 5.]))

### Matrix multiplication (2D tensors)

In [62]:
A, B = torch.randn(5, 4), torch.randn(4, 5)

In [63]:
A @ B

tensor([[-0.2753, -1.2055, -2.0163, -0.7923,  0.2999],
        [ 0.2324,  0.2577, -1.2488, -1.4069, -1.8431],
        [-0.3433,  2.5036,  1.2630, -0.0207, -1.7407],
        [-0.1832,  0.7080,  2.1643,  1.7384, -1.1997],
        [-0.4892,  2.5058,  0.1570, -0.9452, -0.4198]])

In [64]:
torch.mm(A, B)

tensor([[-0.2753, -1.2055, -2.0163, -0.7923,  0.2999],
        [ 0.2324,  0.2577, -1.2488, -1.4069, -1.8431],
        [-0.3433,  2.5036,  1.2630, -0.0207, -1.7407],
        [-0.1832,  0.7080,  2.1643,  1.7384, -1.1997],
        [-0.4892,  2.5058,  0.1570, -0.9452, -0.4198]])

In [66]:
A.mm(B)

tensor([[-0.2753, -1.2055, -2.0163, -0.7923,  0.2999],
        [ 0.2324,  0.2577, -1.2488, -1.4069, -1.8431],
        [-0.3433,  2.5036,  1.2630, -0.0207, -1.7407],
        [-0.1832,  0.7080,  2.1643,  1.7384, -1.1997],
        [-0.4892,  2.5058,  0.1570, -0.9452, -0.4198]])

In [67]:
torch.matmul(A, B)

tensor([[-0.2753, -1.2055, -2.0163, -0.7923,  0.2999],
        [ 0.2324,  0.2577, -1.2488, -1.4069, -1.8431],
        [-0.3433,  2.5036,  1.2630, -0.0207, -1.7407],
        [-0.1832,  0.7080,  2.1643,  1.7384, -1.1997],
        [-0.4892,  2.5058,  0.1570, -0.9452, -0.4198]])

## Broadcasting