##### Master Degree in Computer Science and Data Science for Economics

# Quick introduction to neural networks as functions approximators

### Alfio Ferrara (materials from Darya Shlyk)

<p style="align: center;"><img src="http://upload.wikimedia.org/wikipedia/commons/9/96/Pytorch_logo.png" width=400 height=100></p>

## <h3 style="text-align: center;"><b>Installing PyTorch</b>

Go to [pytorch.org](https://pytorch.org/) and install PyTorch version choosing your system preferences.

In [4]:
import torch

PyTorch is a deep learning framework, build on 3 main components.

$$ PyTorch = NumPy + CUDA +Autograd$$

PyTorch works with ``tensor`` data structure, which is similar to NumPy's ``ndarray``, except that:
 1. tensors can run on GPUs or other hardware accelerators via [CUDA](http://en.wikipedia.org/wiki/CUDA)
 2. tensors are also optimized for automatic differentiation (Autograd)

Tensor is a multi-dimensional array:
- a scalar is a zero dimensional tensor : `` x = 9 # shape ([])``
- a vector, is a 1d tensor : `` y = [9, 5, 10] # shape ([3])``
- a matrix is a 2d tensor : `` z = [[9, 5, 10], [12, 6, 3]] # shape ([2, 3])``
- a 3d array is a 3d tensor: `` t = [[[9, 5, 10], [12, 6, 3]], [[7, 4, 11], [2, 13, 8]]] # shape ([2, 2, 3])``

## <h3 style="text-align: center;"><b> Creating a tensor</b>

There are a few ways to create a tensor, depending on your use case:
1. directly from pre-existing data, like Python list or any sequence, NumPy array, etc.
2. creating a tensor with a specific shape, filled with random or constant values
3. from another tensor, keeping its shape and type

 To create a tensor with pre-existing data, use ``torch.tensor()``

In [5]:
data = [1, 2, 3]
torch.tensor(data) #, dtype=torch.float32, device="cpu", requires_grad=True) # The data type is automatically inferred.

tensor([1, 2, 3])

### From NumPy to PyTorch and viceversa

In [6]:
import numpy as np
np_array = np.array(data)

t = torch.from_numpy(np_array)    # creates a tensor that shares storage with a NumPy array
t.numpy()                         # back to numpy

array([1, 2, 3])

To create a tensor with specific size, filled with random or constant values, use ``torch.* ``,\
passing a sequence of integers defining the desired ``shape`` of a tensor.\
``shape`` can be a variable number of integers OR a collection like a list or tuple.


In [7]:
shape = (3, 2) 

### Sampling values uniformly from [0, 1)

In [8]:
random_uniform = torch.rand(shape)                          # FloatTensor from Uniform[0, 1)
print(random_uniform)         

tensor([[0.1948, 0.3282],
        [0.7330, 0.0548],
        [0.1732, 0.3220]])


### Sampling integers uniformly from [low, high)

In [9]:
random_uniform = torch.randint(low=2, high=7, size=shape)   # IntTensor from Uniform[2, 7)
print(random_uniform)

tensor([[3, 4],
        [4, 5],
        [4, 5]])


### Sampling from normal distribution

In [10]:
random_normal = torch.randn(shape)                          # FloatTensor from Normal(0, 1) 
random_normal = torch.normal(mean=5, std=2, size=shape)     # FloatTensor from Normal(5, 2)
print(random_normal)

tensor([[3.4280, 2.5082],
        [4.3758, 3.7108],
        [6.0956, 5.8780]])


### Dummy tensors of zeros and ones

In [11]:
zero_tensor = torch.zeros(shape)                             # initialize with zeros
ones_tensor = torch.ones(shape)                              # initialize with ones
print(ones_tensor)

tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])


### Tensors of a specific data type

In [12]:
# shape MUST be a variable number of integers
float_tensor = torch.FloatTensor(3, 2)                    # dtype = float32 == torch.Tensor(shape)
int_tensor = torch.IntTensor(3, 2)                        # dtype = int32
long_tensor = torch.LongTensor(3, 2)                      # dtype = int64
bool_tensor = torch.BoolTensor(3, 2)                      # dtype = bool
print(bool_tensor)

tensor([[False, False],
        [False, False],
        [False, False]])


To create a tensor with the same size and type as another tensor, use ``torch.*_like`` 

In [13]:
t = torch.tensor(data, dtype=torch.float32)

new_tensor = torch.randn_like(t)                          # retains the shape and dtype of t
new_tensor = torch.rand_like(t)                           # retains the shape and dtype of t
new_tensor = torch.ones_like(t)                           # retains the shape and dtype of t
new_tensor = torch.zeros_like(t, dtype=torch.int)         # can override the datatype 
print(new_tensor)

tensor([0, 0, 0], dtype=torch.int32)


<h4 style="text-align: center;"><b> Changing the content of existing tensor </b>

Any PyTorch method with an underscore ``_`` modifies the object in place

In [14]:
t.zero_()                   # zero out tensor's content
t.uniform_()                # replace with values from a continuous random Uniform[from, to)
t.normal_(mean=3)           # replace with values from a Normal(mean, std) 
t.fill_(3)                  # fill-in tensor with a constant value

tensor([3., 3., 3.])

## <h3 style="text-align: center;"><b> Attributes of a Tensor </b>

Tensor attributes describe their shape, datatype, and the device on which they are stored.

In [15]:
print(t)
print("Shape:",   t.size()) # same as t.shape
print("Type:",    t.type())
print("Device:",  t.device)

tensor([3., 3., 3.])
Shape: torch.Size([3])
Type: torch.FloatTensor
Device: cpu


<h4 style="text-align: center;"><b> Changing tensor's type </b>

In [16]:
t = t.to(torch.float)
print(t)
print("Type:", t.type())

tensor([3., 3., 3.])
Type: torch.FloatTensor


Changing tensor's device

By default all tensors are allocated on the CPU memory, where all the computation is performed.\
If you have a GPU, you can access its memory via a specialized API called CUDA.\
Move tensors from cpu to cuda and viceversa by using the ``.to(device)`` method. 

In [17]:
print(f"CUDA: {torch.cuda.is_available()}")
print(f"MPS: {torch.mps.is_available()}")

CUDA: False
MPS: True


In [20]:
device = "mps" if torch.mps.is_available() else "cpu"

### Move tensor to device

In [21]:
t = t.to(device) # t.cpu() or t.cuda()
print("Device:",  t.device)

Device: mps:0


### Perform computations on GPU and move the results to CPU

In [22]:
a = torch.randn(100, 100).to(device)
b = torch.clone(a).to(device)
c = (a + b).cpu()
print(c, c.device)

tensor([[ 0.4959, -2.2448,  1.6864,  ...,  1.3226,  2.5204,  0.6146],
        [-2.0256, -1.4225,  2.1130,  ...,  6.8592,  2.3876, -3.1658],
        [-2.0399, -2.4856,  0.8643,  ...,  2.1949, -0.5820,  1.1643],
        ...,
        [-1.0166, -0.6130,  1.4332,  ...,  1.6380,  1.9591,  0.0935],
        [-1.2175, -0.9228,  2.7295,  ..., -0.4011, -1.6465,  3.1865],
        [-2.0832, -1.0881,  1.4536,  ...,  2.7709, -3.3966, -0.4639]]) cpu


### To perform computations all tensors need to be on the same device !

In [23]:
c + b # will raise an error: expected all tensors to be on the same device, but found two devices

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, mps:0 and cpu!

<h3 style="text-align: center;"><b> Example </b>

In [25]:
if torch.mps.is_available():
    device = "mps"
else:
    device = "cpu"

x = torch.FloatTensor(5, 2).uniform_().to(device)
y = torch.ones_like(x, device=device)
z = x + y
z = z.to("cpu", torch.float32)
print(z)

tensor([[1.3459, 1.1779],
        [1.9461, 1.3083],
        [1.3735, 1.9031],
        [1.4729, 1.3708],
        [1.0834, 1.0502]])


## <h3 style="text-align: center;"><b> Reshaping a tensor </b>

In [26]:
t = torch.arange(0, 10, step=1) # [start=0, end=10, step=1)
t

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

### Reshape from 1d to 2d 

``[10] -> [2, 5]``

In [28]:
y = t.view(2, 5) # y shares the storage with the original tensor t
y = t.reshape(2, 5) # y may share the storage with t (y is a view of t) or not ( y is a copy of t)
y

tensor([[0, 1, 2, 3, 4],
        [5, 6, 7, 8, 9]])

### Add a new dimension of size 1

``[2, 5] -> [2, 5, 1]``

In [29]:
b = y.unsqueeze(2) # same as y.unsqueeze(-1) where -1 means the last dimension 
print("Shape:", b.shape)
print(b)

Shape: torch.Size([2, 5, 1])
tensor([[[0],
         [1],
         [2],
         [3],
         [4]],

        [[5],
         [6],
         [7],
         [8],
         [9]]])


Expand dimension(s) of size 1 to a larger size

``[2, 5, 1] -> [2, 5, 10]``

In [30]:
b.expand(-1, -1, 10) # -1 means not changing the size of that dimension

tensor([[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
         [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
         [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
         [3, 3, 3, 3, 3, 3, 3, 3, 3, 3],
         [4, 4, 4, 4, 4, 4, 4, 4, 4, 4]],

        [[5, 5, 5, 5, 5, 5, 5, 5, 5, 5],
         [6, 6, 6, 6, 6, 6, 6, 6, 6, 6],
         [7, 7, 7, 7, 7, 7, 7, 7, 7, 7],
         [8, 8, 8, 8, 8, 8, 8, 8, 8, 8],
         [9, 9, 9, 9, 9, 9, 9, 9, 9, 9]]])

Exchange the order of dimensions 

``[2, 5, 1] -> [1, 5, 2]``

In [31]:
x = b.transpose(0, 2)    # == y.transpose(2, 0) specify 2 dimensions to swap
x = b.permute(2, 1, 0)   # specify the desired ordering for all dimensions
print("Shape:", x.shape)
print(x)

Shape: torch.Size([1, 5, 2])
tensor([[[0, 5],
         [1, 6],
         [2, 7],
         [3, 8],
         [4, 9]]])


Remove dimension(s) of size 1

``[1, 5, 2] -> [2, 5]``

In [32]:
y = x.squeeze(0)
print("Shape:", y.shape)
print(y)

Shape: torch.Size([5, 2])
tensor([[0, 5],
        [1, 6],
        [2, 7],
        [3, 8],
        [4, 9]])


Flatten 

``[2, 5] -> [10]``

In [33]:
# flatten
t = y.reshape(-1)
t

tensor([0, 5, 1, 6, 2, 7, 3, 8, 4, 9])

## <h3 style="text-align: center;"><b> Arithmetic operations on tensors </b>

| Python operator | PyTorch method |
|:-:|:-:|
|+| torch.add() |
|-| torch.sub() |
|*| torch.mul() |
|/| torch.div() |

### Element-wise operations on tensors

In [34]:
t + t
t - t
t * t
t / t

tensor([nan, 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [35]:
t.log()

tensor([  -inf, 1.6094, 0.0000, 1.7918, 0.6931, 1.9459, 1.0986, 2.0794, 1.3863,
        2.1972])

In [36]:
t.exp()

tensor([1.0000e+00, 1.4841e+02, 2.7183e+00, 4.0343e+02, 7.3891e+00, 1.0966e+03,
        2.0086e+01, 2.9810e+03, 5.4598e+01, 8.1031e+03])

In [37]:
t.sqrt()

tensor([0.0000, 2.2361, 1.0000, 2.4495, 1.4142, 2.6458, 1.7321, 2.8284, 2.0000,
        3.0000])

In [38]:
t.pow(2)

tensor([ 0, 25,  1, 36,  4, 49,  9, 64, 16, 81])

## <h3 style="text-align: center;"><b>[Dot product vs Matrix Multiplication](https://mkang32.github.io/python/2020/08/30/numpy-matmul.html) </b>

### Dot product
$$
\begin{bmatrix}
a_1 & a_2
\end{bmatrix}
\begin{bmatrix}
b_1 \\ b_2
\end{bmatrix}
=
a_1 b_1 + a_2 b_2
$$

### Matrix multiplication
$$
\begin{bmatrix}
a_{11} & a_{12} \\
a_{21} & a_{22} \\
\end{bmatrix}
\begin{bmatrix}
b_{11} & b_{12} \\
b_{21} & b_{22} \\
\end{bmatrix}
=
\begin{bmatrix}
a_{11}b_{11} + a_{12}b_{21} & a_{11}b_{12} + a_{12}b_{22} \\
a_{21}b_{11} + a_{22}b_{21} & a_{21}b_{12} + a_{22}b_{22} \\
\end{bmatrix}

$$

###Dot product 

 Computes the inner product of 1D tensors.\
 ``(10) @ (10) -> (1) ``

In [42]:
t = torch.Tensor([2, 4, 6])

print(t.size())
print(t.dot(t), t.dot(t).size())                                           # same as (t * t).sum()                             

torch.Size([3])
tensor(56.) torch.Size([])


### Matrix multiplication

 Performs a matrix multiplication of 2D tensors.\
 **Number of columns in the first tensor should match up with the number of rows in the second tensor!** \
 ``(2, 5) @ (5, 2) -> (2, 2) ``

In [43]:
torch.mm(y.transpose(0, 1), y)                      # same as y.T @ y   

tensor([[ 30,  80],
        [ 80, 255]])

Batch multiplication

Performs a batch matrix-matrix product of 3D tensors. \
``(2, 5, 1) @ (2, 1, 5) -> (2, 5, 5)``

In [44]:
batch_product = torch.bmm(b , b.transpose(1, 2))    # same as b @ b.transpose(1, 2)
batch_product

tensor([[[ 0,  0,  0,  0,  0],
         [ 0,  1,  2,  3,  4],
         [ 0,  2,  4,  6,  8],
         [ 0,  3,  6,  9, 12],
         [ 0,  4,  8, 12, 16]],

        [[25, 30, 35, 40, 45],
         [30, 36, 42, 48, 54],
         [35, 42, 49, 56, 63],
         [40, 48, 56, 64, 72],
         [45, 54, 63, 72, 81]]])

Universal solution for matrix multiplication<

``torch.matmul(tensor1, tensor2)`` a PyTorch equivalent of Python operator ``@``

In [45]:
torch.matmul(t, t)                                  # same as torch.dot(t, t)
torch.matmul(y.T, y)                                # same as torch.mm(y.T, y)
torch.matmul(b, b.transpose(1, 2))                  # same as torch.bmm(b, b.transpose(1, 2))

tensor([[[ 0,  0,  0,  0,  0],
         [ 0,  1,  2,  3,  4],
         [ 0,  2,  4,  6,  8],
         [ 0,  3,  6,  9, 12],
         [ 0,  4,  8, 12, 16]],

        [[25, 30, 35, 40, 45],
         [30, 36, 42, 48, 54],
         [35, 42, 49, 56, 63],
         [40, 48, 56, 64, 72],
         [45, 54, 63, 72, 81]]])

## <h3 style="text-align: center;"><b> Aggregating operations on tensors </b>

<h4 style="text-align: center;"><b> sum (dim) / mean (dim) </b>

In [46]:
batch_product.sum(dim=-1)            # collapsing columns

tensor([[  0,  10,  20,  30,  40],
        [175, 210, 245, 280, 315]])

In [47]:
batch_product.sum(dim=-2)            # collapsing rows

tensor([[  0,  10,  20,  30,  40],
        [175, 210, 245, 280, 315]])

After aggregating, you may end up with a one-element tensor,\
you can convert it to a Python numerical value using ``item()``.

In [48]:
batch_product.sum().item()

1325

<h4 style="text-align: center;"><b> max (dim) / min (dim)</b>

Returns a **namedtuple (values, indices)** \
where ``values`` is the maximum/minimum value of each row of the tensor in a given dimension.\
And ``indices`` is the index of each maximum/minimum value found (``argmin``).

In [49]:
values, indices = batch_product.max(dim=1)
print(values)
print(indices)

tensor([[ 0,  4,  8, 12, 16],
        [45, 54, 63, 72, 81]])
tensor([[0, 4, 4, 4, 4],
        [4, 4, 4, 4, 4]])


In [50]:
batch_product.max() 

tensor(81)

<h4 style="text-align: center;"><b> argmax (dim) / argmin (dim)</b> 

Returns the **indices** of the maximum/minimum values of a tensor across a dimension.

In [53]:
batch_product.argmax(dim=1)

tensor([[0, 4, 4, 4, 4],
        [4, 4, 4, 4, 4]])

<h4 style="text-align: center;"><b>topk ( k, dim, largest )</b> 

Returns a **namedtuple of (values, indices)** \
with values and indices of ``k`` largest/smallest elements of a tensor along ``dim``.

In [54]:
values, indices = batch_product.topk(k=1, dim=1, largest=False)
print("Values:", values)
print("Indices:", indices)

Values: tensor([[[ 0,  0,  0,  0,  0]],

        [[25, 30, 35, 40, 45]]])
Indices: tensor([[[0, 0, 0, 0, 0]],

        [[0, 0, 0, 0, 0]]])


<h3 style="text-align: center;"><b> Standard NumPy indexing and slicing </b>

In [55]:
tensor = torch.arange(15).view(5, 3)

print(f"First row: {tensor[0]}")
print(f"First column: {tensor[:, 0]}")
print(f"Last column: {tensor[..., -1]}")

tensor[:,1] = 0 # zero-out second column
print(tensor)

First row: tensor([0, 1, 2])
First column: tensor([ 0,  3,  6,  9, 12])
Last column: tensor([ 2,  5,  8, 11, 14])
tensor([[ 0,  0,  2],
        [ 3,  0,  5],
        [ 6,  0,  8],
        [ 9,  0, 11],
        [12,  0, 14]])


<h3 style="text-align: center;"><b> Masked indexing</b>

In [56]:
a = torch.arange(9)
b = torch.clone(a)      # creates a copy of tensor a

# generating random indices and performing assignment 
negative_indices = torch.randint(low=0, high=9, size=(4,))
b[negative_indices] = b[negative_indices] * (-1) 

# reshaping from 1d to 2d
a = a.view(3, 3)
b = b.view(3, 3)

print(a)
print(b)

tensor([[0, 1, 2],
        [3, 4, 5],
        [6, 7, 8]])
tensor([[ 0, -1, -2],
        [ 3,  4, -5],
        [ 6,  7, -8]])


### Computes element-wise equality

In [57]:
a.eq(b)                   # same as a == b 

tensor([[ True, False, False],
        [ True,  True, False],
        [ True,  True, False]])

Computes element-wise disequality 

In [58]:
a.ne(b)                    # same as a != b 

tensor([[False,  True,  True],
        [False, False,  True],
        [False, False,  True]])

Selecting elements via boolean mask

In [59]:
b[a > b]

tensor([-1, -2, -5, -8])

## <h3 style="text-align: center;"><b> Joining multiple tensors </b>

In [60]:
# create 2 tensors 
t1 = torch.arange(6).view(2, 3)
t2 = torch.arange(6).view(2, 3)
t1, t2

(tensor([[0, 1, 2],
         [3, 4, 5]]),
 tensor([[0, 1, 2],
         [3, 4, 5]]))

<h4 style="text-align: center;"><b>torch.cat ( (tensors), dim=0 )</b> 

In [61]:
print("Concat tensors one on top of the other (default):")
print(torch.cat((t1, t2), dim=0))

Concat tensors one on top of the other (default):
tensor([[0, 1, 2],
        [3, 4, 5],
        [0, 1, 2],
        [3, 4, 5]])


In [62]:
print("Concat tensors side by side:")
print(torch.cat((t1, t2), dim=1))

Concat tensors side by side:
tensor([[0, 1, 2, 0, 1, 2],
        [3, 4, 5, 3, 4, 5]])


<h4 style="text-align: center;"><b>torch.stack ( (tensors), dim=0 )</b> 

``dim`` is a new dimension to insert

In [63]:
torch.stack((t1, t2), dim=0)

tensor([[[0, 1, 2],
         [3, 4, 5]],

        [[0, 1, 2],
         [3, 4, 5]]])

**Difference between ``torch.stack`` and ``torch.cat``** :\
``torch.stack`` creates a new dimension to stack tensors, while ``torch.cat`` not.

## <h3 style="text-align: center;"><b> Practice </b>

1. Create a 2D tensor and add a batch dimension of size 1
2. Create a random tensor of shape 5x3 in the interval [3, 7)
3. Create a tensor with values from a normal with mean=0, std=3
4. Perform a batch product between 3D tensors
5. Return a batch matrix product between a 3D tensor and a 2D tensor