## 1. DEEP LEARNING
- Machine learning is turning things (data) into numbers and finding patterns. 
Traditional programming: From inputs and sets of rules, you can reproduce outputs; in a machine learning having inputs and a desired output you can find the underlying rules.  
- Why would you use machine learning? --> For a complex problem, there could be a lot of rules. As long as you can convert a problem into numbers, you can use machine learning.

### Rule #1 of ML
If you can build a simple rule-based system that doesn't require machine learning, do that. Machine learning is not a solution for everything. (GOOGLE HANDBOOK).
- DEEP LEARNING: When the traditional approach fails, it can adapt to new scenarios, when you want to discover insights within a large collection of data. 
- BUT: Deep learning is not good if you need explainability, because the patterns learned could be uninterpretable by a human. The outputs of deep learning model aren't always predictable, they are probabilistic, and you may have errors. Don't use it when you don't have enough data. 

###  Machine Learning VS Deep Learning
You want to use Machine learning with structured data (Such as spreadsheets) and Deep Learning is better for unstructured data (Image data, audio files, text).
- For machine learning you use algorithms such as Random forest, Neares neighbour, SVM.  
- For deep learning, you use neural networks, CNN, Transformers, RNN.  
Depending on how you represent your problem, many algorithms can be used for both.

### What are Neural Networks
Before data (input) gets used by NN it should be turned into numbers (known as numerical encoding). Then you will feed the NN which should have been chosen an appropiate architecture according to the problem). It will learn representation (patterns/features/weights). A feature can be almost everything. You can convert the numerical outputs into a human understandable terms. 
#### Anatomy of a neural network
Input layer, Hidden layer(s), Output layer (it learns representations or predicted probabilities).  
Each layer is usually a combination of linear and non-linear functions. 

### Types of Learning
In supervised learning, you have a lot of input data, and you also have what is the expected output of that given data, also known as labels. On the other hand, unsupervised learning the algorithm can learn patterns to figure out patterns of the similarities between different data, hence you don't have labels. Last but not least, there is Transfer Learning, in which you take the patterns learned before by a different architecture/task and take advantage of that for your own task. 

### Deep Learning Applications
Recommendations in youtube (The algorithm), translation tasks, speech recognition, computer vision (object detection), Natural Language Processing, for example detecting if an email is SPAM or not SPAM. You can do classification/regresion tasks. 

## 2. PyTorch
It is a popular research deep learning frameworks, with Python. You can access many pre-built deep learning models, for example with torchvision. Originally it was designed to use by Facebook/Meta but now it is also used by Tesla, microsoft. 58% of the actual work done in computer vision, it is done in PyTorch (to be consulted in paperswithcode trends in 2022).

What is a GPU/TPU? It is a Graphics Processing Unit which is very fast for numerical calculations. CUDA, is a parallel computing platform and application programming interface. TPU is Tensor Processing Unit. 

[**Tensor**](https://www.youtube.com/watch?v=f5liqUk0ZTw): It could be any representation of number, it is the fundamental block of PyTorch. 

In [3]:
import torch
print(torch.__version__)

1.13.0


### 2.1 Tensor Operations

In [4]:
# Create a tensor of values and add a number to it
tensor = torch.tensor([1, 2, 3])
tensor + 10

tensor([11, 12, 13])

In [5]:
# Multiply it by 10
tensor * 10

tensor([10, 20, 30])

Notice how the tensor values above didn't end up being tensor(`[110, 120, 130]`), this is because the values inside the tensor don't change unless they're reassigned.  
Let's subtract a number and this time we'll reassign the `tensor` variable.

In [8]:
# Subtract and Reassign
tensor = tensor - 10
tensor

tensor([-9, -8, -7])

In [10]:
# Add and reassign
tensor = tensor + 10
tensor

tensor([1, 2, 3])

PyTorch also has a bunch of built-in functions like `torch.mul()` (short for multiplcation) and `torch.add()` to perform basic operations. For example:  
``` python
torch.multiply(tensor, 10)
```
However, it's more common to use the operator symbols like * instead of `torch.mul()`

In [11]:
# Element-wise multiplication (each element multiplies its equivalent, index 0->0, 1->1, 2->2)
print(tensor, "*", tensor)
print("Equals:", tensor * tensor)

tensor([1, 2, 3]) * tensor([1, 2, 3])
Equals: tensor([1, 4, 9])


#### 2.1.1. Matrix multiplication 
PyTorch implements matrix multiplication functionality in the `torch.matmul()` method. _Note:_ "@" in Python is the symbol for matrix multiplication. The difference between **element-wise multiplication** and **matrix multiplication** is the addition of values.

In [12]:
tensor = torch.tensor([1, 2, 3])
tensor.shape

torch.Size([3])

In [13]:
# Element-wise matrix multiplication
print(tensor * tensor) 
# Matrix multiplication
print(torch.matmul(tensor, tensor))
# Can also use the "@" symbol for matrix multiplication, though not recommended
print(tensor @ tensor)

tensor([1, 4, 9])
tensor(14)
tensor(14)


### 2.2. Common Errors
Because much of deep learning is multiplying and performing operations on matrices and matrices have a strict rule about what shapes and sizes can be combined, one of the most common errors you'll run into in deep learning is shape mismatches. One example of error would be the next code snippet, because of dimensions mismatch.

``` python
# Shapes need to be in the right way  
tensor_A = torch.tensor([[1, 2],
                         [3, 4],
                         [5, 6]], dtype=torch.float32)

tensor_B = torch.tensor([[7, 10],
                         [8, 11], 
                         [9, 12]], dtype=torch.float32)

# Multiplication (3x2) x (3x2)
torch.matmul(tensor_A, tensor_B) # (this will error)
```

It is also possible to use `torch.mm()` as a shurtcut of `torch.matmul()`.  
In this example, the dimensions can be matched, one of the ways to do this is with a **transpose** (switch the dimensions of a given tensor).

You can perform transposes in PyTorch using either:

- `torch.transpose(input, dim0, dim1)` - where input is the desired tensor to transpose and dim0 and dim1 are the dimensions to be swapped.
- `tensor.T` - where tensor is the desired tensor to transpose.

In [18]:
tensor_A = torch.tensor([[1, 2],
                         [3, 4],
                         [5, 6]], dtype=torch.float32)

tensor_B = torch.tensor([[7, 10],
                         [8, 11], 
                         [9, 12]], dtype=torch.float32)

# View tensor_A and tensor_B
print(tensor_A)
print(tensor_B.T)

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])
tensor([[ 7.,  8.,  9.],
        [10., 11., 12.]])


In [19]:
# The operation works when tensor_B is transposed
print(f"Original shapes: tensor_A = {tensor_A.shape}, tensor_B = {tensor_B.shape}\n")
print(f"New shapes: tensor_A = {tensor_A.shape} (same as above), tensor_B.T = {tensor_B.T.shape}\n")
print(f"Multiplying: {tensor_A.shape} * {tensor_B.T.shape} <- inner dimensions match\n")
print("Output:\n")
output = torch.matmul(tensor_A, tensor_B.T)
print(output) 
print(f"\nOutput shape: {output.shape}")

Original shapes: tensor_A = torch.Size([3, 2]), tensor_B = torch.Size([3, 2])

New shapes: tensor_A = torch.Size([3, 2]) (same as above), tensor_B.T = torch.Size([2, 3])

Multiplying: torch.Size([3, 2]) * torch.Size([2, 3]) <- inner dimensions match

Output:

tensor([[ 27.,  30.,  33.],
        [ 61.,  68.,  75.],
        [ 95., 106., 117.]])

Output shape: torch.Size([3, 3])


The `torch.nn.Linear()` module (we'll see this in action later on), also known as a feed-forward layer or fully connected layer, implements a matrix multiplication between an input `x` and a weights matrix `A`.  

$y = x \cdot A^T +b$

Where:

- `x` is the input to the layer (deep learning is a stack of layers like `torch.nn.Linear()` and others on top of each other).
- `A` is the weights matrix created by the layer, this starts out as random numbers that get adjusted as a neural network learns to better represent patterns in the data (notice the "T", that's because the weights matrix gets transposed).
    - Note: You might also often see W or another letter like X used to showcase the weights matrix.
- `b` is the bias term used to slightly offset the weights and inputs.
- `y` is the output (a manipulation of the input in the hopes to discover patterns in it).

In [20]:
# Since the linear layer starts with a random weights matrix, let's make it reproducible (more on this later)
torch.manual_seed(42)
# This uses matrix multiplication
linear = torch.nn.Linear(in_features=2, # in_features = matches inner dimension of input 
                         out_features=6) # out_features = describes outer value 
x = tensor_A
output = linear(x)
print(f"Input shape: {x.shape}\n")
print(f"Output:\n{output}\n\nOutput shape: {output.shape}")

Input shape: torch.Size([3, 2])

Output:
tensor([[2.2368, 1.2292, 0.4714, 0.3864, 0.1309, 0.9838],
        [4.4919, 2.1970, 0.4469, 0.5285, 0.3401, 2.4777],
        [6.7469, 3.1648, 0.4224, 0.6705, 0.5493, 3.9716]],
       grad_fn=<AddmmBackward0>)

Output shape: torch.Size([3, 6])


- `in_features` relates to how many elements has each X sample. For example the tensor_A has 2 features each row.
- `out_features` how many elements will have every row now.

The linear multiplication will be done once per row in the tensor. 

### 2.3. Aggregation
There are some ways that allow to manipulate tensors, and aggregate (go from more values to less values) is one of this ways. 

#### 2.3.1. Max, Min, Mean, Sum
First we'll create a tensor and then find the max, min, mean and sum of it. 

Note: You may find some methods such as `torch.mean()` require tensors to be in `torch.float32` (the most common) or another specific datatype, otherwise the operation will fail.

In [25]:
# Create a tensor
x = torch.arange(0, 100, 10)
print(x) 

# AGGREGATION
print(f"Minimum: {x.min()}")
print(f"Maximum: {x.max()}")
# print(f"Mean: {x.mean()}") # this will error
print(f"Mean: {x.type(torch.float32).mean()}") # won't work without float datatype
print(f"Sum: {x.sum()}")

tensor([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])
Minimum: 0
Maximum: 90
Mean: 45.0
Sum: 450


You can also do the same as above with torch methods.
```python
torch.max(x), torch.min(x), torch.mean(x.type(torch.float32)), torch.sum(x)
```

#### 2.3.2. Positional min/max
You can also find the index of a tensor where the max or minimum occurs with `torch.argmax()` and `torch.argmin()` respectively.

This is helpful incase you just want the position where the highest (or lowest) value is and not the actual value itself (we'll see this in a later section when using the softmax activation function).

In [26]:
# Create a tensor
tensor = torch.arange(10, 100, 10)
print(f"Tensor: {tensor}")

# Returns index of max and min values
print(f"Index where max value occurs: {tensor.argmax()}")
print(f"Index where min value occurs: {tensor.argmin()}")

Tensor: tensor([10, 20, 30, 40, 50, 60, 70, 80, 90])
Index where max value occurs: 8
Index where min value occurs: 0


### 2.4. Tensor datatype
A common issue with deep learning operations is having your tensors in different datatypes. If one tensor is in `torch.float64` and another is in `torch.float32`, you might run into some errors.  
You can change the datatypes of tensors using `torch.Tensor.type(dtype=None)` where the dtype parameter is the datatype you'd like to use.

In [29]:
# Create a tensor and check its datatype
tensor = torch.arange(10., 100., 10.)
print(tensor.dtype)

# Create a float16 tensor
tensor_float16 = tensor.type(torch.float16)
print(tensor_float16) 

torch.float32
tensor([10., 20., 30., 40., 50., 60., 70., 80., 90.], dtype=torch.float16)


### 2.5. Reshaping
Often times you'll want to reshape or change the dimensions of your tensors without actually changing the values inside them.  
To do so, some popular methods are: reshape, view, stack, squeeze, unsqueeze, permute.

Why using them? Because deep learning models (neural networks) are all about manipulating tensors in some way. And because of the rules of matrix multiplication, if you've got shape mismatches, you'll run into errors. These methods help you make the right elements of your tensors are mixing with the right elements of other tensors.

In [31]:
x = torch.arange(1., 8.)
print(x, x.shape)

# ADDING AN EXTRA DIMENSION
x_reshaped = x.reshape(1, 7)
print(x_reshaped, x_reshaped.shape)

tensor([1., 2., 3., 4., 5., 6., 7.]) torch.Size([7])
tensor([[1., 2., 3., 4., 5., 6., 7.]]) torch.Size([1, 7])


In [32]:
# Change view (keeps same data as original but changes view)
# See more: 
z = x.view(1, 7)
z, z.shape

(tensor([[1., 2., 3., 4., 5., 6., 7.]]), torch.Size([1, 7]))

Changing the [view](https://stackoverflow.com/a/54507446/7900723) of a tensor with `torch.view()` really only creates a new view of the same tensor.  
So changing the view changes the original tensor too.

In [33]:
# Changing z changes x
z[:, 0] = 5
z, x

(tensor([[5., 2., 3., 4., 5., 6., 7.]]), tensor([5., 2., 3., 4., 5., 6., 7.]))

In [38]:
# STACK tensors on top of each other
x_stacked = torch.stack([x, x, x, x], dim=0) # Try changing dim to dim=1 and see what happens
# DIM 0 --> You add the vector as a row
# DIM 1 --> You add the vector as a column
print(x_stacked)

tensor([[5., 2., 3., 4., 5., 6., 7.],
        [5., 2., 3., 4., 5., 6., 7.],
        [5., 2., 3., 4., 5., 6., 7.],
        [5., 2., 3., 4., 5., 6., 7.]])


Removing all single dimensions from a tensor: To do so you can use `torch.squeeze()` (I remember this as squeezing the tensor to only have dimensions over 1). It is possible to do the reverse effect with `torch.unsqueeze()` to add a dimension value of 1

In [39]:
print(f"Previous tensor: {x_reshaped}")
print(f"Previous shape: {x_reshaped.shape}")

# Remove extra dimension from x_reshaped
x_squeezed = x_reshaped.squeeze()
print(f"\nNew tensor: {x_squeezed}")
print(f"New shape: {x_squeezed.shape}")

Previous tensor: tensor([[5., 2., 3., 4., 5., 6., 7.]])
Previous shape: torch.Size([1, 7])

New tensor: tensor([5., 2., 3., 4., 5., 6., 7.])
New shape: torch.Size([7])


**Permute:** can also rearrange the order of axes values with `torch.permute(input, dims)`, where the input gets turned into a view with new dimensions.

In [40]:
# Create tensor with specific shape
x_original = torch.rand(size=(224, 224, 3))

# Permute the original tensor to rearrange the axis order
x_permuted = x_original.permute(2, 0, 1) # shifts axis 0->1, 1->2, 2->0

print(f"Previous shape: {x_original.shape}")
print(f"New shape: {x_permuted.shape}")

Previous shape: torch.Size([224, 224, 3])
New shape: torch.Size([3, 224, 224])


### 2.6. Indexing
Sometimes you want to select specific data from tensors (for example, only the first column or second row). To do so, you can use indexing, which is really similar to NumPy arrays. In order to make a more general example, we create a tensor with depth = 3, so  multiple parts of indexing can be observed. 

In [51]:
x = torch.arange(1, 28).reshape(3, 3, 3)
print(x, x.shape)

tensor([[[ 1,  2,  3],
         [ 4,  5,  6],
         [ 7,  8,  9]],

        [[10, 11, 12],
         [13, 14, 15],
         [16, 17, 18]],

        [[19, 20, 21],
         [22, 23, 24],
         [25, 26, 27]]]) torch.Size([3, 3, 3])


In [55]:
# Let's index bracket by bracket
# It takes the channel, or the depth of it; x[1] would print the second submatrix
print(f"First square bracket:\n{x[0]}") 
print(f"Second square bracket: {x[0][0]}") 
print(f"Third square bracket: {x[0][0][0]}")

First square bracket:
tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])
Second square bracket: tensor([1, 2, 3])
Third square bracket: 1


In [68]:
# Get all values of 0th dimension and the 0 index of 1st dimension
# BECAUSE THERE ARE 3 CHANNELS, IT IS TAKING THE 1st row of the 3 channels
print(x[:, 0])

tensor([[ 1,  2,  3],
        [10, 11, 12],
        [19, 20, 21]])


In [67]:
# Get all values of 0th & 1st dimensions but only index 1 of 2nd dimension
print(x[:, :, 1])

tensor([[ 2,  5,  8],
        [11, 14, 17],
        [20, 23, 26]])


In [65]:
# Get all values of the 0 dimension but only the 1 index value of the 1st and 2nd dimension
print(x[:, 1, 1])

tensor([ 5, 14, 23])


For simplicity, we do the same example but now with just depth = 1 

In [72]:
x = torch.arange(1, 10).reshape(1, 3, 3)
print(x, x.shape)

# Get all values of 0th dimension and the 0 index of 1st dimension
# IT IS TAKING THE 1st row of all the channel
print(x[:, 0])

# Get all values of 0th & 1st dimensions but only index 1 of 2nd dimension
print(x[:, :, 1])

# Get all values of the 0 dimension but only the 1 index value of the 1st and 2nd dimension
print(x[:, 1, 1])

tensor([[[1, 2, 3],
         [4, 5, 6],
         [7, 8, 9]]]) torch.Size([1, 3, 3])
tensor([[1, 2, 3]])
tensor([[2, 5, 8]])
tensor([5])


### 2.7. Numpy / PyTorch
NumPy is a popular Python numerical computing library, PyTorch has functionality to interact with it nicely. The two main methods you'll want to use for NumPy to PyTorch (and back again) are:

- `torch.from_numpy(ndarray)` - NumPy array -> PyTorch tensor.
- `torch.Tensor.numpy()` - PyTorch tensor -> NumPy array.

In [77]:
import numpy as np
array = np.arange(1.0, 8.0)
tensor = torch.from_numpy(array)
array, tensor

(array([1., 2., 3., 4., 5., 6., 7.]),
 tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64))

By default, NumPy arrays are created with the datatype `float64` and if you convert it to a PyTorch tensor, it'll keep the same datatype (as above). However, many PyTorch calculations default to using `float32`.

If you want to convert your NumPy array (float64) -> PyTorch tensor (float64) -> PyTorch tensor (float32), you can use:

- `tensor = torch.from_numpy(array).type(torch.float32)`.

It will always keep the same data type when you do the conversion

In [78]:
array = np.arange(1.0, 8.0)
tensor = torch.from_numpy(array).type(torch.float32)
print(tensor.dtype)
array, tensor

torch.float32


(array([1., 2., 3., 4., 5., 6., 7.]), tensor([1., 2., 3., 4., 5., 6., 7.]))

In [79]:
# Tensor to NumPy array
tensor = torch.ones(7) # create a tensor of ones with dtype=float32
numpy_tensor = tensor.numpy() # will be dtype=float32 unless changed
tensor, numpy_tensor

(tensor([1., 1., 1., 1., 1., 1., 1.]),
 array([1., 1., 1., 1., 1., 1., 1.], dtype=float32))

### 2.8. Reproducibility
In neural networks and machine learning, randomness plays a big role.  
Well, pseudorandomness that is. Because after all, as they're designed, a computer is fundamentally deterministic (each step is predictable) so the randomness they create are simulated randomness (though there is debate on this too, but since I'm not a computer scientist, I'll let you find out more yourself).

How does this relate to neural networks and deep learning then?
We've discussed neural networks start with random numbers to describe patterns in data (these numbers are poor descriptions) and try to improve those random numbers using tensor operations (and a few other things we haven't discussed yet) to better describe patterns in data.

In short:
`start with random numbers -> tensor operations -> try to make better` (Iterative)

Although randomness is nice and powerful, sometimes you'd like there to be a little less randomness.
Why? So you can perform repeatable experiments. For example, you create an algorithm capable of achieving X performance. And then your friend tries it out to verify you're not crazy.

How could they do such a thing? That's where **reproducibility** comes in. In other words, can you get the same (or very similar) results on your computer running the same code as I get on mine?

Let's see a brief example of reproducibility in PyTorch. We'll start by creating two random tensors, since they're random, you'd expect them to be different right?

In [80]:
# Create two random tensors
random_tensor_A = torch.rand(3, 4)
random_tensor_B = torch.rand(3, 4)

print(f"Tensor A:\n{random_tensor_A}\n")
print(f"Tensor B:\n{random_tensor_B}\n")
print(f"Does Tensor A equal Tensor B? (anywhere)")
random_tensor_A == random_tensor_B

Tensor A:
tensor([[0.8016, 0.3649, 0.6286, 0.9663],
        [0.7687, 0.4566, 0.5745, 0.9200],
        [0.3230, 0.8613, 0.0919, 0.3102]])

Tensor B:
tensor([[0.9536, 0.6002, 0.0351, 0.6826],
        [0.3743, 0.5220, 0.1336, 0.9666],
        [0.9754, 0.8474, 0.8988, 0.1105]])

Does Tensor A equal Tensor B? (anywhere)


tensor([[False, False, False, False],
        [False, False, False, False],
        [False, False, False, False]])

The tensors come out with different values. But what if you wanted to created two random tensors with the same values. As in, the tensors would still contain random values but they would be of the same flavour.

That's where `torch.manual_seed(seed)` comes in, where seed is an integer (like 42 but it could be anything) that flavours the randomness. Let's try it out by creating some more flavoured random tensors.

In [83]:
import random

# Set the random seed
RANDOM_SEED = 42 # try changing this to different values and see what happens to the numbers below
torch.manual_seed(seed=RANDOM_SEED) 
random_tensor_C = torch.rand(3, 4)

# Have to reset the seed every time a new rand() is called 
# Without this, tensor_D would be different to tensor_C 

# THE NEXT LINE SETS PYTHON SEED EQUAL
torch.random.manual_seed(seed=RANDOM_SEED) # try commenting this line out and seeing what happens
random_tensor_D = torch.rand(3, 4)

print(f"Tensor C:\n{random_tensor_C}\n")
print(f"Tensor D:\n{random_tensor_D}\n")
print(f"Does Tensor C equal Tensor D? (anywhere)")
random_tensor_C == random_tensor_D

Tensor C:
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])

Tensor D:
tensor([[0.8823, 0.9150, 0.3829, 0.9593],
        [0.3904, 0.6009, 0.2566, 0.7936],
        [0.9408, 0.1332, 0.9346, 0.5936]])

Does Tensor C equal Tensor D? (anywhere)


tensor([[True, True, True, True],
        [True, True, True, True],
        [True, True, True, True]])