# Laymanz Notebooks: Training your first deep learning model
Author: Ali Ahmad & Amrbose Ling

**Our goal is to get rid of abstractions and black boxes when learning about ML**

**What is this notebook about?**

In this notebook, we will go over some of the fundamental ideas behind deep learning, neural networks, model training, inference, backpropagation, loss functions, optimizers.

**What do I need to set up my environment?**

All of our notebooks will only use numpy, pytorch, matplotlib for visualizations. If you are very eager to learn about what PyTorch is and how it works, check out this super detailed notebook on PyTorch! If you are running this on colab you can just import the packages, if you are running this notebook locally , just remember to `pip install numpy torch matplotlib`. Check [here](https://pytorch.org/get-started/locally/) to see which torch version depending on the hardware you have.

**How is this notebook structured?**

Each notebook will have

[**How to use matplotlib for plotting**](https://colab.research.google.com/github/amanchadha/aman-ai/blob/master/matplotlib.ipynb#scrollTo=1-AcMM6NSmP-)


## Breakdown
*   What is deep learning?
*   What is a neuron? What is an activation?
*   What is a neural network?
*   What is a multi-layer perceptron?
*   What is a computation graph?
*   What is autograd?
*   What is a forward pass?
*   What is a loss function?
*   What is an optimizer?




In [13]:
import numpy as np
import torch
import torch.nn as nn
from torchvision.datasets import MNIST
from torch.utils.data import DataLoader,Dataset

# What is deep learning?

You can find many definitions online, but here is my definition:
Deep learning involves extracting meaningful insights from data using deep neural networks.
Deep learning is a branch of Machine Learning that specializes in the use of neural networks to make predictions.


# What can deep learning be used for?
* Classification
    - Binary classification
    - Multiclass classification
    - Multilabel classficiation
* Regression
    - Linear Regression
    - Logistic Regression
    - Polynomial Regression

Ambrose's interpretation:
Deep learning is about feeding a neural network a bunch of data, the neural network produces some 
output, we see how wrong they were compared to what we want, we tell it "hey you were wrong you gotta change how you think so you get it right next time!" then we change it, we repeat this until we iterate through all of our data.

The usual receipe for a deep learning task:
Step 1: Find a bunch of data for the thing you want to train your model on
Step 2: Preprocess the data to turn it into the right form
Step 3: Train the model by feeding it the data
Step 4: Evaluate it, find ways to make it better or train it again 

# What is a neuron?
A neuron are nerve cells in our brains, responsible for transmitting electrical signals from one part of the brain to another. 

## Properties of neurons:
- They receive their signals via their dendrites
- They have snynapses that module the electrical signals it receives (between dendrites and axons)
- They fire an output signal only when the total strength of the input signal exceed a certain threshold

# What is a perceptron?

A perceptron is a mathematical model of a biological neuron. Hence we use mathematical operations to model the properties of a neuron.

## Properties of a perecptron?
- electrical signals are represented by numerical values (some vector)
- modulation is modelled by multiplying some weight value to the input signal / values
- we model the total strength of a signal by performing a weighted sum of the inputs 
- we apply an activation function or a step function to model the firing of the signal upon some threshold
- it is believed that neurons remain inactive until the net input to the cell body reaches a certain threshold

<center><img src="https://miro.medium.com/v2/resize:fit:2902/format:webp/1*hkYlTODpjJgo32DoCOWN5w.png" height=230 width=500/></center>
<center> On the left you have the biological neuron, on the right you have the artificial neuron </center>

### How do we represent a perceptron mathematically?

$$
y = \sigma ( \Sigma w_i x_i + b )
$$

where : 
* y: represents the output from the neuron
* w represents the weight associated with this neuron
* x: represents the input to the neuron
* b: represets the bias added to each 

Ambrose's intuition:
Think of each neuron having its own behaviour or its own state, it behaves differently from other neurosn which is why each
has a different weight and bias associated with it.


### How do we represent all these quantities mathematically?
Think of different ways you may want to represent inputs, and I can list some examples for some of the most common machine learning applications, all these are different modalities. Since we mentioned that the key to 
  

Lets try to develop an intuition for how you would quantitatively represent data?

1. Usecase: Predicting house prices 
Scenario: Lets say i want to predict the price of houses in Toronto given some information of a house(expensive af)
What does data look like:  For this scenario, I want to probably find some way to represent that information of a house **quantitatively**
What does input look like: Some collection of numbers that represent some characteristics of the home
What does output look like: A number (float/double) representating the price of the home
A house can be represented by:
    - int: how many bedrooms are in this house
    - int: how many square feet is this house
    - int: how many bathrooms it has
    - String: where is this house (the location)
    - String: the population of the city it is in

2.  Usecase: Predicting postiive and negative sentiment from text
Scenario: Lets say i want to predict if a tweet contains harmful or positive intent right, you know those goddamn politicans
What does data look like? In this scenario, the data would probably be the tweets themselves, which is a series of strings.


3. Usecase: Predicting the price of Bitcoin 
Scenario: Lets say i want to predict if the price of bitcoin. 
What does input look like: A collection of numbers representing past bitcon pries
The price of bitcoin can be represented by:
    - list of ints: 


4. Usecase: 

### Model Width VS Model Depth

**Larger width** (vertically, more neurons): usually means that the neural network has the capacity to remember more feature or encode more features in the weight connecitons or weight matrix. 
* With more neurons in each layer, you can capture more delicate details with more neurons. 
* Think of the more neurons you have in each layer, you have more processing units and internal states corresponding to each input unit. 
* Think of if you have 1 x 10 input, 15 neurons VS 1 x 10 input, 100 neurons, you have much more overall weight connections to the input, so you can capture each intricate value in the input more precisely with more weights.
* Think of F1 cars right when they get to the pit stop, if you have only 5 people changing the tires, wiping the windows, pumping gas VS when you have 20 people , you would be be able to be more precise about what you do to the car, 5 people would probably capture less of the tasks they need to do on the car. People (neurons), the car (input).

**Larger depth** (more layers), usually means that the neural network can remember or encode more complex,high dimensional features from the training data.

* The way i think could help also understand is that think of width as having more functions to model your data, but more functions do not necessarily mean that the functions are more complex. 
* Increasing the width is similar to going from 1 function: y = mx+b to 20  functions, 20 y = wx+b's. You may be able to capture specific changes in the input data better with more lines. But they are still linear functions and it would always be linear
* But with greater depth, you can chain non-linearities together as you apply activation functions and additional weights. So now think of 20 y = (wa(wa(wx+b)+b)+b), this entire function gets more and more complex and you introduce more non-linearity at each layer.
* So going from 20 y=wx+b's to 20 y = (wa(wa(wx+b)+b)+b) is what enables you to capture a much more complicated function that fits your training data.

### Rank
Rank refers to the number of dimensions a tensor has

- Rank 0:  referred as a **scalar**, 1 single value
- Rank 1:  referred as a **vector**, 1 list of values
- Rank 2:  referred as a **matrix**, 1 2d array of values
- Rank 3:  referred as a **3D tensor**, you can think of as a cube of numbers or a stack of a number of 2D matrices


### Shape 
Shape is an array of numbers representing the length of each dimension. 
You can understand this as the number of elements along each dimension

```python
x = np.array([[[1],[2]],[[3],[4]]])
```

Ambrose's intuition:
The way I like thinking about shapes is in terms box boxes.
Lets say the shape of tensor `x` is `(2,4,5)`. This means that there are 2 big boxes. Within those 2 big boxes, each of the 2 big boxes has 4 boxes in it
Then for each of the 4 boxes you then have 5 boxes in each one.

One very typical example tensor is with shape `(B,C,H,W)`, for instance `(8,3,100,100)`


In [14]:

# 1D: Array
x = np.array([0.12,0.23,2.34])

# 2D: Matrix
x = np.array([[0.1232,0.3445,0.345532],[0.1232,0.3445,0.345532]])

# Why do we use matrices?
# There are many highly efficient linear algebra libraries (NumPY, PyTorch) that are optimized to perfrom matrix multiplications extremely fast
# Using matrices allows us to perform operations in PARALLEL much faster to doing sequential operations


# Matrix
x = np.array([[0,1,2,3,4],[2,3,5,67,5]])

# Tensor
x = np.array([[0,1,2,3,4],[2,3,5,67,5],[10,20,40,20,2]])

# A tensor is nothing but a bigger matrix, it is an array that carries numerical data or think of a tensor as a multi-dimensional array. 
# Usually matrices are  m x n, tensors can be m x n x c x p ...


### Why does neural network architecture matter?
When we talk about the neural network architecture, we are referring to the specific arrangement of neurons, the connections of neurons,layers. The specific architecture dictates the capabilities of the model, the specific data it is best suited for, the features it would excel at capturing, the specific tasks it is suited for.



### What is PyTorch?

PyTorch is a neural network library that lets you build and trian neural networks with their very comprehensive collection of APIs.

1. Tensor Operations: Basic operations for creating, manipulating, and transforming tensors.
2. Mathematical Operations: Functions for performing arithmetic, linear algebra, and complex mathematical computations.
3. Neural Network Operations: Layers, activation functions, loss functions, and other building blocks for constructing neural networks.
4. Data Manipulation: Functions for data loading, preprocessing, and augmentation.
5. Optimization: Algorithms for updating model parameters, such as SGD, Adam, etc.
6. Autograd: Operators supporting automatic differentiation.




###  How do you train a neural network

* Step 1: Find a bunch of data for the thing you want to train your model on
* Step 2: Preprocess the data to turn it into the right form
* Step 3: Train the model by feeding it the data
* Step 4: Evaluate it 

In this notebook we will put more emphasis on step 3
The most common training loop you will ever see:
```python 
for batch in training_data:
    output = model(batch.x) #S
    loss = loss_fn(batch.y,output)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
```

# Preparing your data
* In PyTorch, a `Dataset` object is a class that allows you to access samples of your data. This class also allows you to perform certain data transformations to process the raw samples you grab from your dataset
* In case you dont know, a **sample** just means 1 pair of input and output. So for a image classification task, a sample may be (x: image,y: label), in text sentiment analysis, sample may be (x: string of text,y: float probability).

* Lets say you have a dataset called `MyDataset`,it inherits the `Dataset` class. It represents a collection of data samples. 
    ```python
    # 
    class MyDataset(Dataset):
        ...
        def __getitem__(self,index):
            return

    ```
    - What this class lets you do is that you can specify the logic for accessing 1 sample
    - So when you do `dataset[0]` calls `__getitem__()` and a sample of data will be returned
    - you can also apply transformations that would be applied to all the samples in your dataset when you fetch them
        - inside `__getitem__()`, you can apply transformations to the input and output by doing `x = self.transform(x)` and `y = self.target_transform(y)`.
        - **When will I do this?**, some common usecases
            - Transforms:
                - Images: cropping, resizing, normalization
                - Text: tokenization (turning words into numbers), padding and truncation
                - Audio: resampling, normalization, augmentation
            - Target Transform:
                - one hot encoding
                - encoding labels
                - normalization (coordinates)
                - format conversions
        - NOTE: the most common application of transformations is on image modalities, for text we have designated components that specifically do preprocessing (tokenizer) 
* You can also define specific ways you want to grab samples from the dataset through the use of a sampler or `Sampler` class, which intuitively dictates how you sample data from your dataset.
    ```python
    class MySampler(Sampler):
        ...
        def __iter__(self):
            for i in range(self.N):
                yield i 
    ```
    - it is used to specify the dataloading order, and specifies the indexes
    - more specifically it is an iterable dataset
* As for a `DataLoader`, this object is the intermediate process that prepares the data we take from the data into batches.
    ```python
    dataloader = DataLoader(dataset, batch_size=64, shuffle=True)
    ```
    - the DataLoader receives the `dataset` object as input and returns the dataloader 
### How they all work together:
<center>
<a href="https://ibb.co/ZMgcwc4"><img src="https://i.ibb.co/W2zBdB8/Screenshot-2024-05-30-023025.png" alt="Screenshot-2024-05-30-023025" border="0"></a></center>

* 1. the dataset class allows you to index into different samples in your dataset
* 2. your CPU has different workers that grab the samples responsible for constructing a batch
* 3. the CPU workers load the queried samples into a queue. 
* 4. the sampler class also provides which indices do we need for 
* 4. the dataloader performs the collating procedure, which draws samples in the queue and puts them into a batch

In [None]:
# Here is an example of a collating function

def collate_fn(samples):
    return batch

# As input, the collate function takes the raw batch_size number of samples
# It then performs some processing or you can define some custom logic
# to turn these samples into tensors

In [5]:
import torch
import torch.nn as nn
from torchvision.datasets import MNIST
from torch.utils.data import DataLoader,Dataset
import torch.nn.functional as F
from torchvision.transforms import v2
from torchvision.transforms import functional as TF 

# Load the MNIST dataset
dataset = MNIST('',train=True,download=True,transform=v2.Compose([v2.ToTensor()]),target_transform=v2.Compose([
                                lambda x:torch.LongTensor([x]), # or just torch.tensor
                                lambda x:F.one_hot(x,10)]))


# Create a dataloader for the dataset
dataloader = DataLoader(dataset, batch_size=64, shuffle=True)

# What is happening here?
# 



In [17]:
# Lets see the dataset sample
# Each sample is a tuple (remeber it is a pair of x and y)
# THis is the x (image x)
print(dataset[10][0])

# This is the label for the corresponding 
print(dataset[10][1])


tensor([[[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000],
         [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
          0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,

In [16]:
# Lets see what is the shape of the sample
print(dataset[10][0].shape)
# The shape is 1,28,28, which tells us that this image has 1 channel, since it is a grayscale image, it also has a height and width of 28 pixels
print(dataset[10][1].shape)
# The shape tells us that it is a binary vector of size 10.
# THis is what we call a one-hot encoded vector, where a 1 is present to indicate the presence of that class.

torch.Size([1, 28, 28])
torch.Size([1, 10])


# Creating our very own neural network engine

class Value():
    

In [None]:
#