<a href="https://colab.research.google.com/github/changsin/AI/blob/main/math_fun.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Algebraic Playground

In [None]:
import matplotlib.pyplot as plt

x = np.linspace(1, 224, 1000)
y = []
for x1 in x:
  # print(x1)
  # y.append(np.clip(np.random.chisquare(x1, 1), 0, IMAGE_SIZE))
  noise = np.clip(np.random.noncentral_chisquare(IMAGE_SIZE/2, 50, 4), 0, IMAGE_SIZE)

  y.append(np.clip(np.random.noncentral_chisquare(IMAGE_SIZE/2, 50, 2), 0, IMAGE_SIZE))
  # y.append(np.clip(np.random.normal(loc=x1, scale=50, size=1), 0, IMAGE_SIZE))

# y_line = -2.1*x_line + 20
fig, ax = plt.subplots()
ax.plot(x, y, 'g', label='y=wx+b')

## (0) Introduction

In [None]:
!pip install d2l==0.16.2

Collecting d2l==0.16.2
[?25l  Downloading https://files.pythonhosted.org/packages/d0/1f/13de7e8cafaba15739caee0596032412aaf51a22726649b317bdb53c4f9a/d2l-0.16.2-py3-none-any.whl (77kB)
[K     |████▎                           | 10kB 15.7MB/s eta 0:00:01[K     |████████▌                       | 20kB 21.3MB/s eta 0:00:01[K     |████████████▊                   | 30kB 15.4MB/s eta 0:00:01[K     |█████████████████               | 40kB 11.7MB/s eta 0:00:01[K     |█████████████████████▏          | 51kB 8.7MB/s eta 0:00:01[K     |█████████████████████████▌      | 61kB 7.4MB/s eta 0:00:01[K     |█████████████████████████████▊  | 71kB 7.9MB/s eta 0:00:01[K     |████████████████████████████████| 81kB 4.0MB/s 
Installing collected packages: d2l
Successfully installed d2l-0.16.2


In this part, we will look into the necessary building blocks of common Machine learning code. We will illustrate how these blocks work together using a simple, but widely used, model.

Let's assume we want to predict some house prices. A simple model that can be used to generate some useful insights could be the following simple **Linear regression model**:

$$\mathrm{price} = w_{\mathrm{area}} \cdot \mathrm{area} + w_{\mathrm{age}} \cdot \mathrm{age} + b.$$

This is equivalent to:

$$\hat{y} = w_1  x_1 + ... + w_d  x_d + b.$$ and $$\hat{y} = \mathbf{X} \mathbf{w} + b.$$

Let's first import the necessary libraries:

## (1) Generating the synthetic dataset

We want to generate 1'000 datapoints where the true model parameters are $\textbf{w} = [2, -3.4]$ and $b = 4.2$. Additionally, we will add a **noise term $\epsilon$** since we would like to simulate the imperfect process of data collection. The noise will follow a **Gaussian/Normal distribution** with mean $0$ and varianve of $.01$. The data will be created using the following equation:

$$\mathbf{y}= \mathbf{X} \mathbf{w} + b + \mathbf\epsilon$$

Let's define the function which will generate this data:

Let's check the first datapoint the correlation by plotting all the datapoints
:

## (2) Iterating over the minibatches batches

Now that we have the dataset, we need a method to iterate over it to generate minibatches.

This method uses an iterator, but it is really inefficient! We will see a better way to do the batch sampling in this session using PyTorch's custom class

## (3) Defining the model and initialize the model parameters

We are going to define the model and initialize the model parameters.

## (4) Defining the loss function

We need a loss function so that we can calculate the gradient of the losses, letting us update the model in the correct direction. Here, we will use a squared loss function. The mathematical expression is given by:

$$l^{(i)}(\mathbf{w}, b) = \frac{1}{2} \left(\hat{y}^{(i)} - y^{(i)}\right)^2.$$

## (5) Defining the optimization algorithm

Next, we describe how to update the parameters of the model (the $\textbf{w}$ of our model). 

The most straightforward and commonly used algorithm is called **Stochastic Gradient Descent**

$$(\mathbf{w},b) \leftarrow (\mathbf{w},b) - \frac{\eta}{|\mathcal{B}|} \sum_{i \in \mathcal{B}} \partial_{(\mathbf{w},b)} l^{(i)}(\mathbf{w},b).$$

The algorithm does the following two steps:
(1) Initialize the values of the model parameters, typically at random;
(2) Iteratively sample random minibatches from the data,
updating the parameters in the direction of the negative gradient.
For quadratic losses and affine transformations,
we can write it as follows:

$$\begin{aligned} \mathbf{w} &\leftarrow \mathbf{w} -   \frac{\eta}{|\mathcal{B}|} \sum_{i \in \mathcal{B}} \partial_{\mathbf{w}} l^{(i)}(\mathbf{w}, b) = \mathbf{w} - \frac{\eta}{|\mathcal{B}|} \sum_{i \in \mathcal{B}} \mathbf{x}^{(i)} \left(\mathbf{w}^\top \mathbf{x}^{(i)} + b - y^{(i)}\right),\\ b &\leftarrow b -  \frac{\eta}{|\mathcal{B}|} \sum_{i \in \mathcal{B}} \partial_b l^{(i)}(\mathbf{w}, b)  = b - \frac{\eta}{|\mathcal{B}|} \sum_{i \in \mathcal{B}} \left(\mathbf{w}^\top \mathbf{x}^{(i)} + b - y^{(i)}\right). \end{aligned}$$

## (6) Training the model

Let's define the training loop of the model. This is where all the magic happens!

## (7) Testing the model

We know the true parameters (since we created the data) and we can compare the obtained parameters to these ground-truths.

# Session 1: No need to reinvent the wheel! Concise implementation of the linear regression model

In the last sections, we have seen how to set up a simple experiment:

* Load / create dataset
* Initialize a dataloader
* Initialize model
* Initialize the optimization algorithm
* Initialize the loss-metric
* Train the model/ testing the model 

There are important aspects that we haven't talked about yet (and that we will probably touch upon in later sessions)

* How to pre-process the data / features
* How to do proper train / validation / test data splits

Another important point is that todays frameworks (PyTorch, TF/Keras, mtnext) offer plenty of pre-implemented classes and tools. We don't need to reinvent the wheel every single time. 

Let's see how to simplify the above experiment using the PyTorch framework

In [None]:
# Let's import the important modules



## (1) Getting the data and dataset

## (2) Defining the model and initializing the parameters

In PyTorch, defining model architectures and initializing the parameters is a breeze! We just need the following three commands:

## (3) Defining the loss function

## (4) Defining the optimization algorithm

## (5) Train the model

## (6) Checking the model parameters

Now that the model is trained, we should check if the parameters have been correctly predicted:

We can recreate the same code with only a few lines using the PyTorch framework. Frameworks allow us to concentrate on the interesting parts of the issues at hand and provide useful functions/classes to quickly run experiments!