
<a href="https://colab.research.google.com/github/kokchun/Maskininlarning-AI21/blob/main/Exercises/E01_gradient_descent.ipynb" target="_parent"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a> &nbsp; to see hints and answers.

---
# Gradient descent exercises

---
These are introductory exercises in Machine learning with focus in **gradient descent** .

<p class = "alert alert-info" role="alert"><b>Note</b> all datasets used in this exercise can be found under Data folder of the course Github repo</p>

<p class = "alert alert-info" role="alert"><b>Note</b> that in cases when you start to repeat code, try not to. Create functions to reuse code instead. </p>

<p class = "alert alert-info" role="alert"><b>Remember</b> to use <b>descriptive variable, function, index </b> and <b> column names</b> in order to get readable code </p>

The number of stars (\*), (\*\*), (\*\*\*) denotes the difficulty level of the task

---

## 0. Simulate dataset (*)

Simulate datasets according to these rules:

- set random seed to 42
- (1000,2) samples from $X \sim \mathcal{U}(0,1)$ , i.e. 1000 rows, 2 columns. 
- 1000 samples from $\epsilon \sim \mathcal{N}(0,1)$
- $y = 3x_1 + 5x_2 + 3 + \epsilon$ , where $x_i$ is column $i$ of $X$

Finally add a column of ones for the intercept to $X$.

<details>

<summary>Hint</summary>

Use for simulating X

´´´
np.random.rand(samples, 2)
´´´

to concatenate with ones, use ```np.c_[..., ...]```

</details>

<details>

<summary>Answer</summary>

```
array([[1.        , 0.37454012, 0.95071431],
       [1.        , 0.73199394, 0.59865848],
       [1.        , 0.15601864, 0.15599452],
       [1.        , 0.05808361, 0.86617615],
       [1.        , 0.60111501, 0.70807258]])

```

</details>

---

In [46]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt 

np.random.seed(42)
samples = 1000

X = np.random.rand(samples, 2)
X_1 = X[:,0]
X_2 = X[:,1]
e = np.random.randn(samples, 1)
y = 3*X_1 + 5*X_2 + 3 + e

X = np.c_[np.ones(samples), X]
X

array([[1.        , 0.37454012, 0.95071431],
       [1.        , 0.73199394, 0.59865848],
       [1.        , 0.15601864, 0.15599452],
       ...,
       [1.        , 0.75137509, 0.65695516],
       [1.        , 0.95661462, 0.06895802],
       [1.        , 0.05705472, 0.28218707]])

In [65]:

X[:,1]

array([0.37454012, 0.73199394, 0.15601864, 0.05808361, 0.60111501,
       0.02058449, 0.83244264, 0.18182497, 0.30424224, 0.43194502,
       0.61185289, 0.29214465, 0.45606998, 0.19967378, 0.59241457,
       0.60754485, 0.06505159, 0.96563203, 0.30461377, 0.68423303,
       0.12203823, 0.03438852, 0.25877998, 0.31171108, 0.54671028,
       0.96958463, 0.93949894, 0.59789998, 0.0884925 , 0.04522729,
       0.38867729, 0.82873751, 0.28093451, 0.14092422, 0.07455064,
       0.77224477, 0.00552212, 0.70685734, 0.77127035, 0.35846573,
       0.86310343, 0.33089802, 0.31098232, 0.72960618, 0.88721274,
       0.11959425, 0.76078505, 0.77096718, 0.52273283, 0.02541913,
       0.03142919, 0.31435598, 0.90756647, 0.41038292, 0.22879817,
       0.28975145, 0.92969765, 0.63340376, 0.80367208, 0.892559  ,
       0.80744016, 0.31800347, 0.22793516, 0.81801477, 0.00695213,
       0.417411  , 0.11986537, 0.9429097 , 0.51879062, 0.3636296 ,
       0.96244729, 0.49724851, 0.28484049, 0.60956433, 0.05147

## 1. Gradient descent - learning rate (*)

Use gradient descent to calculate $\vec{\theta} = (\theta_0, \theta_1, \theta_2)^T$ 

&nbsp; a) Use $\eta = 0.1$ and simulate 500 epochs of batch gradient descent. Plot the resulting $\vec{\theta}$ values for every 5th epoch. (*)

&nbsp; b) Do the same as for a) but with learning rate $\eta = 0.01$, 5000 epochs and plot every 20 step. What do you notice when changing the learning rate? (*)

&nbsp; c) Experiment with larger and smaller $\eta$ and see what happens.

<details>

<summary>Hint</summary>

Use for simulating X

´´´
np.random.rand(samples, 2)
´´´

to concatenate with ones, use ```np.c_[..., ...]```

</details>

<details>

<summary>Answer</summary>

a) 

<img src="../assets/grad_desc_converg.png" height="200"/>

b) 

<img src="../assets/grad_desc_converg_001.png" height="200"/>

</details>

---

In [48]:
def gradient_descent(X, y, learning_rate = .1, iterations = 500):
    m = len(X)

    theta = np.random.randn(X.shape[1],1)

    for _ in range(iterations):
        gradient = 2/m*X.T@(X@theta-y)
        theta -= learning_rate*gradient

    return theta

theta = gradient_descent(X,y)
theta.reshape(-1)

ValueError: non-broadcastable output operand with shape (3,1) doesn't match the broadcast shape (3,1000)

## 2. Stochastic Gradient Descent - learning rate (**)

Repeat task 1 but using stochastic gradient descent instead. Also adjust number of epochs to see if you can find convergence. What kind of conclusions can you draw from your experiments. (**)

---

## 3. Mini Batch Gradient Descent (**)

Now try different sizes of mini-batches and make some exploratory plots to see convergence. Also you can make comparison to the other algorithms by using same $\eta$ and same amount of epochs to see how they differ from each other in terms of convergence. (**)

---

Kokchun Giang

[LinkedIn][linkedIn_kokchun]

[GitHub portfolio][github_portfolio]

[linkedIn_kokchun]: https://www.linkedin.com/in/kokchungiang/
[github_portfolio]: https://github.com/kokchun/Portfolio-Kokchun-Giang

---