# Linear Regression

Linear Regression is the basic ML model. It seeks to solve a regression problem. So that given a feature vector, we can predict a continuous output.

$$
h_\theta(x) = \theta_0 + \theta_1x
$$

$$
h_\theta(x) = \theta^T \mathbf{x}
$$

The **objective** is to chose the parameters ($\theta$) such that the hypothesis (predicted) $h_\theta(x)$ is close to target $y$.

To do that, we use a **loss function** to measure the error between the predicted output and the target output. The loss function used is a mean squared error (MSE).

$$
J(\theta) = \frac{1}{m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)})^2
$$

In order to get a correct prediction, we need to minimize this loss using a Gradient Descent (GD) algorithm.

$$
\min_\theta J(\theta) = \min_\theta \frac{1}{m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)})^2
$$

## Gradient Descent

1. Start with some guesses about parameters $\theta$.
2. Find the gradient (derivatives) of $J(\theta)$, which give us the direction to move towards. Roughly speaking, check around $360^{\circ}$ for a direction to go down to a minimal point.
3. Update the parameter in that direction to reduce the loss $J(\theta)$.
4. Keep changing the parameters until end up at a minimum or close to that.

$$
\text{repeat until convergence} \{ \newline \qquad \qquad \qquad \quad \theta_j := \alpha \frac{\partial J(\theta)}{\partial \theta_j} \quad \quad (\text{for j=0,1,..,n})  \}
$$

$$
\frac{\partial J(\theta)}{\partial \theta_j} = \frac{1}{m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)}
$$

The gradients indicate us the direction to move towards, and $\alpha$ is the learning rate that indicate us how big step we take.

<div>
<img src="./illustrations/gradient_descent.png" width="500"/>
</div>

In [1]:
import numpy as np