# LESSON 2: LINEAR REGRESSION
<table><tr>
<td> <img src="../images/linear_logistic_regression_logo.jpeg" width="600px"/> </td>
</tr></table>

*This lecture was refered by [machinelearningcoban.com](https://machinelearningcoban.com/2016/12/28/linearregression/)*

## 1. Linear regression introduction

<img src="../images/ml_house_prices_example.png" width="400px"/>

With an example of **House price prediction** problem, we have 3 features of a house:
- ${x}_{1}$ is the size of the house (in ${m}^{2}$)
- ${x}_{2}$ is the number of bedrooms in the house (in rooms)
- ${x}_{3}$ is the distance from the house to the city center (in km)
and the label price of the house $y$

We have to build a function to calculate the price of the house from above features $\vec{x}=[x_1, x_2, x_3]$.

<center>
    $\hat{y} = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_3$
</center>

- $\vec{\theta}=[\theta_0, \theta_1, \theta_2, \theta_3]^{T}$ is parameters of model
- $\hat{y}$ is a prediction of model and we expect that $\hat{y}$ and $y$ are almost similar.

We use a **linear** function and this is **regression** problem. That's why we call **LINEAR REGRESSION**.

## 2. Loss function and Optimizer
Generalize our problem to n features, we have a set of features $\vec{x}=[x_1, x_2, ..., x_n]$.

<center>
    $\hat{y}=\theta_0+\theta_1 x_1 + \theta_2 x_2 +...+\theta_n x_n$
</center>

Given $m$ training samples $X = \vec{x}^{(1)}, \vec{x}^{(2)},...,\vec{x}^{(m)}$, linear regression finds $\vec{\theta}$ that minimizes the difference between $\hat{y}$ and $y$.

To minimize the difference, we have to build a **LOSS FUNCTION** and for linear regression, we use Mean Square Error (MSE).

<center>
    \[
    MSE(\vec{\theta})
    = \frac{1}{2}\sum_{i=1}^{m}(\hat{y}^{(i)} - y^{(i)})^2 \\
    = \frac{1}{2}\sum_{i=1}^{m}(\vec{x}^{(i)}\vec{\theta} - y^{(i)})^2 \\
    = \frac{1}{2}(X\vec{\theta} - y)^2 \\
    \]
</center>

We need to find the $\vec{\theta}$ to minimize the value of MSE function and this $\vec{\theta}$ called an ***optimal point***.

<center>
    $\vec{\theta}^{*} = \arg\min_{\theta} \mathcal{L}(\theta)$
</center>

To find the optimal point $\vec{\theta}^{*}$, we solve the equation:

<center>
    \[
    \frac{\partial MSE}{\partial\vec{\theta}}
    = \frac{1}{2}\cdot2\cdot(X\vec{\theta} - y)\cdot{X}^{T} \\
    = {X}^{T}\cdot(X\vec{\theta} - y) = 0
    \]
</center>

We can have the above equation because we have:
<center>
    \[
    \frac{\partial Ax+b}{\partial x} = {A}^{T}
    \]
</center>

and $(X\vec{\theta} - y)$ is a scalar so we can use the commutative principle.

Back to the equation
<center>
    \[
    {X}^{T}\cdot(X\vec{\theta} - y) = 0 \\
    {X}^{T}X\vec{\theta} = {X}^{T}y \\
    \vec{\theta} = {({X}^{T}X})^{-1}{X}^{T}y
    \]
</center>

Finally, we have $\vec{\theta}^{*} = {({X}^{T}X})^{-1}{X}^{T}y$ is the solution of $\frac{\partial MSE}{\partial\vec{\theta}} = 0$

## 3. Implementation example
### 3.1. Implement from scratch

### 3.2. Use `sklearn`

## 4. Homework
### 4.1. 