In [1]:


import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)


import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Linear Regression
- We have one dependent and one independent variable.
  $$h_\theta(x)=\theta_0 + \theta_1 * x$$
- Let we have to pridict weight given height.
- Error : $\hat{y_i} = h_\theta(x)_i - y_i$
- We need to minimise this

## Cost Function  
$$J(\theta_0, \theta_1) = \frac{1}{2m} * \sum_{k=1}^{m}(h_\theta(x_i) - y_i)^2$$
- We need to select value of $\theta_0$ and $\theta_1$ such that we can minimize our cost function.
- We use gradient descent algorithm to minimize our cost function, which is given by:
  $$\theta_{new} = \theta_{old} - \alpha * g(\theta_{old})$$
  Where, $g(\theta)$ is a gradient vector of $J(\theta)$ and $\alpha$ is learning rate which decides the rate of convergence of algorithm.
- Until we found global minima, we keep changing our $\theta$ vector by above equation.

## Multiple Linear Regression

- Equation will look like :
$$h_{\theta}(x_1, x_2, x_3,\cdots,x_n) = \theta_0 + \theta_1 * x_1 + \theta_2 * x_2 + \theta_3 * x_3 + \cdots +\theta_n * x_n$$

# Performance Metric

## 1. R Sqaured
- R-Square value is given by :
  $$ R^2 = 1 - \frac{SS_{residual}}{SS_{total}} $$
  $$ R^2 = 1 - \frac{\sum_{i=1}^{m}(y_i - \hat{y_i})^2}{\sum_{i=1}^{m}(y_i - \bar{y})^2}$$
- $\bar{y}$ is a mean of $y_i$'s
- This value is between 0 to 1. So if this value is more towards 1, our model is good.

## 2. Adjusted R Sqaured
$$ Adjusted-R^2 = 1 - \frac{(1 - R^2) * (n - 1)}{n - p - 1}$$
- Where $n$ is number of data points, $p$ is number of independent features.


# Cost functions :  

## 1. Mean Squared Error
$$MSE = \frac{\sum_{i=1}^{n}(y_i - \hat{y_i})^2}{n}$$

|Advantages| Disadvantages|
|--|--|
|Differentiable| Not robust to outliers|
|It has one local and one global minima| It is not in the same unit|
|Converges faster||

## 2. Mean Absolute Error
$$MAE = \frac{\sum_{i=1}^{n}|y_i - \hat{y_i}|}{n}$$

|Disadvantages| Advantages|
|--|--|
|Non Differentiable at $x = 0$. So convergence usually take more time. Optimisation is a complex task.| Robust to outliers|
|Time consuming| It will be in the same unit|
## 3. Root Mean Absolute Error
$$RMSE = \sqrt{MSE}$$
$$EMSE = \sqrt{\frac{\sum_{i=1}^{n}(y_i - \hat{y_i})^2}{n}}$$

|Advantages| Disadvantages|
|--|--|
|Differentiable| Not robust to outliers|
|It is in the same unit| |


# Overfitting and underfitting
- We divide our dataset into 2 datasets:
  1. Training Dataset
  2. Test Dataset

  #### 1. Training Dataset
  - Further we divide it into two datasets:
  1. Train : For training the model
  2. Validation : Hyperparameter tuning for model

### 1. Generalised model
1. Very Good accuracy -> Train dataset   [Low bias]
2. Very Good accuracy -> Test dataset    [Low variance]
3. 
### 2. Overfitting model
1. Very good accuracy -> Train Dataset [Low bias]
2. Bad accuracy -> Test Dataset [High Variance]
3. 
### 3. Underfitting Model
1. Low accuracy -> Train Dataset [High bias]
2. Low accuracy -> Test Dataset [High Variance]