# Implement Linear Regression From Scratch
__Author__ : Mohammad Rouintan , 400222042

__Course__ : Undergraduate Machine Learning Course

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## Problem 
Implement Linear Regression with *Mean Absolute Error* as the cost function from scratch. Compare your results with the Linear Regression module of *Scikit-Learn*.

### Gradient descent to solve linear regression with mean absolute error (MAE) loss function
Mean Absolute Error cost function for a linear regression model :
$$MAE(w,b) = \frac{1}{m} \sum\limits_{i = 0}^{m-1} |f_{w,b}(x^{(i)}) - y^{(i)}|\tag{1}$$ 

where 
  $$f_{w,b}(x^{(i)}) = wx^{(i)} + b \tag{2}$$

We will implement Linear Regression model using gradient descent from scratch. *Gradient Descent* was described as

$$\begin{align*} \text{repeat}&\text{ until convergence:} \; \lbrace \newline
\;  w &= w -  \alpha \frac{\partial J(w,b)}{\partial w} \tag{3}  \; \newline 
 b &= b -  \alpha \frac{\partial J(w,b)}{\partial b}  \newline \rbrace
\end{align*}$$

where, parameters $w$, $b$ are updated simultaneously.

The gradient is defined as:
$$
\begin{align}
\frac{\partial MAE(w,b)}{\partial w}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} sgn(f_{w,b}(x^{(i)}) - y^{(i)})x^{(i)} \tag{4}\\
\frac{\partial MAE(w,b)}{\partial b}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} sgn(f_{w,b}(x^{(i)}) - y^{(i)}) \tag{5}\\
\end{align}
$$

where
$$
\begin{align}
sgn(x) &= \frac{x}{|x|} \tag{6}
\end{align}
$$

In [3]:
class LinearRegression():

    """
        Linear Regression model with mean absolute error as cost function

        Parameters
        ----------

        learning_rate : float
                        learning rate of gradient descent algorithm
        n_iter : int
                number of iterations or epoch of gradient descent algorithm
        
        Attributes
        ----------
        lr : float
             learning rate of gradient descent algorithm
        n_iter : int
                 number of iterations or epoch of gradient descent algorithm
        weights : numpy.array
                  weights of our model are initialized with random numbers in [0,1]  
        bias : float
               bias of our model which is initialized with 0

        Methods
        -------
        fit(X,y)
            training the weights with regards to dataset (X and y)
        predict(X)
                predict target values corresponding to X
        _compute_gradient(X,y_true,y_predicted)
                          computing gradients of mae cost function

    """

    def __init__(self,learning_rate = 0.01,n_iter=2000):
        self.lr = learning_rate
        self.n_iter = n_iter
        self.weights = None
        self.bias = None

    # computing gradients of mae cost function
    def _compute_gradient(self,X,y_true,y_predicted):
        n = X.shape[0]
        dw = (1/n) * np.dot(np.sign(y_predicted-y_true), X.T)
        db = (1/n) * np.sum(np.sign(y_predicted-y_true))
        return dw , db

    def predict(self,X):
        return np.dot(X,self.weights) + self.bias

    def fit(self,X,y):
        # initializing weights and bias
        self.weights = np.random.rand(X.shape[1])
        self.bias = 0 

        for _ in range(self.n_iter):
            y_predicted = self.predict(X)
            dw , db = self._compute_gradient(X,y,y_predicted)
            
            # updating weights and bias
            self.weights -= self.lr * dw
            self.bias -= self.lr * db

### Part b)
Description and code of second part

In [None]:
# Your code for first problem

After each cell, you should explain your entire code. Please consider clean code in cells too and use comments if you should

## Conclusion for this problem
Write a conclusion and references which you've used in your homework