<b>Linear Regression (one-variable):</b>

  Y = wX + b

  Y --> Dependent Variable<br>
  X --> Independent Variable<br>
  w --> weight<br>
  b --> bias<br><br>


<b>Gradient Descent:</b>

  We try to minimize value of the cost function.<br><br>
  The equation for cost with one variable is:<br>
    $$J(w,b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})^2 \tag{1}$$ 
  
  where 
    $$f_{w,b}(x^{(i)}) = wx^{(i)} + b \tag{2}$$

    
  - $f_{w,b}(x^{(i)})$ is our prediction for example $i$ using parameters $w,b$.  
  - $(f_{w,b}(x^{(i)}) -y^{(i)})^2$ is the squared difference between the target value and the prediction.   
  - These differences are summed over all the $m$ examples and divided by `2m` to produce the cost, $J(w,b)$.  
  - `m` can also be used instead of `2m`. The cost fucntion remains interchangeable with both values since the point of minimum remains unchanged.

  
  gradient descent* is described as:

  $$\begin{align*} \text{repeat}&\text{ until convergence:} \; \lbrace \newline
  \;  w &= w -  \alpha \frac{\partial J(w,b)}{\partial w} \tag{3}  \; \newline 
  b &= b -  \alpha \frac{\partial J(w,b)}{\partial b}  \newline \rbrace
  \end{align*}$$
  where, parameters $w$, $b$ are updated simultaneously.  
  The gradient is defined as:
  $$
  \begin{align}
  \frac{\partial J(w,b)}{\partial w}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})x^{(i)} \tag{4}\\
    \frac{\partial J(w,b)}{\partial b}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)}) \tag{5}\\
  \end{align}
  $$

  Here *simultaniously* means that you calculate the partial derivatives for all the parameters before updating any of the parameters.<br><br>

<b> Learning Rate:</b><br><br>
  <b>$\alpha$</b> can be used to vary the rate of change of w or b. Giving a value too large can cause the gradient descent to never converge. It must me chosen very less in the order of ~ 0.001 and varied based on success rate.


In [2]:
import numpy as np
import math, copy

<b>Linear Regression</b>

In [3]:
class Linear_Regression():
    
    # initiating the parameters
    def __init__(self, alpha, num_iters):
        self.alpha = alpha
        self.num_iters = num_iters

    # Accepts dependent and independent variables
    # X = Independent variable (numpy array)
    # Y = Dependent variable (numpy array)
    def fit(self, X, Y, w, b):

        # m = no. of data points
        # n = no. of features per data point
        self.m, self.n = X.shape

        # w = weight (as many as features per data point, numpy arrray)
        # b = bias (random single value)
        self.w = w 
        self.b = b

        self.X = X
        self.Y = Y

    # compute cost for specific w and b.
    # Ref eq (1)
    def compute_cost(self, w, b):
        cost = 0

        for i in range(self.m):
            f_wb = w * self.X[i] + b
            cost = cost + (f_wb - self.Y[i])**2

        total_cost = cost * (1/(2 * self.m))
        return total_cost

    # Compute the gradient at sepcific w and b. 
    # Ref eq (4) & (5)
    def compute_gradient(self, w, b):

        dj_dw = 0
        dj_db = 0
        for i in range(self.m):
            f_wb = w * self.X[i] + b

            dj_dw_i = self.X[i] * (f_wb - self.Y[i])
            dj_db_i = f_wb - self.Y[i]

            dj_dw = dj_dw + dj_dw_i
            dj_db = dj_db + dj_db_i
        
        dj_dw = dj_dw / self.m
        dj_db = dj_db / self.m

        return dj_dw, dj_db
    

    # Implement gradient descent to minimize cost function for a specific input w and b
    # Ref eq (3)
    def gradient_descent(self, w_in, b_in):
        w = copy.deepcopy(w_in)
        b = b_in
        self.J_history = []
        self.p_history = []

        for i in range(self.num_iters):
            dj_dw, dj_db = self.compute_gradient(w, b)

            # Break if convergence reached
            if(dj_dw == 0 and dj_db == 0):
                break
            
            w = w - self.alpha * dj_dw
            b = b - self.alpha * dj_db


            if i < 10000:
                self.J_history.append(self.compute_cost(w, b))
                self.p_history.append([w,b])

        
        return w, b, self.J_history, self.p_history


    