### Implementing Linear regression using Batch Gradient descent on 2x2 dataset

In [3]:
import numpy as np

In [5]:
# Load our data set
x_train = np.array([1.0, 2.0])   #features
y_train = np.array([300.0, 500.0])   #target value


In [22]:
""" computing the cost
now to calculate cost, we need: 
1. m - the number of data points
2. y^ - predicted value (formula: w.xi + b)
3. y - target value
4. x - input value
4. w - weight
5. b - bias
"""
def compute_cost(w,b,y,x):
    
    # number of examples we will loop (since its summation)
    m = x.shape[0]
    cost = 0

    for i in range(m):
        f_wb = w * x[i] + b
        cost = cost + (f_wb - y[i]) ** 2
    total_cost = 1 / (2*m) * cost

    return total_cost


Now we have the compute_cost function, which will help us compute the cost of any w or b.

Next, we need to calculate gradient, so that we can update w and b to minimize the error and get the best possible solution.

To calculate the gradient and update w and b, we need:
1. learning rate
2. w_old
3. pde of cost wrt w - which is the gradient

so first we will create a function: compute_gradient


*gradient descent* was described as:

$$\begin{align*} \text{repeat}&\text{ until convergence:} \; \lbrace \newline
\;  w &= w -  \alpha \frac{\partial J(w,b)}{\partial w} \tag{3}  \; \newline 
 b &= b -  \alpha \frac{\partial J(w,b)}{\partial b}  \newline \rbrace
\end{align*}$$
where, parameters $w$, $b$ are updated simultaneously.  
The gradient is defined as:
$$
\begin{align}
\frac{\partial J(w,b)}{\partial w}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})x^{(i)} \tag{4}\\
  \frac{\partial J(w,b)}{\partial b}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)}) \tag{5}\\
\end{align}
$$

$
\frac{\partial J}{\partial w}
= \frac{1}{m} \sum_{i=1}^{m} ( \hat{y}_i - y_i ) x_i
$


Here *simultaniously* means that you calculate the partial derivatives for all the parameters before updating any of the parameters.

In [23]:
"""
calculate the gradient
"""
def compute_gradient(w,b,x,y):
    """
    Computes the gradient for linear regression 
    Args:
      x (ndarray (m,)): Data, m examples 
      y (ndarray (m,)): target values
      w,b (scalar)    : model parameters  
    Returns
      dj_dw (scalar): The gradient of the cost w.r.t. the parameters w
      dj_db (scalar): The gradient of the cost w.r.t. the parameter b     
     """
    m = x.shape[0]
    dj_dw = 0
    dj_db = 0

    for i in range(m):
        f_wb = w * x[i] + b
        dj_dw_i = (f_wb - y[i]) * x[i]
        dj_db_i = (f_wb - y[i])
        dj_dw += dj_dw_i
        dj_db += dj_db_i
    dj_dw = dj_dw / m
    dj_db = dj_db / m

    return dj_dw, dj_db

Now we have the function to calculate the gradient, and now we will move forward to calculate the descent and update w and b.

In [24]:
def gradient_descent(compute_cost, compute_gradient, w_in, b_in, x, y, alpha, num_iters):
    """
        w_in = initial weight 
        b_in = initial bias
        alpha = learning rate (hyperparam)
        compute_cost = function to compute cost using MSE
        compute_gradient = function to compute gradient and give the value of dj_dw and dj_db
        x = training input values
        y = training target values
        num_iters = number of iterations the gradient descent will run, since this is batch gd, we can also call it epochs (as it sees the entire data n number of times)

    Returns:
        w_new = updated w
        b_new = updated b
        J_history = history of costs
        p_history = history of params (w and b)
    """
    J_history = []
    p_history = []
    w = w_in
    b = b_in

    for i in range(num_iters):
        # w_new = w_old - alpha * gradient
        dj_dw, dj_db = compute_gradient(w,b,x,y)

        w = w - alpha * dj_dw
        b = b - alpha * dj_db
        J_history.append(compute_cost(w,b,x,y))
        p_history.append([w,b])

    return w, b, J_history, p_history
        


In [None]:
iterations = 1000
w_final,b_final,J_his, p_his = gradient_descent(compute_cost, compute_gradient, 0, 0, x_train, y_train, 0.1, iterations)
print("Final weight:", w_final)
print("Final bias:", b_final)

for i in range(iterations):
    print(J_his[i], p_his[i])

In [33]:
print(f"1000 sqft house prediction {w_final*1.0 + b_final:0.1f} Thousand dollars")
print(f"1200 sqft house prediction {w_final*1.2 + b_final:0.1f} Thousand dollars")
print(f"2000 sqft house prediction {w_final*2.0 + b_final:0.1f} Thousand dollars")

1000 sqft house prediction 300.0 Thousand dollars
1200 sqft house prediction 340.0 Thousand dollars
2000 sqft house prediction 500.0 Thousand dollars
