## Goals
In this lab, we will:
- update gradient descent for logistic regression.

In [109]:
import numpy as np
import math, copy

In [110]:
X_train = np.array([[0.5, 1.5], [1,1], [1.5, 0.5], [3, 0.5], [2, 2], [1, 2.5]])
y_train = np.array([0, 0, 0, 1, 1, 1])

## Logistic Gradient Descent
<img src="https://raw.githubusercontent.com/bernadlger/Machine-Learning-Specialization/main/Regression%20and%20Classification/images/C1_W3_Logistic_gradient_descent.png" style="width:400px; padding:10px;" align="right">

Recall the gradient descent algorithm utilizes the gradient calculation:
$$\begin{align*}
&\text{repeat until convergence:} \; \lbrace \\
&  \; \; \;w_j = w_j -  \alpha \frac{\partial J(\mathbf{w},b)}{\partial w_j} \tag{1}  \; & \text{for j := 0..n-1} \\ 
&  \; \; \;  \; \;b = b -  \alpha \frac{\partial J(\mathbf{w},b)}{\partial b} \\
&\rbrace
\end{align*}$$

Where each iteration performs simultaneous updates on $w_j$ for all $j$, where
$$\begin{align*}
\frac{\partial J(\mathbf{w},b)}{\partial w_j}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \tag{2} \\
\frac{\partial J(\mathbf{w},b)}{\partial b}  &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)}) \tag{3} 
\end{align*}$$

* m is the number of training examples in the data set      
* $f_{\mathbf{w},b}(x^{(i)})$ is the model's prediction, while $y^{(i)}$ is the target
* For a logistic regression model  
    $z = \mathbf{w} \cdot \mathbf{x} + b$  
    $f_{\mathbf{w},b}(x) = g(z)$  
    where $g(z)$ is the sigmoid function:  
    $g(z) = \frac{1}{1+e^{-z}}$   

In [111]:
def sigmoid(z):
    g = 1/(1+ np.exp(-z))
    return g

In [112]:
def compute_cost_logistic(X,y,w,b):
    m = X.shape[0] 
    cost= 0.0
    for i in range(m):
        z_i = np.dot(X[i],w) + b 
        fw_b_i = sigmoid(z_i)
        cost+= -y[i]*np.log(fw_b_i) - (1-y[i])*np.log(1-fw_b_i)
    cost = cost/m
    return cost

In [113]:
def compute_gradient_logistic(X,y,w,b):
    m,n = X.shape
    dj_dw= np.zeros((n,)) 
    dj_db = 0.

    for i in range (m):
        z_i = np.dot(X[i],w) + b 
        fw_b_i = sigmoid(z_i)
        err_i = fw_b_i - y[i]
        for j in range (n):
            dj_dw[j] = dj_dw[j] + err_i * X[i,j]
        dj_db = dj_db + err_i
    dj_dw = dj_dw/m
    dj_db = dj_db/m

    return dj_dw, dj_db

In [114]:
def gradient_descent(X,y,w_in,b_in,alpha,num_iters):

    J_History= []
    w = copy.deepcopy(w_in)
    b = b_in

    for i in range(num_iters):
        dj_dw , dj_db = compute_gradient_logistic(X,y,w,b)

        w = w - alpha*dj_dw
        b = b - alpha*dj_db

        if i < 100000: # prevent resource exhaustion
            J_History.append(compute_cost_logistic(X,y,w,b))
             # Print cost every at intervals 10 times or as many iterations if < 10
        if i% math.ceil(num_iters / 10) == 0:
            print(f"Iteration {i:4d}: Cost {J_History[-1]} ")
    
    return w, b, J_History

Let's run gradient descent on our data set.

In [115]:
w_tmp = np.zeros_like(X_train[0])
b_tmp = 0.
alph = 0.1
iters = 10000

w_out, b_out, _ = gradient_descent(X_train,y_train,w_tmp,b_tmp,alph,iters)
print(f"\nupdated parameters: w:{w_out} b:{b_out}")

Iteration    0: Cost 0.684610468560574 
Iteration 1000: Cost 0.1590977666870456 
Iteration 2000: Cost 0.08460064176930081 
Iteration 3000: Cost 0.05705327279402531 
Iteration 4000: Cost 0.042907594216820076 
Iteration 5000: Cost 0.034338477298845684 
Iteration 6000: Cost 0.028603798022120097 
Iteration 7000: Cost 0.024501569608793 
Iteration 8000: Cost 0.02142370332569295 
Iteration 9000: Cost 0.019030137124109114 

updated parameters: w:[5.28123029 5.07815608] b:-14.222409982019837
