# Ungraded Lab: Gradient Descent for Logistic Regression



## Goals
In this lab you will:
- implement the gradient descent update step for logistic regression.
- a version using looping
- optionally, a version using matrices

## Dataset 
Let's start with the same dataset as before.

In [None]:
import numpy as np

X = np.array([[0.5, 1.5], [1,1], [1.5, 0.5], [3, 0.5], [2, 2], [1, 2.5]])
y = np.array([0, 0, 0, 1, 1, 1]).reshape(-1,1)

As before, we'll use a helper function to plot this data. The data points with label $y=1$ are shown as red crosses, while the data points with label $y=0$ are shown as black circles.

In [None]:
from lab_utils import plot_data
import matplotlib.pyplot as plt

plot_data(X,y)

# Set both axes to be from 0-6
plt.axis([0, 6, 0, 6])
# Set the y-axis label
plt.ylabel('$x_1$')
# Set the x-axis label
plt.xlabel('$x_0$')

## Logistic Gradient

 First, you will implement a non-vectorized version of the gradient. Then, you will implement a vectorized version.


### Non- vectorized version


Recall the gradient descent algorithm utilizes the gradient calculation:
$$\begin{align*}& \text{repeat until convergence:} \; \lbrace \newline \; & b := b -  \alpha \frac{\partial J(\mathbf{w},b)}{\partial b} \newline       \; & w_j := w_j -  \alpha \frac{\partial J(\mathbf{w},b)}{\partial w_j} \tag{1}  \; & \text{for j := 0..n-1}\newline & \rbrace\end{align*}$$


Where each iteration performs simultaneous updates on $w_j$ for all $j$, where
$$
\frac{\partial J(\mathbf{w},b)}{\partial b}  = \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - \mathbf{y}^{(i)}) \tag{2}
$$
$$
\frac{\partial J(\mathbf{w},b)}{\partial w_j}  = \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - \mathbf{y}^{(i)})x_{j}^{(i)} \tag{3}
$$

* m is the number of training examples in the dataset

    
*  $f_{\mathbf{w},b}(x^{(i)})$ is the model's prediction, while $y^{(i)}$, which is the actual label



* For a logistic regression model for the dataset given above, the model can be representented as:

    $f_{\mathbf{w},b}(x) = g(w_0 + w_1x_1 + w_2x_2)$

    where $g(z)$ is the sigmoid function:

    $g(z) = \frac{1}{1+e^{-z}}$ 
    


We've implemented the `sigmoid` function for you already and you can simply import and use it, as shown in the code block below.

In [None]:
from lab_utils import sigmoid 

print(sigmoid(0))

### compute_gradient using looping
Implement equation (2),(3) above for all $w_j$ and $b$.
There are many ways to implement this and you can choose an alternate approach. Outlined below is this:
- initialize variables to accumulate dJdw and dJdb
- loop over all examples
    - calculate the error for that example $g(\mathbf{x}^{(i)T}\mathbf{w} + b) - \mathbf{y}^{(i)}$
    - add the error to dJdb (equation 2 above)
    - for each input value $x_{j}^{(i)}$ in this example,  
        - multiply the error by the input  $x_{j}^{(i)}$, and add to the corresponding element of dJdw. 
- divide dJdb and dJdw by total number of examples (m)

<details>
<summary>
    <font size='3', color='darkgreen'><b>Hints</b></font>
</summary>

```python
def compute_gradient_logistic_loop(X, y, w, b): 
    """
    Computes the gradient for linear regression 
 
    Args:
      X : (array_like Shape (m,n)) variable such as house size 
      y : (array_like Shape (m,1)) actual value 
      w : (array_like Shape (n,1)) values of parameters of the model      
      b : (scalar)                 value of parameter of the model   
    Returns
      dJdw: (array_like Shape (n,1)) The gradient of the cost w.r.t. the parameters w. 
      dJdb: (scalar)                The gradient of the cost w.r.t. the parameter b. 
    """
    m,n = X.shape
    dJdw = np.zeros((n,1))
    dJdb = 0.
    err  = 0.

    ### BEGIN SOLUTION ###
    for i in range(m):
        err = sigmoid(X[i] @ w + b)  - y[i]    
        for j in range(n):
            dJdw[j] = dJdw[j] + err * X[i][j]
        dJdb = dJdb + err
    dJdw = dJdw/m
    dJdb = dJdb/m
    ### END CODE HERE ###         
        
    return dJdb[0],dJdw  #index dJdb to return scalar value
```

In [None]:
def compute_gradient_logistic_loop(X, y, w, b): 
    """
    Computes the gradient for linear regression 
 
    Args:
      X : (array_like Shape (m,n)) variable such as house size 
      y : (array_like Shape (m,1)) actual value 
      w : (array_like Shape (n,1)) values of parameters of the model      
      b : (scalar)                 value of parameter of the model   
    Returns
      dJdw: (array_like Shape (n,1)) The gradient of the cost w.r.t. the parameters w. 
      dJdb: (scalar)                The gradient of the cost w.r.t. the parameter b. 
    """
    m,n = X.shape
    dJdw = np.zeros((n,1))
    dJdb = 0.
    err  = 0.

    ### START CODE HERE ### 
    ### BEGIN SOLUTION ###
    for i in range(m):
        err = sigmoid(X[i] @ w + b)  - y[i]    
        for j in range(n):
            dJdw[j] = dJdw[j] + err * X[i][j]
        dJdb = dJdb + err
    dJdw = dJdw/m
    dJdb = dJdb/m
    ### END SOLUTION ### 
    ### END CODE HERE ###         
        
    return dJdb[0],dJdw  #index dJdb to return scalar value

In [None]:
w = np.array([2.,3.]).reshape(-1,1)
b = 1.
dJdb, dJdw = compute_gradient_logistic_loop(X, y, w, b)
print(f"dJdb, non-vectorized version: {dJdb}" )
print(f"dJdw, non-vectorized version: {dJdw.tolist()}" )

Check the implementation of your gradient function using the cell below.

**Expected output**

``` 
dJdb, non-vectorized version: 0.49861806546328574
dJdw, non-vectorized version: [[0.498333393278696], [0.49883942983996693]]
```

### (Optional ) Vectorized version

You will now implement a vectorized version of the gradient function.

The vectorized version of the gradient formula is 

$$\frac{\partial \mathbf{J_{w,b}}(\mathbf{X,y})}{\partial \mathbf{b}}= \frac{1}{m} sum(\mathbf{f_{w,b}} - \mathbf{y}) \tag{4}$$ 


$$\nabla_{\mathbf{w}}\mathbf{J} = \frac{1}{m} \mathbf{X^T}(\mathbf{f} - \mathbf{y}) \tag{5}$$ 

where

$$ \mathbf{f_{w,b}} = g(\mathbf{X}  \mathbf{w})$$

As before, $g$ is the sigmoid function


**Exercise**

You'll complete the vectorized cost function utilizing the equations above. The Hint is available  if you run into difficulties.




**Debugging Tip:** Vectorizing code can sometimes be tricky. One common strategy for debugging is to print out the sizes of the matrices you are working with using the size function. For example, given a data matrix $\mathbf{X}$ of size 6 × 3 (6 examples, 3 features) and $\mathbf{w}$, a vector with dimensions 3x1, you can observe that $\mathbf{Xw}$ is a valid multiplication operation, while $\mathbf{wX}$ is not.

<details>
<summary>
    <font size='3', color='darkgreen'><b>Hints</b></font>
</summary>

```python
def compute_gradient_logistic_matrix(X, y, w, b): 
    """
    Computes the gradient for linear regression 
 
    Args:
      X : (array_like Shape (m,n)) variable such as house size 
      y : (array_like Shape (m,1)) actual value 
      w : (array_like Shape (n,1)) Values of parameters of the model      
      b : (scalar )                Values of parameter of the model      
    Returns
      dJdw: (array_like Shape (n,1)) The gradient of the cost w.r.t. the parameters w. 
      dJdb: (scalar)                 The gradient of the cost w.r.t. the parameter b. 
                                  
    """
    m,n = X.shape
    ### START CODE HERE ### 
    f_wb =  sigmoid(X @ w + b)      
    err  = f_wb - y                 
    dJdw = (1/m) * (X.T @ err)      
    dJdb = (1/m) * np.sum(err)      
    ### END CODE HERE ###         
        
    return dJdb,dJdw
```

In [None]:
def compute_gradient_logistic_matrix(X, y, w, b): 
    """
    Computes the gradient for linear regression 
 
    Args:
      X : (array_like Shape (m,n)) variable such as house size 
      y : (array_like Shape (m,1)) actual value 
      w : (array_like Shape (n,1)) Values of parameters of the model      
      b : (scalar )                Values of parameter of the model      
    Returns
      dJdw: (array_like Shape (n,1)) The gradient of the cost w.r.t. the parameters w. 
      dJdb: (scalar)                 The gradient of the cost w.r.t. the parameter b. 
                                  
    """
    m,n = X.shape
    ### START CODE HERE ### 
    ### BEGIN SOLUTION ###
    f_wb =  sigmoid(X @ w + b)      ##None
    err  = f_wb - y                 ##None
    dJdw = (1/m) * (X.T @ err)      ##None
    dJdb = (1/m) * np.sum(err)      ##None
    ### END SOLUTION ### 
    ### END CODE HERE ###         
        
    return dJdb,dJdw

Now let's check if the output of this function is equivalent to the output of your non-vectorized implementation above.

In [None]:
w = np.array([2.,3.]).reshape(-1,1)
b = 1.
dJdb, dJdw = compute_gradient_logistic_loop(X, y, w, b)
print(f"dJdb, non-vectorized version: {dJdb}" )
print(f"dJdw, non-vectorized version: {dJdw.tolist()}" )
dJdb, dJdw = compute_gradient_logistic_matrix(X, y, w, b)
print(f"dJdb, vectorized version: {dJdb}" )
print(f"dJdw, vectorized version: {dJdw.tolist()}" )
#print("Gradients computed by matrix version: \n", compute_gradient_logistic_matrix(X, y, w, b, predict))

**Expected output** 

```
dJdb, non-vectorized version: 0.49861806546328574
dJdw, non-vectorized version: [[0.498333393278696], [0.49883942983996693]]
dJdb, vectorized version: 0.49861806546328574
dJdw, vectorized version: [[0.498333393278696], [0.4988394298399669]]
```
