# Ungraded Lab - Gradient for regularized logistic regression
In this lab you will extend the implementation of determining the gradient to include regularization.

In [3]:
import numpy as np
from lab_utils import sigmoid

### Gradient for regularized logistic regression

Below, you will implement the gradient for regularized logistic regression.

The gradient of the regularized cost function is a vector with the same shape as the parameters $\mathbf{w}$, where the $j^\mathrm{th}$ element is defined as follows:

$$\frac{\partial J(\mathbf{w})}{\partial w_0} = \frac{1}{m}  \sum_{i=0}^{m-1} (f_{\mathbf{w}}(\mathbf{x}^{(i)}) - y^{(i)}) x_j^{(i)} \quad\quad\quad\quad\quad\quad \mbox{for $j = 0$}$$

$$\frac{\partial J(\mathbf{w})}{\partial w_j} = \left( \frac{1}{m}  \sum_{i=0}^{m-1} (f_{\mathbf{w}}(\mathbf{x}^{(i)}) - y^{(i)}) x_j^{(i)} \right) + \frac{\lambda}{m} w_j  \quad\, \mbox{for $j \geq 1$}$$


You'll implement a function called `compute_gradient_reg` which will return $\frac{\partial J(\mathbf{w})}{\partial \mathbf{w}}$.

In the instructions and code below, the term 'dw' refers to $\frac{\partial J(\mathbf{w})}{\partial \mathbf{w}}$. The dJ is implied.  
Please complete the `compute_gradient_reg` function to:
- Loop over each element in dw (which has been initialized to 0) (`j`). 
    - Calculate the gradient for each element ($dw_j$) 
        - Loop over all examples in the training set. Create a variable outside the loop to store the total gradient
        - Inside the loop, calculate the gradient update from each training example 
            - Calculate `z`
            $$
            z =  w_0x_0^{(i)} + w_1x_1^{(i)} + w_2x_2^{(i)} = \mathbf{w} \cdot \mathbf{x}^{(i)}
            $$
            - Predict `f` where `g` is the sigmoid function
            $$
            f =  g(z)
            $$
            - Calculate the gradient from each example

            $$gradient =  (f - y^{(i)})x_j^{(i)}$$
    
            - Add this gradient to the total gradient variable created outside the loop
    
        - Get the total gradient at $dw_j$ as the sum of the gradient from all iterations and return the total divided by the number of examples.
    - If `j> 0`
        - add $\frac{\lambda}{m} w_j$ to $dw_j$

As you are doing this, remember that the variables X and y are not scalar values but matrices of shape ($m, n+1$) and ($𝑚$, ) respectively, where  $𝑛$ is the number of features and $𝑚$ is the number of training examples. 


In [4]:
## GRADED FUNCTION: compute_gradient_reg

def compute_gradient_reg(X, y, w, lambda_=1): 
    """
    Computes the gradient for regularized logistic regression.
    
    Parameters
    ----------
    X : array_like
        Shape (m, n+1) 
    
    y : array_like
        Shape (m,) 
    
    w : array_like
        Parameters of the model
        Shape (n+1,)
        
    lambda_ : float
        Controls amount of regularization
        Default is set to 1
    
    Returns
    -------
    dw : array_like
        Shape (n+1,)
        The gradient of the regularized cost w.r.t. the parameters w. 
        Note that dw has the same dimensions as w.
    """
    
    m, n = X.shape
    
    # You will need to replace/update dw
    dw = np.zeros_like(w)
    
    ### START CODE HERE ### 
    ### BEGIN SOLUTION ###
    for j in range(n):
        gradient_list = []
        
        for i in range(m):        
            z = np.dot(w.T, X[i])
            f = sigmoid(z)
            gradient = (f-y[i])*X[i][j]          
            gradient_list.append(gradient)
        
        dw[j] = (1/m)* sum(gradient_list)
        
        if j > 0:
            dw[j] += (lambda_/m)*w[j]
    ### END SOLUTION ###  
    ### END CODE HERE ### 
    
    return dw

Run the cell below to check your implementation of the `compute_gradient_reg` function.

In [5]:
np.random.seed(1)
X_tmp = np.random.rand(5,6)
y_tmp = np.array([0,1,0,1,0])
initial_w = np.random.rand(X_tmp.shape[1]) - 0.5
lambda_ = 1
grad = compute_gradient_reg(X_tmp, y_tmp, initial_w, lambda_)

print(f"First few elements of regularized gradient:\n {grad[:4]}", )

First few elements of regularized gradient:
 [ 0.09181613  0.14678723 -0.00790065 -0.03773603]


**Expected Output**:
<table>
  <tr>
    <td> <b>First few elements of regularized gradient:</b </td></tr>
    <td>  [ 0.09181613  0.14678723 -0.00790065 -0.03773603] </td> 
  </tr>
</table>