# Logistic Regression with a Neural Network Mindset

Welcome to your second programming assignment! In this notebook, you will build a logistic regression classifier to recognize whether an image contains a cat or not. This is an important building block for understanding neural networks.

**Instructions:**
- Do not use loops (for/while) for your computations unless explicitly told to do so
- Use vectorization and numpy operations
- After coding each function, run the cell below it to test your implementation

**By the end of this assignment, you will be able to:**
- Build the general architecture of a learning algorithm, including:
    - Initializing parameters
    - Calculating the cost function and its gradient
    - Using an optimization algorithm (gradient descent)
- Gather all three functions above into a main model function

Let's get started!

## Table of Contents
- [1 - Packages](#1)
- [2 - Overview of the Problem](#2)
- [3 - General Architecture of the Learning Algorithm](#3)
- [4 - Building the Parts of the Algorithm](#4)
    - [4.1 - Helper Functions](#4-1)
        - [Exercise 1 - sigmoid](#ex-1)
    - [4.2 - Initializing Parameters](#4-2)
        - [Exercise 2 - initialize_with_zeros](#ex-2)
    - [4.3 - Forward and Backward Propagation](#4-3)
        - [Exercise 3 - propagate](#ex-3)
    - [4.4 - Optimization](#4-4)
        - [Exercise 4 - optimize](#ex-4)
    - [4.5 - Prediction](#4-5)
        - [Exercise 5 - predict](#ex-5)
- [5 - Merge All Functions into a Model](#5)
    - [Exercise 6 - model](#ex-6)
- [6 - Test with Your Own Image](#6)
- [7 - Analysis](#7)

<a name='1'></a>
## 1 - Packages

First, let's import all the packages you will need during this assignment:
- [numpy](https://numpy.org/doc/stable/) is the fundamental package for scientific computing with Python
- [matplotlib](http://matplotlib.org) is a library to plot graphs in Python

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import copy

%matplotlib inline

<a name='2'></a>
## 2 - Overview of the Problem

**Problem Statement**: You are given a dataset containing:
- A training set of m_train images labeled as cat (y=1) or non-cat (y=0)
- A test set of m_test images labeled as cat or non-cat
- Each image is of shape (num_px, num_px, 3) where 3 is for the 3 channels (RGB)

You will build a simple image-recognition algorithm that can correctly classify pictures as cat or non-cat.

Let's familiarize ourselves with the dataset structure.

In [None]:
# For this exercise, we'll create synthetic data
# In a real scenario, you would load actual cat/non-cat images

np.random.seed(1)

# Create synthetic dataset
m_train = 200  # number of training examples
m_test = 50    # number of test examples
num_px = 64    # height/width of each image

# Generate random images (flattened)
train_set_x = np.random.randn(num_px * num_px * 3, m_train)
test_set_x = np.random.randn(num_px * num_px * 3, m_test)

# Generate random labels (0 or 1)
train_set_y = np.random.randint(0, 2, (1, m_train))
test_set_y = np.random.randint(0, 2, (1, m_test))

print("train_set_x shape: " + str(train_set_x.shape))
print("train_set_y shape: " + str(train_set_y.shape))
print("test_set_x shape: " + str(test_set_x.shape))
print("test_set_y shape: " + str(test_set_y.shape))

**Expected Output:**
```
train_set_x shape: (12288, 200)
train_set_y shape: (1, 200)
test_set_x shape: (12288, 50)
test_set_y shape: (1, 50)
```

Note: Each image is represented as a column vector of shape (12288, 1). 12288 = 64 × 64 × 3.

<a name='3'></a>
## 3 - General Architecture of the Learning Algorithm

It's time to design a simple algorithm to distinguish cat images from non-cat images.

You will build a Logistic Regression using a Neural Network mindset. The following figure explains why **Logistic Regression is actually a very simple Neural Network!**

**Mathematical expression of the algorithm**:

For one example $x^{(i)}$:
$$z^{(i)} = w^T x^{(i)} + b \tag{1}$$
$$\hat{y}^{(i)} = a^{(i)} = \sigma(z^{(i)})\tag{2}$$ 
$$ \mathcal{L}(a^{(i)}, y^{(i)}) =  - y^{(i)}  \log(a^{(i)}) - (1-y^{(i)} )  \log(1-a^{(i)})\tag{3}$$

The cost is then computed by summing over all training examples:
$$ J = \frac{1}{m} \sum_{i=1}^m \mathcal{L}(a^{(i)}, y^{(i)})\tag{4}$$

**Key steps**:
1. Initialize the parameters of the model
2. Learn the parameters for the model by minimizing the cost  
3. Use the learned parameters to make predictions (on the test set)
4. Analyze the results and conclude

<a name='4'></a>
## 4 - Building the Parts of the Algorithm

The main steps for building a Neural Network are:
1. Define the model structure (such as number of input features) 
2. Initialize the model's parameters
3. Loop:
    - Calculate current loss (forward propagation)
    - Calculate current gradient (backward propagation)
    - Update parameters (gradient descent)

You often build 1-3 separately and integrate them into one function we call `model()`.

<a name='4-1'></a>
### 4.1 - Helper Functions

<a name='ex-1'></a>
### Exercise 1 - sigmoid

Build the sigmoid function. You will use this function in several places in this assignment. Recall from the previous exercise that:

$$\text{For } z = w^T x + b$$
$$\sigma(z) = \frac{1}{1 + e^{-z}}$$

In [None]:
# sigmoid

def sigmoid(z):
    """
    Compute the sigmoid of z

    Arguments:
    z -- A scalar or numpy array of any size.

    Return:
    s -- sigmoid(z)
    """

    # (≈ 1 line of code)
    # s = 
    # YOUR CODE STARTS HERE
    
    
    # YOUR CODE ENDS HERE
    
    return s

In [None]:
# Test the sigmoid function
print("sigmoid(0) = " + str(sigmoid(0)))
print("sigmoid(9.2) = " + str(sigmoid(9.2)))
print("sigmoid(array) = " + str(sigmoid(np.array([1, 2, 3]))))

**Expected Output:**
```
sigmoid(0) = 0.5
sigmoid(9.2) = 0.9998989898989899
sigmoid(array) = [0.73105858 0.88079708 0.95257413]
```

<a name='4-2'></a>
### 4.2 - Initializing Parameters

<a name='ex-2'></a>
### Exercise 2 - initialize_with_zeros

Implement parameter initialization in the cell below. You have to initialize w as a vector of zeros with shape (dim, 1) and b to 0.

**Hint**: Use `np.zeros()`

In [None]:
# initialize_with_zeros

def initialize_with_zeros(dim):
    """
    This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.
    
    Argument:
    dim -- size of the w vector we want (or number of parameters in this case)
    
    Returns:
    w -- initialized vector of shape (dim, 1)
    b -- initialized scalar (corresponds to the bias) of type float
    """
    
    # (≈ 2 lines of code)
    # w = ...
    # b = ...
    # YOUR CODE STARTS HERE
    
    
    # YOUR CODE ENDS HERE

    return w, b

In [None]:
# Test initialize_with_zeros
dim = 2
w, b = initialize_with_zeros(dim)

print("w = " + str(w))
print("b = " + str(b))

assert type(b) == float
print("\n\033[92mAll tests passed!")

**Expected Output:**
```
w = [[0.]
 [0.]]
b = 0.0
```

<a name='4-3'></a>
### 4.3 - Forward and Backward Propagation

Now that your parameters are initialized, you can do the "forward" and "backward" propagation steps for learning the parameters.

<a name='ex-3'></a>
### Exercise 3 - propagate

Implement a function `propagate()` that computes the cost function and its gradient.

**Hints**:

Forward Propagation:
- You get X
- You compute $A = \sigma(w^T X + b) = (a^{(1)}, a^{(2)}, ..., a^{(m-1)}, a^{(m)})$
- You calculate the cost function: $J = -\frac{1}{m}\sum_{i=1}^{m}(y^{(i)}\log(a^{(i)})+(1-y^{(i)})\log(1-a^{(i)}))$

Here are the two formulas you will be using: 

$$ \frac{\partial J}{\partial w} = \frac{1}{m}X(A-Y)^T\tag{5}$$
$$ \frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (a^{(i)}-y^{(i)})\tag{6}$$

In [None]:
# propagate

def propagate(w, b, X, Y):
    """
    Implement the cost function and its gradient for the propagation explained above

    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of size (num_px * num_px * 3, number of examples)
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)

    Return:
    cost -- negative log-likelihood cost for logistic regression
    dw -- gradient of the loss with respect to w, thus same shape as w
    db -- gradient of the loss with respect to b, thus same shape as b
    
    Tips:
    - Write your code step by step for the propagation. np.log(), np.dot()
    """
    
    m = X.shape[1]
    
    # FORWARD PROPAGATION (FROM X TO COST)
    # (≈ 2 lines of code)
    # compute activation
    # A = ...
    # compute cost by using np.dot to perform multiplication. 
    # And don't use loops for the sum.
    # cost = ...                              
    # YOUR CODE STARTS HERE
    
    
    # YOUR CODE ENDS HERE
    
    # BACKWARD PROPAGATION (TO FIND GRAD)
    # (≈ 2 lines of code)
    # dw = ...
    # db = ...
    # YOUR CODE STARTS HERE
    
    
    # YOUR CODE ENDS HERE
    
    cost = np.squeeze(np.array(cost))

    
    grads = {"dw": dw,
             "db": db}
    
    return grads, cost

In [None]:
# Test propagate
w =  np.array([[1.], [2]])
b = 1.5
X = np.array([[1., -2., -1.], [3., 0.5, -3.2]])
Y = np.array([[1, 1, 0]])
grads, cost = propagate(w, b, X, Y)

print("dw = " + str(grads["dw"]))
print("db = " + str(grads["db"]))
print("cost = " + str(cost))

assert type(grads["dw"]) == np.ndarray
assert grads["dw"].shape == (2, 1)
assert type(grads["db"]) == np.float64
print("\n\033[92mAll tests passed!")

**Expected Output:**
```
dw = [[ 0.25071532]
 [-0.06604096]]
db = -0.1250040450043965
cost = 0.15900537707692405
```

<a name='4-4'></a>
### 4.4 - Optimization

- You have initialized your parameters.
- You are also able to compute a cost function and its gradient.
- Now, you want to update the parameters using gradient descent.

<a name='ex-4'></a>
### Exercise 4 - optimize

Write down the optimization function. The goal is to learn $w$ and $b$ by minimizing the cost function $J$. For a parameter $\theta$, the update rule is $ \theta = \theta - \alpha \text{ } d\theta$, where $\alpha$ is the learning rate.

In [None]:
# GRADED FUNCTION: optimize

def optimize(w, b, X, Y, num_iterations=100, learning_rate=0.009, print_cost=False):
    """
    This function optimizes w and b by running a gradient descent algorithm
    
    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of shape (num_px * num_px * 3, number of examples)
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat), of shape (1, number of examples)
    num_iterations -- number of iterations of the optimization loop
    learning_rate -- learning rate of the gradient descent update rule
    print_cost -- True to print the loss every 100 steps
    
    Returns:
    params -- dictionary containing the weights w and bias b
    grads -- dictionary containing the gradients of the weights and bias with respect to the cost function
    costs -- list of all the costs computed during the optimization, this will be used to plot the learning curve.
    
    Tips:
    You need to write down two steps and iterate through them:
        1) Calculate the cost and the gradient for the current parameters. Use propagate().
        2) Update the parameters using gradient descent rule for w and b.
    """
    
    w = copy.deepcopy(w)
    b = copy.deepcopy(b)
    
    costs = []
    
    for i in range(num_iterations):
        # (≈ 1 line of code)
        # Cost and gradient calculation 
        # grads, cost = ...
        # YOUR CODE STARTS HERE
        
        
        # YOUR CODE ENDS HERE
        
        # Retrieve derivatives from grads
        dw = grads["dw"]
        db = grads["db"]
        
        # update rule (≈ 2 lines of code)
        # w = ...
        # b = ...
        # YOUR CODE STARTS HERE
        
        
        # YOUR CODE ENDS HERE
        
        # Record the costs
        if i % 100 == 0:
            costs.append(cost)
        
            # Print the cost every 100 training iterations
            if print_cost:
                print ("Cost after iteration %i: %f" %(i, cost))
    
    params = {"w": w,
              "b": b}
    
    grads = {"dw": dw,
             "db": db}
    
    return params, grads, costs

In [None]:
# Test optimize
w =  np.array([[1.], [2.]])
b = 1.5
X = np.array([[1., -2., -1.], [3., 0.5, -3.2]])
Y = np.array([[1, 1, 0]])

params, grads, costs = optimize(w, b, X, Y, num_iterations=100, learning_rate=0.009, print_cost=False)

print("w = " + str(params["w"]))
print("b = " + str(params["b"]))
print("dw = " + str(grads["dw"]))
print("db = " + str(grads["db"]))
print("Costs = " + str(costs))

assert type(costs) == list
print("\n\033[92mAll tests passed!")

**Expected Output:**
```
w = [[0.80956046]
 [2.0508202 ]]
b = 1.5948713189708588
dw = [[ 0.17860505]
 [-0.04840656]]
db = -0.08888460336847771
Costs = [0.15900537707692405]
```

<a name='4-5'></a>
### 4.5 - Prediction

The previous function will output the learned w and b. We are able to use w and b to predict the labels for a dataset X. Implement the `predict()` function. There are two steps to computing predictions:

1. Calculate $\hat{Y} = A = \sigma(w^T X + b)$

2. Convert the entries of a into 0 (if activation <= 0.5) or 1 (if activation > 0.5), stores the predictions in a vector `Y_prediction`. If you wish, you can use an `if`/`else` statement in a `for` loop (though there is also a way to vectorize this). 

<a name='ex-5'></a>
### Exercise 5 - predict

Implement the `predict()` function.

In [None]:
# GRADED FUNCTION: predict

def predict(w, b, X):
    '''
    Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)
    
    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of size (num_px * num_px * 3, number of examples)
    
    Returns:
    Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X
    '''
    
    m = X.shape[1]
    Y_prediction = np.zeros((1, m))
    w = w.reshape(X.shape[0], 1)
    
    # Compute vector "A" predicting the probabilities of a cat being present in the picture
    # (≈ 1 line of code)
    # A = ...
    # YOUR CODE STARTS HERE
    
    
    # YOUR CODE ENDS HERE
    
    # Convert probabilities A[0,i] to actual predictions p[0,i]
    # (≈ 4 lines of code)
    # for i in range(A.shape[1]):
        # Convert probabilities a[0,i] to actual predictions p[0,i]
        # YOUR CODE STARTS HERE
    
    
    # YOUR CODE ENDS HERE
    
    return Y_prediction

In [None]:
# Test predict
w = np.array([[0.3], [0.5]])
b = -0.2
X = np.array([[1., -1.1, -3.2], [1.2, 2., 0.1]])

print("predictions = " + str(predict(w, b, X)))

assert type(predict(w, b, X)) == np.ndarray
print("\n\033[92mAll tests passed!")

**Expected Output:**
```
predictions = [[1. 0. 0.]]
```

<font color='blue'>
    
**What to remember:**

You've implemented several functions that:
- Initialize (w,b)
- Optimize the loss iteratively to learn parameters (w,b):
    - Computing the cost and its gradient 
    - Updating the parameters using gradient descent
- Use the learned (w,b) to predict the labels for a given set of examples

<a name='5'></a>
## 5 - Merge All Functions into a Model

You will now see how the overall model is structured by putting together all the building blocks (functions implemented in the previous parts) together, in the right order.

<a name='ex-6'></a>
### Exercise 6 - model

Implement the model function. Use the following notation:
- Y_prediction_test for your predictions on the test set
- Y_prediction_train for your predictions on the train set
- parameters, grads, costs for the outputs of optimize()

In [None]:
# GRADED FUNCTION: model

def model(X_train, Y_train, X_test, Y_test, num_iterations=2000, learning_rate=0.5, print_cost=False):
    """
    Builds the logistic regression model by calling the function you've implemented previously
    
    Arguments:
    X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train)
    Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)
    X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)
    Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)
    num_iterations -- hyperparameter representing the number of iterations to optimize the parameters
    learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()
    print_cost -- Set to True to print the cost every 100 iterations
    
    Returns:
    d -- dictionary containing information about the model.
    """
    
    # (≈ 1 line of code)   
    # initialize parameters with zeros 
    # w, b = ...
    # YOUR CODE STARTS HERE
    
    
    # YOUR CODE ENDS HERE

    # Gradient descent 
    # (≈ 1 line of code)
    # params, grads, costs = ...
    # YOUR CODE STARTS HERE
    
    
    # YOUR CODE ENDS HERE
    
    # Retrieve parameters w and b from dictionary "params"
    w = params["w"]
    b = params["b"]
    
    # Predict test/train set examples (≈ 2 lines of code)
    # Y_prediction_test = ...
    # Y_prediction_train = ...
    # YOUR CODE STARTS HERE
    
    
    # YOUR CODE ENDS HERE

    # Print train/test Errors
    if print_cost:
        print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
        print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))

    
    d = {"costs": costs,
         "Y_prediction_test": Y_prediction_test, 
         "Y_prediction_train" : Y_prediction_train, 
         "w" : w, 
         "b" : b,
         "learning_rate" : learning_rate,
         "num_iterations": num_iterations}
    
    return d

If you wish, run the following cell to train your model.

In [None]:
# Train the model
logistic_regression_model = model(train_set_x, train_set_y, test_set_x, test_set_y,
                                   num_iterations=2000, learning_rate=0.005, print_cost=True)

**Expected Output (approximate values, will vary due to random data):**
```
Cost after iteration 0: 0.693147...
Cost after iteration 100: 0.584508...
...
train accuracy: 65-75 %
test accuracy: 50-60 %
```

Note: The accuracies shown are for synthetic random data. With real cat/non-cat images, you would expect much better performance!

Let's also plot the learning curve (cost vs. number of iterations):

In [None]:
# Plot learning curve (with costs)
costs = np.squeeze(logistic_regression_model['costs'])
plt.plot(costs)
plt.ylabel('cost')
plt.xlabel('iterations (per hundreds)')
plt.title("Learning rate =" + str(logistic_regression_model["learning_rate"]))
plt.show()

<a name='7'></a>
## 7 - Analysis

**Interpretation**: You can see that the cost decreases over iterations, showing that the parameters are being learned. With real image data (rather than random synthetic data), you would see much better accuracy results.

**Choice of learning rate**:

In order for Gradient Descent to work you must choose the learning rate wisely. The learning rate $\alpha$ determines how rapidly we update the parameters. If the learning rate is too large we may "overshoot" the optimal value. Similarly, if it is too small we will need too many iterations to converge to the best values.

**Things to try**:
- Different learning rates: Try learning_rate = 0.01, 0.001, 0.0001
- More iterations: Try num_iterations = 5000, 10000
- Real image data: Replace the synthetic data with actual cat/non-cat images

In [None]:
# Experiment with different learning rates
learning_rates = [0.01, 0.001, 0.0001]
models = {}

for lr in learning_rates:
    print("\nTraining a model with learning rate: " + str(lr))
    models[str(lr)] = model(train_set_x, train_set_y, test_set_x, test_set_y,
                            num_iterations=1500, learning_rate=lr, print_cost=False)
    print('train accuracy: {} %'.format(100 - np.mean(np.abs(models[str(lr)]["Y_prediction_train"] - train_set_y)) * 100))
    print('test accuracy: {} %'.format(100 - np.mean(np.abs(models[str(lr)]["Y_prediction_test"] - test_set_y)) * 100))

In [None]:
# Compare learning curves for different learning rates
for lr in learning_rates:
    plt.plot(np.squeeze(models[str(lr)]["costs"]), label=str(models[str(lr)]["learning_rate"]))

plt.ylabel('cost')
plt.xlabel('iterations (hundreds)')

legend = plt.legend(loc='upper right', shadow=True)
frame = legend.get_frame()
frame.set_facecolor('0.90')
plt.show()

<font color='blue'>
    
**What to remember from this assignment:**
1. Preprocessing the dataset is important.
2. You implemented each function separately: initialize(), propagate(), optimize(). Then you built a model().
3. Tuning the learning rate (which is an example of a "hyperparameter") can make a big difference to the algorithm.
4. This exercise introduced you to the general methodology of building a neural network model:
    - Define the model structure (such as number of input features)
    - Initialize the model's parameters
    - Loop:
        - Calculate current loss (forward propagation)
        - Calculate current gradient (backward propagation)
        - Update parameters (gradient descent)

**Congratulations!** You have successfully built your first machine learning model using logistic regression. In the next assignments, you will build deeper neural networks with hidden layers.