ADAM Cody & DE ZORDO Benjamin

<font  style="font-size: 4rem; color: darkviolet"> Logistic Regression as a Neural Network </font>

AA - 2022/23 - TP5

In this assignment, you will build a logistic regression classifier to recognize cats using a Neural Network approach. The goal is to build a general architecture of a learning algorithm, including initializing parameters, calculating the cost function and its gradient, and using an optimization algorithm (gradient descent). This will help you develop a better understanding and intuition about deep learning.

### Table of Contents
- [1 - Dataset](#1)
    - [Exercise 1.1 - dimensions and shapes](#ex-1.1)
    - [Exercise 1.2 - reshape the datasets](#ex-1.2)
    - [Exercise 1.3 - standardize the data](#ex-1.3)
- [2 - Building the components of the algorithm](#2)
    - [Exercise 2.1 - initialization](#ex-2.1)
    - [Exercise 2.2 - forward and backward propagation](#ex-2.2)
    - [Exercise 2.3 - optimization](#ex-2.3)
    - [Exercise 2.4 - prediction](#ex-2.4)
- [3 - Merge all functions into a model](#3)
    - [Exercise 3.1 - model](#ex-3.1)

In [None]:
import numpy as np
import copy
import matplotlib.pyplot as plt
import h5py
import scipy
import time
from PIL import Image
from scipy import ndimage

%matplotlib inline

<a name='1'></a>
## <font color='darkviolet'> 1 - Dataset ##

The problem involves working with a dataset that contains two sets of images: a training set and a test set. The images are classified as either cats (labeled as y=1) or non-cats (labeled as y=0). 

Each image is represented as a matrix of size (num_px,num_px,3), where 3 represents the RGB channels. 
    
The goal is to develop an image recognition algorithm that can accurately classify new images as cats or non-cats. To begin, the dataset is loaded into memory using the provided code.

In [None]:
def load_dataset():
    train_dataset = h5py.File('./data/train_catvnoncat.h5', "r")
    train_set_x = np.array(train_dataset["train_set_x"][:]) # the train set features
    train_set_y = np.array(train_dataset["train_set_y"][:]) # the train set labels

    test_dataset = h5py.File('./data/test_catvnoncat.h5', "r")
    test_set_x = np.array(test_dataset["test_set_x"][:]) # the test set features
    test_set_y = np.array(test_dataset["test_set_y"][:]) # the test set labels

    classes = np.array(test_dataset["list_classes"][:]) # the list of classes
    
    train_set_y = train_set_y.reshape((1, train_set_y.shape[0]))
    test_set_y = test_set_y.reshape((1, test_set_y.shape[0]))
    
    return train_set_x, train_set_y, test_set_x, test_set_y, classes

In [None]:
# Loading the data
train_set_x_raw, train_set_y, test_set_x_raw, test_set_y, classes = load_dataset()

In [None]:
# Example of a picture
index =10
plt.imshow(train_set_x_raw[index])
print ("y = " + str(train_set_y[0, index]) + ", it's a '" + classes[np.squeeze(train_set_y[:, index])].decode("utf-8") +  "' picture.")

<a name='ex-1.1'></a>

### <font color='blue'> Exercise 1.1: dimensions and shapes 

<font color='blue'> Determine the values of the variables of interest: m_train, which is to the number of training examples; m_test, which is the number of test examples; num_px, which denotes the height and width of a given training image.

Note that `train_set_x_raw` is a numpy-array with dimensions (m_train, num_px, num_px, 3). To extract the value for m_train, one may access the first element of the shape attribute of `train_set_x_raw` as such `train_set_x_raw.shape[0]`. Similarly, num_px corresponds to the height and width of a training image, and is thus equivalent to the second and third elements of the shape attribute of `train_set_x_raw`.

In [None]:
# TODO: write the lines to determine m_train, m_test, num_px

<a name='ex-1.2'></a>

### <font color='blue'> Exercise 1.2: reshape the datasets 
    
<font color='blue'> To simplify subsequent computations, reshape the (num_px, num_px, 3) images into a numpy array of shape (num_px * num_px * 3, 1), resulting in a training and test dataset where each column corresponds to a flattened image. The dataset will contain m_train (and m_test) columns, where m_train (and m_test) represent the number of examples.

In [None]:
# TODO: write the lines to reshape the training and test examples. Use the  use the reshape() function.

<a name='ex-1.3'></a>

### <font color='blue'> Exercise 1.3: standardize the data 
Color images are represented by specifying the red, green, and blue channels (RGB) for each pixel. Consequently, the value of each pixel is expressed as a vector comprising three numbers, ranging from 0 to 255. Dividing each pixel value by 255 scales the pixel values to the range [0, 1], which can make it easier for the model to learn the underlying patterns in the data.
    
<font color='blue'> Standardize the dataset by dividing each row by 255.

In [None]:
# TODO: write the lines to standardize the dataset by dividing each row by 255.

<a name='2'></a>
## <font color='darkviolet'> 2 - Building the components of the algorithm ##

In this section, we will build a simple algorithm that can distinguish cat images from non-cat images. Specifically, we will build a logistic regression model using a Neural Network approach. 

**Reminder**. The mathematical expression of the logistic regression algorithm, given an input example $x^{(i)}$:

$z^{(i)} = w^T x^{(i)} + b \tag{1}$ 

$\hat{y}^{(i)} = a^{(i)} = sigmoid(z^{(i)})\tag{2}$

$\mathcal{L}(a^{(i)}, y^{(i)}) = - y^{(i)} \log(a^{(i)}) - (1-y^{(i)} ) \log(1-a^{(i)})\tag{3}$

$ J = \frac{1}{m} \sum_{i=1}^m \mathcal{L}(a^{(i)}, y^{(i)})\tag{4}$

**Key steps** we will implement:

1. Initialize the parameters of the model
3. Loop:
   * Calculate loss (forward propagation)
   * Calculate gradient (backward propagation)
   * Update parameters (gradient descent)
        
We will build each step separately and then integrate them into one function called `model()`.

<a name='ex-2.1'></a>
### <font color='blue'> Exercise 2.1: Initializing parameters
    
<font color='blue'> Implement the function `initialize_with_zeros()` that initializes w as a vector of zeros, and b to zero.
If you don't know what numpy function to use, look up np.zeros() in the Numpy library's documentation.

In [None]:
def initialize_with_zeros(dim):
    """
    This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.
    
    Argument:
    dim -- size of the w vector we want (or number of parameters in this case)
    
    Returns:
    w -- initialized vector of shape (dim, 1)
    b -- initialized scalar (corresponds to the bias) of type float
    """
    
    # TODO: Implement this function
        
    return w, b

In [None]:
dim = 2
w, b = initialize_with_zeros(dim)

assert type(b) == float
print ("w = " + str(w))
print ("b = " + str(b))

<a name='ex-2.2'></a>
### <font color='blue'> Exercise 2.2: Forward and Backward propagation

Now that we have initialized the parameters, we can do the forward and backward propagation steps for learning the parameters.

<font color='blue'> a. Implement the sigmoid function using np.exp().
    
<font color='blue'> b. Implement a function `propagate()` that computes the cost function and its gradient. Write your code step by step. Don't use loops for the sum; use np.log(), np.dot().

**Recap**:

* Forward Propagation:
1. Compute $A = \sigma(w^T X + b) = (a^{(1)}, a^{(2)}, ..., a^{(m)})$
3. Calculate the cost function $J = -\frac{1}{m}\sum_{i=1}^{m}(y^{(i)}\log(a^{(i)})+(1-y^{(i)})\log(1-a^{(i)}))$

* Backward Propagation key formulas: 

$$ \frac{d J}{d w} = \frac{1}{m}X(A-Y)^T\tag{7}$$
$$ \frac{d J}{d b} = \frac{1}{m} \sum_{i=1}^m (a^{(i)}-y^{(i)})\tag{8}$$

In [None]:
def sigmoid(z):
    """
    Compute the sigmoid of z

    Arguments:
    z -- A scalar or numpy array of any size.

    Return:
    s -- sigmoid(z)
    """

    # TODO: Implement this function
    
    return s


In [None]:
def propagate(w, b, X, Y):
    """
    Implement the cost function and its gradient for logistic regression.

    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of size (num_px * num_px * 3, number of examples)
    Y -- true labels, a numpy array of size (1, number of examples)

    Returns:
    cost -- negative log-likelihood cost for logistic regression
    dw -- gradient of the loss with respect to w, same shape as w
    db -- gradient of the loss with respect to b, same shape as b
    """
    
    m = X.shape[1]  # number of examples
    
    # TODO
    # FORWARD PROPAGATION
    # Z = ...
    # A = ...
    # cost = ...
    
    # BACKWARD PROPAGATION
    # dZ = ...
    # dw = ...
    # db = ...
        
    grads = {"dw": dw, "db": db}
    
    return grads, cost

In [None]:
w =  np.array([[1.], [2]])
b = 1.5
X = np.array([[1., -2., -1.], [3., 0.5, -3.2]])
Y = np.array([[1, 1, 0]])
grads, cost = propagate(w, b, X, Y)

assert type(grads["dw"]) == np.ndarray
assert grads["dw"].shape == (2, 1)
assert type(grads["db"]) == np.float64

print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
print ("cost = " + str(cost))

<a name='ex-2.3'></a>
### <font color='blue'> Exercise 2.3: Optimization

Now that we have initialized the parameters and can compute the cost function and its gradient, it's time to update the parameters using gradient descent.

<font color='blue'> Implement the optimization function `optimize()`. The goal is to learn $w$ and $b$ by minimizing the cost function $J$.

**Recall** Iterate through the loop:
1) Calculate the cost and the gradient for the current parameters, using propagate();
2) Update the parameters using gradient descent rule for w and b.
    
For a parameter $\theta$, the update rule is $ \theta = \theta - \alpha \text{ } d\theta$, where $\alpha$ is the learning rate.

In [None]:
def optimize(w, b, X, Y, num_iterations=100, learning_rate=0.009, print_cost=False):
    """
    Optimizes the parameters w and b using gradient descent.
    
    Arguments:
    w -- weights, a numpy array of size (num_features, 1)
    b -- bias, a scalar
    X -- input data of size (num_features, num_samples)
    Y -- true "label" vector of size (1, num_samples)
    num_iterations -- number of iterations of the optimization loop
    learning_rate -- learning rate of the gradient descent update rule
    print_cost -- True to print the cost every 100 iterations
    
    Returns:
    params -- dictionary containing the optimized weights w and bias b
    grads -- dictionary containing the gradients of the weights and bias
    costs -- list of all the costs computed during the optimization
    """
    
    costs = []
    
    for i in range(num_iterations):
        
        # TODO Cost and gradient calculation
        # grads, cost = ...
        
        # Retrieve derivatives from the dictionary grads
        dw = grads["dw"]
        db = grads["db"]
        
        # TODO Update the parameters
        # w = ...
        # b = ...
        
        # Record the costs
        if i % 100 == 0:
            costs.append(cost)
            
            # Print the cost every 100 training iterations
            if print_cost and i % 100 == 0:
                print(f"Cost after iteration {i}: {cost}")
    
    # Save the optimized parameters and gradients in dictionaries
    params = {"w": w, "b": b}
    grads = {"dw": dw, "db": db}
    
    return params, grads, costs

In [None]:
params, grads, costs = optimize(w, b, X, Y, num_iterations=100, learning_rate=0.009, print_cost=False)

print ("w = " + str(params["w"]))
print ("b = " + str(params["b"]))
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
print("Costs = " + str(costs))

<a name='ex-2.4'></a>
### <font color='blue'> Exercise 2.4: Prediction
    
<font color='blue'> Implement the `predict()` function that returns the predicted labels for a dataset X using the learned parameters w and b. Don't use loops, use vectorization.
    
The function should return a numpy array containing all predictions for the examples in X. The predicted values can be rounded to the nearest integer to obtain binary predictions.

In [None]:
def predict(w, b, X):
    '''
    Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)
    
    Arguments:
    w -- weights, a numpy array of shape (num_features, 1)
    b -- bias, a scalar
    X -- data of shape (num_features, num_examples)
    
    Returns:
    Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X
    '''
    
    m = X.shape[1]
    
    Y_prediction = np.zeros((1, m))
    
    # TODO Compute vector Z containing the logits for each example
    # Z = ...
    
    # TODO Compute vector A containing the predicted probabilities for each example
    # A = ...
    
    # TODO Convert probabilities to actual predictions
    # Y_prediction = ...
    
    return Y_prediction

In [None]:
w = np.array([[0.1124579], [0.23106775]])
b = -0.3
X = np.array([[1., -1.1, -3.2],[1.2, 2., 0.1]])
print ("predictions = " + str(predict(w, b, X)))

<a name='3'></a>
## <font color='darkviolet'> 3 - Merging the components into a model
    
<a name='ex-3.1'></a>
### <font color='blue'> Exercise 3.1: Model
<font color='blue'> Implement the `model()` function that puts together all the building blocks (functions implemented in the previous parts) in the right order to obtain the overall model.

In [None]:
def model(X_train, Y_train, X_test, Y_test, num_iterations=2000, learning_rate=0.5, print_cost=False):
    """
    Builds the logistic regression model by calling previous functions
    
    Arguments:
    X_train -- training set represented by a numpy array of shape (num_features, m_train)
    Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)
    X_test -- test set represented by a numpy array of shape (num_features, m_test)
    Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)
    num_iterations -- hyperparameter representing the number of iterations to optimize the parameters
    learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()
    print_cost -- Set to True to print the cost every 100 iterations
    
    Returns:
    d -- dictionary containing information about the model.
    """
    
    # TODO
    # Initialize parameters with zeros   
    # Gradient descent to optimize parameters
    
    # TODO Retrieve parameters w and b from dictionary "params"
    # w = ...
    # b = ...
    
    # TODO Predict test/train set examples using learned parameters
    # Y_prediction_test = ...
    # Y_prediction_train = ...
    
    # Print train/test Errors
    if print_cost:
        print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
        print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))

    
    d = {"costs": costs,
         "Y_prediction_test": Y_prediction_test, 
         "Y_prediction_train" : Y_prediction_train, 
         "w" : w, 
         "b" : b,
         "learning_rate" : learning_rate,
         "num_iterations": num_iterations}
                                      
    return d



Run the following cell to train the model:
    

In [None]:
logistic_regression_model = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations=2000, learning_rate=0.0005, print_cost=True)

<font color='blue'>
Q1: 

*  Provide an interpretation of the results obtained above. 

In [None]:
# Example of an image that was wrongly classified.
index = 10
plt.imshow(test_set_x[:, index].reshape((num_px, num_px, 3)))
print ("y = " + str(test_set_y[0,index]) + ", you predicted that it is a \"" + classes[int(logistic_regression_model['Y_prediction_test'][0,index])].decode("utf-8") +  "\" picture.")


Run the cell below to compare the learning curve of our model with different choices of learning rates. Feel free to try different values and see what happens. 
    

In [None]:
learning_rates = [0.01, 0.001, 0.0001]
models = {}

for lr in learning_rates:
    print ("Training a model with learning rate: " + str(lr))
    models[str(lr)] = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations=1500, learning_rate=lr, print_cost=False)
    print ("-------------------------------------------------------")

for lr in learning_rates:
    plt.plot(np.squeeze(models[str(lr)]["costs"]), label=str(models[str(lr)]["learning_rate"]))

plt.ylabel('cost')
plt.xlabel('iterations (hundreds)')

legend = plt.legend(loc='upper center', shadow=True)
frame = legend.get_frame()
frame.set_facecolor('0.90')
plt.show()

<font color='blue'>  
<font color='blue'>  Q2:    
    
* What is the purpose of the learning rate in the gradient descent algorithm and how can it affect the performance of the model?
    
* Why it is important to use a well-tuned learning rate?
    
    
* Provide an interpretation of the results obtained above for different learning rates.