# Overview
## Logistic regression with a neural network mindset
Build general architecture of a learning algorithm:
- init parameters
- calculate the cost function and its gradient
- use optimization algorithm


## Problem statement
Given a dataset:
- a training set of `m_train` images labeled as cat `(y=1)` or non-cat `(y=0)`
- a test set of `m_test` images labeled as cat or non-cat

## 1 Packages

In [1]:
import numpy as np
import copy
import matplotlib.pyplot as plt
import h5py
import scipy
from PIL import Image
from scipy import ndimage



# 2 Pre-processing
In order to pre-process images of shape (num_px, num_px, 3) into numpy-array of shape (num_px * num_px * 3, 1):


In [2]:
# Given a dataset
# train_set_x_orig.shape: (m_train, num_px, num_px, 3)
train_set_x_orig = np.random.randint(0, 256, size = (20, 64, 64, 3))
# test_set_x_orig.shape: (m_test, num_px, num_px, 3)
test_set_x_orig = np.random.randint(0, 256, size = (10, 64, 64, 3))
m_train = train_set_x_orig.shape[0] # number of training examples
m_test = test_set_x_orig.shape[0] # number of test examples
train_set_x_flatten = train_set_x_orig.reshape(m_train, -1).T # reshape to (num_px * num_px * 3, m_train)
test_set_x_flatten = test_set_x_orig.reshape(m_test, -1).T # reshape to (num_px * num_px * 3, m_test)

print("number of training examples: " + str(m_train))
print("number of test examples: " + str(m_test))
print("train_set_x_orig shape: " + str(train_set_x_orig.shape))
print("train_set_x_flatten shape: " + str(train_set_x_flatten.shape))
print("test_set_x_orig shape: " + str(test_set_x_orig.shape))
print("test_set_x_flatten shape: " + str(test_set_x_flatten.shape))




number of training examples: 20
number of test examples: 10
train_set_x_orig shape: (20, 64, 64, 3)
train_set_x_flatten shape: (12288, 20)
test_set_x_orig shape: (10, 64, 64, 3)
test_set_x_flatten shape: (12288, 10)


Assuming pixels refers to RGB values with range from 0 to 255 (inclusive),  
standardize the dataset by simple division of row by 255 (max value).

In [3]:
# Standardize the data
train_set_x = train_set_x_flatten / 255.
test_set_x = test_set_x_flatten / 255.

# 3 Architecture for learning algorithm
Mathematical expression  
For one example $x^{(i)}$:
$$z^{(i)} = w^T x^{(i)} + b \tag{1}$$
$$\hat{y}^{(i)} = a^{(i)} = sigmoid(z^{(i)})\tag{2}$$ 
$$ \mathcal{L}(a^{(i)}, y^{(i)}) =  - y^{(i)}  \log(a^{(i)}) - (1-y^{(i)} )  \log(1-a^{(i)})\tag{3}$$

The cost is then computed by summing over all training examples:
$$ J = \frac{1}{m} \sum_{i=1}^m \mathcal{L}(a^{(i)}, y^{(i)})\tag{6}$$


# 4 Building the algorithm
1. Define model structure
2. Initialize model parameters
3. Loop
- Calculate current loss (forward propagation)
- Calculate current gradient (backward propagation)
- Update parameters (gradient descent)

## 4.1 Helper functions

### Sigmoid


In [4]:
def sigmoid(z):
    s = 1 / ( 1 + np.exp(-z))
    return s

## 4.2 Initializing parameters
initialize `w` as parameter of zeros

In [9]:
def initialize_with_zeros(dim):
    w = np.zeros(shape=(dim, 1))
    b = 0.0
    return w, b

dim = 2
w, b = initialize_with_zeros(dim)
print("w: " + str(w) + " shape: " + str(w.shape))
print("b: " + str(b) + " type: " + str(type(b)))


w: [[0.]
 [0.]] shape: (2, 1)
b: 0.0 type: <class 'float'>


## 4.3 Forward and back propagation

### propagate
implement a propagate function that computes the cost function and its gradient

In [11]:
def propagate(w, b, X, Y):
    m = X.shape[1]
    A = sigmoid(np.dot(w.T, X) + b)
    cost = -1 / m * np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A))

    dw = 1 / m * np.dot(X, (A - Y).T)
    db = 1 / m * np.sum(A - Y)

    cost = np.squeeze(np.array(cost))
    grds = {"dw": dw, "db": db}
    
    return grds, cost

w =  np.array([[1.], [2]])
b = 1.5

# X is using 3 examples, with 2 features each
# Each example is stacked column-wise
X = np.array([[1., -2., -1.], [3., 0.5, -3.2]])
Y = np.array([[1, 1, 0]])
grads, cost = propagate(w, b, X, Y)

assert type(grads["dw"]) == np.ndarray
assert grads["dw"].shape == (2, 1)
assert type(grads["db"]) == np.float64


print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
print ("cost = " + str(cost))

dw = [[ 0.25071532]
 [-0.06604096]]
db = -0.1250040450043965
cost = 0.15900537707692405


## 4.4 Optimization
- after initializing parameters
- and computing the cost function and its gradient
- now update the parameters using gradient descent

### Optimization
implmenet the optimization function to learn `w` and `b` by minimizing cost function `J`.  
For parameter $\theta$, the update rule is $ \theta = \theta - \alpha \text{ } d\theta$, where $\alpha$ is the learning rate.

In [14]:
def optimize(w, b, X, Y, num_iterations=100, learning_rate=0.009, print_cost=False):
    w = copy.deepcopy(w)
    b = copy.deepcopy(b)

    costs = []

    for i in range(num_iterations):
        grds, cost = propagate(w, b, X, Y)

        dw = grds["dw"]
        db = grds["db"]

        w = w - learning_rate * dw
        b = b - learning_rate * db
        
        if i % 100 == 0:
            costs.append(cost)

            if print_cost:
                print("Cost after iteration %i: %f" %(i, cost))
    
    params = {"w": w, "b": b}
    grds = {"dw": dw, "db": db}
    return params, grds, costs

params, grads, costs = optimize(w, b, X, Y, num_iterations=100, learning_rate=0.009, print_cost=False)

print ("w = " + str(params["w"]))
print ("b = " + str(params["b"]))
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
print("Costs = " + str(costs))

w = [[0.80956046]
 [2.0508202 ]]
b = 1.5948713189708588
dw = [[ 0.17860505]
 [-0.04840656]]
db = -0.08888460336847771
Costs = [array(0.15900538)]


### Predict
Using the previous function's output of the learned `w` and `b`, predict the labels for dataset X.  
Implement predict function:
1. Calculate $\hat{Y} = A = \sigma(w^T X + b)$

2. Convert the entries of a into 0 (if activation <= 0.5) or 1 (if activation > 0.5), stores the predictions in a vector `Y_prediction`.

In [17]:
def predict(w, b, X):
    m = X.shape[1]
    Y_prediction = np.zeros((1, m))
    w = w.reshape(X.shape[0], 1)

    A = sigmoid(np.dot(w.T, X) + b)

    for i in range(A.shape[1]):
        if A[0, i] > 0.5:
            Y_prediction[0,i] = 1
        else:
            Y_prediction[0,i] = 0
    
    return Y_prediction

w = np.array([[0.1124579], [0.23106775]])
b = -0.3
X = np.array([[1., -1.1, -3.2],[1.2, 2., 0.1]])
print ("predictions = " + str(predict(w, b, X)))

predictions = [[1. 1. 0.]]


# 5 Merge all functions into a model

### Model
Implement the model function:
- Y_prediction_test for your predictions on the test set
- Y_prediction_train for your predictions on the train set
- parameters, grads, costs for the outputs of optimize()

In [18]:
def model(X_train, Y_train, X_test, Y_test, num_iterations=2000, learning_rate=0.5, print_cost=False):
    w, b = initialize_with_zeros(X_train.shape[0])
    params, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)
    w = params["w"]
    b = params["b"]

    Y_prediction_test = predict(w, b, X_test)
    Y_prediction_train = predict(w, b, X_train)

    if print_cost:
        print("train accuracy: {}%".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
        print("test accuracy: {}%".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))
    
    d = {
        "costs" : costs, 
        "Y_prediction_test" : Y_prediction_test, 
        "Y_prediction_train" : Y_prediction_train, 
        "w" : w, 
        "b" : b, 
        "learning_rate" : learning_rate, 
        "num_iterations" : num_iterations
    }

    return d
