# Single Hidden Layer Neural Network

**You will learn how to:**
- Implement a 2-class classification neural network with a single hidden layer
- Use units with a non-linear activation function, such as tanh 
- Compute the cross entropy loss 
- Implement forward and backward propagation


## 1 - Packages ##

Let's first import all the packages that you will need during this assignment.
- [numpy](www.numpy.org) is the fundamental package for scientific computing with Python.
- [sklearn](http://scikit-learn.org/stable/) provides simple and efficient tools for data mining and data analysis. 
- [matplotlib](http://matplotlib.org) is a library for plotting graphs in Python.

In [2]:
# Package imports
import numpy as np
import pandas as pd
import random
from sklearn import preprocessing
from sklearn.preprocessing import StandardScaler

import sklearn
from sklearn.datasets import load_breast_cancer
import sklearn.linear_model
%matplotlib inline

np.random.seed(1) # set a seed so that the results are consistent

## 2 - Dataset ##

First, let's get the dataset you will work on. The following code will load a "flower" 2-class dataset into variables `X` and `Y`.

In [1]:
dir(sklearn.datasets)

NameError: name 'sklearn' is not defined

In [2]:
# For help on the dataset of sklearn library
help(sklearn.datasets.load_breast_cancer)

NameError: name 'sklearn' is not defined

In [61]:
from sklearn.datasets import load_breast_cancer
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import LinearSVC
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import random
import math


data = load_breast_cancer()

X = data.data
Y = data.target



#shuffel
c=list(zip(X,Y))
random.shuffle(c)
X,Y=zip(*c)


#features to convert
#Normalize the dataset
X = np.array(X)  
Y = np.array(Y) 
X = X.T
X = (X-np.mean(X, axis=1, keepdims = True))/(np.max(X, axis=1, keepdims = True)-np.min(X, axis = 1, keepdims = True))
print(X.shape)
print(Y.shape)
print(Y)

(30, 569)
(569,)
[0 1 0 1 1 0 1 1 0 1 1 1 1 1 1 0 1 1 0 1 1 1 1 0 1 1 1 1 0 0 1 1 1 1 0 1 1
 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 0 0 0 1 0 1 1 0 1 1 1 1 0 1 1
 1 1 1 0 1 0 0 0 1 1 0 0 1 1 1 1 1 0 0 1 1 1 1 0 1 0 1 0 1 0 1 1 1 1 0 1 0
 0 0 1 1 1 0 1 0 1 0 0 0 1 1 1 1 0 0 1 0 1 0 1 1 1 1 1 0 0 0 1 0 1 0 0 1 1
 1 1 0 0 1 0 1 0 1 0 1 1 0 1 1 0 0 1 1 0 0 1 0 1 1 1 0 1 0 1 1 1 1 1 1 1 1
 1 0 1 0 1 0 0 1 1 1 1 1 1 0 1 1 1 1 1 0 1 0 1 0 0 0 1 0 0 1 1 1 1 1 0 0 1
 1 1 1 0 0 1 1 1 0 0 1 1 1 0 1 1 0 1 1 1 0 1 1 1 1 1 1 1 0 1 1 0 1 0 1 0 1
 0 1 0 1 1 0 0 0 0 1 1 1 1 1 1 1 0 0 1 0 1 1 1 1 1 1 0 1 0 0 1 0 1 1 0 1 0
 1 1 1 1 1 1 1 1 1 0 0 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1
 0 1 0 1 1 1 1 0 1 1 0 0 0 0 0 1 1 0 0 0 1 1 1 1 0 1 1 1 1 0 0 0 0 1 1 0 0
 1 0 1 0 1 0 0 1 0 0 0 0 1 1 1 0 1 0 0 0 1 0 0 1 1 0 1 1 1 0 1 1 1 1 1 0 1
 0 1 0 1 1 1 1 0 0 0 0 0 1 1 0 1 0 1 0 1 1 1 1 1 0 1 1 1 1 1 1 0 0 0 1 0 0
 1 0 1 0 1 1 1 1 0 0 1 1 0 0 1 1 1 0 1 0 0 1 1 1 1 0 1 1 1 0 0 1 0 1 0 0 0
 0 0 0 1

In [62]:
shape_X = X.shape
Y = Y.reshape(1,Y.shape[0])
shape_Y = Y.shape
m = X.shape[1]  # training set size

# Type: numpy array, validate using function type(X), type(Y)
print ('The shape of X(Features) is: ' + str(shape_X))
print ('The shape of Y(Target values) is: ' + str(shape_Y))
print ('Number of training examples:', m)

The shape of X(Features) is: (30, 569)
The shape of Y(Target values) is: (1, 569)
Number of training examples: 569


You have:
    - a numpy-array (matrix) X that contains your 30 features
    - a numpy-array (vector) Y that contains your labels (malignant:0, benign:1).

## 4 - Neural Network model

Logistic regression did not work well on the "flower dataset". You are going to train a Neural Network with a single hidden layer.

**Here is our model**:
<img src="images/classification_kiank.png" style="width:600px;height:300px;">

**Mathematically**:

For one example $x^{(i)}$:
$$z^{[1] (i)} =  W^{[1]} x^{(i)} + b^{[1] (i)}\tag{1}$$ 
$$a^{[1] (i)} = \tanh(z^{[1] (i)})\tag{2}$$
$$z^{[2] (i)} = W^{[2]} a^{[1] (i)} + b^{[2] (i)}\tag{3}$$
$$\hat{y}^{(i)} = a^{[2] (i)} = \sigma(z^{ [2] (i)})\tag{4}$$
$$y^{(i)}_{prediction} = \begin{cases} 1 & \mbox{if } a^{[2](i)} > 0.5 \\ 0 & \mbox{otherwise } \end{cases}\tag{5}$$

Given the predictions on all the examples, you can also compute the cost $J$ as follows: 
$$J = - \frac{1}{m} \sum\limits_{i = 0}^{m} \large\left(\small y^{(i)}\log\left(a^{[2] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[2] (i)}\right)  \large  \right) \small \tag{6}$$

**Reminder**: The general methodology to build a Neural Network is to:
    1. Define the neural network structure ( # of input units,  # of hidden units, etc). 
    2. Initialize the model's parameters
    3. Loop:
        - Implement forward propagation
        - Compute loss
        - Implement backward propagation to get the gradients
        - Update parameters (gradient descent)

You often build helper functions to compute steps 1-3 and then merge them into one function we call `nn_model()`. Once you've built `nn_model()` and learnt the right parameters, you can make predictions on new data.

### 4.1 - Defining the neural network structure ####

**Exercise**: Define three variables:
    - n_x: the size of the input layer
    - n_h: the size of the hidden layer (set this to 4) 
    - n_y: the size of the output layer

**Hint**: Use shapes of X and Y to find n_x and n_y. Also, hard code the hidden layer size to be 4.

In [63]:
def layer_sizes(X, Y, h_layers):
    """
    Arguments:
    X -- input dataset of shape (input size, number of examples)
    Y -- labels of shape (output size, number of examples)
    h_layers --number of hidden layers
    """ 
    n_x = X.shape[0] # size of input layer
    n_h1 = 10
    n_h2 = 10
    n_y = Y.shape[0] # size of output layer
    ### END CODE HERE ###
    return list([n_x, n_h1, n_h2, n_y])

layer_dims = layer_sizes(X, Y , 2)
print(layer_dims)

[30, 10, 10, 1]


### 4.2 - Initialize the model's parameters ####

**Exercise**: Implement the function `initialize_parameters()`.

**Instructions**:
- You will initialize the weights matrices with random values. 
    - Use: `np.random.randn(a,b) * 0.01` to randomly initialize a matrix of shape (a,b).
- You will initialize the bias vectors as zeros. 
    - Use: `np.zeros((a,b))` to initialize a matrix of shape (a,b) with zeros.

In [64]:
#Initialize parameters of 3 layer Neural network with 2 hidden layers
def initialize_parameters(layer_dims):
    """
    Input : layer_dims -- python array (list) containing the dimensions of each layer in our network
    Output: python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
    """
    np.random.seed(3)
    parameters = {}
    L = len(layer_dims)            # number of layers in the network

    for l in range(1, L):
        parameters['W' + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1]) * 0.01
        parameters['b' + str(l)] = np.zeros((layer_dims[l], 1))
        
    return parameters

In [65]:
parameters = initialize_parameters(layer_dims)
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))
print("W3 = " + str(parameters["W3"]))
print("b3 = " + str(parameters["b3"]))

W1 = [[ 1.78862847e-02  4.36509851e-03  9.64974681e-04 -1.86349270e-02
  -2.77388203e-03 -3.54758979e-03 -8.27414815e-04 -6.27000677e-03
  -4.38181690e-04 -4.77218030e-03 -1.31386475e-02  8.84622380e-03
   8.81318042e-03  1.70957306e-02  5.00336422e-04 -4.04677415e-03
  -5.45359948e-03 -1.54647732e-02  9.82367434e-03 -1.10106763e-02
  -1.18504653e-02 -2.05649899e-03  1.48614836e-02  2.36716267e-03
  -1.02378514e-02 -7.12993200e-03  6.25244966e-03 -1.60513363e-03
  -7.68836350e-03 -2.30030722e-03]
 [ 7.45056266e-03  1.97611078e-02 -1.24412333e-02 -6.26416911e-03
  -8.03766095e-03 -2.41908317e-02 -9.23792022e-03 -1.02387576e-02
   1.12397796e-02 -1.31914233e-03 -1.62328545e-02  6.46675452e-03
  -3.56270759e-03 -1.74314104e-02 -5.96649642e-03 -5.88594380e-03
  -8.73882298e-03  2.97138154e-04 -2.24825777e-02 -2.67761865e-03
   1.01318344e-02  8.52797841e-03  1.10818750e-02  1.11939066e-02
   1.48754313e-02 -1.11830068e-02  8.45833407e-03 -1.86088953e-02
  -6.02885104e-03 -1.91447204e-02]
 

### 4.3 - Forward and Backward Propagation ####

**Question**: Implement `forward_propagation()`.

**Instructions**:
- Look above at the mathematical representation of your classifier.
- You can use the function `np.tanh()`. It is part of the numpy library.
- The steps you have to implement are:
    1. Retrieve each parameter from the dictionary "parameters" (which is the output of `initialize_parameters()`) by using `parameters[".."]`.
    2. Implement Forward Propagation. Compute $Z^{[1]}, A^{[1]}, Z^{[2]}$ and $A^{[2]}$ (the vector of all your predictions on all the examples in the training set).
- Values needed in the backpropagation are stored in "`cache`". The `cache` will be given as an input to the backpropagation function.

In [66]:
# GRADED FUNCTION: forward_propagation
def sigmoid(x):
    """
    Compute the sigmoid of x

    Arguments:
    x -- A scalar or numpy array of any size

    Return:
    s -- sigmoid(x)
    """
    
    ### START CODE HERE ### (≈ 1 line of code)
    s = 1.0/(1+np.exp(-x))
    ### END CODE HERE ###
    
    return s
    
def forward_propagation(X, parameters):
    """
    Argument:
    X -- input data of size (n_x, m)
    parameters -- python dictionary containing your parameters (output of initialization function)
    
    Returns:
    A2 -- The sigmoid output of the second activation
    cache -- a dictionary containing "Z1", "A1", "Z2" and "A2"
    
    """
    
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]
    W3 = parameters["W3"]
    b3 = parameters["b3"]
    
    Z1 = np.dot(W1,X) + b1
    A1 = np.tanh(Z1)
    Z2 = np.dot(W2,A1) + b2
    A2 = np.tanh(Z2)
    Z3 = np.dot(W3,A2) + b3
    A3 = sigmoid(Z3)

    cache = {"Z1": Z1,
             "A1": A1,
             "Z2": Z2,
             "A2": A2,
             "Z3": Z3,
             "A3": A3}
    
    return A3, cache


In [67]:
A3, cache = forward_propagation(X, parameters)

for l in range(1,4):
    print(parameters['W' + str(l)].shape)
    print("Z =",np.mean(cache['Z' + str(l)]))
    print("A =",np.mean(cache['A' + str(l)]))
# Note: we use the mean here just to make sure that your output matches ours. 
print(np.mean(cache['Z1']) ,np.mean(cache['A1']),np.mean(cache['Z3']),np.mean(cache['A3']))

(10, 30)
Z = 8.036434240201869e-19
A = -7.686584949595291e-08
(10, 10)
Z = -1.938372647619744e-09
A = -1.9417893591520342e-09
(1, 10)
Z = 1.085113515890222e-10
A = 0.5000000000271277
8.036434240201869e-19 -7.686584949595291e-08 1.085113515890222e-10 0.5000000000271277


##### Now that you have computed $A^{[2]}$ (in the Python variable "`A2`"), which contains $a^{[2](i)}$ for every example, you can compute the cost function as follows:

$$J = - \frac{1}{m} \sum\limits_{i = 0}^{m} \large{(} \small y^{(i)}\log\left(a^{[2] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[2] (i)}\right) \large{)} \small\tag{13}$$

**Exercise**: Implement `compute_cost()` to compute the value of the cost $J$.

**Instructions**:
- There are many ways to implement the cross-entropy loss. To help you, we give you how we would have implemented
$- \sum\limits_{i=0}^{m}  y^{(i)}\log(a^{[2](i)})$:
```python
logprobs = np.multiply(np.log(A2),Y)
cost = - np.sum(logprobs)                # no need to use a for loop!
```

(you can use either `np.multiply()` and then `np.sum()` or directly `np.dot()`).


In [79]:
# GRADED FUNCTION: compute_cost

#Compute Cost
def compute_cost(A_final, Y, parameters):
    """
    Computes the cost 
    
    Arguments:
    A2 -- The sigmoid output of the final activation
    Y -- "true" labels vector
    parameters -- python dictionary containing your parameters W1, b1, W2 , b2 , W3 , b3
    
    Returns:
    cost 
    """
    #print("A_final",(A_final))
    m = Y.shape[1] # number of example
    # Compute the cost
    
    logprobs = np.multiply(np.log(A_final), Y) + np.multiply((1 - Y), np.log(1 - A_final))
    cost = -np.sum(logprobs) / m
    
    cost = np.squeeze(cost)     # makes sure cost is the dimension we expect. 
                                # E.g., turns [[17]] into 17 
    return cost

In [80]:
print("cost = " + str(compute_cost(A3, Y, parameters)))

cost = 0.6931466649524404


Using the cache computed during forward propagation, you can now implement backward propagation.

**Question**: Implement the function `backward_propagation()`.

**Instructions**:
Backpropagation is usually the hardest (most mathematical) part in deep learning. To help you, here again is the slide from the lecture on backpropagation. You'll want to use the six equations on the right of this slide, since you are building a vectorized implementation.  

<img src="images/grad_summary.png" style="width:600px;height:300px;">

<!--
$\frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } = \frac{1}{m} (a^{[2](i)} - y^{(i)})$

$\frac{\partial \mathcal{J} }{ \partial W_2 } = \frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } a^{[1] (i) T} $

$\frac{\partial \mathcal{J} }{ \partial b_2 } = \sum_i{\frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)}}}$

$\frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)} } =  W_2^T \frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } * ( 1 - a^{[1] (i) 2}) $

$\frac{\partial \mathcal{J} }{ \partial W_1 } = \frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)} }  X^T $

$\frac{\partial \mathcal{J} _i }{ \partial b_1 } = \sum_i{\frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)}}}$

- Note that $*$ denotes elementwise multiplication.
- The notation you will use is common in deep learning coding:
    - dW1 = $\frac{\partial \mathcal{J} }{ \partial W_1 }$
    - db1 = $\frac{\partial \mathcal{J} }{ \partial b_1 }$
    - dW2 = $\frac{\partial \mathcal{J} }{ \partial W_2 }$
    - db2 = $\frac{\partial \mathcal{J} }{ \partial b_2 }$
    
!-->

- Tips:
    - To compute dZ1 you'll need to compute $g^{[1]'}(Z^{[1]})$. Since $g^{[1]}(.)$ is the tanh activation function, if $a = g^{[1]}(z)$ then $g^{[1]'}(z) = 1-a^2$. So you can compute 
    $g^{[1]'}(Z^{[1]})$ using `(1 - np.power(A1, 2))`.

In [81]:
# GRADED FUNCTION: backward_propagation

def backward_propagation(parameters, cache, X, Y):
    """
    Implement the backward propagation using the instructions above.
    
    Arguments:
    parameters -- python dictionary containing our parameters 
    cache -- a dictionary containing "Z1", "A1", "Z2" , "A2" , Z3 , A3.
    X -- input data of shape (2, number of examples)
    Y -- "true" labels vector of shape (1, number of examples)
    
    Returns:
    grads -- python dictionary containing your gradients with respect to different parameters
    """
    m = X.shape[1]
    
    # First, retrieve W1 , W2 and W3 from the dictionary "parameters".
    W1 = parameters["W1"]
    W2 = parameters["W2"]
    W3 = parameters["W3"]
    ### END CODE HERE ###
        
    # Retrieve also A1 , A2 and A3 from dictionary "cache".
    A1 = cache["A1"]
    A2 = cache["A2"]
    A3 = cache["A3"]
    ### END CODE HERE ###
    
    # Backward propagation: calculate dW1, db1, dW2, db2 , dW3, db3. 
    dZ3 = A3-Y
    dW3 = 1/m*(np.dot(dZ3,A2.T))
    db3 = 1/m*(np.sum(dZ3,axis=1, keepdims=True))
    
    dZ2 = np.multiply(np.dot(W3.T,dZ3),(1-np.power(A2,2)))
    dW2 = 1/m*(np.dot(dZ2,A1.T))
    db2 = 1/m*(np.sum(dZ2,axis=1, keepdims=True))
    
    dZ1 = np.multiply(np.dot(W2.T,dZ2),(1-np.power(A1,2)))
    dW1 = 1/m*(np.dot(dZ1,X.T))
    db1 = 1/m*(np.sum(dZ1,axis=1, keepdims=True))
    ### END CODE HERE ###
    
    grads = {"dW1": dW1,
             "db1": db1,
             "dW2": dW2,
             "db2": db2,
             "dW3": dW3,
             "db3": db3}
    
    return grads

In [82]:
grads = backward_propagation(parameters, cache, X, Y)
print ("dW1 = "+ str(grads["dW1"]))
print ("db1 = "+ str(grads["db1"]))
print ("dW2 = "+ str(grads["dW2"]))
print ("db2 = "+ str(grads["db2"]))
print ("dW3 = "+ str(grads["dW3"]))
print ("db3 = "+ str(grads["db3"]))

dW1 = [[ 9.82820147e-02  4.87462091e-02  1.00655106e-01  8.54266423e-02
   3.67451935e-02  7.79988933e-02  1.04989108e-01  1.20894764e-01
   3.69349908e-02 -1.54194450e-03  4.59709736e-02 -8.15168709e-04
   4.27660213e-02  3.75999648e-02 -5.52117478e-03  3.18088882e-02
   1.56129372e-02  3.84977713e-02 -6.12295995e-04  5.75454542e-03
   1.07761615e-01  6.04147843e-02  1.05758255e-01  8.28862971e-02
   5.12918682e-02  7.28169925e-02  8.87181002e-02  1.44688135e-01
   4.09775920e-02  3.09677455e-02]
 [ 3.04450919e-01  1.51030303e-01  3.11798693e-01  2.64623839e-01
   1.13805417e-01  2.41584100e-01  3.25201040e-01  3.74478602e-01
   1.14389527e-01 -4.80705934e-03  1.42386020e-01 -2.53154563e-03
   1.32454365e-01  1.16455784e-01 -1.71177699e-02  9.85085788e-02
   4.83588270e-02  1.19247010e-01 -1.94057103e-03  1.78181994e-02
   3.33822127e-01  1.87184757e-01  3.27611172e-01  2.56760920e-01
   1.58885703e-01  2.25551000e-01  2.74819263e-01  4.48200157e-01
   1.26922168e-01  9.59217085e-02]


In [83]:
# GRADED FUNCTION: update_parameters

def update_parameters(parameters, grads, learning_rate = 0.2):
    """
    Updates parameters using the gradient descent update rule given above
    
    Arguments:
    parameters -- python dictionary containing your parameters 
    grads -- python dictionary containing your gradients 
    
    Returns:
    parameters -- python dictionary containing your updated parameters 
    """
    # Retrieve each parameter from the dictionary "parameters"
    ### START CODE HERE ### (≈ 4 lines of code)
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]
    W3 = parameters["W3"]
    b3 = parameters["b3"]
    ### END CODE HERE ###
    
    # Retrieve each gradient from the dictionary "grads"
    dW1 = grads["dW1"]
    db1 = grads["db1"]
    dW2 = grads["dW2"]
    db2 = grads["db2"]
    dW3 = grads["dW3"]
    db3 = grads["db3"]
    ## END CODE HERE ###
    
    # Update rule for each parameter
    W1 = W1 - learning_rate*dW1
    b1 = b1 - learning_rate*db1
    W2 = W2 - learning_rate*dW2
    b2 = b2 - learning_rate*db2
    W3 = W3 - learning_rate*dW3
    b3 = b3 - learning_rate*db3
    ### END CODE HERE ###
    
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2,
                  "W3": W3,
                  "b3": b3}
    
    return parameters

In [84]:
parameters = update_parameters(parameters, grads)

print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))
print("W3 = " + str(parameters["W3"]))
print("b3 = " + str(parameters["b3"]))

W1 = [[-9.04381998e-02 -1.08050178e-01 -1.03529472e-01 -1.20819516e-01
  -6.05638597e-02  2.86428895e-02 -1.08935263e-01 -1.47575698e-01
  -3.01125687e-02  6.72119491e-02 -1.75368661e-01  1.33830614e-02
  -1.12705436e-01 -8.21464582e-02 -1.82999876e-02  9.86207488e-02
   9.64811182e-03 -4.06977480e-02  4.73081037e-02  6.18392664e-02
  -1.93237295e-01 -1.66899654e-01 -1.42887822e-01 -1.39658765e-01
  -1.48393377e-01 -1.87957793e-02 -1.01172932e-01 -1.64344182e-01
  -1.18704525e-01 -1.68001824e-02]
 [-3.15992329e-01 -3.20049054e-01 -3.23672539e-01 -3.12262450e-01
  -1.74072829e-01  8.53783105e-02 -3.30328689e-01 -4.24887819e-01
  -8.03672001e-02  2.27063040e-01 -4.94796022e-01  4.11732362e-02
  -3.59564124e-01 -3.12379854e-01 -6.92067396e-02  3.14814639e-01
   3.95668260e-02 -5.87145458e-02  8.55995591e-02  2.24889109e-01
  -5.38842851e-01 -4.97606284e-01 -4.65232987e-01 -4.19090789e-01
  -4.16826103e-01 -4.19979908e-02 -3.17691131e-01 -5.03120035e-01
  -3.61092165e-01 -6.18874142e-02]
 

### 4.4 - Integrate parts 4.1, 4.2 and 4.3 in nn_model() ####

**Question**: Build your neural network model in `nn_model()`.

**Instructions**: The neural network model has to use the previous functions in the right order.

In [85]:
# GRADED FUNCTION: nn_model

def nn_model(X, Y, num_iterations = 10000, print_cost=False):
    """
    Arguments:
    X -- dataset of shape (2, number of examples)
    Y -- labels of shape (1, number of examples)
    num_iterations -- Number of iterations in gradient descent loop
    print_cost -- if True, print the cost every 1000 iterations
    
    Returns:
    parameters -- parameters learnt by the model. They can then be used to predict.
    """
    
    np.random.seed(3)
    layer_dims = layer_sizes(X, Y , 2)
    
    # Initialize parameters, then retrieve W1, b1, W2, b2. Inputs: "n_x, n_h, n_y". Outputs = "W1, b1, W2, b2, parameters".
    ### START CODE HERE ### (≈ 5 lines of code)
    parameters = initialize_parameters(layer_dims)
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]
    W3 = parameters["W3"]
    b3 = parameters["b3"]
    ### END CODE HERE ###
    
    
    # Loop (gradient descent)

    for i in range(0, num_iterations+1):
        
        # Forward propagation. Inputs: "X, parameters". Outputs: "A3, cache".
        A3, cache = forward_propagation(X,parameters)
        
        # Cost function. Inputs: "A3, Y, parameters". Outputs: "cost".
        cost = compute_cost(A3,Y,parameters)
 
        # Backpropagation. Inputs: "parameters, cache, X, Y". Outputs: "grads".
        grads = backward_propagation(parameters,cache,X,Y)
 
        # Gradient descent parameter update. Inputs: "parameters, grads". Outputs: "parameters".
        parameters = update_parameters(parameters,grads)
        
        ### END CODE HERE ###
        
        # Print the cost every 100 iterations
        if print_cost and i % 100 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))
    return parameters

In [86]:
"""
parameters = nn_model(X, Y, num_iterations=500, print_cost=True)
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))
print("W3 = " + str(parameters["W3"]))
print("b3 = " + str(parameters["b3"]))
"""

'\nparameters = nn_model(X, Y, num_iterations=500, print_cost=True)\nprint("W1 = " + str(parameters["W1"]))\nprint("b1 = " + str(parameters["b1"]))\nprint("W2 = " + str(parameters["W2"]))\nprint("b2 = " + str(parameters["b2"]))\nprint("W3 = " + str(parameters["W3"]))\nprint("b3 = " + str(parameters["b3"]))\n'

### 4.5 Predictions

**Question**: Use your model to predict by building predict().
Use forward propagation to predict results.

**Reminder**: predictions = $y_{prediction} = \mathbb 1 \text{{activation > 0.5}} = \begin{cases}
      1 & \text{if}\ activation > 0.5 \\
      0 & \text{otherwise}
    \end{cases}$  
    
As an example, if you would like to set the entries of a matrix X to 0 and 1 based on a threshold you would do: ```X_new = (X > threshold)```

In [87]:
# GRADED FUNCTION: predict

def predict(parameters, X):
    """
    Using the learned parameters, predicts a class for each example in X
    
    Arguments:
    parameters -- python dictionary containing your parameters 
    X -- input data of size (n_x, m)
    
    Returns
    predictions -- vector of predictions of our model (red: 0 / blue: 1)
    """
    
    # Computes probabilities using forward propagation, and classifies to 0/1 using 0.5 as the threshold.
    ### START CODE HERE ### (≈ 2 lines of code)
    A3, cache = forward_propagation(X,parameters)
    predictions = 1*(A3>0.5)
    ### END CODE HERE ###
    
    return predictions

In [88]:
# Build a model with a n_h-dimensional hidden layer
parameters = nn_model(X, Y, num_iterations = 1400, print_cost=True)
print("Training over")

Cost after iteration 0: 0.693147
Cost after iteration 100: 0.660315
Cost after iteration 200: 0.660308
Cost after iteration 300: 0.660301
Cost after iteration 400: 0.660285
Cost after iteration 500: 0.660242
Cost after iteration 600: 0.660086
Cost after iteration 700: 0.658899
Cost after iteration 800: 0.532958
Cost after iteration 900: 0.100731
Cost after iteration 1000: 0.075886
Cost after iteration 1100: 0.068492
Cost after iteration 1200: 0.064247
Cost after iteration 1300: 0.060989
Cost after iteration 1400: 0.058233
Training over


In [89]:
# Print accuracy
predictions = predict(parameters, X)
#print(predictions)
print( X.shape, predictions.shape)
print ('Accuracy: %d' % float((np.dot(Y,predictions.T) + np.dot(1-Y,1-predictions.T))/float(Y.size)*100) + '%')

(30, 569) (1, 569)
Accuracy: 98%
