### A deep (L-Layer) Neural Network from scratch
**We will :**
- Build  the general architecture of an L layer Neural Network  learning algorithm, including:
    - Initializing parameters
    - Calculating the cost function and its gradient
    - Using an optimization algorithm (gradient descent) 
- Gather all three functions above into a main model function, in the right order.
<img src=./data/deep_neural_network.png><img>

## 1 - Packages ##

import all the packages that you will need during this assignment. 
- [numpy](www.numpy.org) is the fundamental package for scientific computing with Python.
- [matplotlib](http://matplotlib.org) is a famous library to plot graphs in Python.
* [scikit-learn](http://scikit-learn.org/stable/) a library with Simple and efficient tools for data mining and data analysis

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

%matplotlib inline
np.random.seed(42)

## 2 - Dataset ##

we will use the make_classification data from sklearn

Loading the data by with the  following code.

In [2]:
X,Y=datasets.make_classification(n_samples=100000, n_features=100,
                                    n_informative=100,n_classes=2, n_redundant=0,
                                    random_state=42)

## - Data-split  ##

we will split the data with the following distribution 
- 99% -training set
- 1% -test set

we will use the sklearn train_test_split

In [3]:
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.05,
                                                    random_state=42)

For convenience, we reshape the data into  a numpy-array of shape (1, m). After this, our training (and test) dataset is a numpy-array where each column represents one training example. There should be m_train (respectively m_test) columns.

In [4]:
# we need to reshape our data to column vectors 
X_train=X_train.reshape(X_train.shape[0],-1).T
X_test=X_test.reshape(X_test.shape[0],-1).T
y_train=y_train.reshape(y_train.shape[0],-1).T
y_test=y_test.reshape(y_test.shape[0],-1).T

## 3 - Building the parts of our algorithm ## 

The main steps for building a Neural Network are:
1. Define the model structure (such as number of input features,number of layers) 
2. Initialize the model's parameters
3. Loop:
    - Calculate current loss (forward propagation)
    - Calculate current gradient (backward propagation)
    - Update parameters (gradient descent)

You often build 1-3 separately and integrate them into one function we call `model()`.

### We will now initialize the number of layers with the specific units in each layer
- We will store all these in a list called layer_dims
- The first layer has n_x dimensions and the output layer has one unit for binary classification

In [5]:
n_x,n_y=X_train.shape[0],y_train.shape[0]
layer_dims=[n_x,40,20,n_y]

- We can get the number of layers in the network from layer_dims

In [6]:
len(layer_dims)

4

### 1.1 We now initialize parameters for the network based on layer_dims values
- We provide a variety of ways to initialize the parameters to see the effects of different initialization techniques

In [7]:
def initialize_layers(layer_dims,initializer="random"):
    """
    This function initializes the parameters for different layers
    Arguments:
        layer_dims-> a list of layer dimensions 
        initializer -> type of initilization
    
    """
    #number of layers
    L=len(layer_dims)
    # variable for parameters
    parameters={}
    #create parameters
    if initializer=="random":
        for l in range(1,L):
            parameters["W"+ str(l)]=np.random.randn(layer_dims[l],layer_dims[l-1])*0.01
            parameters["b"+str(l)]=np.zeros((layer_dims[l],1),dtype=float)
        return parameters
    elif initializer=="zeros":
        for l in range(1,L):
            parameters["W"+ str(l)]=np.zeros((layer_dims[l],layer_dims[l-1]),dtype=float)
            parameters["b"+str(l)]=np.zeros((layer_dims[l],1),dtype=float)
        return parameters
    elif initializer=="xavier":
        for l in range(1,L):
            parameters["W"+ str(l)]=np.random.randn(layer_dims[l],layer_dims[l-1])*0.01
            parameters["b"+str(l)]=np.zeros((layer_dims[l],1),dtype=float)
        return parameters
    elif initializer=="He":
        for l in range(1,L):
            parameters["W"+ str(l)]=np.random.randn(layer_dims[l],layer_dims[l-1])*0.01
            parameters["b"+str(l)]=np.zeros((layer_dims[l],1),dtype=float)
        return parameters
    

### We now define the various activations 

####  <center>  Relu function </center> 
#    <center>               $\max(0,Z)$</center> 


In [8]:
def relu(Z):
    """
    This function computes the relu activation 
    
    Arguments:
        Z-> Weighteds inputs (Z=W.TA+b)
    Returns:
        A->relu activations of Z
    """
    A=np.maximum(0,Z)
    
    return A


####  <center>  Tanh function </center> 
#    <center>               $\frac{\mathrm{e}^{z}-\mathrm{e}^{-z}}{\mathrm{e}^{z}+\mathrm{e}^{-z}}$</center> 

In [9]:
def tanh(Z):
    """
    This function computes the tanh activation 
    
    Arguments:
        Z-> Weighted inputs (Z=W.TA+b)
    Returns:
        A->tanh activations of Z
    """
    
    A=(np.exp(Z)-np.exp(-Z))/(np.exp(Z)+np.exp(-Z))
    
    return A
    

####  <center>  Sigmoid function </center> 
#    <center>               $\frac{1}{1+\mathrm{e}^{-z}}$</center> 

In [10]:
def sigmoid(Z):
    """
    This function computes the sigmoid activation 
    
    Arguments:
        Z-> Weighted inputs (Z=W.TA+b)
    Returns:
        A->sigmoid activations of Z
    """
    A= 1/(1+np.exp(-Z))
    
    return A
    

### Linear forward computation
- we use the following function to compute the linear function 
- We compute $A^{l} = \sigma(W^{[l]^{T}} A^{[l-1]} + b^{[l]}) = (a^{(0)}, a^{(1)}, ..., a^{(n_l-1)}, a^{(n_l)})$

In [11]:
def linear_forward_computation(W,X,b, activation="relu"):
    """
    
    """
    
    Z= np.dot(W,X)+b
    
    if activation=="relu":
        A=relu(Z)
    elif activation=="tanh":
        A=tanh(Z)
    elif activation == "sigmoid":
        A=sigmoid(Z)
    return A,Z

### forward propagation

**forward propagation:** Implementing forward propagation 

** for layer hidden layers **
- where l is layer number,L the total number of layers and n_l is the number of units in layer l
- We get $A^{[l-1]}$ where $A^{0} = X$
- We compute $A^{l} = \sigma(W^{[l]^{T}} A^{[l-1]} + b^{[l]}) = (a^{(0)}, a^{(1)}, ..., a^{(n_l-1)}, a^{(n_l)})$
- where  $\sigma$ is the  activation function 

** for output layer **
- We get layer [L-1] activations
- we compute $A^{[L]}=\sigma(W^{[L]^{T}} A^{[L-1]} + b^{[L]})=(a^{(0)}, a^{(1)}, ..., a^{(n_L-1)}, a^{(n_L)})$

In [62]:
def forward_propagate_deep(parameters, X):
    """
    This function computes the forward propagation for the network
    
    """
    L=len(parameters)//2 
    cache={}
    A=X
    for i in range(1,L):
        A_Prev=A
        A,Z=linear_forward_computation(parameters["W"+str(i)],A_Prev,parameters["b"+str(i)],activation="relu")
        cache["Z"+str(i)]=Z
        cache["A"+str(i)]=A
    AL,ZL=linear_forward_computation(parameters["W"+str(L)],A,parameters["b"+str(L)],activation="sigmoid")

    return AL, cache
        
        
        
        
        

### Cost function

In [63]:
def compute_cost(AL,Y):
    """
    This function computes the cost of the Neural Network    
    
    Arguments:
        A-> Activations from the forward propagation
        Y-> The correct labels 
    Returns:
    
    """
    m=Y.shape[1]
    cost=-(1/m)*np.sum(Y*np.log(AL)+((1-Y)*np.log(1-AL)))
    
    return cost


### compute linear backpropagation 

In [64]:
def linear_backward_computation(dZ,A_prev):
    """
    """
    m=A_prev.shape[1]
    dw=(1/m)*np.dot(dZ,A_prev.T)
    db=(1/m)*np.sum(dZ,axis=1,keepdims=True)
    
    return dw,db

### Compute the derivatives $dz^{[l]}$

### Back propagation 

In [77]:
def back_propagate(cache,AL,parameters,X,Y,activation="relu"):
    """
    
    """
    L=len(parameters)//2
    dZ={}
    dZ["dZ"+str(L)]=AL-Y
    grads={}
    cache["A0"]=X
    grads["dW"+ str(L)],grads["db"+ str(L)]=linear_backward_computation(dZ["dZ"+str(L)],cache["A"+str(L-1)])
    
    
    
    for l in reversed(range(1,L)):
        dZ["dZ"+str(l)]=np.dot(parameters["W"+str(l+1)].T,dZ["dZ"+str(l+1)])*relu_derivative(cache["Z"+str(l)])
        grads["dW"+ str(l)],grads["db"+ str(l)]=linear_backward_computation(dZ["dZ"+str(l)],cache["A"+str(l-1)])
        
    return grads,dZ
        
        
        
    
    
    
        
        
        
        
    
    
    
    

In [78]:
def relu_derivative(A):
    """
    This function computes the derivative of the relu function Max(0,Z)->drelu=Max(0,1)
    drelu->returns 0 for all values below and including o and 1 for all other values
    Arguments:
        A->Activations
    Returns:
        Ad->relu derivative of A
    """
    
    Ad=np.choose(A>0,[0,1])
    return Ad

In [222]:
grads

{'dW1': array([[  2.59007642e-05,  -3.80679885e-05,   3.91137738e-05, ...,
           7.34216197e-05,   8.34544187e-06,  -2.65041076e-05],
        [ -9.49310833e-06,   9.08562824e-06,  -1.01542335e-05, ...,
          -2.16753070e-05,   9.39515518e-07,   9.47135686e-06],
        [ -1.33647268e-05,   1.47785482e-05,  -1.67115183e-05, ...,
          -3.23013724e-05,   1.66749570e-07,   1.24955396e-05],
        ..., 
        [ -6.95117948e-05,   4.69047931e-05,  -5.80958712e-05, ...,
          -1.12000657e-04,  -1.41563727e-06,   4.85716624e-05],
        [  1.15745444e-05,  -9.47149631e-06,   8.61157351e-06, ...,
           2.29045865e-05,  -8.70146556e-07,  -1.19665228e-05],
        [ -4.33372262e-05,   4.47791036e-05,  -4.74205996e-05, ...,
          -9.04120672e-05,   8.41265647e-06,   4.63844842e-05]]),
 'dW2': array([[  3.97589755e-05,  -2.20015702e-05,   2.15246778e-05,
          -8.60182088e-05,  -3.06514902e-05,  -3.82221806e-05,
          -2.24716927e-05,  -4.23484127e-05,   6.920

In [16]:
def update_parameters(grads,parameters,learning_rate):
    """
    """
    for i in range(len(parameters)//2 ):
        parameters["W"+str(i+1)]=parameters["W"+str(i+1)]-learning_rate*grads["dW"+str(i+1)]
        parameters["b"+str(i+1)]=parameters["b"+str(i+1)]-learning_rate*grads["db"+str(i+1)]
    return parameters

In [248]:
parameters=update_parameters(grads,parameters,0.1)

In [69]:
def model(X,Y,num_iterations,learning_rate,layer_dims):
    parameters=initialize_layers(layer_dims)
    for i in range(num_iterations):
        AL,cache=forward_propagate_deep(parameters=parameters,X=X)
        cost= compute_cost(AL,Y)
        grads,dZ=back_propagate(cache,AL,parameters,X,Y,activation="relu")
        parameters=update_parameters(grads,parameters,learning_rate)
        print(cost)
    return parameters

In [73]:
layer_dims=[n_x,100,n_y]
params=model(X_train,y_train,20,0.1,layer_dims)

0.693145186197
0.689445918222
0.685674950692
0.681569386413
0.676872667615
0.671327002806
0.664662644045
0.656609715237
0.646901484018
0.635308983216
0.621674243042
0.605957727388
0.588274774208
0.568925098271
0.548381407076
0.527244108227
0.506142841801
0.485644972663
0.466193233573
0.448073507458


In [74]:
AL,ZL=forward_propagate_deep(params,X_train)

In [75]:
predictions = np.choose(AL<0.5,[1,0])

In [76]:
 print("train accuracy on iteration {} is {} %".format(10,(100 - np.mean(np.abs(predictions - y_train)) * 100)))

train accuracy on iteration 10 is 85.18 %


In [345]:
params.keys()

dict_keys(['b3', 'W3', 'b2', 'W1', 'b1', 'W2', 'W4', 'b4'])