### A deep (L-Layer) Neural Network from scratch
**We will :**
- Build  the general architecture of an L layer Neural Network  learning algorithm, including:
    - Initializing parameters
    - Calculating the cost function and its gradient
    - Using an optimization algorithm (gradient descent) 
- Gather all three functions above into a main model function, in the right order.
<img src=./data/deep_neural_network.png><img>

## 1 - Packages ##

import all the packages that you will need during this assignment. 
- [numpy](www.numpy.org) is the fundamental package for scientific computing with Python.
- [matplotlib](http://matplotlib.org) is a famous library to plot graphs in Python.
* [scikit-learn](http://scikit-learn.org/stable/) a library with Simple and efficient tools for data mining and data analysis

In [2]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

%matplotlib inline
np.random.seed(42)

## 2 - Dataset ##

we will use the make_classification data from sklearn

Loading the data by with the  following code.

In [3]:
X,Y=datasets.make_classification(n_samples=100000, n_features=100,
                                    n_informative=100,n_classes=2, n_redundant=0,
                                    random_state=42)

## - Data-split  ##

we will split the data with the following distribution 
- 99% -training set
- 1% -test set

we will use the sklearn train_test_split

In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.01,
                                                    random_state=42)

For convenience, we reshape the data into  a numpy-array of shape (1, m). After this, our training (and test) dataset is a numpy-array where each column represents one training example. There should be m_train (respectively m_test) columns.

In [5]:
# we need to reshape our data to column vectors 
X_train=X_train.reshape(X_train.shape[0],-1).T
X_test=X_test.reshape(X_test.shape[0],-1).T
y_train=y_train.reshape(y_train.shape[0],-1).T
y_test=y_test.reshape(y_test.shape[0],-1).T

## 3 - Building the parts of our algorithm ## 

The main steps for building a Neural Network are:
1. Define the model structure (such as number of input features,number of layers) 
2. Initialize the model's parameters
3. Loop:
    - Calculate current loss (forward propagation)
    - Calculate current gradient (backward propagation)
    - Update parameters (gradient descent)

You often build 1-3 separately and integrate them into one function we call `model()`.

### We will now initialize the number of layers with the specific units in each layer
- We will store all these in a list called layer_dims
- The first layer has n_x dimensions and the output layer has one unit for binary classification

In [6]:
n_x,n_y=X_train.shape[0],y_train.shape[0]
layer_dims=[n_x,30,20,40,n_y]

- We can get the number of layers in the network from layer_dims

In [7]:
len(layer_dims)

5

### 1.1 We now initialize parameters for the network based on layer_dims values
- We provide a variety of ways to initialize the parameters to see the effects of different initialization techniques

In [8]:
def initialize_layers(layer_dims,initializer="random"):
    """
    This function initializes the parameters for different layers
    Arguments:
        layer_dims-> a list of layer dimensions 
        initializer -> type of initilization
    
    """
    #number of layers
    L=len(layer_dims)
    # variable for parameters
    parameters={}
    #create parameters
    if initializer=="random":
        for l in range(1,L):
            parameters["W"+ str(l)]=np.random.randn(layer_dims[l],layer_dims[l-1])*0.01
            parameters["b"+str(l)]=np.zeros((layer_dims[l],1),dtype=float)
        return parameters
    elif initializer=="zeros":
        for l in range(L-1):
            parameters["W"+ str(l)]=np.zeros((layer_dims[l],layer_dims[l-1]),dtype=zeros)
            parameters["b"+str(l)]=np.zeros((layer_dims[l],1),dtype=float)
        return parameters
    elif initializer=="xavier":
        for l in range(L-1):
            parameters["W"+ str(l)]=np.random.randn(layer_dims[l],layer_dims[l-1])*0.01
            parameters["b"+str(l)]=np.zeros((layer_dims[l],1),dtype=float)
        return parameters
    elif initializer=="He":
        for l in range(L-1):
            parameters["W"+ str(l)]=np.random.randn(layer_dims[l],layer_dims[l-1])*0.01
            parameters["b"+str(l)]=np.zeros((layer_dims[l],1),dtype=float)
        return parameters
    

### We now define the various activations 

####  <center>  Relu function </center> 
#    <center>               $\max(0,Z)$</center> 


In [9]:
def relu(Z):
    """
    This function computes the relu activation 
    
    Arguments:
        Z-> Weighteds inputs (Z=W.TA+b)
    Returns:
        A->relu activations of Z
    """
    A=np.maximum(0,Z)
    
    return A


####  <center>  Tanh function </center> 
#    <center>               $\frac{\mathrm{e}^{z}-\mathrm{e}^{-z}}{\mathrm{e}^{z}+\mathrm{e}^{-z}}$</center> 

In [10]:
def tanh(Z):
    """
    This function computes the tanh activation 
    
    Arguments:
        Z-> Weighted inputs (Z=W.TA+b)
    Returns:
        A->tanh activations of Z
    """
    
    A=(np.exp(Z)-np.exp(-Z))/(np.exp(Z)+np.exp(-Z))
    
    return A
    

####  <center>  Sigmoid function </center> 
#    <center>               $\frac{1}{1+\mathrm{e}^{-z}}$</center> 

In [11]:
def sigmoid(Z):
    """
    This function computes the sigmoid activation 
    
    Arguments:
        Z-> Weighted inputs (Z=W.TA+b)
    Returns:
        A->sigmoid activations of Z
    """
    A= 1/(1+np.exp(-Z))
    
    return A
    

### Linear forward computation
- we use the following function to compute the linear function 
- We compute $A^{l} = \sigma(W^{[l]^{T}} A^{[l-1]} + b^{[l]}) = (a^{(0)}, a^{(1)}, ..., a^{(n_l-1)}, a^{(n_l)})$

In [None]:
def linear_forward_computation(parameters,X, activation="relu"):
    """
    
    """
    
    Z= np.dot(paramet)

### forward propagation

**forward propagation:** Implementing forward propagation 

** for layer hidden layers **
- where l is layer number,L the total number of layers and n_l is the number of units in layer l
- We get $A^{[l-1]}$ where $A^{0} = X$
- We compute $A^{l} = \sigma(W^{[l]^{T}} A^{[l-1]} + b^{[l]}) = (a^{(0)}, a^{(1)}, ..., a^{(n_l-1)}, a^{(n_l)})$
- where  $\sigma$ is the  activation function 

** for output layer **
- We get layer [L-1] activations
- we compute $A^{[L]}=\sigma(W^{[L]^{T}} A^{[L-1]} + b^{[L]})=(a^{(0)}, a^{(1)}, ..., a^{(n_L-1)}, a^{(n_L)})$

In [31]:
def forward_propagate_deep(parameters, X):
    """
    This function computes the forward propagation for the network
    
    """
    L=len(parameters)//2 
    cache={}
    A=X
    for i in range(1,L):
        A_Prev=A
        Z=np.dot(parameters["W"+str(i)],A_Prev)+parameters["b"+str(i)]
        A=tanh(Z)
        cache["Z"+str(i)]=Z
        cache["A"+str(i)]=A
    ZL=np.dot(parameters["W"+str(L)],A)+parameters["b"+str(L)]
    AL=sigmoid(ZL)

    return AL, cache
        
        
        
        
        

In [32]:
parameters=initialize_layers(layer_dims)
al,cache=forward_propagate_deep(parameters,X_train)

4


0.5001167413983475