### A shallow(2-Layer) Neural Network from scratch
**We will :**
- Build  the general architecture of a two layer Neural Network  learning algorithm, including:
    - Initializing parameters
    - Calculating the cost function and its gradient
    - Using an optimization algorithm (gradient descent) 
- Gather all three functions above into a main model function, in the right order.
<img src=./data/2-Layer_Neural_Network.png><img>

## 1 - Packages ##

import all the packages that you will need during this assignment. 
- [numpy](www.numpy.org) is the fundamental package for scientific computing with Python.
- [matplotlib](http://matplotlib.org) is a famous library to plot graphs in Python.
* [scikit-learn](http://scikit-learn.org/stable/) a library with Simple and efficient tools for data mining and data analysis

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

%matplotlib inline
np.random.seed(42)

## 2 - Dataset ##

we will use the make_classification data from sklearn

Loading the data by with the  following code.

In [2]:
X,Y=datasets.make_classification(n_samples=100000, n_features=100,
                                    n_informative=100,n_classes=2, n_redundant=0,
                                    random_state=42)


## - Data-split  ##

we will split the data with the following distribution 
- 99% -training set
- 1% -test set

we will use the sklearn train_test_split

In [3]:
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.01,
                                                    random_state=42)

For convenience, we reshape the data into  a numpy-array of shape (1, m). After this, our training (and test) dataset is a numpy-array where each column represents one training example. There should be m_train (respectively m_test) columns.

In [4]:
# we need to reshape our data to column vectors 
X_train=X_train.reshape(X_train.shape[0],-1).T
X_test=X_test.reshape(X_test.shape[0],-1).T
y_train=y_train.reshape(y_train.shape[0],-1).T
y_test=y_test.reshape(y_test.shape[0],-1).T


## 3 - Building the parts of our algorithm ## 

The main steps for building a Neural Network are:
1. Define the model structure (such as number of input features) 
2. Initialize the model's parameters
3. Loop:
    - Calculate current loss (forward propagation)
    - Calculate current gradient (backward propagation)
    - Update parameters (gradient descent)

You often build 1-3 separately and integrate them into one function we call `model()`.

##### We will define the parameters of our Neural Network and initialize them 

In [72]:
def initialize_parameters(n_x,n_h,n_y):
    """
    This functions computes the initialization weights and bias for the hidden layer and output layer
    
    Arguments:
        n_x->number of features in the input X
        n_h-> number of units in the hidden layer
        n_y-> number of units in the output layer
    
    Returns:
        parameters->weights and biases for each layer
    """
    W1=np.random.randn(n_h,n_x)*0.01
    b1=np.zeros((n_h,1),dtype=float)
    W2=np.random.randn(n_y,n_h)*0.01
    b2=np.zeros((n_y,1),dtype=float)
    
    parameters={"W1": W1, "b1":b1,"W2":W2,"b2":b2}
    #Lets check that parameters have correct shapes
    assert parameters["W1"].shape==(n_h,n_x),"Error in parameters W1 shape"
    assert parameters["W2"].shape==(n_y,n_h),"Error in parameters W2 shape"
    assert parameters["b1"].shape==(n_h,1),"Error in parameters b1 shape"
    assert parameters["b2"].shape==(n_y,1),"Error in parameters b2 shape"
    
    return parameters

Lets do some checks to ensure we initiliazed the parameters correctly

In [73]:
parameters=initialize_parameters(X_train.shape[0],5,y_train.shape[0])
print(parameters["W1"].shape,parameters["W2"].shape,parameters["b1"].shape,parameters["b2"].shape)
    

(5, 100) (1, 5) (5, 1) (1, 1)


#### We define the activations functions mainly Relu and sigmoid and tanh

####  <center>  Relu function </center> 
#    <center>               $\max(0,Z)$</center> 


In [7]:
def relu(Z):
    """
    This function computes the relu activation 
    
    Arguments:
        Z-> Weighteds inputs (Z=W.TA+b)
    Returns:
        A->relu activations of Z
    """
    A=np.maximum(0,Z)
    
    return A


####  <center>  Tanh function </center> 
#    <center>               $\frac{\mathrm{e}^{z}-\mathrm{e}^{-z}}{\mathrm{e}^{z}+\mathrm{e}^{-z}}$</center> 

In [8]:
def tanh(Z):
    """
    This function computes the tanh activation 
    
    Arguments:
        Z-> Weighted inputs (Z=W.TA+b)
    Returns:
        A->tanh activations of Z
    """
    
    A=(np.exp(Z)-np.exp(-Z))/(np.exp(Z)+np.exp(-Z))
    
    return A
    

####  <center>  Sigmoid function </center> 
#    <center>               $\frac{1}{1-\mathrm{e}^{-z}}$</center> 

In [9]:
def sigmoid(Z):
    """
    This function computes the sigmoid activation 
    
    Arguments:
        Z-> Weighted inputs (Z=W.TA+b)
    Returns:
        A->sigmoid activations of Z
    """
    A= 1/(1+np.exp(-Z))
    
    return A
    

### forward propagation

**forward propagation:** Implementing forward propagation 

** for layer hidden layer **
- We get X
- We compute $A1 = \sigma(W1^T X + b1) = (a^{(0)}, a^{(1)}, ..., a^{(m-1)}, a^{(m)})$

** for output layer **
- We get layer 1 activations
- we compute $A2=\sigma(W2^T + b2)=(a^{(0)}, a^{(1)}, ..., a^{(n_h-1)}, a^{(n_h)})$


In [55]:
def forward_propagate(parameters,X,activation="relu"):
    """
    This function computes the forward propagation
    
    Arguments:
        parameters-> weights and biases for hidden layer
        activation-> the activation to use 
    """
    W1=parameters["W1"]
    b1=parameters["b1"]
    W2=parameters["W2"]
    b2=parameters["b2"]
    #compute Z1
    Z1=np.dot(W1 ,X )+ b1
        
    if activation=="relu":
        A1=relu(Z1)
    elif activation=="sigmoid":
        A1=sigmoid(Z1)
    elif activation=="tanh":
        A1=tanh(Z1)
        
    Z2=np.dot(W2,A1)+b2
    #activation for the final layer is a sigmoid since we are trying to estimate the predictions where it's either 1 or 0
    A2=sigmoid(Z2)
    
    #we need to keep A2 and A1 for backpropagation
    cache={"A2":A2,"A1":A1 }
    
    return A2,cache
        
        
    
    
    

### Now we define the cost function 

In [64]:
def compute_cost(A,Y):
    """
    This function computes the cost of the Neural Network    
    
    Arguments:
        A-> Activations from the forward propagation
        Y-> The correct labels 
    Returns:
        cost-> logistic regression cost
    """
    m=Y.shape[1]
    cost=-(1/m)*np.sum(Y*np.log(A)+((1-Y)*np.log(1-A)))
    
    return cost

#### Derivatives of the activation functions

In [26]:
def relu_derivative(A):
    """
    This function computes the derivative of the relu function Max(0,Z)->drelu=Max(0,1)
    drelu->returns 0 for all values below and including o and 1 for all other values
    Arguments:
        A->Activations
    Returns:
        Ad->relu derivative of A
    """
    
    Ad=np.choose(A>0,[0,1])
    return Ad

In [27]:
def sigmoid_derivative(A):
    """
    This function computes the derivative of the sigmoid function
    the derivate evaluates to a(1-a) where a is the sigmoid function
    Arguments:
        A->Activations
    Returns:
        Ad->relu derivative of A
    """
    Ad=sigmoid(A)*(1-sigmoid(A))
    return Ad
    

In [28]:
def tanh_derivative(A):
    """
    This function computes the derivative of the tanh function
    the derivate evaluates to (1-a*a) where a is the tanh function
    Arguments:
        A->Activations
    Returns:
        Ad->tanh derivative of A
    """
    Ad=(1-tanh(A)*tanh(A))
    return Ad
    
    

In [80]:
def backpropagation(cache,Y,X,parameters):
    """
    This function computes the gradients by backpropagation 
    Arguments:
        cache-> stored values of activations
        parameters->stored parameters
        Y->true labels
        X->inputs features
    Returns:
        grads-> a dictionary containing the gradients of the parameters W1,W2,b1,and b2
    
    """
    m=X.shape[1]
    
    A2=cache["A2"]
    A1=cache["A1"]
    
    W1=parameters["W1"]
    W2=parameters["W2"]
    
    dZ2=A2-Y
    dW2=1/m* np.dot(dZ2,A1.T)
    db2=1/m*np.sum(dZ2,axis=1,keepdims=True)
    dZ1=np.dot(W2.T,dZ2)*relu_derivative(A1)
    dW1=1/m*np.dot(dZ1,X.T)
    db1=1/m*np.sum(dZ1,axis=1,keepdims=True)
    
    grads={"dW1":dW1,"dW2":dW2,"db1":db1,"db2":db2}
    
    return grads
    
    

In [99]:
# GRADED FUNCTION: predict

def predict(parameters, X):
    """
    Using the learned parameters, predicts a class for each example in X
    
    Arguments:
    parameters -- python dictionary containing your parameters 
    X -- input data of size (n_x, m)
    
    Returns
    predictions -- vector of predictions of our model (red: 0 / blue: 1)
    """
    
    # Computes probabilities using forward propagation, and classifies to 0/1 using 0.5 as the threshold.
    ### START CODE HERE ### (≈ 2 lines of code)
    A2, cache = forward_propagate(parameters,X,activation="relu")
    predictions = np.choose(A2<0.5,[1,0])
    ### END CODE HERE ###
    
    return predictions

In [102]:
def model(X,Y,num_iterations,learning_rate):
    """
    This function combines all the above functions to create the 2 layer Neural Network
    
    Arguments:
        X-> inputs X
        Y->tr
    
    """
    np.seterr(all='raise')
    parameters=initialize_parameters(X.shape[0],40,Y.shape[0])
    W1=parameters["W1"]
    W2=parameters["W2"]
    b1=parameters["b1"]
    b2=parameters["b2"]
    for i in range(num_iterations):
        A2,cache=forward_propagate(parameters,X,activation="relu")
        cost=compute_cost(A2,Y)
        #print(cost)
        grads=backpropagation(parameters=parameters,cache=cache,X=X,Y=Y)
        parameters["W1"]=parameters["W1"]-learning_rate*grads["dW1"]
        parameters["W2"]=parameters["W2"]-learning_rate*grads["dW2"]
        parameters["b1"]=parameters["b1"]-learning_rate*grads["db1"]
        parameters["b2"]=parameters["b2"]-learning_rate*grads["db2"]
        if i%100==0:
            preds=predict(parameters,X_train)
            print(cost)
            print("train accuracy: {} %".format(100 - np.mean(np.abs(preds - y_train)) * 100))
    return parameters
        
        

In [98]:
params=model(X_train,y_train,500,0.01)

0.693478094193
train accuracy: 51.16868686868687 %
0.666921585739
train accuracy: 75.68080808080808 %
0.511462647306
train accuracy: 80.15959595959596 %
0.370568309501
train accuracy: 81.78080808080809 %
0.306539343937
train accuracy: 82.58282828282829 %


In [100]:
preds=predict(params,X_test)

In [101]:
 print("train accuracy: {} %".format(100 - np.mean(np.abs(preds - y_test)) * 100))

train accuracy: 91.4 %


In [71]:
np.log()

-538.80491176060673