<a href="https://colab.research.google.com/github/Megh-Zyke/Neural-Network-models/blob/main/functions_definations/Planar_Data_Classification_using_Shallow_neural_Networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [20]:
import numpy as np
import copy
import matplotlib.pyplot as plt
import sklearn
import sklearn.datasets
import sklearn.linear_model
import pandas as pd

%matplotlib inline

%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


#Non Linear Activation Functions

The non linear activation functions used in Neural Network models are :

- Sigmoid Function

  $S(x)= \frac {1}{1+e^{-x}}$

- Hyperbolic Tangent Function (tanh(x))

  $tanh(x) = \frac {e^x – e^{-x}} {e^x + e^{-x}}$
- Rectified Linear Unit Function (ReLu(x))

  $ReLu(x)  = max{(0, z)}$


In [18]:
def sigmoid(x):
  denominator  = 1 + np.exp(-x)
  sigmoid = 1/denominator

  return sigmoid

In [19]:
def tanh(x):

  num =np.exp(x) + np.exp(-x)
  den = np.exp(x) - np.exp(-x)

  tanh = num/den

  return tanh

In [26]:
def initialize_parameters(n_x, n_h, n_y):


    W1 = np.random.randn(n_h,n_x)*0.01
    b1 = np.zeros((n_h,1))
    W2 = np.random.randn(n_y,n_h)*0.01
    b2 = np.zeros((n_y,1))
    # YOUR CODE ENDS HERE

    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}

    return parameters

#Forward Propagation

Implement `forward_propagation()` using the following equations:
$$Z^{[1]} =  W^{[1]} X + b^{[1]}\tag{1}$$
$$A^{[1]} = \tanh(Z^{[1]})\tag{2}$$
$$Z^{[2]} = W^{[2]} A^{[1]} + b^{[2]}\tag{3}$$
$$\hat{Y} = A^{[2]} = \sigma(Z^{[2]})\tag{4}$$



In [25]:
def forward_propagation(X,parameters):

    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]


    Z1 = np.dot(W1,X) + b1
    A1 = np.tanh(Z1)
    Z2 = np.dot(W2,A1) + b2
    A2 = sigmoid(Z2)


    assert(A2.shape == (1, X.shape[1]))

    cache = {"Z1": Z1,
             "A1": A1,
             "Z2": Z2,
             "A2": A2}

    return A2, cache

#Computing Cost Function

$$J = - \frac{1}{m} \sum\limits_{i = 1}^{m} \large{(} \small y^{(i)}\log\left(a^{[2] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[2] (i)}\right) \large{)} \small\tag{13}$$


The above given mathematical function is the `Cost Function` of the Neural Network. The cost function helps us average out the entire losses of the dataset.

It is the average of the loss function used in this Neural Network model.

`Loss Function : `

$$L =  \large{(} \small y^{(i)}\log\left(a^{[2] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[2] (i)}\right) \large{)} \small\tag{13}$$

In [17]:
def computing_cost(A2,Y):
  m = A2.shape[0]
  loss_function = np.sum(Y*np.log(A2)) + np.sum((1-Y)*np.log(1-A2))

  J = -1*np.sum(loss_function)/m
  cost = float(np.squeeze(J))
  return cost

#Implementation of Backward Propagation

Backpropagation is one of the most important steps in Neural Network Building as it in this level that weights and biases are updated to reduce the cost function.

We find the derivative of the cost function with respect to each of the weights and biases and then we apply gradient descent on the function to minimize the value of the cost function

`Cost Function:`$J = - \frac{1}{m} \sum\limits_{i = 1}^{m} \large{(} \small y^{(i)}\log\left(a^{[2] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[2] (i)}\right) \large{)} \small\tag{13}$


`Derivatives:`

$\frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } = \frac{1}{m} (a^{[2](i)} - y^{(i)})$

$\frac{\partial \mathcal{J} }{ \partial W_2 } = \frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } a^{[1] (i) T} $

$\frac{\partial \mathcal{J} }{ \partial b_2 } = \sum_i{\frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)}}}$

$\frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)} } =  W_2^T \frac{\partial \mathcal{J} }{ \partial z_{2}^{(i)} } * ( 1 - a^{[1] (i) 2}) $

$\frac{\partial \mathcal{J} }{ \partial W_1 } = \frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)} }  X^T $

$\frac{\partial \mathcal{J} _i }{ \partial b_1 } = \sum_i{\frac{\partial \mathcal{J} }{ \partial z_{1}^{(i)}}}$


In [21]:
def backward_propagation(parameters, cache, X, Y):
    m = X.shape[1]

    W1 = parameters["W1"]
    W2 = parameters["W2"]

    A1 = cache["A1"]
    A2 = cache["A2"]


    dZ2 = A2 - Y
    dW2 = (1/m) * np.dot(dZ2,A1.T)
    db2 = (1/m) * np.sum(dZ2, axis=1, keepdims=True)

    dZ1 = np.dot(W2.T,dZ2) * (1 - np.power(A1,2))
    dW1 = (1/m) * np.dot(dZ1,X.T)
    db1 = (1/m) * np.sum(dZ1, axis=1, keepdims=True)


    grads = {"dW1": dW1,
             "db1": db1,
             "dW2": dW2,
             "db2": db2}

    return grads

#Gradient Descent

Gradient descent is an optimization algorithm which is commonly-used to train machine learning models and neural networks. Training data helps these models learn over time, and the cost function within gradient descent specifically acts as a barometer, gauging its accuracy with each iteration of parameter updates.

$\theta = \theta - \alpha \frac{\partial J }{ \partial \theta }$

In [22]:
def gradient_descent(parameters, grads, learning_rate = 1.2):

    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]


    dW1 = grads["dW1"]
    db1 = grads["db1"]
    dW2 = grads["dW2"]
    db2 = grads["db2"]

    W1 = W1 - learning_rate * dW1
    b1 = b1 - learning_rate * db1
    W2 = W2 - learning_rate * dW2
    b2 = b2 - learning_rate * db2

    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}

    return parameters

#Integration of functions in `nn_model()`

Building your neural network model in `nn_model()`.

In [27]:
def nn_model(X, Y, n_h, num_iterations = 10000, print_cost=False):


    np.random.seed(3)
    n_x = layer_sizes(X, Y)[0]
    n_y = layer_sizes(X, Y)[2]


    parameters = initialize_parameters(n_x, n_h, n_y)



    for i in range(0, num_iterations):


        A2, cache = forward_propagation(X, parameters)
        cost = computing_cost(A2,Y)

        grads =  backward_propagation(parameters, cache, X, Y)
        parameters =  gradient_descent(parameters, grads)

        if print_cost and i % 1000 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))

    return parameters