## 2 Layer Neural Network model
a Neural Network with a single hidden layer

In [18]:
import numpy as np

**relu activation function**

$g(Z) = max(0, Z)$ for $z = w.x + b$ 

**sigmoid function** 

 $sigmoid(z) = \frac{1}{1 + e^{-z}}$ for $z = w.x + b$ 

In [19]:
def relu(Z):
    return np.maximum(0, Z)

def sigmoid(Z):
    return 1 / (1 + np.exp(-Z))

### Forward Propagation

We Implement `forward_prop()` using the following equations:

$Z^{[1]} =  W^{[1]} X + b^{[1]}$

$A^{[1]} = relu(Z^{[1]})$

$Z^{[2]} = W^{[2]} A^{[1]} + b^{[2]}$

$\hat{Y} = A^{[2]} = \sigma(Z^{[2]})$

In [20]:
def forward_prop(W1, b1, W2, b2, X):
    Z1 = np.dot(W1, X) + b1
    A1 = relu(Z1)
    Z2 = np.dot(W2, A1) + b2
    A2 = sigmoid(Z2)
    return Z1, A1, Z2, A2

### Cost Function
After $A^{[2]}$ (in the Python variable "`A2`"), which contains $a^{[2](i)}$ for all examples, the cost function is as follows:
$$J = - \frac{1}{m} \sum\limits_{i = 1}^{m} \large{(} \small y^{(i)}\log\left(a^{[2] (i)}\right) + (1-y^{(i)})\log\left(1- a^{[2] (i)}\right) \large{)}$$

In [21]:
def compute_cost(A2, Y):
    """
    Computes the cross-entropy cost given in equation abov
    
    """
    m = Y.shape[1] # number of examples

    logprobs = np.multiply(np.log(A2),Y) + np.multiply(np.log(1 - A2),1 - Y)
    cost = - (1/m) * np.sum(logprobs)
    
    cost = float(cost)
    
    return cost

### Backward propagation

We Implement `back_prop()` using the following equations:

$dZ^{[2]} = A^{[2]} - Y$ 

$dW^{[2]} = \frac{1}{m}dZ^{[2]} A^{[1]T}$

$db^{[2]} = \frac{1}{m} np.sum(dZ^{[2]}, axis = 1, keepdims=True)$

$dZ^{[1]} = W^{[1]T}dZ^{[2]}*g^{[1]'}(Z^{[1]})$

$dW^{[1]} = \frac{1}{m}dZ^{[1]} X^{T}$

$db^{[2]} = \frac{1}{m} np.sum(dZ^{[1]}, axis = 1, keepdims=True)$

to implement $g^{[1]'}$ for relu function:

$$g^{[1]'} = \begin{cases}
      1 & \text{if}\ Z > 0 \\
      0 & \text{otherwise}
    \end{cases}$$


In [22]:
def drelu(Z):
    return np.where(Z > 0, 1, 0)
    
def back_prop(W1, b1, W2, b2, X, Y, Z1, A1, Z2, A2):
    
    m = X.shape[1]
    
    dZ2 = A2 - Y
    dW2 = (1 / m) * np.dot(dZ2, A1.T)
    db2 = (1 / m) * np.sum(dZ2, axis=1, keepdims=True)
    dZ1 = np.dot(W2.T, dZ2) * drelu(Z1)
    dW1 = (1 / m) * np.dot(dZ1, X.T)
    db1 = (1 / m) * np.sum(dZ1, axis=1, keepdims=True)
    
    return dW2, db2, dW1, db1

### Update Parameters Using Gradient Descent

**General gradient descent rule**: $\theta = \theta - \alpha \frac{\partial J }{ \partial \theta }$ where $\alpha$ is the learning rate and $\theta$ represents a parameter.

In [50]:
def gradient_descent(W1, b1, W2, b2, X, Y, iterations, alpha, print_cost=False):
    for i in range(iterations):
        
        Z1, A1, Z2, A2 = forward_prop(W1, b1, W2, b2, X)
        cost = compute_cost(A2, Y)
        accuracy = np.mean((A2 >= 0.5) == Y) * 100  # Calculate accuracy
        
        dW2, db2, dW1, db1 = back_prop(W1, b1, W2, b2, X, Y, Z1, A1, Z2, A2)
        
        W1 = W1 - alpha * dW1
        b1 = b1 - alpha * db1
        W2 = W2 - alpha * dW2
        b2 = b2 - alpha * db2
        
        # Print the cost and accuracy every 1000 iterations
        if print_cost and i % 1000 == 0:
            print("Cost after iteration %i: %f, Accuracy: %.2f%%" % (i, cost, accuracy))
            
    return W1, b1, W2, b2, A2

In [51]:
def nn_model(X, Y, n_h, num_iterations = 10000, print_cost=False, learning_rate = 0.01 ):
    
    # Define the dimensions of the neural network
    input_size = X.shape[0]
    hidden_size = n_h
    output_size = Y.shape[0]
    
    # Initialize weights and biases
    W1 = np.random.randn(hidden_size, input_size) * 0.01
    b1 = np.zeros((hidden_size, 1))
    W2 = np.random.randn(output_size, hidden_size) * 0.01
    b2 = np.zeros((output_size, 1))

    # Train the neural network
    W1, b1, W2, b2, A2 = gradient_descent(W1, b1, W2, b2, X, Y, num_iterations, learning_rate, print_cost)

    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2}

    return parameters
    

### Predict

$y_{prediction} = \mathbb 1 * \text{(activation > 0.5)} = \begin{cases}
      1 & \text{if}\ activation > 0.5 \\
      0 & \text{otherwise}
    \end{cases}$ 

In [44]:
def predict(W1, b1, W2, b2, X):
    
    _, _, _, A2 = forward_prop(W1, b1, W2, b2, X)
    predictions = np.round(A2)
    
    return predictions

In [56]:
# Updated sample data
X = np.array([[0, 1, 4, 5, 2, 5, 3], [2, 3, 5, 6, 2, 7, 4]]) # Input features ,shape=(nx, m), m:training examples, nx: #features
Y = np.array([[0, 0, 1, 1, 0, 1, 0]])                   # Output labels

parameters = nn_model(X, Y, 5, num_iterations = 10000, print_cost=True, learning_rate = 0.01 )

W1 = parameters["W1"]
b1 = parameters["b1"]
W2 = parameters["W2"]
b2 = parameters["b2"]

Cost after iteration 0: 0.693179, Accuracy: 57.14%
Cost after iteration 1000: 0.412717, Accuracy: 85.71%
Cost after iteration 2000: 0.238421, Accuracy: 100.00%
Cost after iteration 3000: 0.162656, Accuracy: 100.00%
Cost after iteration 4000: 0.119458, Accuracy: 100.00%
Cost after iteration 5000: 0.094788, Accuracy: 100.00%
Cost after iteration 6000: 0.076591, Accuracy: 100.00%
Cost after iteration 7000: 0.062894, Accuracy: 100.00%
Cost after iteration 8000: 0.052424, Accuracy: 100.00%
Cost after iteration 9000: 0.044308, Accuracy: 100.00%


In [47]:
W1, b1, W2, b2

(array([[ 1.18521062e+00, -1.56561054e-01],
        [-1.93231620e-02, -2.27658143e-03],
        [-1.25176114e-02, -7.81241153e-04],
        [ 8.98739180e-01, -1.22583608e-01],
        [ 5.87120447e-03, -5.85625696e-03]]),
 array([[-2.45028576e+00],
        [ 0.00000000e+00],
        [ 0.00000000e+00],
        [-1.84992332e+00],
        [-3.45574402e-05]]),
 array([[ 2.72632243e+00,  1.36743310e-02, -2.87543899e-03,
          2.06031188e+00,  2.31045407e-03]]),
 array([[-4.07940946]]))

In [45]:
# Use the trained weights and biases to make predictions
predictions = predict(W1, b1, W2, b2, X)

print("Predictions:")
print(predictions)

Predictions:
[[0. 0. 1. 1. 0. 1. 0.]]
