<a href="https://colab.research.google.com/github/Risad-Raihan/2_layer_nn/blob/main/forward_and_backprop_for_2_layer_nns.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import numpy as np

In [2]:
def sigmoid(z):
  """Sigmoid Activation Function. """
  return 1/(1 + np.exp(-z))

In [3]:
print (sigmoid(1))

0.7310585786300049


In [4]:
def sigmoid_derivative(a):
  """Derivative of Sigmoid Activation Function. """
  return a * (1 - a)

In [5]:
print (sigmoid_derivative(sigmoid(1)))

0.19661193324148185


does it stay

In [6]:
def initialize_parameters(n_x, n_h, n_y):
  """Initialize weights and bases. """
  W1 = np.random.randn(n_h, n_x)
  b1 = np.zeros((n_h, 1))
  W2 = np.random.randn(n_y, n_h)
  b2 = np.zeros((n_y, 1))
  parameters = {"W1": W1,
                "b1": b1,
                "W2": W2,
                "b2": b2}
  return parameters


# Forward Propagation

In [7]:
def forward_propagation(X, parameters):
  """Forward Propagation. """
  W1 = parameters["W1"]
  b1 = parameters["b1"]
  W2 = parameters["W2"]
  b2 = parameters["b2"]

  # Layer 1
  Z1 = np.dot(W1, X) + b1
  A1 = np.tanh(Z1)

  # Layer 2
  Z2 = np.dot(W2, A1) + b2
  A2 = sigmoid(Z2)

  cache = {"Z1": Z1, "A1": A1, "Z2": Z2, "A2": A2}

  return A2, cache

In [8]:
def compute_cost(A2, Y):
  """Compute binary cross-entropy Cost. """
  m = Y.shape[1]
  cost = -(1/m) * np.sum(Y * np.log(A2) + (1-Y) * np.log(1-A2))
  cost = np.squeeze(cost) #scaler
  return cost


cost = -(1/m) * np.sum(Y * np.log(A2) + (1 - Y) * np.log(1 - A2)): Calculates the binary cross-entropy cost function. This formula measures the difference between the predicted probabilities (A2) and the true labels (Y). The sum is taken over all training examples, and then the result is averaged by dividing by m. The negative sign ensures that minimizing the cost leads to better predictions. np.log is the natural logarithm, and the multiplication and subtraction are performed element-wise.

# Back Prop

In [13]:
def backward_propagation(parameters, cache, X, Y):
  m = X.shape[1]
  W1 = parameters["W1"]
  W2 = parameters["W2"]
  A1 = cache["A1"]
  A2 = cache["A2"]

  #output layer - layer 2
  dZ2 = A2 - Y
  dW2 = (1/m) * np.dot(dZ2, A1.T)
  db2 = (1/m) * np.sum(dZ2, axis=1, keepdims=True)

  #hidden layer - layer 1
  dA1 = np.dot(W2.T, dZ2)
  dZ1 = dA1 * sigmoid_derivative(A1)
  dW1 = (1/m) * np.dot(dZ1, X.T)
  db1 = (1/m) * np.sum(dZ1, axis=1, keepdims=True)

  grads = {"dW1": dW1,
           "db1": db1,
           "dW2": dW2,
           "db2": db2}
  return grads

dZ2 = A2 - Y: Calculates the error term for the output layer. For a sigmoid output with a binary cross-entropy loss, the derivative of the loss with respect to the linear output Z2 is simply A2−Y. The shape of dZ2 is (n_y, m).
dW2 = (1/m) * np.dot(dZ2, A1.T): Calculates the gradient of the cost with respect to the weights W2. np.dot(dZ2, A1.T) performs the matrix multiplication of dZ2 (shape (n_y, m)) and the transpose of A1 (shape (m, n_h)), resulting in a matrix of shape (n_y, n_h), which is the same shape as W2. The (1/m) factor averages the gradients over all training examples.
db2 = (1/m) * np.sum(dZ2, axis=1, keepdims=True): Calculates the gradient of the cost with respect to the bias b2. np.sum(dZ2, axis=1, keepdims=True) sums the error dZ2 along the rows (axis=1), resulting in a vector of shape (n_y, 1), which is the same shape as b2. keepdims=True ensures that the result has the correct dimensions for broadcasting during parameter update. The (1/m) factor averages the gradients.
# Hidden Layer (Layer 1): Comment indicating calculations for the hidden layer.
dA1 = np.dot(W2.T, dZ2): Calculates the derivative of the cost with respect to the activations of the hidden layer A1. np.dot(W2.T, dZ2) multiplies the transpose of W2 (shape (n_h, n_y)) with dZ2 (shape (n_y, m)), resulting in a matrix of shape (n_h, m), the same shape as A1. This propagates the error back to the previous layer.
dZ1 = dA1 * sigmoid_derivative(A1): Calculates the derivative of the cost with respect to the linear output Z1 of the hidden layer. This is obtained by multiplying the upstream gradient dA1 element-wise with the derivative of the sigmoid activation function evaluated at A1. The * operator performs element-wise multiplication. The shape of dZ1 is (n_h, m).
dW1 = (1/m) * np.dot(dZ1, X.T): Calculates the gradient of the cost with respect to the weights W1. np.dot(dZ1, X.T) multiplies dZ1 (shape (n_h, m)) with the transpose of the input X (shape (m, n_x)), resulting in a matrix of shape (n_h, n_x), the same shape as W1. The (1/m) factor averages the gradients.
db1 = (1/m) * np.sum(dZ1, axis=1, keepdims=True): Calculates the gradient of the cost with respect to the bias b1. np.sum(dZ1, axis=1, keepdims=True) sums dZ1 along the rows (axis=1), resulting in a vector of shape (n_h, 1), the same shape as b1. The (1/m) factor averages the gradients.
grads = {"dW1": dW1, "db1": db1, "dW2": dW2, "db2": db2}: Creates a dictionary grads to store the calculated gradients for W1, b1, W2, and b2.

In [14]:
def update_parameters(parameters, grads, learning_rate):
  """Update parameters using gradient descent. """
  W1 = parameters["W1"]
  b1 = parameters["b1"]
  W2 = parameters["W2"]
  b2 = parameters["b2"]
  dW1 = grads["dW1"]
  db1 = grads["db1"]
  dW2 = grads["dW2"]
  db2 = grads["db2"]

  W1 = W1 - learning_rate * dW1
  b1 = b1 - learning_rate * db1
  W2 = W2 - learning_rate * dW2
  b2 = b2 - learning_rate * db2

  parameters = {"W1": W1,
                "b1": b1,
                "W2": W2,
                "b2": b2}
  return parameters



# **Example Usage**

In [16]:
if __name__ == "__main__":
  #structre of the netwprk
  n_x = 2   # Number of input features
  n_h = 4   # Number of neurons in the hidden layer
  n_y = 1   # Number of neurons in the output layer (binary classification)
  m = 100   # Number of training examples

  #Generate some dummy data
  X= np.random.randn(n_x, m)
  Y = np.random.randint(0,2, size =(1,m))

  #initialise parameters
  parameters = initialize_parameters(n_x, n_h, n_y)

  #set learning rate
  learning_rate = 0.01

  #training loop
  num_iterations = 10000
  for i in range(num_iterations):
    #forward propagation
    A2, cache = forward_propagation(X, parameters)

    #compute cost
    cost = compute_cost(A2, Y)

    #Back prop
    grads = backward_propagation(parameters, cache, X, Y)

    #update parameters
    parameters = update_parameters(parameters, grads, learning_rate)

    #print cost every 100 num_iterations
    if i % 100 == 0:
      print(f"Cost after iteration {i}: {cost}")



Cost after iteration 0: 0.7777571291611867
Cost after iteration 100: 0.7592389228829867
Cost after iteration 200: 0.7474377696627231
Cost after iteration 300: 0.7401474771370038
Cost after iteration 400: 0.7358723670180937
Cost after iteration 500: 0.733646432183807
Cost after iteration 600: 0.732805254633658
Cost after iteration 700: 0.7328038970428399
Cost after iteration 800: 0.7331165773948444
Cost after iteration 900: 0.7332339815454566
Cost after iteration 1000: 0.7327459809517975
Cost after iteration 1100: 0.731448552408488
Cost after iteration 1200: 0.7293909270463661
Cost after iteration 1300: 0.7268229608069322
Cost after iteration 1400: 0.7240760706733451
Cost after iteration 1500: 0.7214448217433211
Cost after iteration 1600: 0.7191162523498826
Cost after iteration 1700: 0.7171553271652921
Cost after iteration 1800: 0.7155295996000233
Cost after iteration 1900: 0.7141520650814682
Cost after iteration 2000: 0.712927551088335
Cost after iteration 2100: 0.7117901466074722
Cost

In [17]:

    #Make prediction
    A2_final, _ = forward_propagation(X, parameters)
    predictions = (A2_final > 0.5) * 1

    #calculate accuracy
    accuracy = np.mean((predictions == Y).astype(int))

    print (f"Accuracy: {accuracy}")

Accuracy: 0.52
