# Exercise: Implementing Forward Propagation in a Neural Network

**Level:** Intermediate

## 🎯 Learning Objectives

By the end of this exercise, you will be able to:

* Understand the fundamental concept of **forward propagation** in a neural network.
* Implement the linear transformation part of a layer.
* Implement common activation functions: **ReLU** and **Sigmoid**.
* Combine these components to build a complete forward propagation pass for a multi-layer neural network (L-layer model).
* Store intermediate values (cache) required for backpropagation (though backpropagation itself is not part of this exercise).

---

## 🧠 Theoretical Introduction

**Forward propagation** is the process by which input data is fed through a neural network, layer by layer, to produce an output prediction. Each layer in the network performs two main operations:

1.  **Linear Transformation:** A linear combination of the inputs from the previous layer with the current layer's weights, plus a bias term. For a layer $l$, this is calculated as:
    $$Z^{[l]} = W^{[l]}A^{[l-1]} + b^{[l]}$$
    Where:
    * $A^{[l-1]}$ is the activation (output) from the previous layer (or the input data $X$ for the first layer, $A^{[0]} = X$).
    * $W^{[l]}$ is the weight matrix for the current layer $l$.
    * $b^{[l]}$ is the bias vector for the current layer $l$.
    * $Z^{[l]}$ is the linear output of layer $l$, sometimes called the pre-activation.

2.  **Activation Function:** A non-linear function applied element-wise to $Z^{[l]}$ to produce the output (activation) of the current layer $A^{[l]}$.
    $$A^{[l]} = g^{[l]}(Z^{[l]})$$
    Where $g^{[l]}$ is the activation function for layer $l$. Common activation functions include:
    * **Sigmoid:** $\sigma(z) = \frac{1}{1 + e^{-z}}$. Often used in the output layer for binary classification problems as it squashes values between 0 and 1.
    * **ReLU (Rectified Linear Unit):** $ReLU(z) = \max(0, z)$. Commonly used in hidden layers due to its efficiency and ability to mitigate the vanishing gradient problem.

For an **L-layer neural network**, this process is repeated for $L-1$ hidden layers, typically using an activation function like ReLU. The final output layer then uses an appropriate activation function for the task (e.g., Sigmoid for binary classification, Softmax for multi-class classification).

In this exercise, you will implement a generic `L_model_forward` function that performs forward propagation for a network with the following architecture:
`[LINEAR -> RELU] * (L-1) -> LINEAR -> SIGMOID`

This means all hidden layers will use the ReLU activation function, and the output layer will use the Sigmoid activation function. You will also need to implement helper functions for the linear step and for applying activations.

---

## 📋 Exercise Instructions

Your task is to complete the Python functions provided in the code cells below. Specifically, you will need to:

1.  **Implement `sigmoid(Z)`:** Computes the sigmoid activation.
2.  **Implement `relu(Z)`:** Computes the ReLU activation.
3.  **Complete `initialize_parameters_deep(layer_dims)`:** Initializes weights and biases for an L-layer network.
    * Weights $W$ should be initialized with small random numbers (e.g., `np.random.randn(shape) * 0.01`).
    * Biases $b$ should be initialized to zeros (e.g., `np.zeros(shape)`).
4.  **Complete `linear_forward(A_prev, W, b)`:** Implements the linear part of a layer's forward propagation ($Z = WA + b$).
5.  **Complete `linear_activation_forward(A_prev, W, b, activation)`:** Implements one step of forward propagation (LINEAR -> ACTIVATION). This function will use `linear_forward` and then apply either `relu` or `sigmoid` based on the `activation` argument.
6.  **Complete `L_model_forward(X, parameters)`:** Implements the full forward propagation for the `[LINEAR->RELU]*(L-1)->LINEAR->SIGMOID` model. This function will call `linear_activation_forward` iteratively.

You will find placeholders like `# YOUR CODE HERE` or `# YOUR CODE GOES HERE` where you need to add your implementation. Make sure to also return the `cache` at each step, as it stores values (like $Z$, $A_{prev}$, $W$, $b$) needed for backpropagation (which you might implement in a future exercise!).

After implementing the functions, you can run the **Unit Test** cells to check your work.

**Important:** Pay close attention to the dimensions of your matrices and vectors. Using `np.dot()` for matrix multiplication and broadcasting for biases will be key.

Let's get started!

# Code Exercise

In [1]:
import numpy as np

def sigmoid(Z):
    """
    Implements the sigmoid activation function.
    Arguments:
    Z -- numpy array of any shape
    Returns:
    A -- output of sigmoid(Z), same shape as Z
    cache -- returns Z as well, useful for backpropagation
    """
    # YOUR CODE HERE (approximately 1-2 lines)
    A = 1 / (1 + np.exp(-Z))
    cache = Z
    # END OF YOUR CODE
    return A, cache

def relu(Z):
    """
    Implements the ReLU activation function.
    Arguments:
    Z -- numpy array of any shape
    Returns:
    A -- output of relu(Z), same shape as Z
    cache -- returns Z as well, useful for backpropagation
    """
    # YOUR CODE HERE (approximately 1-2 lines)
    A = np.maximum(0, Z)
    cache = Z
    # END OF YOUR CODE
    return A, cache

def initialize_parameters_deep(layer_dims):
    """
    Arguments:
    layer_dims -- python list containing the dimensions of each layer in our network

    Returns:
    parameters -- python dictionary containing your parameters "W1", "b1", ..., "WL", "bL":
                    Wl -- weight matrix of shape (layer_dims[l], layer_dims[l-1])
                    bl -- bias vector of shape (layer_dims[l], 1)
    """
    np.random.seed(3)
    parameters = {}
    num_layers = len(layer_dims)

    for l in range(1, num_layers):
        # YOUR CODE HERE (approximately 2 lines)
        # Initialize Wl and bl.
        # Hint: use np.random.randn multiplying by a small factor (e.g., 0.01)
        # Hint: use np.zeros for biases.
        parameters['W' + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1]) * 0.01
        parameters['b' + str(l)] = np.zeros((layer_dims[l], 1))
        # END OF YOUR CODE

        assert(parameters['W' + str(l)].shape == (layer_dims[l], layer_dims[l-1]))
        assert(parameters['b' + str(l)].shape == (layer_dims[l], 1))

    return parameters

def linear_forward(A_prev, W, b):
    """
    Implements the linear part of a layer's forward propagation.

    Arguments:
    A_prev -- activations from previous layer (or input data): (size of previous layer, number of examples)
    W -- weights matrix: (size of current layer, size of previous layer)
    b -- bias vector: (size of current layer, 1)

    Returns:
    Z -- the input of the activation function, also called pre-activation parameter
    cache -- a python tuple containing "A_prev", "W" and "b"; stored for efficiently computing the backward pass
    """
    # YOUR CODE HERE (approximately 1 line)
    # Calculate Z using the formula Z = W * A_prev + b
    # Hint: use np.dot() for matrix multiplication.
    Z = np.dot(W, A_prev) + b
    # END OF YOUR CODE

    assert(Z.shape == (W.shape[0], A_prev.shape[1]))
    cache = (A_prev, W, b)

    return Z, cache

def linear_activation_forward(A_prev, W, b, activation):
    """
    Implements the forward propagation for the LINEAR->ACTIVATION layer.

    Arguments:
    A_prev -- activations from previous layer (or input data): (size of previous layer, number of examples)
    W -- weights matrix: (size of current layer, size of previous layer)
    b -- bias vector: (size of current layer, 1)
    activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"

    Returns:
    A -- the output of the activation function, also called the post-activation value
    cache -- a python tuple containing "linear_cache" and "activation_cache";
             stored for efficiently computing the backward pass
    """
    linear_cache, activation_cache = None, None # Initialization to avoid errors if not completed

    if activation == "sigmoid":
        # YOUR CODE HERE (approximately 2 lines)
        # Z, linear_cache = ...
        # A, activation_cache = ...
        Z, linear_cache = linear_forward(A_prev, W, b)
        A, activation_cache = sigmoid(Z)
        # END OF YOUR CODE

    elif activation == "relu":
        # YOUR CODE HERE (approximately 2 lines)
        # Z, linear_cache = ...
        # A, activation_cache = ...
        Z, linear_cache = linear_forward(A_prev, W, b)
        A, activation_cache = relu(Z)
        # END OF YOUR CODE

    assert (A.shape == (W.shape[0], A_prev.shape[1]))
    cache = (linear_cache, activation_cache)

    return A, cache

def L_model_forward(X, parameters):
    """
    Implements the forward propagation for the [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID model.

    Arguments:
    X -- data, numpy array of shape (input size, number of examples)
    parameters -- output of initialize_parameters_deep()

    Returns:
    AL -- activation of the last layer (output)
    caches -- list of caches containing:
                every cache of linear_activation_forward() (there are L-1 of them, indexed from 0 to L-2)
                the cache of linear_activation_forward() for the final layer (indexed L-1)
    """
    caches = []
    A = X
    num_layers = len(parameters) // 2 # number of layers in the neural network

    # Implement [LINEAR -> RELU]*(L-1). Add "cache" to the "caches" list.
    # The loop goes from 1 to L-1 because the last layer is different (Sigmoid).
    for l in range(1, num_layers):
        A_prev = A
        # YOUR CODE HERE (approximately 2 lines)
        # Get W, b from parameters.
        # Calculate A and cache using linear_activation_forward with "relu".
        # Store the cache.
        Wl = parameters['W' + str(l)]
        bl = parameters['b' + str(l)]
        A, cache = linear_activation_forward(A_prev, Wl, bl, activation="relu")
        caches.append(cache)
        # END OF YOUR CODE

    # Implement LINEAR -> SIGMOID. Add "cache" to the "caches" list.
    # This is the final layer L.
    # YOUR CODE HERE (approximately 2 lines)
    # Get WL, bL from parameters.
    # Calculate AL and cache using linear_activation_forward with "sigmoid".
    # Store the cache.
    WL = parameters['W' + str(num_layers)]
    bL = parameters['b' + str(num_layers)]
    AL, cache = linear_activation_forward(A, WL, bL, activation="sigmoid")
    caches.append(cache)
    # END OF YOUR CODE

    assert(AL.shape == (parameters['W' + str(num_layers)].shape[0], X.shape[1]))

    return AL, caches


## Unit testing

In [2]:
# Cell for Unit Tests (student can execute this)
print("🧪 Running unit tests...")

# --- Test data preparation ---
def initialize_parameters_for_test():
    # 2-layer network: input (2 neurons), hidden (3 neurons), output (1 neuron)
    # This simplifies manual checking if necessary.
    # Layer_dims for a simple network: [n_x, n_h, n_y]
    # For example, for a network with 2 input neurons, 3 in the hidden layer, and 1 in the output:
    # layer_dims_test = [2, 3, 1]
    # parameters_test = initialize_parameters_deep(layer_dims_test)

    # For a more specific test of L_model_forward, let's use the same parameters
    # that would be used in a known example.
    # For example, if you have a reference solution or an example from a course.
    # Here, we will generate fixed parameters for the test.
    np.random.seed(1) # Seed for weight reproducibility
    W1 = np.random.randn(3, 2) * 0.01
    b1 = np.zeros((3, 1))
    W2 = np.random.randn(1, 3) * 0.01
    b2 = np.zeros((1, 1))
    parameters_test = {"W1": W1, "b1": b1, "W2": W2, "b2": b2}
    return parameters_test

parameters_test = initialize_parameters_for_test()
X_test = np.random.rand(2, 5) # 2 features, 5 examples

# --- Test 1: L_model_forward function execution ---
try:
    AL_test, caches_test = L_model_forward(X_test, parameters_test)
    test_1_passed = True
    print("✅ Test 1 (Execution): L_model_forward executed without errors.")
except Exception as e:
    test_1_passed = False
    print(f"❌ Test 1 (Execution): L_model_forward execution failed. Error: {e}")
    # If it fails here, subsequent tests might not run or might give additional errors.
    # It's important for the student to see this message first.
    AL_test, caches_test = None, None # To avoid errors in subsequent tests


# --- Test 2: Verify AL output shape ---
if test_1_passed and AL_test is not None:
    expected_AL_shape = (parameters_test["W2"].shape[0], X_test.shape[1]) # (n_y, m)
    if AL_test.shape == expected_AL_shape:
        print(f"✅ Test 2 (AL Shape): AL shape {AL_test.shape} is correct.")
    else:
        print(f"❌ Test 2 (AL Shape): Incorrect. Expected: {expected_AL_shape}, Got: {AL_test.shape}")
        print("    Hint: Check how the final activation is calculated and the dimensions of W and b in the last layer.")

# --- Test 3: Verify number of caches ---
# L = number of layers = len(parameters) // 2. There should be L caches.
if test_1_passed and caches_test is not None:
    expected_num_caches = len(parameters_test) // 2
    if len(caches_test) == expected_num_caches:
        print(f"✅ Test 3 (Number of Caches): The number of caches ({len(caches_test)}) is correct.")
    else:
        print(f"❌ Test 3 (Number of Caches): Incorrect. Expected: {expected_num_caches} caches, Got: {len(caches_test)}")
        print("    Hint: Make sure to store one cache for each layer, including the output layer.")

# --- Test 4: Verify cache content and shape (first cache as example) ---
# Each cache = (linear_cache, activation_cache)
# linear_cache = (A_prev, W, b)
# activation_cache = Z
if test_1_passed and caches_test is not None and len(caches_test) > 0:
    first_cache = caches_test[0] # Cache of the first hidden layer ([LINEAR->RELU])
    if len(first_cache) == 2:
        linear_cache_1, activation_cache_1_Z = first_cache
        if len(linear_cache_1) == 3: # A_prev, W1, b1
            A0_test, W1_test, b1_test = linear_cache_1
            # Check A0 shape (should be X_test)
            if A0_test.shape == X_test.shape:
                print("✅ Test 4.1 (A_prev shape in cache[0]): Correct.")
            else:
                print(f"❌ Test 4.1 (A_prev shape in cache[0]): Incorrect. Expected: {X_test.shape}, Got: {A0_test.shape}")

            # Check W1 shape
            if W1_test.shape == parameters_test["W1"].shape:
                print("✅ Test 4.2 (W1 shape in cache[0]): Correct.")
            else:
                print(f"❌ Test 4.2 (W1 shape in cache[0]): Incorrect. Expected: {parameters_test['W1'].shape}, Got: {W1_test.shape}")

            # Check Z shape in activation_cache_1_Z (should be (n_h, m))
            expected_Z1_shape = (parameters_test["W1"].shape[0], X_test.shape[1])
            if activation_cache_1_Z.shape == expected_Z1_shape:
                print(f"✅ Test 4.3 (Z1 shape in cache[0]): Correct.")
            else:
                print(f"❌ Test 4.3 (Z1 shape in cache[0]): Incorrect. Expected: {expected_Z1_shape}, Got: {activation_cache_1_Z.shape}")
        else:
            print(f"❌ Test 4 (linear_cache[0] structure): Incorrect. linear_cache should have 3 elements (A_prev, W, b). Got {len(linear_cache_1)}.")
    else:
        print(f"❌ Test 4 (cache[0] structure): Incorrect. Each cache should be a tuple of 2 elements (linear_cache, activation_cache). Got {len(first_cache)}.")

# --- Test 5: Value verification (requires a reference implementation) ---
# For this test, we need the exact values that L_model_forward should produce
# with X_test and parameters_test.
# This is the most complex part of the test and requires you to have a reference solution.
# (Assume you have a correctly implemented L_model_forward_solution function)
# from solution_module import L_model_forward_solution # (This would not be given to the student)

# For the purposes of this example, we will hardcode expected values here
# that would be calculated with a correct solution and the parameters from `initialize_parameters_for_test()`
# and the X_test generated with np.random.seed(1) for X_test as well
# This is just an example, actual values may vary depending on the exact initialization.
if test_1_passed and AL_test is not None:
    # Create a reproducible X_test with a seed
    np.random.seed(1)
    X_reproducible_test = np.array([[0.417022, 0.72032449, 0.00011437, 0.30233257, 0.14675589],
                                    [0.09233859, 0.18626021, 0.34556073, 0.39676747, 0.53881673]])
    # Parameters from initialize_parameters_for_test() with seed 1
    # W1 = [[ 0.01764052,  0.00400157], [ 0.00978738,  0.02240893], [ 0.01867558, -0.00977278]]
    # b1 = [[0.], [0.], [0.]]
    # W2 = [[-0.00752184,  0.00785796, -0.02242689]]
    # b2 = [[0.]]

    # Calculated with a reference implementation (THESE VALUES ARE EXAMPLES!)
    # You should calculate them yourself with your solution.
    # With the activation functions and linear propagation, using the parameters and X_reproducible_test:
    # Z1 = W1.dot(X_reproducible_test) + b1
    # A1 = relu(Z1)
    # Z2 = W2.dot(A1) + b2
    # AL_expected_example = sigmoid(Z2)
    # For this example, assume AL_expected_example is (actual values may vary):
    AL_expected_example = np.array([[0.50000208, 0.50000317, 0.49999717, 0.50000166, 0.49999899]])

    # It is crucial to use the same X_test that was used to generate AL_test.
    # If AL_test was generated with a random X_test without a fixed seed *within the test*,
    # this value test will not be reproducible.
    # That's why X_test must be fixed or generated with a seed before calling L_model_forward.
    # In our case, X_test was generated outside, but for this test to be self-contained and robust,
    # we could re-run L_model_forward with an X_reproducible_test.
    AL_reproducible_test, _ = L_model_forward(X_reproducible_test, parameters_test)

    if np.allclose(AL_reproducible_test, AL_expected_example, atol=1e-7):
        print(f"✅ Test 5 (AL Values): AL values are correct for a specific test input.")
    else:
        print(f"❌ Test 5 (AL Values): Incorrect for a specific test input.")
        print(f"    Expected (approx): {AL_expected_example}")
        print(f"    Obtained: {AL_reproducible_test}")
        print(f"    Difference: {np.abs(AL_reproducible_test - AL_expected_example)}")
        print(f"    np.allclose says: {np.allclose(AL_reproducible_test, AL_expected_example, atol=1e-7)}")
        print("    Hint: Check the calculations in `linear_forward` and `linear_activation_forward`. Verify the `sigmoid` and `relu` activation functions.")
        print("    Make sure that the parameters (W, b) are being used correctly in each layer.")
        print("    Check the order of operations and the application of activation functions (ReLU for hidden layers, Sigmoid for output).")

print("🏁 Tests finished.")



🧪 Running unit tests...
✅ Test 1 (Execution): L_model_forward executed without errors.
✅ Test 2 (AL Shape): AL shape (1, 5) is correct.
✅ Test 3 (Number of Caches): The number of caches (2) is correct.
✅ Test 4.1 (A_prev shape in cache[0]): Correct.
✅ Test 4.2 (W1 shape in cache[0]): Correct.
✅ Test 4.3 (Z1 shape in cache[0]): Correct.
❌ Test 5 (AL Values): Incorrect for a specific test input.
    Expected (approx): [[0.50000208 0.50000317 0.49999717 0.50000166 0.49999899]]
    Obtained: [[0.50002827 0.50004762 0.5        0.50001083 0.5       ]]
    Difference: [[2.61872193e-05 4.44506598e-05 2.83000000e-06 9.17386182e-06
  1.01000000e-06]]
    np.allclose says: False
    Hint: Check the calculations in `linear_forward` and `linear_activation_forward`. Verify the `sigmoid` and `relu` activation functions.
    Make sure that the parameters (W, b) are being used correctly in each layer.
    Check the order of operations and the application of activation functions (ReLU for hidden layers,

## Exercise Solution

In [3]:
# ███████████████████████████████████████████████████████████████████████████████████████████████████
# ██████████████████████████ P R O P O S E D   S O L U T I O N ██████████████████████████████████████
# ███████████████████████████████████████████████████████████████████████████████████████████████████
# (This cell would normally not be visible to the student)

# --- Helper Functions (Complete) ---
def sigmoid_solution(Z):
    A = 1/(1+np.exp(-Z))
    cache = Z
    return A, cache

def relu_solution(Z):
    A = np.maximum(0,Z)
    cache = Z
    return A, cache

def initialize_parameters_deep_solution(layer_dims):
    np.random.seed(3) # Maintain consistency if used elsewhere
    parameters = {}
    L = len(layer_dims)

    for l in range(1, L):
        parameters['W' + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1]) * 0.01
        parameters['b' + str(l)] = np.zeros((layer_dims[l], 1))

        assert(parameters['W' + str(l)].shape == (layer_dims[l], layer_dims[l-1]))
        assert(parameters['b' + str(l)].shape == (layer_dims[l], 1))
    return parameters

def linear_forward_solution(A_prev, W, b):
    Z = np.dot(W, A_prev) + b
    assert(Z.shape == (W.shape[0], A_prev.shape[1]))
    cache = (A_prev, W, b)
    return Z, cache

def linear_activation_forward_solution(A_prev, W, b, activation):
    Z, linear_cache = linear_forward_solution(A_prev, W, b)
    if activation == "sigmoid":
        A, activation_cache = sigmoid_solution(Z)
    elif activation == "relu":
        A, activation_cache = relu_solution(Z)
    else:
        raise ValueError("Activation must be 'sigmoid' or 'relu'")

    assert (A.shape == (W.shape[0], A_prev.shape[1]))
    cache = (linear_cache, activation_cache)
    return A, cache

# --- Main Exercise Function (Solution) ---
def L_model_forward_solution(X, parameters):
    """
    Implements the forward propagation for the [LINEAR->RELU]*(L-1)->LINEAR->SIGMOID model.

    Arguments:
    X -- data, numpy array of shape (input size, number of examples)
    parameters -- output of initialize_parameters_deep()

    Returns:
    AL -- activation of the last layer (output)
    caches -- list of caches containing:
                every cache from linear_activation_forward() (there are L-1 of them, indexed from 0 to L-2)
                the cache from linear_activation_forward() for the final layer (indexed L-1)
    """
    caches = []
    A = X
    L = len(parameters) // 2 # number of layers in the neural network

    # Implement [LINEAR -> RELU]*(L-1). Add "cache" to the "caches" list.
    for l in range(1, L):
        A_prev = A
        Wl = parameters['W' + str(l)]
        bl = parameters['b' + str(l)]
        A, cache = linear_activation_forward_solution(A_prev, Wl, bl, activation="relu")
        caches.append(cache)

    # Implement LINEAR -> SIGMOID. Add "cache" to the "caches" list.
    WL = parameters['W' + str(L)]
    bL = parameters['b' + str(L)]
    AL, cache = linear_activation_forward_solution(A, WL, bL, activation="sigmoid")
    caches.append(cache)

    assert(AL.shape == (WL.shape[0], X.shape[1]))

    return AL, caches

# --- Notes for the solution (optional) ---
# - It is crucial to maintain the correct order of operations: Z = W*A + b, then A_sig = sigmoid(Z) or A_relu = relu(Z).
# - Matrix dimensions must be consistent. A common error is a transpose error or order error in np.dot().
# - The `cache` is vital for backpropagation. Make sure it stores the correct components.
#   For linear_cache: (A_prev, W, b)
#   For activation_cache: Z
# - Parameter initialization (although not part of THIS fill-in-the-blank exercise) is important.
#   Using `np.random.randn * 0.01` helps prevent neurons from saturating too quickly (especially with sigmoid).
#   Biases are initialized to zero.
