# Forward Propagation Exercise

In this exercise, you'll implement **forward propagation** for a simple neural network. Forward propagation is the process of computing the network's output given an input by passing data through the layers.

## Neural Network Architecture
- **Input Layer**: 3 features
- **Hidden Layer**: 4 neurons with ReLU activation
- **Output Layer**: 1 neuron with sigmoid activation

## Learning Objectives
- Understand how to compute the output of a neural network using matrix operations.
- Apply ReLU and sigmoid activation functions.
- Store intermediate values in a cache for later use (e.g., in backpropagation).

## Instructions
- Complete the `forward_propagation` function by filling in the placeholders marked `# your code here`.
- Use `np.dot` for matrix multiplication, `relu` for the hidden layer activation, and `sigmoid` for the output layer activation.
- Run the unit test cell to verify your implementation.
- The cache dictionary should store `Z1`, `A1`, `Z2`, and `A2` for use in backpropagation.

**Tip**: Ensure matrix shapes align during computations (e.g., `W1` is (4, 3), `X` is (3, m), so `np.dot(W1, X)` produces (4, m)).

In [None]:
import numpy as np

def sigmoid(z):
    """
    Compute the sigmoid activation function.
    Arguments:
        z -- Input array
    Returns:
        Sigmoid of z (values between 0 and 1)
    """
    return 1 / (1 + np.exp(-z))

def relu(z):
    """
    Compute the ReLU activation function.
    Arguments:
        z -- Input array
    Returns:
        ReLU of z (values >= 0)
    """
    return np.maximum(0, z)

In [None]:
def forward_propagation(X, parameters):
    """
    Arguments:
    X -- input data of size (n_x, m), where n_x is the number of features and m is the number of examples
    parameters -- dictionary containing weights and biases:
                  W1 -- weight matrix of shape (4, 3)
                  b1 -- bias vector of shape (4, 1)
                  W2 -- weight matrix of shape (1, 4)
                  b2 -- bias vector of shape (1, 1)

    Returns:
    A2 -- The sigmoid output of the second activation, shape (1, m)
    cache -- a dictionary containing Z1, A1, Z2, A2 for later use
    """
    # Retrieve parameters
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']

    ### START CODE HERE ###
    Z1 = # your code here (Linear transformation: W1*X + b1)
    A1 = # your code here (Apply ReLU to Z1)
    Z2 = # your code here (Linear transformation: W2*A1 + b2)
    A2 = # your code here (Apply sigmoid to Z2)
    ### END CODE HERE ###

    cache = {"Z1": Z1, "A1": A1, "Z2": Z2, "A2": A2}
    return A2, cache

In [None]:
def test_forward_propagation():
    """
    Unit test to verify the forward_propagation function.
    """
    np.random.seed(1)
    X = np.random.randn(3, 2)
    parameters = {
        "W1": np.random.randn(4, 3),
        "b1": np.random.randn(4, 1),
        "W2": np.random.randn(1, 4),
        "b2": np.random.randn(1, 1)
    }

    try:
        A2, cache = forward_propagation(X, parameters)
        assert A2.shape == (1, 2), f"Expected A2 shape (1, 2), got {A2.shape}"
        assert cache["Z1"].shape == (4, 2), f"Expected Z1 shape (4, 2), got {cache['Z1'].shape}"
        assert cache["A1"].shape == (4, 2), f"Expected A1 shape (4, 2), got {cache['A1'].shape}"
        assert cache["Z2"].shape == (1, 2), f"Expected Z2 shape (1, 2), got {cache['Z2'].shape}"
        assert np.allclose(A2, np.array([[0.11046056, 0.14411428]]), atol=1e-2), "Unexpected A2 values"
        print("✅ All tests passed!")
    except AssertionError as e:
        print(f"❌ Test failed: {e}")

# Run the test
test_forward_propagation()

## Solution 

The following cell contains the complete implementation of the `forward_propagation` function. This is provided for instructors and is not visible to learners.

```python
def forward_propagation(X, parameters):
    """
    Arguments:
    X -- input data of size (n_x, m)
    parameters -- dictionary containing weights and biases:
                  W1, b1, W2, b2

    Returns:
    A2 -- The sigmoid output of the second activation
    cache -- a dictionary containing Z1, A1, Z2, A2
    """
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']

    Z1 = np.dot(W1, X) + b1  # Linear transformation for hidden layer
    A1 = relu(Z1)            # ReLU activation
    Z2 = np.dot(W2, A1) + b2 # Linear transformation for output layer
    A2 = sigmoid(Z2)         # Sigmoid activation

    cache = {"Z1": Z1, "A1": A1, "Z2": Z2, "A2": A2}
    return A2, cache

# Notes:
# - The solution uses np.dot for matrix multiplication and adds biases correctly.
# - ReLU ensures non-negative outputs for the hidden layer.
# - Sigmoid maps the output to [0, 1], suitable for binary classification.