# Exercise: Implement the ReLU Activation Function

In this exercise, you'll implement one of the most widely used activation functions in deep learning — **ReLU**, or **Rectified Linear Unit**.

ReLU introduces **non-linearity** into the neural network while being simple and computationally efficient.

Let’s dive in!

## What Is ReLU?

The ReLU activation function is defined as:

$ A = ReLU(x) = max(0, x)$

This means:
- If the input `x` is **positive**, it returns `x`.
- If the input `x` is **negative or zero**, it returns `0`.


## Import Packages

In [None]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

## Task: Create ReLU function

Complete the function `relu(x)` to return the ReLU of `x`.

Your function should:
- Accept either a **scalar** or a **NumPy array**
- Replace all negative values with 0.0 (yes, specifically 0.0 and not 0. This will is important for later as we train a simple model ensuring all scalar values match data types)
- Return a scalar or array (same shape as input)

Hint: Use TensorFlow's built-in `maximum()` function


In [None]:
def relu(x):
    """
    Compute the ReLU (Rectified Linear Unit) of x.

    Arguments:
    x -- A scalar or numpy array of any shape

    Returns:
    ReLU applied element-wise to x
    """
    result = # Replace with your code
    return result

In [None]:
def test_relu():
    """
    Test the relu function with scalar inputs.
    """
    # Test relu function with negative values
    assert relu(-5) == 0, "Scalar negative input test failed. Expected 0 for negative input."
    # Test relu function with positive values
    assert relu(3) == 3, "Scalar positive input test failed. Expected x for positive input for relu(x)."
    # Test relu function with zero
    assert relu(0) == 0, "Scalar zero input test failed. Expected 0 for zero input."

    print("Scalar tests passed!")

def test_relu_array():
    """
    Test the relu function with numpy array inputs.
    """
    input_array = np.array([-1, 0, 2])
    expected_output = np.array([0, 0, 2])
    try:
        np.testing.assert_array_equal(relu(input_array), expected_output)
        print("Array test passed!")
    except AssertionError:
        raise AssertionError("Array test failed: expected [0, 0, 2] from relu([-1, 0, 2])")
    

try:
    test_relu()
    test_relu_array()
    print("All tests passed!")
except AssertionError as err:
    print(err)


## Task: Plot ReLU function on a Graph.

Plot the ReLU function for values from -10 to 10.

This helps you visualize how ReLU behaves depending on the value of x.

Hint: use the code above that you wrote above to fill in the values for y!


In [None]:
def plot_relu():
    """
    Plot the ReLU activation function.
    """
    x = np.linspace(-10, 10, 100)
    y = # Replace with your code
    plt.plot(x, y)
    plt.title("ReLU Activation Function")
    plt.xlabel("x")
    plt.ylabel("ReLU(x)")
    plt.grid(True)
    plt.show()

plot_relu()

If run correctly, you should see a plot where the relationship between x and y is linear except for when $x < 0$, in which $y = 0$ for all of $x < 0$.

## ReLU in a Real Neural Network (Using TensorFlow)

So far, you've implemented and visualized the ReLU activation function manually.

Now let’s see how it's used in practice — inside a real neural network, trained on real data!

To do this, we’ll use **TensorFlow**, one of the most popular open-source libraries for building and training machine learning models. 

Additionally, we will use a high-level API called **Keras**, which assists with defining and training neural networks.

#### Why TensorFlow?

- TensorFlow handles low-level math (like gradients and matrix ops) for us.
- TensorFlow makes it easy to define layers, activation functions, loss functions, and optimizers.

In the code below,  build a neural network and use our custom `relu()` function as the activation function in the hidden layer.

#### Dataset: MNIST

We’ll use the **MNIST dataset** for training our neural network. **MNIST** is a dataset that contains grayscale images of handwritten digits (0–9). Each image is 28×28 pixels and we will build a classifier that correctly predicts the digit in each image.

Let’s load the data and build the model!

In [None]:
# Load the dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Normalize the images to values between 0 and 1
X_train = X_train / 255.0
X_test = X_test / 255.0

# Flatten the 28x28 images to 784-dimensional vectors
X_train = X_train.reshape(-1, 28 * 28)
X_test = X_test.reshape(-1, 28 * 28)

# One-hot encode the labels
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

## The Neural Network Structure

We'll build a simple neural network with:
- **Input layer**: 784 features (28×28 pixels)
- **Hidden layer**: 128 neurons with ReLU activation
- **Output layer**: 10 neurons (one per digit) with softmax activation

## Task: Incorporate ReLU into Neural Network
Set the Dense hidden layer's activation function to your custom made ReLU function.

**Hint**: Do not use activation='relu' as this defaults to Kera's relu function. 

**Hint** If you get an error saying `TypeError: relu() missing 1 required positional argument: 'x'`, remove the `()` from the method call.


In [None]:
# Build the model
model = Sequential([
    Dense(128, activation=, input_shape=(784,)),  # Replace with your code
    Dense(10, activation='softmax') 
])

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

## What ReLU Really Does in a Neural Network

When you build a neural network and use `activation=relu`, you're telling the model:

"After the weighted sum of inputs is calculated at each neuron, apply the ReLU function to that result."


### Step-by-Step Computation

For a dense (fully connected) layer:

$Z = W \cdot X + b$

- `W`: weight matrix
- `X`: input vector
- `b`: bias vector
- `Z`: raw output (pre-activation)

Then ReLU is applied element-wise to `Z`:

$A = \text{ReLU}(Z) = \max(0, Z)$

So if the output of a neuron is **negative**, ReLU turns it into **zero**. If it's **positive**, it passes through unchanged.

### Why use ReLU?

ReLU is particularly useful because by zeroing out negative values, it also adds a form of **sparsity** — only some neurons activate for a given input. Additionally, ReLU is computationally **faster** (just a single `max()` call), compared to other activation functions which can include exponential values.

### In Practice

During training (see `.fit()` below), this ReLU logic is used during every forward pass:
- Input → Linear transformation → ReLU → Next layer
- ReLU is part of what's being "trained around" as weights are updated via backpropagation.

Your `relu()` function is plugged directly into this pipeline — so every neuron in your hidden layer applies your implementation to its computed value.


In [None]:
# Train the model
history = model.fit(X_train, y_train, epochs=5, batch_size=32, validation_split=0.1)

## Verify Your ReLU Activation Is Used

To ensure that your custom `relu` function was correctly passed into the model, you can inspect the model's layers.

Run the following commands in the cell below to view information about the model's layers:

```python
model.layers[0].activation.__name__


In [None]:
# Check the activation function of the first layer
model.layers[0].activation.__name__

In [None]:
# Evaluate on the test set
test_loss, test_accuracy = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {test_accuracy:.4f}")

## Great job!

You have successfully trained a neural network on MNIST data using the ReLU activation function!

## Recap

- ReLU activation helps neural networks learn non-linear functions efficiently.
- It "activates" only positive values and zeroes out negatives.
- It’s the default choice for hidden layers in modern deep learning.

# Solution to exercises

## Solution: Create ReLU function

In [None]:
def relu(x):
    result = tf.maximum(0.0, x)
    return result

## Solution: Plot ReLU function on a Graph.

In [None]:
def plot_relu():
    x = np.linspace(-10, 10, 100)
    y = relu(x)
    plt.plot(x, y)
    plt.title("ReLU Activation Function")
    plt.xlabel("x")
    plt.ylabel("ReLU(x)")
    plt.grid(True)
    plt.show()

plot_relu()

## Solution: Incorporate ReLU into Neural Network

In [None]:
model = Sequential([
    Dense(128, activation=relu, input_shape=(784,)),
    Dense(10, activation='softmax') 
])