# The ReLU activation function
### Building your first neural network in Python

#### What is ReLU?

ReLU stands for Rectified Linear Unit. It is one of the most widely used activation functions in deep learning, especially in convolutional neural networks (CNNs) and fully connected layers.

Mathematically, ReLU is defined as:

$$
\text{ReLU}(x) = \max(0, x)
$$

This means:

$$
\text{If } ( x > 0 ), \text{ReLU returns } x.
$$
$$
\text{If } ( x \leq 0 ), \text{ReLU returns } 0.
$$

In simple terms:
ReLU "rectifies" negative values to zero and keeps positive values unchanged.



<div align="center">
  <img src="../assets/ReLU.png" alt="ReLU function" width="50%"/>
  <p style="text-align: center;"><em>Figure 1: ReLU activation function</em></p>
</div>

## Why Do We Need ReLU?

Before ReLU became popular, activation functions like sigmoid and tanh were commonly used, see Figure 2. Their purpose is to introduce a non-linearity into the neural network and they were easy to differentiate when the mathematics of neural networks was first developed plus other desired properties like being monotically increasing. However, these functions introduced two major problems:

**Vanishing Gradient Problem**

In deep neural networks, gradients from sigmoid or tanh can become very small during backpropagation, especially when activations are in their saturated regions where the derivative is small. This makes learning slow or even stops it completely.

**Expensive Computation**

Sigmoid and tanh involve exponential functions, making them computationally more expensive compared to simple operations like max(0, x) in ReLU.

<div align="center"> 
  <img src="../assets/Activation_logistic.svg" alt="Logistic activation function" width="40%" style="background-color: white;"/>
  <p style="text-align: center;"><em>Figure 2a: Sigmoid activation function</em></p>
</div>

<div align="center"> 
  <img src="../assets/Activation_tanh.svg" alt="Hyperbolic tangent activation function" width="40%" style="background-color: white;"/>
  <p style="text-align: center;"><em>Figure 2b: Hyperbolic tangent (tanh) activation function</em></p>
</div>

### How ReLU Solves These Problems?

**No Saturation for Positive Inputs**

ReLU does not saturate in the positive region, which helps maintain stronger gradients during training while still introducing a non-linearity into the network computations.

**Computational Efficiency**

ReLU only requires a simple thresholding at zero, allowing only values large than zero and zeroing out negative value. This is fast to compute with a clamp operation in [Pytorch](https://docs.pytorch.org/docs/stable/generated/torch.clamp.html). 

**Encourages Sparse Activations**

Since negative inputs become zero, many neurons stay inactive. This sparsity can improve generalization and efficiency.

<hr>

## Exercise: Implementing the ReLU function with Pytorch

Read the the Pytorch documentation for the [ReLU module](https://docs.pytorch.org/docs/stable/generated/torch.nn.ReLU.html) to familiarize with its definition. In this excercise we will aim to define our own ReLU module class with a similar behaviour than the Pytorch implementation.

Notice in the documentation the ReLU declaration only accepts one argument, `inplace=` which is set to `False` by default. When `inplace=True`, the ReLU modifies the tensor without creating a new tensor, this is only used when saving memory is necessary for the network architecture.  

We will implement the ReLU operation with the Pytorch function [**clamp** (Link)](https://docs.pytorch.org/docs/stable/generated/torch.clamp.html) available as an attribute method of the Tensor object. Clamp exists in two forms: The standard clamp that returns a new object (e.g. `newtensor=oldtensor.clamp`) and its inplace version `.clamp_`. See the documentation https://docs.pytorch.org/docs/stable/generated/torch.Tensor.clamp_.html

#### Complete the following code that declared a ReLU module class MyReLU:

`Important`: Do not delete the cell below neither change the name of MyReLU class, the notebook evaluator will search for the cell tag and the class name to assess your work.

In [None]:
import torch
import torch.nn as nn

class MyReLU(nn.Module):
    def __init__(self, inplace=False):
        super(MyReLU, self).__init__()
        #** Your code goes here **

    def forward(self, tinput: torch.Tensor):
        if self.inplace:
            #** Your code goes here** : Hint: tinput is a Tensor with functional attributes and forward() returns a tensor
            pass
        else:
            ##** Your code goes here** : Hint: tinput is a Tensor with functional attributes and forward() returns a tensor
            pass

#### Test your MyReLU class to show it behaves as expected

In [None]:
test_tensor = torch.arange(-10,10,1)
print("Original test tensor: \n", test_tensor)
relu = MyReLU()
print("Test tensor after ReLU operation: \n", relu(test_tensor))

All negative values in the tensor vector must be zero. Now experiment with random tensors as further check:

In [None]:
n=5
test_tensor = torch.randn((n,n))
print("Original test tensor: \n", test_tensor)

#Feel free to experiment by yourself with your own code

### Inplace operation of MyReLU

If you coded your MyReLU with `inplace` capability correctly, the code below should displayed two PASSED messages for each of the inplace and not-inplace tests. 

Feel free to copy-paste the code below into an LLM for an explanation about what it does and checks.

In [None]:
# Original tensor
x = torch.tensor([1.0, -2.0, 3.0, -4.0, 5.0])

print("Original Tensor:", x)
print("id(x):", id(x))
print("data_ptr(x):", x.data_ptr())
print("-" * 40)

#Re-instantiating a MyReLU object
relu = MyReLU()
relu_inplace = MyReLU(inplace=True)

# Non-in-place ReLU
y = relu(x)
print("After torch.relu(x) (NOT inplace):")
print("Tensor y:", y)
print("id(y):", id(y))
print("data_ptr(y):", y.data_ptr())
if id(x) != id(y) and x.data_ptr() != y.data_ptr():
    print("The non-implace instance return DIFFERENT tensors:", "PASSED")
else:
    print("In place operation FAILED.")
print("-" * 40)

# In-place ReLU
y = relu_inplace(x)
print("After x.relu_() (INPLACE):")
print("Tensor x:", y)
print("id(x):", id(y))
print("data_ptr(x):", y.data_ptr())
if id(x) == id(y) and x.data_ptr() == y.data_ptr():
    print("The non-implace instance return SAME tensors:", "PASSED")
else:
    print("In place operation FAILED.")
print("-" * 40)

**If you got two "PASSED" messages, you are ready to evaluate your work in this notebook. Proceed to run the Evaluation cell**

### Evaluation

Run the following cell to evaluate your work in this notebook. The code will run similar tests to those you ran above to thoroughly assess your work. You must get FOUR passes in this evaluation.

In [None]:
!pytest -v ../test_exercise/test_relu.py

<hr>

# Limitations of ReLU

ReLU is not perfect. A well-known issue is the "dying ReLU" problem, where neurons can get stuck outputting zero for all inputs if their weights lead to negative pre-activations. Variants like Leaky ReLU and Parametric ReLU (PReLU) were introduced to address this.

# Quick Summary

Definition: 
$$
\text{ReLU}(x) =
\begin{cases}
x & \text{if } x > 0 \\
0 & \text{if } x \leq 0
\end{cases}
$$


Benefits: Reduces vanishing gradients, faster computation, sparse activations

Drawback: Dying ReLU (inactive neurons)

### Solution

If you wish, you can have look at hidden the solution below. We recommend to first to attempt this exercise on your own first.

```
import torch
import torch.nn as nn

class MyReLU(nn.Module):
    def __init__(self, inplace=False):
        super(MyReLU, self).__init__()
        self.inplace = inplace
    
    def forward(self, tinput: torch.Tensor):
        if self.inplace:
            return tinput.clamp_(min=0)
            pass
        else:
            return tinput.clamp(min=0)
            pass
```