# Relu and SoftMax

## 1. Theory Introduction to Softmax and ReLU

### Softmax
The Softmax function is an activation function that can be thought of as an extension of the logistic function to multiple dimensions. It is typically used in the output layer of a neural network for multi-class classification problems. The Softmax function takes a vector of arbitrary real-valued scores and squashes it to a vector of values between zero and one that sum up to one. It's represented as:

$$
\sigma(\mathbf{z})_j = \frac{e^{z_j}}{\sum_{k=1}^{K} e^{z_k}}
$$

Where:
- \( \mathbf{z} \) is the input vector to the softmax function.
- \( K \) is the number of classes.
- \( j \) is the element of the output vector.

### ReLU (Rectified Linear Activation)
ReLU stands for Rectified Linear Activation, which is a type of activation function that is widely used in convolutional neural networks and deep learning models. The mathematical expression for ReLU is:

$$
f(x) = max(0, x)
$$

The function returns \( x \) if \( x \) is greater than or equal to zero, and returns zero otherwise. The main advantage of using the ReLU function over other activation functions is that it does not activate all the neurons at the same time. If you feed the output from a neuron through the ReLU function and the output is below zero, it will convert it to zero. Neurons with an output of zero are considered inactive.


In [None]:
# Importing required libraries
import numpy as np
import tensorflow as tf
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split

## 2. Dataset

In [None]:
# Create a dataset with 3 classes
X, y = make_blobs(n_samples=1000, centers=3, random_state=42)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

## 3. Model coded in Python


In [None]:
# Creating a neural network model with ReLU in the hidden layer
model_relu = tf.keras.Sequential([
    tf.keras.layers.Dense(10, input_dim=2, activation='relu'),
    tf.keras.layers.Dense(3, activation='softmax')  
])

model_relu.compile(optimizer='adam', 
              loss='sparse_categorical_crossentropy', 
              metrics=['accuracy'])

history_relu = model_relu.fit(X_train, y_train, epochs=50, batch_size=10, verbose=0)

## 4. Comparison of ReLU and Softmax

### ReLU:
ReLU, or Rectified Linear Activation, is an element-wise function defined as \( f(x) = max(0, x) \). The function returns \( x \) if it's positive, otherwise it returns zero. It is applied element-wise, meaning that it processes each pixel from the input data separately. This activation function has been found to greatly accelerate the convergence of stochastic gradient descent as compared to sigmoid/tanh functions. It's widely used in many deep neural networks.

### Softmax:
Softmax converts a vector of numbers into a probability distribution, making it suitable for multi-class classification problems in the output layer. It will highlight the largest values and suppress values which are significantly below the maximum value. This helps in producing a clearer, bolder decision boundary in multi-class problems.

To clarify, ReLU is commonly used in hidden layers to introduce non-linearity and to avoid the vanishing gradient problem. On the other hand, softmax is used in the output layer of multi-class classification problems. It's not a matter of choosing between them for the same layer, but rather knowing where each of them fits best in a neural network architecture.
