<a href="https://colab.research.google.com/github/Sameer-30/Neural-Network-From-Scratch/blob/main/Activation_Functions_in_Neural_Networks0.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Activation Functions in Neural Network

In this notebook, we explore the importance of activation functions in deep learning, their mathematical foundations, and code implementations. We'll cover:

1. **Introduction to Activation Functions**
2. **Types of Activation Functions**
    - Step Function
    - Linear Function
    - Sigmoid Function
    - Rectified Linear Unit (ReLU)
    - Softmax Function
3. **Implementing Activation Functions in Python**
4. **Building a Simple Neural Network Example**




## 1. Introduction

Activation functions are essential components in neural networks. They introduce non-linearity, allowing the network to learn complex patterns. Without them, even a deep network would behave like a single linear model.

In our deep learning models:
- **Hidden Layers** typically use non-linear activations (like ReLU or Sigmoid) to capture intricate patterns.
- **Output Layers** use activations tailored to the task (Softmax for classification, Linear for regression).


## 2. Types of Activation Functions

### 2.1 Step Activation Function

The step function mimics a neuron "firing" or "not firing":
$$f(x) = \begin{cases}
1 & \text{if } x > 0 \\
0 & \text{otherwise}
\end{cases}
$$

 In a single neuron, if
the weights · inputs + bias​ results in a value greater than 0, the neuron will fire and output a 1;
otherwise, it will output a 0.


*Note:* It provides no gradient information, which is why it is rarely used in modern deep learning.


### 2.2 Linear Activation Function

The linear function is defined as:
$$
f(x) = x
$$
It is often used in regression problems. However, stacking multiple linear layers still yields a linear function.


### 2.3 Sigmoid Activation Function

The Sigmoid function is given by:
$$
\sigma(x) = \frac{1}{1 + e^{-x}}
$$
It outputs values between 0 and 1, making it suitable for binary classification. However, it can suffer from vanishing gradients.


### 2.4 Rectified Linear Unit (ReLU)

ReLU is defined as:
$$
f(x) = \max(0, x)
$$
ReLU is widely used because it is computationally efficient and helps mitigate the vanishing gradient problem.

#### Code Examples for ReLU:


In [None]:
# ReLU using a loop
inputs = [0, 2, -1, 3.3, -2.7, 1.1, 2.2, -100]
output = []
for i in inputs:
    if i > 0:
        output.append(i)
    else:
        output.append(0)
print("ReLU output (loop):", output)


In [None]:
# ReLU using list comprehension
output = [max(0, i) for i in inputs]
print("ReLU output (comprehension):", output)


In [None]:
# ReLU using NumPy
import numpy as np
output = np.maximum(0, inputs)
print("ReLU output (NumPy):", output)


### 2.5. Softmax Activation Function

Softmax converts logits into probabilities:
$$
\text{softmax}(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}
$$

For numerical stability, we subtract the maximum value:
$$
\text{softmax}(z_i) = \frac{e^{z_i - \max(z)}}{\sum_j e^{z_j - \max(z)}}
$$


In [None]:
# Softmax example for a single sample
layer_outputs = [4.8, 1.21, 2.385]
exp_values = np.exp(layer_outputs - np.max(layer_outputs))
norm_values = exp_values / np.sum(exp_values)

print("Exponentiated values:", exp_values)
print("Softmax output:", norm_values)


In [None]:
# Softmax example for a batch of data
inputs = np.array([[4.8, 1.21, 2.385],
                   [8.9, -1.81, 0.2],
                   [1.41, 1.051, 0.026]])

exp_values = np.exp(inputs - np.max(inputs, axis=1, keepdims=True))
probabilities = exp_values / np.sum(exp_values, axis=1, keepdims=True)
print("Batch Softmax output:\n", probabilities)


## 3. Implementing Activation Functions in a Neural Network

Let's build a simple neural network with:
- A dense (fully-connected) layer
- ReLU activation in the hidden layer
- Softmax activation in the output layer

### 3.1 Dense Layer
The dense layer computes:
$$
\text{Output} = XW + b
$$


In [None]:
# Dense Layer class
class Layer_Dense:
    def __init__(self, n_inputs, n_neurons):
        # Initialize weights with small random values and biases with zeros
        self.weights = 0.01 * np.random.randn(n_inputs, n_neurons)
        self.biases = np.zeros((1, n_neurons))

    def forward(self, inputs):
        # Compute the linear combination of inputs, weights, and biases
        self.output = np.dot(inputs, self.weights) + self.biases


### 3.2 ReLU Activation Layer


In [None]:
# ReLU Activation class
class Activation_ReLU:
    def forward(self, inputs):
        # Apply the ReLU function element-wise
        self.output = np.maximum(0, inputs)


### 3.3 Softmax Activation Layer


In [None]:
# Softmax Activation class
class Activation_Softmax:
    def forward(self, inputs):
        # Subtract the maximum value for numerical stability
        exp_values = np.exp(inputs - np.max(inputs, axis=1, keepdims=True))
        # Normalize to get probabilities
        self.output = exp_values / np.sum(exp_values, axis=1, keepdims=True)


## 4. Building and Running a Simple Neural Network

Below is the complete code that creates a simple neural network, passes data through its layers, and prints the output probabilities. This demonstrates how the network uses activation functions to produce predictions.


In [None]:
!pip install nnfs
from nnfs.datasets import spiral_data # Import the spiral_data function
# Generate the spiral dataset
X, y = spiral_data(samples=100, classes=3)

# Create first dense layer (input layer)
dense1 = Layer_Dense(2, 3)
activation1 = Activation_ReLU()

# Create second dense layer (output layer)
dense2 = Layer_Dense(3, 3)
activation2 = Activation_Softmax()

# Forward pass through the network
dense1.forward(X)
activation1.forward(dense1.output)
dense2.forward(activation1.output)
activation2.forward(dense2.output)

# Display the first 5 output probabilities
print("Network output (first 5 samples):\n", activation2.output[:5])
