# Module85 Activation function Assignment - 1

Q1. What is an activation function in the context of artificial neural networks?

A1. An activation function is a mathematical function applied to the output of a neuron in a neural network.


It introduces non-linearity to the network, enabling it to learn complex patterns and relationships in data.


Without activation functions, a neural network would behave like a simple linear regression model.


Q2. What are some common types of activation functions used in neural networks?

A2. Some commonly used activation functions include:

1. **Sigmoid** – S-shaped curve, used for binary classification.

2. **Tanh (Hyperbolic Tangent)** – Similar to Sigmoid but ranges from -1 to 1.

3. **ReLU (Rectified Linear Unit)** – Outputs 0 for negative inputs and x for positive inputs.

4. **Leaky ReLU** – Modified ReLU that allows a small slope for negative values to avoid dead neurons.

5. **Softmax** – Used for multi-class classification, outputs probability distribution.


Q3. How do activation functions affect the training process and performance of a neural network?

A3. Activation functions play a crucial role in:

1. **Learning complex patterns** – By introducing non-linearity, they help the network learn complex decision boundaries.

2. **Gradient propagation** – Poor choices (like Sigmoid in deep networks) can lead to vanishing gradient problems, slowing training.

3. **Training speed** – Functions like ReLU improve efficiency by reducing computational complexity.

4. **Model accuracy** – The right activation function improves network performance in classification and regression tasks.

Q4. How does the sigmoid activation function work? What are its advantages and disadvantages?

A4. The Sigmoid function is defined as:

```f(x) = 1 / (1 + e^-x)```

It maps input values to a range of (0,1), making it useful for binary classification problems.

✅ Advantages:

1. Converts input into probabilities, making it interpretable.

2. Smooth and differentiable.

❌ Disadvantages:

1. **Vanishing Gradient Problem** – Large negative or positive inputs lead to very small gradients, slowing training.

2. **Not zero-centered** – Outputs are always positive, which can slow convergence.

3. **Expensive computation** – Due to exponentiation.

Q5.What is the rectified linear unit (ReLU) activation function? How does it differ from the sigmoid function?

A5. The ReLU function is defined as:

```f(x) = max(0, x)```

It outputs 0 for negative values and the input itself for positive values.

## Differences from Sigmoid:

1. **No exponentiation** → ReLU is computationally efficient.

2. **Prevents vanishing gradient** → Unlike Sigmoid, large positive inputs maintain strong gradients.

3. **Non-saturating** → Helps deeper networks train faster.

4. **Dead Neurons Problem** → If many neurons receive negative inputs, they output zero and stop learning.



Q6. What are the benefits of using the ReLU activation function over the sigmoid function?

A6. Benefits of using the ReLU activation function over the sigmoid function are:-

1. **Avoids vanishing gradients** – Keeps gradients large for positive values, helping deeper networks learn efficiently.

2. **Computationally efficient** – Requires only a max operation, unlike sigmoid which involves exponentiation.

3. **Sparse activation** – Many neurons output zero, leading to a more efficient and sparse network representation.

4. **Faster convergence** – Leads to better gradient flow and faster optimization.

Q7. Explain the concept of "leaky ReLU" and how it addresses the vanishing gradient problem.

A7. Leaky ReLU modifies ReLU to allow a small, non-zero slope (αx) for negative inputs:

```
f(x) = x, x > 0
     = ax, x <= 0

```
where **a** is a small positive constatn(e.g., 0.01).

## How it helps?

1. Prevents neurons from becoming completely inactive (avoiding the dead neuron problem).

2. Retains some gradient flow even for negative inputs, unlike ReLU.

3. Helps in better gradient propagation in deeper networks.

Q8. What is the purpose of the softmax activation function? When is it commonly used?

A8. The Softmax function converts a vector of real numbers into probabilities that sum to 1:
```
σ(x) = e^xi / ∑(j=1 to N) e^xj

```
where, N = Total classes

It is commonly used in the output layer of multi-class classification problems, where each class is assigned a probability.

## Why use Softmax?

1. Converts logits into probabilities.

2. Ensures that the sum of all outputs is 1 (useful for classification).

3. Allows easy interpretation of class predictions.

## Where is it used?

Last layer of a neural network for multi-class classification (e.g., image classification).


Q9. What is the hyperbolic tangent (tanh) activation function? How does it compare to the sigmoid function?

A9. The tanh function is defined as:

```
f(x) = (e^x - e^-x) / (e^x + e^-x)
```

It maps input values to the range (-1,1).

## Comparison with Sigmoid:
```
Feature	             Sigmoid	             Tanh
----------------------------------------------------------------------------------------------
Range	                (0,1)	              (-1,1)
Zero-centered?	       No             	   Yes
Saturation issue?	     Yes	               Yes
Preferred for?	Binary classification	   Hidden layers in deep networks
```

## Why prefer Tanh over Sigmoid?

1. Zero-centered, leading to better weight updates.

2. Stronger gradient for negative values, reducing vanishing gradient risk.

## Disadvantage:

Still suffers from vanishing gradients in deep networks.

