## __Activation Functions in Pytorch__
Date : 28, Sep, 2024.

An ideal activation function should handle non-linear relationships by using the linear concepts and it should be differentiable so as to reduce the errors and adjust the weights accordingly. All activation functions are present in the torch.nn library.

`Types of Pytorch Activation Function`

* ReLU Activation Function
* Leaky ReLU Activation Function
* Sigmoid Activation Function
* Tanh Activation Function
* Softmax Activation Function

1. `ReLU Activation Function:`

If the inputs are negative its derivative becomes zero which causes the ‘dying’ of neurons and learning doesn’t take place. Let us illustrate the use of ReLU with the help of the Python program.

In [5]:
import torch
import torch.nn as nn

# defining relu
r = nn.ReLU()

# Creating tensor with an array
input = torch.Tensor([1, -2, 3, -5])

output = r(input)
print(output)

tensor([1., 0., 3., 0.])


2. `Leaky ReLU Activation Function:`

It is similar to ReLU but solves the problem of ‘dying’ neurons.

In [6]:
import torch
import torch.nn as nn 

r = nn.LeakyReLU(0.2)  # parameter 0.2 is passed to control the negative slope; a = 0.2

input = torch.Tensor([[1, -2, 3, -5]])

output = r(input)

print(output)

tensor([[ 1.0000, -0.4000,  3.0000, -1.0000]])


3. `Sigmoid Activation Function:`

Sigmoid activation function has a problem of “Vanishing Gradient”.  Vanishing Gradient is a significant problem as a large number of inputs are fed to the neural network and the number of hidden layers increases, the gradient or derivative becomes close to zero thus leading to inaccuracy in the neural network.

In [7]:
import torch
import torch.nn as nn

sig = nn.Sigmoid()
input = torch.Tensor([1, -2, 3, -5])
output = sig(input)

print(output)

tensor([0.7311, 0.1192, 0.9526, 0.0067])


4. `Tanh Activation Function:`

The problem with the Tanh Activation function is it is slow and the vanishing gradient problem persists.

In [8]:
t = nn.Tanh()
print(t(torch.Tensor([1, -2, 3, -5])))

tensor([ 0.7616, -0.9640,  0.9951, -0.9999])


4. `Softmax Activation Function:`


The softmax function is different from other activation functions as it is placed at the last to normalize the output. We can use other activation functions in combination with Softmax to produce the output in probabilistic form. It is used in multiclass classification and generates an output of probabilities whose sum is 1. The range of output lies between 0 and 1. Softmax has the following transformative behavior:

In [9]:
sm = nn.Softmax(dim=0)
print(sm(torch.Tensor([1, -2, 3, -5])))


tensor([1.1846e-01, 5.8980e-03, 8.7534e-01, 2.9365e-04])
