-
-
Notifications
You must be signed in to change notification settings - Fork 3
Activation Functions
This documentation provides an overview of the implemented activation functions in the code. Activation functions play a crucial role in neural networks by introducing non-linearity to the model. They are applied to the output of each neuron, allowing the network to learn complex patterns.
This structure represents a generic activation function and includes both forward (fw
) and backward (bw
) methods.
-
fw
: The forward method applies the activation function to the input tensor. -
bw
: The backward method calculates the gradients during backpropagation.
Rectified Linear Unit (ReLU) is a widely used activation function that returns zero for negative inputs and the input value for non-negative inputs.
The Sigmoid activation function squashes input values between 0 and 1, making it suitable for binary classification problems.
Softmax is often used in the output layer for multi-class classification problems. It converts input values into probability distributions.
Softplus is a smooth approximation of ReLU and is defined as the logarithm of the exponentiation of the input plus one.
Softsign is a smooth activation function that maps input values to the range (-1, 1).
The Hyperbolic Tangent (Tanh) activation function squashes input values between -1 and 1.
Scaled Exponential Linear Unit (SELU) is a self-normalizing activation function designed to maintain a mean close to zero and a standard deviation close to one.
Exponential Linear Unit (ELU) is an activation function that smoothly handles negative inputs, preventing the dying ReLU problem.
The Exponential activation function simply returns the exponentiation of the input.
Leaky ReLU is a variant of ReLU that allows a small, non-zero gradient for negative inputs to prevent neurons from becoming inactive.
ReLU6 is a modified ReLU that sets negative values to zero and clips positive values at 6.
Silu, also known as Swish, is an activation function that combines the sigmoid and linear operations.
GELU is an activation function that aims to capture the Gaussian error linear unit.
Hard Sigmoid is a piecewise-linear approximation of the sigmoid function, often used for efficiency.
The Linear activation function simply returns the input without any non-linearity.
Mish is a novel activation function that has demonstrated improved performance in some scenarios.
LogSoftmax is the logarithm of the Softmax function and is often used in combination with negative log-likelihood loss for classification problems.
Each activation function is implemented using a generic vectorized approach, allowing efficient computation on SIMD architectures. The forward and backward methods are defined for each activation function, ensuring proper integration into neural network training frameworks.
Note: The provided code includes SIMD (Single Instruction, Multiple Data) operations for vectorized computation, enhancing the efficiency of activation function calculations on modern hardware architectures.