Skip to content
This repository has been archived by the owner on Apr 4, 2024. It is now read-only.

Activation Functions

Benny Nottonson edited this page Jan 10, 2024 · 10 revisions

This documentation provides an overview of the implemented activation functions in the code. Activation functions play a crucial role in neural networks by introducing non-linearity to the model. They are applied to the output of each neuron, allowing the network to learn complex patterns.

Activation Function Structures

GenericActivation

This structure represents a generic activation function and includes both forward (fw) and backward (bw) methods.

  • fw: The forward method applies the activation function to the input tensor.
  • bw: The backward method calculates the gradients during backpropagation.

Relu

Rectified Linear Unit (ReLU) is a widely used activation function that returns zero for negative inputs and the input value for non-negative inputs.

Sigmoid

The Sigmoid activation function squashes input values between 0 and 1, making it suitable for binary classification problems.

Softmax

Softmax is often used in the output layer for multi-class classification problems. It converts input values into probability distributions.

Softplus

Softplus is a smooth approximation of ReLU and is defined as the logarithm of the exponentiation of the input plus one.

Softsign

Softsign is a smooth activation function that maps input values to the range (-1, 1).

Tanh

The Hyperbolic Tangent (Tanh) activation function squashes input values between -1 and 1.

Selu

Scaled Exponential Linear Unit (SELU) is a self-normalizing activation function designed to maintain a mean close to zero and a standard deviation close to one.

Elu

Exponential Linear Unit (ELU) is an activation function that smoothly handles negative inputs, preventing the dying ReLU problem.

Exp

The Exponential activation function simply returns the exponentiation of the input.

LeakyRelu

Leaky ReLU is a variant of ReLU that allows a small, non-zero gradient for negative inputs to prevent neurons from becoming inactive.

Relu6

ReLU6 is a modified ReLU that sets negative values to zero and clips positive values at 6.

Silu (Sigmoid-Weighted Linear Unit)

Silu, also known as Swish, is an activation function that combines the sigmoid and linear operations.

Gelu (Gaussian Error Linear Unit)

GELU is an activation function that aims to capture the Gaussian error linear unit.

HardSigmoid

Hard Sigmoid is a piecewise-linear approximation of the sigmoid function, often used for efficiency.

Linear

The Linear activation function simply returns the input without any non-linearity.

Mish

Mish is a novel activation function that has demonstrated improved performance in some scenarios.

LogSoftmax

LogSoftmax is the logarithm of the Softmax function and is often used in combination with negative log-likelihood loss for classification problems.

Activation Function Implementation

Each activation function is implemented using a generic vectorized approach, allowing efficient computation on SIMD architectures. The forward and backward methods are defined for each activation function, ensuring proper integration into neural network training frameworks.

Note: The provided code includes SIMD (Single Instruction, Multiple Data) operations for vectorized computation, enhancing the efficiency of activation function calculations on modern hardware architectures.