# Layers, Neurons and Activation Functions
The building blocks...

# 1. Layers VS Neurons in Artificial Neural Networks
> What's the difference?

A layer (each column of circles) is a collection of neurons that are organized together in a specific way. In an ANN, layers are stacked on top of each other to form a network. Each layer in the network performs a specific function, such as processing inputs, extracting features, or generating outputs. The number of layers in an ANN and the arrangement of neurons within each layer can vary depending on the specific architecture and task.

![img](https://editor.analyticsvidhya.com/uploads/25366Convolutional_Neural_Network_to_identify_the_image_of_a_bird.png)
</br>
Source: analyticsvidhya.com

On the other hand, a neuron (an individual circle), also known as a node or a perceptron, is an individual computational unit within a layer. Neurons receive input signals from the previous layer or external sources, perform computations on these inputs, and transmit the output to the next layer. Each neuron in a layer is connected to multiple neurons in the previous layer and the subsequent layer, forming a network of interconnected nodes.

# 2. Biological Neurons vs Artificial Neurons: How They Work

Biological neurons and artificial neurons have different mechanisms of operation. Biological neurons rely on electrical and chemical signals to transmit information, while artificial neurons perform mathematical computations on input signals using weights and activation functions. Understanding these differences is essential for developing effective neural network models.


## 2.1. Biological Neurons

Biological neurons are the fundamental units of the nervous system in living organisms. They work through a series of steps:

1. **Signal Reception**: `Dendrites`, the branching extensions of a neuron, receive signals from other neurons or sensory receptors.

2. **Signal Integration**: The received signals are combined in the `cell body` of the neuron. This integration process involves the summation of the incoming signals.

3. **Signal Transmission**: If the integrated signal surpasses a certain threshold, an electrical impulse called an action potential is generated. This action potential travels along the `axon`, a long fiber-like structure, towards the `synapses`.

4. **Signal Communication**: At the `synapses`, the electrical impulse is converted into a chemical signal. Neurotransmitters are released into the synapse, allowing the signal to be transmitted to the dendrites of the next neuron.

5. **Signal Processing**: The process repeats in the subsequent `neurons`, enabling the propagation of signals throughout the nervous system.

![](https://www.researchgate.net/profile/Zhenzhu-Meng/publication/339446790/figure/fig2/AS:862019817320450@1582532948784/A-biological-neuron-in-comparison-to-an-artificial-neural-network-a-human-neuron-b.png)
</br>
A biological neuron in comparison to an artificial neural network: 
(a) human neuron; (b) artificial neuron; (c) biological synapse; and (d) ANN synapses.

[Source](https://www.researchgate.net/figure/A-biological-neuron-in-comparison-to-an-artificial-neural-network-a-human-neuron-b_fig2_339446790)


## 2.2. Artificial Neurons

Artificial neurons, also known as nodes or perceptrons, are mathematical models designed to mimic the behavior of biological neurons. They operate in a different manner:

1. **Signal Reception**: Artificial `neurons` receive input signals, typically represented as numerical values, from the previous layer or external sources.

2. **Signal Weighting**: Each input signal is multiplied by a corresponding `weight`. These weights determine the importance or contribution of each input to the overall computation.

3. **Signal Summation**: The weighted input signals are summed up, usually using a linear combination.

4. **Activation Function**: The summed signal is passed through an `activation function`, which introduces non-linearity into the computation. The activation function determines the output of the artificial neuron based on the input signal.

5. **Signal Transmission**: The output of the artificial neuron is then transmitted to the next layer or used as the final output of the neural network.

6. **Learning**: `Neurons` in a neural network learn by adjusting the `weights` and `biases` associated with their connections. This process, known as `backpropagation`, allows the network to optimize its performance and make accurate predictions.

# 3. Neurons: The building blocks

Neurons in ANNs are similar to their biological counterparts and have three main components:

1. **Input**: Neurons receive input signals from the previous layer or external sources. These inputs are multiplied by corresponding weights and summed up.

2. **Activation Function**: The weighted sum of inputs is passed through an activation function, which introduces non-linearity into the network. Activation functions determine the output of a neuron based on the input signal.

3. **Output**: The output of a neuron is the result of the activation function applied to the weighted sum of inputs. This output is then transmitted to the next layer of neurons.


![](https://machinelearningknowledge.ai/wp-content/uploads/2019/06/Artificial-Neuron-Working.gif)
</br>
Source: machinelearningknowledge.ai

## 3.1. Neuron Weights and Bias

In artificial neural networks (ANNs), each neuron receives input signals from the previous layer or external sources. These input signals are multiplied by corresponding weights and then summed up. Additionally, a bias term is added to the weighted sum before passing it through an activation function.

The mathematical formula for a neuron can be represented as follows:

$$
z = \sum_{i=1}^{n} (w_i \cdot x_i) + b
$$

$$
out = \text{{activation\_function}}(z)
$$

where:
- $z$ is the weighted sum of the inputs plus the bias term,
- $w_i$ represents the weights assigned to each input $x_i$,
- $b$ is the bias term,
- $out$ is the output of the neuron after passing through the activation function.

### *3.1.1. What are they*

- **Weights**: The weights $w_i$ assigned to each input $x_i$ signal determine the importance or contribution of that input to the overall computation of the neuron. Larger weights indicate a stronger influence of the corresponding input on the neuron's output. By adjusting the weights during training, the network learns to assign `higher weights to more important features and lower weights to less important ones`'.

- **Bias**: The bias term $b$ allows the neuron to adjust its output independently of the input signals. It acts as an offset or `threshold`, determining the neuron's activation level. `A positive bias shifts the activation function to the right, while a negative bias shifts it to the left.` By adjusting the bias, the network can control the overall output of the neuron.

Understanding and appropriately setting the weights and bias of neurons is crucial for the performance and effectiveness of an artificial neural network. Improper initialization or incorrect adjustment during training can lead to suboptimal results or even convergence issues.

### *3.1.2. How to calculate them*

The process of calculating neuron weights and bias involves two main steps:

1. **Initialization**: Initially, the weights $w_i$ and bias $b$ of a neuron are `randomly assigned or initialized with small values`. This random initialization helps in breaking symmetry and allows the network to learn different features.

2. **Training**: During the training process, the network adjusts the weights $w_i$ and bias $b$ to `minimize the difference between predicted outputs and actual outputs`. This adjustment is done using optimization algorithms, such as gradient descent, which iteratively update the weights $w_i$ and bias $b$ based on the error between predicted and actual outputs.




## 3.2. Activation Functions

The activation function of a neuron determines its output based on the weighted sum of inputs. It introduces non-linearity into the neural network, enabling it to learn complex patterns and make predictions.

Common activation functions include:

|       |       |       |
|-------|-------|-------|
| Step  | Sigmoid | Tanh |
| ![Step](https://machinelearningknowledge.ai/wp-content/uploads/2019/08/Step-Activation_Function.gif) | ![Sigmoid](https://machinelearningknowledge.ai/wp-content/uploads/2019/08/Sigmoid-Activation-Function.gif) | ![Tanh](https://machinelearningknowledge.ai/wp-content/uploads/2019/08/Tanh-Activation-Function.gif) |
| ReLU  | LReLU | Softmax |
| ![ReLU](https://machinelearningknowledge.ai/wp-content/uploads/2019/08/Relu-Activation-Function.gif) | ![LReLU](https://machinelearningknowledge.ai/wp-content/uploads/2019/08/Leaky-Relu-Function.gif) | ![Softmax](https://machinelearningknowledge.ai/wp-content/uploads/2019/08/Softmax-Activation-Function.gif) |

Source: machinelearningknowledge.ai


In [None]:
import numpy as np

### *3.2.1. Step Activation Function*

> Simple activation function that outputs a binary value based on a threshold. It returns 0 if the input is less than the threshold, and 1 otherwise. The step function is often used in binary classification problems.

The mathematical function can be represented as follows:

$$
\text{{step}}(x) = \begin{cases}
1, & \text{{if }} x \geq 0 \\
0, & \text{{otherwise}}
\end{cases}
$$

In [None]:
def step(x):
    """
    Step function:
    - Returns 1 if x >= 0, 0 otherwise.
    """
    return np.where(x >= 0, 1, 0)


### *3.2.2. Sigmoid Activation Function*

> Often used in the output layer for binary classification problems. It squashes the output between 0 and 1, representing the probability of the positive class. Might lead to vanishing gradient problem if used in hidden layers.

The sigmoid mathematical function can be represented as follows:

$$
\text{{sigmoid}}(x) = \frac{1}{{1 + e^{-x}}}
$$


In [3]:
def sigmoid(x):
    """
    Sigmoid function:
    - Returns the sigmoid activation of x.
    """
    return 1 / (1 + np.exp(-x))

### *3.2.3. Tanh (Hyperbolic Tangent) Activation Function*

> Squashes the input values between -1 and 1. It is symmetric around the origin and is useful for capturing non-linear relationships in the data. Might lead to vanishing gradient problem if used in hidden layers.

The mathematical representation of the hyperbolic tangent (tanh) function is:

$$
\text{tanh}(x) = \frac{{e^x - e^{-x}}}{{e^x + e^{-x}}}
$$


In [None]:
def tanh(x):
    """
    Compute the hyperbolic tangent of the input.

    Parameters:
    - x (float or numpy.ndarray): Input value(s) for which to compute the hyperbolic tangent.

    Returns:
    - float or numpy.ndarray: Hyperbolic tangent of the input value(s).

    Notes:
    - The hyperbolic tangent function is defined as (e^x - e^-x) / (e^x + e^-x).
    - The function is an element-wise operation when applied to a numpy array.

    """
    return (np.exp(x) - np.exp(-x)) / (np.exp(x) + np.exp(-x))

### *3.2.4. ReLU (Rectified Linear Unit) Activation Function*

> Sets all negative values to zero and keeps positive values unchanged. Can suffer from dying neurons problem "dying ReLU".

The mathematical representation of the ReLU (Rectified Linear Unit) activation function is:

$$
\text{{ReLU}}(x) = \max(0, x)
$$

In [4]:
def relu(x):
    """
    Rectified Linear Unit (ReLU) function:
    - Returns max(0, x).
    """
    return np.maximum(0, x)

### *3.2.5. LReLU (Leaky ReLU) Activation Function*

> A variation of the ReLU activation function that allows small negative values instead of setting them to zero. This helps to mitigate the "dying ReLU" problem, where neurons can become inactive and stop learning.

The mathematical representation of Leaky ReLU (LReLU) is:

$$
\text{LReLU}(x) = \begin{cases}
x, & \text{if } x \geq 0 \\
\alpha x, & \text{otherwise}
\end{cases}
$$

where $\alpha$ is a small positive constant that determines the slope of the function for negative values of $x$.

In [None]:
def lrelu(x, alpha=0.01):
    """
    Leaky ReLU function:
    - Returns max(alpha * x, x).
    """
    return np.where(x >= 0, x, alpha * x)


### *3.2.6. Softmax Activation Function*

> Commonly used in the output layer for multi-class classification problems. It normalizes the outputs into a probability distribution over the classes.

The mathematical representation of the softmax activation function is:

$$
\text{{softmax}}(x_i) = \frac{{e^{x_i}}}{{\sum_{j=1}^{n} e^{x_j}}}
$$

where $x_i$ represents the input value of the $i\th$ element, and $n$ is the total number of elements in the input vector.

In [None]:
def softmax(x):
    """
    Softmax function:
    Calculates the softmax activation for each row of the input array x.
    
    Parameters:
    - x (numpy.ndarray): Input array of shape (N, C), where N is the number of samples and C is the number of classes.
    
    Returns:
    - numpy.ndarray: Array of shape (N, C) containing the softmax activations for each sample.
    
    Notes:
    - The softmax activation is calculated as follows:
        - Subtract the maximum value of each row of x from x to prevent numerical instability.
        - Compute the exponential of each element in the resulting array.
        - Divide each element in the resulting array by the sum of all elements in the same row.
    """
    exps = np.exp(x - np.max(x, axis=1, keepdims=True))
    return exps / np.sum(exps, axis=1, keepdims=True)