# Multi Layer Perceptron or Neural Networks

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ChemAI-Lab/AI4Chem/blob/main/website/modules/03-neural_networks.ipynb)

**References:**
1. **Chapters 5**: [Pattern Recognition and Machine Learning](https://www.microsoft.com/en-us/research/wp-content/uploads/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf), C. M. Bishop.
2. **Chapter 2**: [Machine Learning in Quantum Sciences](https://arxiv.org/pdf/2204.04198)
3. **Chapter 16**: [Probabilistic Machine Learning: An Introduction, K. P. Murphy.](https://probml.github.io/pml-book/book1.html)

# Activation Functions

In the perceptron model, we assumed that $\phi(\mathbf{x})$ is a non linear differentiable transformation to generate the feature space representation. 
The jump to modern deep learning models is to assume $\phi(\mathbf{x})$ can be learned through some parameters, but for that we require a non-linear transformation, commonly known as **activation function**.
Typically, a differentiable nonlinear activation function is used in the hidden layers of a neural network. This allows the model to learn more complex functions than a network trained using a linear activation function.

## Sigmoid



Equation: $\sigma(x)=\frac{1}{1+e^{-x}}$\
Derivative: $\sigma'(x)=\sigma(x)(1-\sigma(x))$


In [None]:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-6, 6, 400)
def sigmoid(x):
    return 1.0 / (1.0 + np.exp(-x))

y = sigmoid(x)
dy = y * (1 - y)

fig, ax = plt.subplots(1, 2, figsize=(8, 3))
ax[0].plot(x, y, color='k')
ax[0].set_title('Sigmoid')
ax[0].grid(True, alpha=0.2)
ax[1].plot(x, dy, color='k')
ax[1].set_title('Derivative')
ax[1].grid(True, alpha=0.2)
plt.tight_layout()


## Tanh



Equation: $\tanh(x)=\frac{e^x-e^{-x}}{e^x+e^{-x}}$\
Derivative: $\frac{d}{dx}\tanh(x)=1-\tanh^2(x)$


In [None]:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-6, 6, 400)
y = np.tanh(x)
dy = 1 - y**2

fig, ax = plt.subplots(1, 2, figsize=(8, 3))
ax[0].plot(x, y, color='k')
ax[0].set_title('Tanh')
ax[0].grid(True, alpha=0.2)
ax[1].plot(x, dy, color='k')
ax[1].set_title('Derivative')
ax[1].grid(True, alpha=0.2)
plt.tight_layout()


## ReLU



Equation: $\mathrm{ReLU}(x)=\max(0, x)$\
Derivative: $\mathrm{ReLU}'(x)=\begin{cases}1,&x>0\\0,&x\le 0\end{cases}$


In [None]:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-6, 6, 400)
y = np.maximum(0, x)
dy = (x > 0).astype(float)

fig, ax = plt.subplots(1, 2, figsize=(8, 3))
ax[0].plot(x, y, color='k')
ax[0].set_title('ReLU')
ax[0].grid(True, alpha=0.2)
ax[1].plot(x, dy, color='k')
ax[1].set_title('Derivative')
ax[1].grid(True, alpha=0.2)
plt.tight_layout()


## Leaky ReLU



Equation: $\mathrm{LReLU}(x)=\max(\alpha x, x)$\
Derivative: $\mathrm{LReLU}'(x)=\begin{cases}1,&x>0\\\alpha,&x\le 0\end{cases}$


In [None]:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-6, 6, 400)
alpha = 0.01
y = np.where(x > 0, x, alpha * x)
dy = np.where(x > 0, 1.0, alpha)

fig, ax = plt.subplots(1, 2, figsize=(8, 3))
ax[0].plot(x, y, color='k')
ax[0].set_title('Leaky ReLU')
ax[0].grid(True, alpha=0.2)
ax[1].plot(x, dy, color='k')
ax[1].set_title('Derivative')
ax[1].grid(True, alpha=0.2)
plt.tight_layout()


## ELU



Equation: $\mathrm{ELU}(x)=\begin{cases}x,&x>0\\\alpha(e^x-1),&x\le 0\end{cases}$\
Derivative: $\mathrm{ELU}'(x)=\begin{cases}1,&x>0\\\alpha e^x,&x\le 0\end{cases}$


In [None]:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-6, 6, 400)
alpha = 1.0
y = np.where(x > 0, x, alpha * (np.exp(x) - 1))
dy = np.where(x > 0, 1.0, y + alpha)

fig, ax = plt.subplots(1, 2, figsize=(8, 3))
ax[0].plot(x, y, color='k')
ax[0].set_title('ELU')
ax[0].grid(True, alpha=0.2)
ax[1].plot(x, dy, color='k')
ax[1].set_title('Derivative')
ax[1].grid(True, alpha=0.2)
plt.tight_layout()


## Softplus



Equation: $\mathrm{Softplus}(x)=\ln(1+e^x)$\
Derivative: $\frac{d}{dx}\mathrm{Softplus}(x)=\sigma(x)$


In [None]:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-6, 6, 400)
y = np.log1p(np.exp(x))
dy = 1.0 / (1.0 + np.exp(-x))

fig, ax = plt.subplots(1, 2, figsize=(8, 3))
ax[0].plot(x, y, color='k')
ax[0].set_title('Softplus')
ax[0].grid(True, alpha=0.2)
ax[1].plot(x, dy, color='k')
ax[1].set_title('Derivative')
ax[1].grid(True, alpha=0.2)
plt.tight_layout()


## GELU (approx)



Equation: $\mathrm{GELU}(x)=0.5x\left(1+\tanh\left(\sqrt{\tfrac{2}{\pi}}(x+0.044715x^3)\right)\right)$\
Derivative: $\mathrm{GELU}'(x)=0.5(1+\tanh u)+0.5x(1-\tanh^2 u)u'$, $u=\sqrt{\tfrac{2}{\pi}}(x+0.044715x^3)$, $u'=\sqrt{\tfrac{2}{\pi}}(1+3\cdot0.044715x^2)$


In [None]:
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-6, 6, 400)
def gelu(x):
    return 0.5 * x * (1 + np.tanh(np.sqrt(2 / np.pi) * (x + 0.044715 * x**3)))

y = gelu(x)
# Numerical derivative for visualization
dy = np.gradient(y, x)

fig, ax = plt.subplots(1, 2, figsize=(8, 3))
ax[0].plot(x, y, color='k')
ax[0].set_title('GELU (approx)')
ax[0].grid(True, alpha=0.2)
ax[1].plot(x, dy, color='k')
ax[1].set_title('Derivative (numerical)')
ax[1].grid(True, alpha=0.2)
plt.tight_layout()
