# Activation Functions
---
- Author: Diego Inácio
- GitHub: [github.com/diegoinacio](https://github.com/diegoinacio)
- Notebook: [NN_activation_functions.ipynb](https://github.com/diegoinacio/machine-learning-notebooks/blob/master/Machine-Learning-Fundamentals/NN_activation_functions.ipynb)
---
Brief overview about some of the main *activation functions* applicable to *Neural Networks* and *Deep Learning systems*.

<font color="#CC0000">[<b>PT-BR</b> content]</font>

Em *redes neurais*, a função de ativação $\large \varphi$ é a parte do neurônio que promove a não-linearidade na propagação positiva entre os sinais de entrada e a saída. Para a minimização do erro e atualização dos pesos no processo do gradiente descendente e retropropagação, é necessário calcular a derivada parcial da respectiva função, representada por $\large \varphi'$. Seguem algumas das funções de ativação mais utilizadas..

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

In [None]:
plt.rcParams['figure.figsize'] = (16, 4)
def displayPlot(xlim=[-10, 10], ylim=[-0.1, 1.1], ncol=2):
    legend = plt.legend(loc=1, ncol=ncol, framealpha=0, bbox_to_anchor=(1, -0.1))
    plt.setp(legend.get_texts(), color='0.75', size=12)
    plt.grid(True, alpha=0.25)
    plt.xlim(xlim); plt.ylim(ylim)

In [None]:
np.seterr(all = 'ignore')
v = np.linspace(-10, 10, 1000)

## Identidade
---
Função biunívoca que retorna o mesmo valor utilizado em sua entrada, de tal forma que $f: x \mapsto x$. A função identidade é definida por:

$$
\large \varphi(\upsilon)=\upsilon
\hspace{2cm}
\therefore
\hspace{2cm}
\varphi'(\upsilon)=1
$$

In [None]:
phi = v
phi_prime = v*0 + 1
plt.plot(v, phi, label='função')
plt.plot(v, phi_prime, '--', label='derivada')
displayPlot(ylim = [-8, 8])

Sua forma com inclinação $\beta$ é definida por: $\large \varphi(\upsilon)=\alpha \upsilon$

In [None]:
B = np.linspace(1, 9, 5)
for b in B:
    phi = v*b
    plt.plot(v, phi, label = 'β = {0:.2f}'.format(b))
displayPlot(ncol=len(B), ylim=[-8, 8])

## Heaviside
---

$$
\large \varphi(\upsilon)=
\begin{cases}
1 ,\upsilon < 0 \\
0 ,\upsilon \geq 0
\end{cases}
\hspace{2cm}
\therefore
\hspace{2cm}
\varphi'(\upsilon)=
\begin{cases}
0, \upsilon \neq 0 \\
?, \upsilon = 0
\end{cases}
$$

In [None]:
phi = v >= 0
phi_prime = ~(v != 0)
plt.plot(v, phi, label='função')
plt.plot(v, phi_prime, '--', label='derivada')
displayPlot()

## Logística
---

$$
\large \varphi(\upsilon)=\frac{1}{1 + e^{-\upsilon}}
\hspace{2cm}
\therefore
\hspace{2cm}
\varphi'(\upsilon)=\frac{e^{-\upsilon}}{(1+e^{-\upsilon})^2}=\varphi(\upsilon)(1-\varphi(\upsilon))
$$

In [None]:
phi = 1/(1 + np.exp(-v))
phi_prime = phi*(1 - phi)
plt.plot(v, phi, label='função')
plt.plot(v, phi_prime, '--', label='derivada')
displayPlot()

Sua forma com inclinação $\beta$ é definida por: $\large \varphi(\upsilon)=\frac{1}{1+e^{-\beta \upsilon}}$

In [None]:
B = np.linspace(0, 2, 5)
for b in B:
    phi = 1/(1 + np.exp(-b*v))
    plt.plot(v, phi, label='β = {0:.2f}'.format(b))
displayPlot(ncol=len(B))

## Tangente hiperbólica
---
$$
\large \varphi(\upsilon)=tanh(\upsilon)=\frac{2}{1 + e^{-2\upsilon}}-1
\hspace{2cm}
\therefore
\hspace{2cm}
\varphi'(\upsilon)=\frac{4e^{-2\upsilon}}{(1+e^{-2\upsilon})^2}=1-\varphi(\upsilon)^2
$$

In [None]:
phi = 2/(1 + np.exp(-2*v)) - 1
phi_prime = 1 - phi**2
plt.plot(v, phi, label='função')
plt.plot(v, phi_prime, '--', label='derivada')
displayPlot(ylim=[-1.1, 1.1])

Sua forma com inclinação $\beta$ é definida por: $\large \varphi(\upsilon)=\frac{2}{1+e^{-2\beta \upsilon}}-1$

In [None]:
B = np.linspace(0, 2, 5)
for b in B:
    phi = 2/(1 + np.exp(-2*b*v)) - 1
    plt.plot(v, phi, label='β = {0:.2f}'.format(b))
displayPlot(ncol=len(B), ylim=[-1.1, 1.1])

## Arco tangente
---
$$
\large \varphi(\upsilon)=tan^{-1}(\upsilon)
\hspace{2cm}
\therefore
\hspace{2cm}
\varphi'(\upsilon)=\frac{1}{1+\upsilon^2}
$$

In [None]:
phi = np.arctan(v)
phi_prime = 1/(v**2 + 1)
plt.plot(v, phi, label='função')
plt.plot(v, phi_prime, '--', label='derivada')
displayPlot(ylim=[-1.6, 1.6])

Sua forma com inclinação $\beta$ é definida por: $\large \varphi(\upsilon)=tan^{-1}(\beta\upsilon)$

In [None]:
B = np.linspace(0, 2, 5)
for b in B:
    phi = np.arctan(b*v)
    plt.plot(v, phi, label='β = {0:.2f}'.format(b))
displayPlot(ncol=len(B), ylim=[-1.6, 1.6])

## SoftSign
---
$$
\large \varphi(\upsilon)=\frac{\upsilon}{1 + \mid\upsilon\mid}
\hspace{2cm}
\therefore
\hspace{2cm}
\varphi(\upsilon)=\frac{1}{(1 + \mid\upsilon\mid)^2}
$$

In [None]:
phi = v/(1 + np.abs(v))
phi_prime = 1/(1 + np.abs(v))**2
plt.plot(v, phi, label='função')
plt.plot(v, phi_prime, '--', label='derivada')
displayPlot(ylim=[-1.1, 1.1])

Sua forma com inclinação $\beta$ é definida por: $\large \varphi(\upsilon)=\frac{\beta\upsilon}{1+\mid \beta\upsilon \mid}$

In [None]:
B = np.linspace(0, 2, 5)
for b in B:
    phi = b*v/(1 + np.abs(b*v))
    plt.plot(v, phi, label='β = {0:.2f}'.format(b))
displayPlot(ncol=len(B), ylim=[-1.1, 1.1])

## ReLU
---

$$
\large \varphi(\upsilon)=
\begin{cases}
0 ,\upsilon < 0 \\
\upsilon ,\upsilon \geq 0
\end{cases}
\hspace{2cm}
\therefore
\hspace{2cm}
\varphi'(\upsilon)=
\begin{cases}
0, \upsilon < 0 \\
1, \upsilon \geq 0
\end{cases}
$$

In [None]:
phi = v*(v >= 0)
phi_prime = (v >= 0)
plt.plot(v, phi, label='função')
plt.plot(v, phi_prime, '--', label='derivada')
displayPlot(xlim=[-2, 2], ylim=[-0.1, 1.1])

Sua forma com inclinação $\beta$ é definida por: $\large \varphi(\upsilon)=\begin{cases}0 ,\upsilon < 0 \\\beta \upsilon ,\upsilon \geq 0\end{cases}$

In [None]:
B = np.linspace(0, 2, 5)
for b in B:
    phi = b*v*(v >= 0)
    plt.plot(v, phi, label='β = {0:.2f}'.format(b))
displayPlot(ncol=len(B), xlim=[-2, 2], ylim=[-0.1, 1.1])

## PReLU
---

$$
\large \varphi(\alpha, \upsilon)=
\begin{cases}
\alpha \upsilon ,\upsilon < 0 \\
\upsilon ,\upsilon \geq 0
\end{cases}
\hspace{2cm}
\therefore
\hspace{2cm}
\varphi'(\alpha, \upsilon)=
\begin{cases}
\alpha, \upsilon < 0 \\
1, \upsilon \geq 0
\end{cases}
$$

In [None]:
a = 0.2
phi = np.where(v >= 0, v, a*v)
phi_prime = np.where(v >= 0, 1, a)
plt.plot(v, phi, label='função (α = {0})'.format(a))
plt.plot(v, phi_prime, '--', label='derivada')
displayPlot(xlim=[-2, 2], ylim=[-0.6, 1.1])

Variando $\alpha$ temos:

In [None]:
A = np.linspace(0, 2, 5)
for a in A:
    phi = np.where(v >= 0, v, a*v)
    plt.plot(v, phi, label='α = {0:.2f}'.format(a))
displayPlot(ncol=len(A), xlim=[-2, 2], ylim=[-1.1, 1.1])

## ELU
---

$$
\large \varphi(\alpha, \upsilon)=
\begin{cases}
\alpha(e^\upsilon-1) ,\upsilon < 0 \\
\upsilon ,\upsilon \geq 0
\end{cases}
\hspace{2cm}
\therefore
\hspace{2cm}
\varphi'(\alpha, \upsilon)=
\begin{cases}
\varphi(\upsilon) + \alpha, \upsilon < 0 \\
1, \upsilon \geq 0
\end{cases}
$$

In [None]:
a = 1
phi = np.where(v >= 0, v, a*(np.exp(v) - 1))
phi_prime = np.where(v >= 0, 1, phi + a)
plt.plot(v, phi, label='função (α = {0})'.format(a))
plt.plot(v, phi_prime, '--', label='derivada')
displayPlot(xlim=[-2, 2], ylim=[-1.1, 1.1])

Variando $\alpha$ temos:

In [None]:
A= np.linspace(0, 2, 5)
for a in A:
    phi = np.where(v >= 0, v, a*(np.exp(v) - 1))
    plt.plot(v, phi, label='α = {0:.2f}'.format(a))
displayPlot(ncol=len(A), xlim=[-2, 2], ylim=[-1.1, 1.1])

## SoftPlus
---

$$
\large \varphi(\upsilon)=\ln(1+e^\upsilon)
\hspace{2cm}
\therefore
\hspace{2cm}
\varphi'(\upsilon)=\frac{1}{1+e^{-\upsilon}}
$$

In [None]:
phi = np.log(1 + np.exp(v))
phi_prime = 1/(1 + np.exp(-v))
plt.plot(v, phi, label='função')
plt.plot(v, phi_prime, '--', label='derivada')
displayPlot(ylim=[-0.1, 2])

Sua forma com inclinação $\beta$ é definida por: $\large \varphi(\upsilon)=\frac{1}{1+e^{-\beta \upsilon}}$

In [None]:
B = np.linspace(0, 2, 5)
for b in B:
    phi = np.log(1 + np.exp(b*v))
    plt.plot(v, phi, label='β = {0:.2f}'.format(b))
displayPlot(ncol=len(B), ylim=[-0.1, 2])

## SoftExponential
---

$$
\large \varphi(\alpha, \upsilon)=
\begin{cases}
-\frac{\ln(1 - \alpha(\alpha+\upsilon))}{\alpha}, \alpha < 0 \\
\upsilon, \alpha=0 \\
\alpha + \frac{e^{\alpha\upsilon}-1}{\alpha}, \alpha > 0
\end{cases}
\hspace{2cm}
\therefore
\hspace{2cm}
\varphi'(\alpha, \upsilon)=
\begin{cases}
\frac{1}{1-\alpha(\alpha+\upsilon)}, \alpha < 0 \\
e^{\alpha\upsilon}, \alpha \geq 0
\end{cases}
$$

In [None]:
a = 0.5
phi = np.where(a == 0, v,
     np.where(a > 0, (np.exp(a*v) - 1)/a,
              - np.log(1 - a*(a + v))/a))
phi_prime = np.where(a >= 0, a + np.exp(a*v),
                     1/(1 - a*(a + v)))
plt.plot(v, phi, label='função (α = {0})'.format(a))
plt.plot(v, phi_prime, '--', label='derivada')
displayPlot(ylim=[-2.1, 2.1])

Variando $\alpha$ temos:

In [None]:
A = np.linspace(-1.1, 1, 5)
for a in A:
    phi = np.where(a == 0, v,
         np.where(a > 0, a + (np.exp(a*v) - 1)/a, 
                 - np.log(1 - a*(a + v))/a))
    plt.plot(v, phi, label='β = {0:.2f}'.format(a))
displayPlot(ncol=len(A), ylim=[-2.1, 2.1])

## Sinusóide
---

$$
\large \varphi(\upsilon)=sin(\upsilon)
\hspace{2cm}
\therefore
\hspace{2cm}
\varphi'(\upsilon)=cos(\upsilon)
$$

In [None]:
phi = np.sin(v)
phi_prime = np.cos(v)
plt.plot(v, phi, label='função')
plt.plot(v, phi_prime, '--', label='derivada')
displayPlot(ylim=[-1.1, 1.1])

Sua forma com parâmetro $\beta$ é definida por: $\large \varphi(\upsilon)=sin(\beta x)$

In [None]:
B = np.linspace(1, 2, 5)
for b in B:
    phi = np.sin(b*v)
    plt.plot(v, phi, label='β = {0:.2f}'.format(b))
displayPlot(ncol=len(B), ylim=[-1.1, 1.1])

## Seno cardinal
---

$$
\large \varphi(\alpha, \upsilon)=
\begin{cases}
1, \upsilon = 0 \\
\frac{sin(\upsilon)}{\upsilon}, \upsilon \neq 0
\end{cases}
\hspace{2cm}
\therefore
\hspace{2cm}
\varphi'(\alpha, \upsilon)=
\begin{cases}
0, \upsilon = 0 \\
\frac{cos(\upsilon)}{\upsilon} - \frac{sin(\upsilon)}{\upsilon^2}, \upsilon \neq 0
\end{cases}
$$

In [None]:
phi = np.where(v == 0, 1, np.sin(v)/v)
phi_prime = np.where(v == 0, 0, np.cos(v)/v - np.sin(v)/v**2)
plt.plot(v, phi, label='função)')
plt.plot(v, phi_prime, '--', label='derivada')
displayPlot(ylim=[-1.1, 1.1])

Sua forma com parâmetro $\beta$ é definida por: $\large \varphi(\upsilon)=\begin{cases}1 ,\upsilon = 0 \\\frac{sin(\beta\upsilon)}{\beta\upsilon},\upsilon \neq 0\end{cases}$

In [None]:
B = np.linspace(0.1, 2, 5)
for b in B:
    phi = np.where(v == 0, 1, np.sin(b*v)/(b*v))
    plt.plot(v, phi, label='β = {0:.2f}'.format(b))
displayPlot(ncol=len(B), ylim=[-0.3, 1.1])

## Gaussiana
---

$$
\large \varphi(\upsilon)=e^{-\upsilon^2}
\hspace{2cm}
\therefore
\hspace{2cm}
\varphi'(\upsilon)=-2\upsilon e^{-\upsilon^2}
$$

In [None]:
phi = np.exp(-v**2)
phi_prime = -2*v*np.exp(-v**2)
plt.plot(v, phi, label='função')
plt.plot(v, phi_prime, '--', label='derivada')
displayPlot(xlim=[-4, 4], ylim=[-1.1, 1.1])

Sua forma com parâmetro $\beta$ é definida por: $\large \varphi(\upsilon)=e^{-\beta\upsilon^2}$

In [None]:
B = np.linspace(0, 2, 5)
for b in B:
    phi = np.exp(-b*v**2)
    plt.plot(v, phi, label='β = {0:.2f}'.format(b))
displayPlot(ncol=len(B), xlim=[-4, 4], ylim=[-0.1, 1.1])