<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#What-is-an-Activation-Function?" data-toc-modified-id="What-is-an-Activation-Function?-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>What is an Activation Function?</a></span></li><li><span><a href="#Commonly-Used-Activations" data-toc-modified-id="Commonly-Used-Activations-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Commonly Used Activations</a></span><ul class="toc-item"><li><span><a href="#Sigmoid" data-toc-modified-id="Sigmoid-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Sigmoid</a></span><ul class="toc-item"><li><span><a href="#Advantages" data-toc-modified-id="Advantages-2.1.1"><span class="toc-item-num">2.1.1&nbsp;&nbsp;</span>Advantages</a></span></li><li><span><a href="#Disadvantages" data-toc-modified-id="Disadvantages-2.1.2"><span class="toc-item-num">2.1.2&nbsp;&nbsp;</span>Disadvantages</a></span></li><li><span><a href="#Code-&amp;-Visualization" data-toc-modified-id="Code-&amp;-Visualization-2.1.3"><span class="toc-item-num">2.1.3&nbsp;&nbsp;</span>Code &amp; Visualization</a></span></li></ul></li><li><span><a href="#Hyperbolic-Tangent---Tanh" data-toc-modified-id="Hyperbolic-Tangent---Tanh-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Hyperbolic Tangent - Tanh</a></span><ul class="toc-item"><li><span><a href="#Advantages" data-toc-modified-id="Advantages-2.2.1"><span class="toc-item-num">2.2.1&nbsp;&nbsp;</span>Advantages</a></span></li><li><span><a href="#Disadvantages" data-toc-modified-id="Disadvantages-2.2.2"><span class="toc-item-num">2.2.2&nbsp;&nbsp;</span>Disadvantages</a></span></li><li><span><a href="#Code-&amp;-Visualization" data-toc-modified-id="Code-&amp;-Visualization-2.2.3"><span class="toc-item-num">2.2.3&nbsp;&nbsp;</span>Code &amp; Visualization</a></span></li></ul></li><li><span><a href="#ReLU" data-toc-modified-id="ReLU-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>ReLU</a></span><ul class="toc-item"><li><span><a href="#Advantages" data-toc-modified-id="Advantages-2.3.1"><span class="toc-item-num">2.3.1&nbsp;&nbsp;</span>Advantages</a></span></li><li><span><a href="#Disadvantages" data-toc-modified-id="Disadvantages-2.3.2"><span class="toc-item-num">2.3.2&nbsp;&nbsp;</span>Disadvantages</a></span></li><li><span><a href="#Code-&amp;-Visualization" data-toc-modified-id="Code-&amp;-Visualization-2.3.3"><span class="toc-item-num">2.3.3&nbsp;&nbsp;</span>Code &amp; Visualization</a></span></li></ul></li><li><span><a href="#Leaky-ReLU" data-toc-modified-id="Leaky-ReLU-2.4"><span class="toc-item-num">2.4&nbsp;&nbsp;</span>Leaky ReLU</a></span><ul class="toc-item"><li><span><a href="#Advantages" data-toc-modified-id="Advantages-2.4.1"><span class="toc-item-num">2.4.1&nbsp;&nbsp;</span>Advantages</a></span></li><li><span><a href="#Disadvantages" data-toc-modified-id="Disadvantages-2.4.2"><span class="toc-item-num">2.4.2&nbsp;&nbsp;</span>Disadvantages</a></span></li><li><span><a href="#Code-&amp;-Visualization" data-toc-modified-id="Code-&amp;-Visualization-2.4.3"><span class="toc-item-num">2.4.3&nbsp;&nbsp;</span>Code &amp; Visualization</a></span></li></ul></li></ul></li></ul></div>

In [None]:
import numpy as np
import matplotlib.pyplot as plt

z = np.arange(-10,10,0.2)

# What is an Activation Function?

In [None]:
def arctan(x, derivative=False):
    if (derivative == True):
        return 1/(1+np.square(x))
    return np.arctan(x)

z = np.arange(-10,10,0.2)

# Commonly Used Activations

## Sigmoid

Range: $(0,1)$

Function: $\sigma(x) = \frac{1}{1+e^{-x}}$

### Advantages

- Relatively intuitive at classifications
- Commonly used

### Disadvantages

- Not as efficient at training
- vanishing gradient problem

### Code & Visualization

In [None]:
def sigmoid(x, derivative=False):
    f = 1 / (1 + np.exp(-x))
    if (derivative == True):
        return f * (1 - f)
    return f

y = sigmoid(z)
dy = sigmoid(z, derivative=True)
plt.title("sigmoid")
plt.axhline(color="gray", linewidth=1,)
plt.axvline(color="gray", linewidth=1,)
plt.plot(z, y, 'r')
plt.plot(z, dy, 'b')

## Hyperbolic Tangent - Tanh

Range: $(-1,1)$

Function: $\tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$

### Advantages

- More efficient at training than sigmoid
- Steeper gradient

### Disadvantages

- Still suffers from the vanishing gradient problem

### Code & Visualization

In [None]:
def tanh(x, derivative=False):
    f = np.tanh(x)
    if (derivative == True):
        return (1 - (f ** 2))
    return np.tanh(x)

y = tanh(z)
dy = tanh(z, derivative=True)
plt.title("sigmoid")
plt.axhline(color="gray", linewidth=1,)
plt.axvline(color="gray", linewidth=1,)
plt.plot(z, y, 'r')
plt.plot(z, dy, 'b')

## ReLU

Range: $(0,\infty)$

Function: $f(x) = 
    \begin{cases}
      0, & \text{if}\ x<0 \\
      x, & \text{if}\ x\ge 0
    \end{cases}$

### Advantages

- Calculation is relatively efficient
- Specify a more positive activation

### Disadvantages

- Zero value: longer to train

### Code & Visualization

In [None]:
def relu(x, derivative=False):
    f = np.zeros(len(x))
    if (derivative == True):
        for i in range(0, len(x)):
            if x[i] > 0:
                f[i] = 1  
            else:
                f[i] = 0
        return f
    for i in range(0, len(x)):
        if x[i] > 0:
            f[i] = x[i]  
        else:
            f[i] = 0
    return f

plt.title("ReLU")
y = relu(z)
dy = relu(z, derivative=True)
plt.axhline(color="gray", linewidth=1,)
plt.axvline(color="gray", linewidth=1,)
plt.plot(z, dy, 'b')
plt.plot(z, y, 'r')


## Leaky ReLU

Range: $(-\infty,\infty)$

Function: $f(x) = 
    \begin{cases}
      - c \cdot x, & \text{if}\ x<0 \\
      x, & \text{if}\ x\ge 0
    \end{cases}\  \text{where}\ c\ \text{is some small value (0.01)}$

### Advantages

- Helps with training speed

### Disadvantages

- Still has to compute when x is negative

### Code & Visualization

In [None]:
def leaky_relu(x, leakage = 0.05, derivative=False):
    f = np.zeros(len(x))
    if (derivative == True):
        for i in range(0, len(x)):
            if x[i] > 0:
                f[i] = 1  
            else:
                f[i] = leakage
        return f
    for i in range(0, len(x)):
        if x[i] > 0:
            f[i] = x[i]  
        else:
            f[i] = x[i]* leakage
    return f

# the default leakage here is 0.05!
y = leaky_relu(z)
dy = leaky_relu(z, derivative=True)
plt.axhline(color="gray", linewidth=1,)
plt.axvline(color="gray", linewidth=1,)
plt.title("leaky ReLU")
plt.xlim(-10,10)
plt.plot(z, y, 'r')
plt.plot(z, dy, 'b')