In [4]:
import numpy as np
import math

### Sigmoid activation function
Advantage:
1. Squashes numbers to range [0,1]
2. Historically popular since they have nice interpretation as a saturating “firing rate” of a neuron

Disadvantage:
1. Saturated neurons “kill” the gradients
    - Gradients are in most cases near 0 (Big values/small values), that kills the updates if the graph/network are large.
2. Sigmoid outputs are not zero-centered
    - Didn't produce zero-mean data.
3. exp() is a bit compute expensive

In [51]:
def sigmooid_function(x):
    sig=1/(1+math.exp(-x)) #1/(1+np.exp(-x))
    # print("Sigmoid Function:", sig)
    return sig

In [52]:
sigmooid_function(10)

0.9999546021312976

### Tanh activation function
Advantage:
1. Squashes numbers to range [-1,1]
2. zero centered (nice)

Disadvantage:
1. still kills gradients when saturated

In [53]:
def tanh_function(x):
    thf= math.tanh(x)
    # print(thf)
    return thf

In [54]:
tanh_function(-1000)

-1.0

### Relu activation function
`max(0.0, x)`

Advantage:
- Does not saturate (in +region)
- Does not kill the gradient.
    - Only small values that are killed. Killed the gradient in the half.
- Very computationally efficient
- Converges much faster than sigmoid/tanh in practice (e.g. 6x)
- Actually more biologically plausible than sigmoid.

Disadvantage:
- Not zero-centered output
- If weights aren't initialized good, maybe 75% of the neurons will be dead and thats a waste computation. But its still works. This is an active area of research to optimize this.
    - To solve the issue mentioned above, people might initialize all the biases by 0.01


In [55]:
def relu_function(x):
	relu=max(0.0, x)
	# print("Relu Function:", relu)
	return relu

In [56]:
relu_function(-10)

0.0

### Leaky Relu function:
    - Does not saturate
    - Computationally efficient
    - Converges much faster than sigmoid/tanh in practice! (e.g. 6x)
    - will not “die”

In [46]:
def leaky_relu_function(x):
    if x>0:
        return x
    else:
        return .01*x

In [47]:
leaky_relu_function(-100)

-1.0

### Expotential Relu Function:
    - All benefits of ReLU
    - Closer to zero mean outputs
    - Negative saturation regime compared with Leaky ReLU adds some robustness to noise
    - Computation requires exp()

In [5]:
def exp_relu_function(x):
    '''
    '''
    if x>0:
        return x
    else:
        return .01*(np.exp(x)-1)

In [6]:
exp_relu_function(-10)

-0.009999546000702375

### Maxout activation:
- maxout(x) = max(w1.T*x + b1, w2.T*x + b2)
- Generalizes RELU and Leaky RELU
- Doesn't die!
- Problems:
    - doubles the number of parameters per neuron.

In [2]:
def maxout_function(x, weights, biases):
    linear_outputs=np.dot(x, weights)+biases
    max_output= np.max(linear_outputs, axis=1)
    return max_output

In [7]:
x = np.array([[1, 2, 3], [4, 5, 6]])  # Input
weights = np.array([[0.1, 0.2], [0.3, 0.4], [0.5, 0.6]])  # Weights
biases = np.array([0.1, 0.2])  # Biases

output = maxout_function(x, weights, biases)
print("Output:", output)

Output: [3.  6.6]


### Key Point:
    - Use ReLU. Be careful with your learning rates
    - Try out Leaky ReLU / Maxout / ELU
    - Try out tanh but don’t expect much
    - Don’t use sigmoid