<a href="https://colab.research.google.com/github/bnsreenu/python_for_image_processing_APEER/blob/master/tutorial94_DL_terminology_Activation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

https://youtu.be/vp8mQ_inplo

**Code to explain Activation Functions**

Example use case - A family deciding on the cuisine to eat out. 
Choices are, Thai, Sushi and Indian. 

Each family member has weight associated with their pwoer to influence the decision.

Mom: 0.5

Dad: 0.2

Son: 0.3

Daughter: 0.6


Each member provides a rating for each cuisine type that ranges between -100 (really hate) to 100 (really love).

Ratings for each cusine given by Mom, da, son, and daughter, respectively, in that order.

Thai: -30, 100, 40, -40

Sushi: -10, -25, -60, 50

Indian: -50, -50, 50, 40 

Weighted sum as calculated by the neuron for each cuisine type by adding the product of above weights and ratings by each family member:

Thai: -7 

Sushi: 2

Indian: 4

These will be used as inputs to each activation function velow. 

In [None]:
import numpy as np
my_data = np.array([-7, 2, 4])  #Thai, Sushi, Indian, respectively. 

In [None]:
# Linear kernel. Input values with some scaling constant 
# Gradient (derivative) will be a constant. This means the update factor for weights and biases during training is the same. 
# The network will not be improving better during training, especially for complex scenarios. 

def linear(x):
    return 2*x  #Scaling constant = 2

linear(my_data)

array([-14,   4,   8])

In [None]:
#Step function. 0 for all values below 0 and 1 for all values above 0.
# Useful for binary classification and useless for multiclass classification. 
# Gradient of step function is zero, so do not use this for hidden layers as the weights will not be updated. 
# Use this for the classification layer. 

def step(x):
    if x<0:
        return 0
    else:
        return 1
    print(i)

print(step(-7))
print(step(2))
print(step(4))

0
1
1


In [None]:
#sigmoid function - transforms input values to values between 0 and 1
# Continuously differentialble function. The derivative tapers off (gets small) pretty fast, around values -4 / 4.
# This means the network is not learning when these limits are reached. 
# Output values do not all add up to 1. 

def sigmoid(X):
   return 1/(1+np.exp(-X))

sigm_result = sigmoid(my_data)

print(sigm_result)

sigm_result_binarized = sigm_result>0.5
print(sigm_result_binarized)

[9.11051194e-04 8.80797078e-01 9.82013790e-01]
[False  True  True]


In [None]:
#tanh(x)=2sigmoid(2x)-1
# Similar to sogmoid except this is symmetric around origin and ranges from -1 to 1. 
# Usually tanh is preferred over the sigmoid function as it is zero centered

def tanh(x):
    return (2/(1 + np.exp(-2*x))) -1

tanh_result=tanh(my_data)

print(tanh_result)

[-0.99999834  0.96402758  0.9993293 ]


In [None]:
#ReLu - Rectified Linear Unit
# Neurons will only be deactivated if the output of the transformation is less than 0.
# Output is 0 for negative values and linear for values above 0. 
# negative side og the derivative (gradient) is 0, this may lead to dead neurons that never get activated.
# Leaky ReLU can be used if this is a problem. 
def relu(X):
   return np.maximum(0,X)

relu(my_data)

array([0, 2, 4])

In [None]:
# Leaky ReLu
#Similar to reLU except for values below zero instead of output being 0 it is a tiny linear component. 
def leaky_relu(x):
    if x<0:
        return 0.1*x
    else:
        return x
print(leaky_relu(-7))
print(leaky_relu(2))
print(leaky_relu(4))

-0.7000000000000001
2
4


In [None]:
#Softmax is like combining multiple sigmoids. 
# Softmax is used for multiclass classification problems. 
# Output returns the probability for a datapoint belonging to each individual class.
#All probabilities add to 1

def softmax(X):
    expo = np.exp(X)
    expo_sum = np.sum(np.exp(X))
    return expo/expo_sum

#Example with mmatrix defined above
a, b, c = softmax(my_data)
print (softmax(my_data))
print("The sum of all values = ", a+b+c)

[1.47105928e-05 1.19201168e-01 8.80784121e-01]
The sum of all values =  1.0


**Suggestions:**


*   Sigmoid works well for classifiers
*   Sigmoid (and tanh) are usually avoided in hidden layers due to the vanishing gradient problem
*   ReLU is often used in hidden layers, especially for CNNs.
*   Never use ReLU outside the hidden layers.
*   Use softmax for multiclass classification



