# **Binary Step Function**
**f(x) = 1,    x>=0,     
x<0**

In [None]:
def binary_step(x):
  if x<0:
    return 0
  else:
    return 1
binary_step(-1)

0

# **Linear Function**
**f(x)=ax**

In [None]:
def linear_function(x):
  return 4*x
linear_function(4), linear_function(-2)

(16, -8)

# **Sigmoid**
The next activation function that we are going to look at is the Sigmoid function.
It is one of the most widely used non-linear activation function.
Sigmoid transforms the values between the range 0 and 1.
Here is the mathematical expression for sigmoid-
f(x) = 1/(1+e^-x)

In [None]:
import numpy as np
def sigmoid_function(x):
  z = (1/(1 + np.exp(-x)))
  return z
sigmoid_function(7),sigmoid_function(-22)

(0.9990889488055994, 2.7894680920908113e-10)

# **Tanh**
The tanh function is very similar to the sigmoid function. The only difference is that it is
symmetric around the origin. The range of values in this case is from -1 to 1. Thus the inputs to
the next layers will not always be of the same sign. The tanh function is defined as-
tanh(x)=2sigmoid(2x)-1

tanh(x) = 2/(1+e^(-2x)) -1

In [None]:
def tanh_function(x):
  z = (2/(1 + np.exp(-2*x))) -1
  return z
tanh_function(0.5), tanh_function(-1)

(0.4621171572600098, -0.7615941559557649)

# **ReLU - Rectified Linear Unit**
The ReLU function is another non-linear activation function.
The main advantage is that it does not activate all the neurons at the same time.   
This means that the neurons will only be deactivated if the output of the linear transformation is
less than 0.    
f(x)=max(0,x)   
Negative input values --> the result is zero, that means the neuron does not get activated.   
Since only a certain number of neurons are activated, the ReLU function is far more computationally efficient when compared to the sigmoid and tanh function.

In [None]:
def relu_function(x):
  if x<0:
    return 0
  else:
    return x
relu_function(7), relu_function(-7)

(7, 0)

# **Leaky ReLU**
Leaky ReLU function is an improved version of the ReLU function.
As we saw that for the ReLU function, the gradient is 0 for x<0, which would deactivate the neurons in that region.
Leaky ReLU is defined to address this problem.
Instead of defining the Relu function as 0 for negative values of x, we define it as an extremely
small linear component of x.

Here is the mathematical expression-   
f(x)= 0.01x,         x<0
= x, x>=0

In [None]:
def leaky_relu_function(x):
  if x<0:
    return 0.01*x
  else:
    return x
leaky_relu_function(7), leaky_relu_function(-7)

(7, -0.07)

# **Parameterised ReLU**
The parameterised ReLU introduces a new parameter as a slope of the negative part of the
function.

Here’s how the ReLU function is modified to incorporate the slope parameter-
f(x) = x, x>=0

= ax, x<0

When the value of a is fixed to 0.01, the function acts as a Leaky ReLU function.
However, in case of a parameterised ReLU function, ‘a‘ is also a trainable parameter .
The network also learns the value of ‘a‘ for faster and more optimum convergence.

In [None]:
def parameterised_relu_function(x, a=0.01):
    if x < 0:
        return a * x
    else:
        return x

print(parameterised_relu_function(7))
print(parameterised_relu_function(-7))

print(parameterised_relu_function(-7, a=0.1))


7
-0.07
-0.7000000000000001


#**Exponential Linear Unit - ELU**
Exponential Linear Unit a variant of Rectiufied Linear Unit (ReLU) that modifies the slope of the
negative part of the function.
Unlike the leaky relu and parametric ReLU functions, instead of a straight line, ELU uses a log
curve for defning the negatice values.
It is defined as

f(x) = x, x>=0
= a(e^x-1), x<0

In [None]:
def elu_function(x, a):
  if x<0:
    return a*(np.exp(x)-1)
  else:
    return x
elu_function(5, 0.1),elu_function(-5, 0.1)

(5, -0.09932620530009145)

# **Swish**


In [None]:
def swish_function(x):
  return x/(1-np.exp(-x))
swish_function(-67), swish_function(4)

(5.349885844610276e-28, 4.074629441455096)

# **Softmax**
Softmax function is often described as a combination of multiple sigmoids.
We know that sigmoid returns values between 0 and 1, which can be treated as probabilities of a
data point belonging to a particular class.
Thus sigmoid is widely used for binary classification problems.
The softmax function can be used for multiclass classification problems.
This function returns the probability for a datapoint belonging to each individual class.

Here is the mathematical expression of the same-
While building a network for a multiclass problem, the output layer would have as many neurons

as the number of classes in the target.
For instance if you have three classes, there would be three neurons in the output layer.
Suppose you got the output from the neurons as [1.2 , 0.9 , 0.75].
Applying the softmax function over these values, you will get the following result – [0.42 , 0.31,
0.27]
. These represent the probability for the data point belonging to each class.
Note that the sum of all the values is 1

In [None]:
def softmax_function(x):
  z = np.exp(x)
  z_ = z/z.sum()
  return z_
softmax_function([0.8, 1.2, 3.1])

array([0.08021815, 0.11967141, 0.80011044])