Linear and ELU Activation Function Added (#31)

Linear and Elu Activation Functions and updated to Tahn
bfortuner · Jul 26, 2018 · 003498d · 003498d
1 parent 98a1a01
commit 003498d
Show file tree

Hide file tree

Showing 6 changed files with 81 additions and 3 deletions.
diff --git a/code/activation_functions.py b/code/activation_functions.py
@@ -8,6 +8,12 @@
 
 ### Functions ###
 
+def linear(z,m)
+	return m*z
+
+def elu(z,alpha)
+	return z if z >= 0 else alpha*(e^z -1)
+
 def leakyrelu(z, alpha):
 	return max(alpha * z, z)
 
@@ -26,6 +32,12 @@ def tanh(z):
 
 ### Derivatives ###
 
+def linear_prime(z,m)
+	return m
+
+def elu_prime(z,alpha)
+	return 1 if z > 0 else alpha*e^z
+
 def leakyrelu_prime(z, alpha):
 	return 1 if z > 0 else alpha
 

diff --git a/docs/activation_functions.rst b/docs/activation_functions.rst
@@ -6,12 +6,78 @@ Activation Functions
 
 .. contents:: :local:
 
+.. _activation_linear:
 
+Linear
+======
+
+A straight line function where activation is proportional to input ( which is the weighted sum from neuron ).
+
++-------------------------------------------------------+------------------------------------------------------+
+| Function                                              | Derivative                                           |
++-------------------------------------------------------+------------------------------------------------------+
+| .. math::                                             | .. math::                                            |
+|      R(z,m) = \begin{Bmatrix} z*m    \\               |       R'(z,m) = \begin{Bmatrix} m     \\             |
+|                 \end{Bmatrix}                         |                   \end{Bmatrix}                      |
++-------------------------------------------------------+------------------------------------------------------+
+| .. image:: images/linear.png                          | .. image:: images/linear_prime.png                   |
+|       :align: center                                  |      :align: center                                  |
+|       :width: 256 px                                  |      :width: 256 px                                  |
+|       :height: 256 px                                 |      :height: 256 px                                 |
++-------------------------------------------------------+------------------------------------------------------+
+| .. literalinclude:: ../code/activation_functions.py   | .. literalinclude:: ../code/activation_functions.py  |
+|       :pyobject: linear                               |      :pyobject: linear_prime                         |
++-------------------------------------------------------+------------------------------------------------------+
+
+
+.. rubric:: Pros
+
+- It gives a range of activations, so it is not binary activation.
+- We can definitely connect a few neurons together and if more than 1 fires, we could take the max ( or softmax) and decide based on that.
+
+.. rubric:: Cons
+
+- For this function, derivative is a constant. That means, the gradient has no relationship with X.
+- It is a constant gradient and the descent is going to be on constant gradient.
+- If there is an error in prediction, the changes made by back propagation is constant and not depending on the change in input delta(x) !
+
+
+
+.. _activation_elu:
 
 ELU
 ===
 
-Be the first to `contribute! <https://github.com/bfortuner/ml-cheatsheet>`__
+Exponential Linear Unit or its widely known name ELU is a function that tend to converge cost to zero faster and produce more accurate results. Different to other activation functions, ELU has a extra alpha constant which should be positive number.
+
+ELU is very similiar to RELU except negative inputs. They are both in identity function form for non-negative inputs. On the other hand, ELU becomes smooth slowly until its output equal to -α whereas RELU sharply smoothes.
+
++-------------------------------------------------------+------------------------------------------------------+
+| Function                                              | Derivative                                           |
++-------------------------------------------------------+------------------------------------------------------+
+| .. math::                                             | .. math::                                            |
+|      R(z) = \begin{Bmatrix} z & z > 0 \\              |       R'(z) = \begin{Bmatrix} 1 & z>0 \\             |
+|       α.( e^z – 1) & z <= 0 \end{Bmatrix}             |       α.e^z & z<0 \end{Bmatrix}                      |
++-------------------------------------------------------+------------------------------------------------------+
+| .. image:: images/elu.png                             | .. image:: images/elu_prime.png                      |
+|       :align: center                                  |      :align: center                                  |
+|       :width: 256 px                                  |      :width: 256 px                                  |
+|       :height: 256 px                                 |      :height: 256 px                                 |
++-------------------------------------------------------+------------------------------------------------------+
+| .. literalinclude:: ../code/activation_functions.py   | .. literalinclude:: ../code/activation_functions.py  |
+|       :pyobject: elu                                  |      :pyobject: elu_prime                            |
++-------------------------------------------------------+------------------------------------------------------+
+
+
+.. rubric:: Pros
+
+- ELU becomes smooth slowly until its output equal to -α whereas RELU sharply smoothes.
+- ELU is a strong alternative to ReLU.
+- Unlike to ReLU, ELU can produce negative outputs.
+
+.. rubric:: Cons
+
+- For x > 0, it can blow up the activation with the output range of [0, inf].
 
 
 .. _activation_relu:
@@ -86,11 +152,11 @@ LeakyRelu is a variant of ReLU. Instead of being 0 when :math:`z < 0`, a leaky R
 
 .. rubric:: Pros
 
-- Pro 1
+- Leaky ReLUs are one attempt to fix the "dying ReLU" problem by having a small negative slope (of 0.01, or so).
 
 .. rubric:: Cons
 
-- Con 1
+- As it possess linearity, it can't be used for the complex Classification. It lags behind the Sigmoid and Tanh for some of the use cases.
 
 .. rubric:: Further reading
 

diff --git a/docs/images/elu.png b/docs/images/elu.png
diff --git a/docs/images/elu_prime.png b/docs/images/elu_prime.png
diff --git a/docs/images/linear.png b/docs/images/linear.png
diff --git a/docs/images/linear_prime.png b/docs/images/linear_prime.png