<a href="https://colab.research.google.com/github/Anjasfedo/Learning-TensorFlow/blob/main/eat_tensorflow2_in_30_days/Chapter5_3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 5-3 activation

Activation function plays a key role in deep learning. It introduces the non-linearity that enables the neural network to fit arbitary complicated functions.

The neural network, no matter how complicated the structure is, is still a linear transformation which cannot fit the non-linear functions without the activation function.

For the time being, the most popular activation is `relu`, but there is some new functiosn such as `swish`, `GELU`, claiming a better performance over `relu`

## The most popular activation functions

- `tf.nn.sigmoid`: Compressing real number between 0 to 1, usually used in the output layer for binary classification; the main drawback are vanishing gradient, high computing complexity, and the non-zero center of the output.

- `tf.nn.softmax`: Extended version of sigmoid for multiple categories, usually used in the output layer for multiple classifications.

- `tf.nn.tanh`: Compressing real number between -1 to 1, expectation of the output is zero; the main drawbacks are vanishing gradient and high computing complexity.

- `tf.nn.relu`: Linear reactified unit, the most popular activation function, usually used in the hidden layer; the main drawbacks are non-zero center of the output and vanishing gradient for the inputs < 0 (dying relu)

- `tf.nn.leaky_relu`: Improved ReLU, resolving the dying ReLU problem.

- `tf.nn.elu`: Exponential linear unit, which is an improvement to the ReLU, alleviate the dying ReLU problem.

- `tf.nn.selu`: Scaled exponential linear unit, which is able to normalize the neural network automatically if the weights are initalized through `tf.keras.initalizers.lecun_normal`. No gradient exploding/vanishing problems, but need to apply together with AlphaDropout (an alternation of Dropout)

- `tf.nn.swish`: Self-gated activation function, a research product from Google. The literature prove that it brings slight improvement comparing to ReLU.

- `gelu`: Gaussian error linear unit, which has the best performance in Transformer; however `tf.nn` hasnt implemented it.

## 2. Implementing activation functions in the models

There are two ways of implementing activation functions in Keras models:
- specifying through are `activation` parameter in certain layers.
- adding activation layer `layers.Activation` explicitly.

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras import layers, models

In [2]:
tf.keras.backend.clear_session()

model = models.Sequential()
model.add(layers.Dense(32, activation=tf.nn.relu, input_shape=(None, 16))) # specifying through the activation parameter

model.add(layers.Dense(10))
model.add(layers.Activation(tf.nn.softmax)) # adding activation layer explicitly

model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, None, 32)          544       
                                                                 
 dense_1 (Dense)             (None, None, 10)          330       
                                                                 
 activation (Activation)     (None, None, 10)          0         
                                                                 
Total params: 874 (3.41 KB)
Trainable params: 874 (3.41 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
