<a href="https://colab.research.google.com/github/Rishav-hub/Challenge1/blob/main/Documentation_3_June.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import pandas as pd

In [None]:
(X_train_full, y_train_full), (X_test, y_test) = tf.keras.datasets.mnist.load_data()
X_train_full = X_train_full / 255.0
X_test = X_test / 255.0
X_valid, X_train = X_train_full[:5000], X_train_full[5000:]
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]

### What are activation functions ?
Activation functions helps to determine the output of a neural network. These type of functions are attached to each neuron in the network, and determines whether it should be activated or not, based on whether each neuron’s input is relevant for the model’s prediction.

### Activation Functions

#### Sigmoid Activation Function
  $\sigma(x) = \frac{1}{1+e^{-z}}$
  $$where\ \sigma(x) \in (0, 1),\\
  and\ x \in [-\infty, +\infty]$$

- It's one of the first used activation Functions
- Output is in the open interval (0,1).

Disadvantages - 
- Prone to Gradient Vanishing- Due to the saturation region
- Output is not zero centered, only outputs zero value.


In [None]:
def update_even_odd_labels(labels):
  for idx, label in enumerate(labels):
    labels[idx] = np.where(label % 2 == 0, 1, 0)
  return labels

In [None]:
y_train_bin, y_test_bin, y_valid_bin = update_even_odd_labels([y_train, y_test, y_valid])

In [None]:
# Implementation
model_3 = tf.keras.models.Sequential([tf.keras.layers.Flatten(input_shape = [28,28], name = 'input_layer'),
                                     tf.keras.layers.Dense(300,activation='relu',name= 'hidden_layer1'),
                                     tf.keras.layers.Dense(200, activation = 'relu', name = 'hidden_layer2'),
                                     tf.keras.layers.Dense(100, activation = 'relu', name = 'hidden_layer3'),
                                     tf.keras.layers.Dense(1, activation='sigmoid', name = 'output_layer')])
LOSS_FUNCTION = "binary_crossentropy" 
OPTIMIZER = "SGD" # or use with custom learning rate=> tf.keras.optimizers.SGD(0.02)
METRICS = ["accuracy"]

model_3.compile(loss=LOSS_FUNCTION,
              optimizer=OPTIMIZER, metrics = METRICS)

EPOCHS = 10
VALIDATION_SET = (X_valid, y_valid_bin)

history = model_3.fit(X_train, y_train_bin, epochs=EPOCHS,
                    validation_data=VALIDATION_SET,batch_size = 32)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [None]:
model_3.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_layer (Flatten)        (None, 784)               0         
_________________________________________________________________
hidden_layer1 (Dense)        (None, 300)               235500    
_________________________________________________________________
hidden_layer2 (Dense)        (None, 200)               60200     
_________________________________________________________________
hidden_layer3 (Dense)        (None, 100)               20100     
_________________________________________________________________
output_layer (Dense)         (None, 1)                 101       
Total params: 315,901
Trainable params: 315,901
Non-trainable params: 0
_________________________________________________________________


#### Hyperbolic tangent activation function

The tanh function formula and curve are as follows

$$tanh(x) = \frac{(e^{x} - e^{-x})}{(e^{x} + e^{-x})}$$

$$where\ \tanh(x) \in (-1, 1),\\
and\ x \in [-\infty, +\infty]$$

Advantage over Sigmoid- 
- It outputs value that range between 1 to -1.

It is mostly used in the hidden layers of a binary classification model.

**Otherwise the same defects are present in tanh AF as there was in Sigmoid AF**

In [None]:
# Implementation
model_1 = tf.keras.models.Sequential([tf.keras.layers.Flatten(input_shape = [28,28], name = 'input_layer'),
                                     tf.keras.layers.Dense(300,activation='tanh',name= 'hidden_layer1'),
                                     tf.keras.layers.Dense(200, activation = 'tanh', name = 'hidden_layer2'),
                                     tf.keras.layers.Dense(100, activation = 'tanh', name = 'hidden_layer3'),
                                     tf.keras.layers.Dense(1, activation='sigmoid', name = 'output_layer')])
LOSS_FUNCTION = "binary_crossentropy" 
OPTIMIZER = "SGD" # or use with custom learning rate=> tf.keras.optimizers.SGD(0.02)
METRICS = ["accuracy"]

model_1.compile(loss=LOSS_FUNCTION,
              optimizer=OPTIMIZER, metrics = METRICS)

EPOCHS = 10
VALIDATION_SET = (X_valid, y_valid_bin)

history = model_1.fit(X_train, y_train_bin, epochs=EPOCHS,
                    validation_data=VALIDATION_SET,batch_size = 32)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


#### ReLU Sigmoid Model VS tanh Sigmoid Model

In [None]:
print("Evaluation for ReLU Sigmoid Model is {}".format(model_3.evaluate(X_test, y_test_bin)))
print("Evaluation for tanh Sigmoid Model is {}".format(model_1.evaluate(X_test, y_test_bin)))

Evaluation for ReLU Sigmoid Model is [0.04918106645345688, 0.9846000075340271]
Evaluation for tanh Sigmoid Model is [0.05450804531574249, 0.9821000099182129]


#### Observation
Gradually ReLU Sigmoid model did a slightly better job compared to the other model.

### ReLU (Rectified Linear Unit)

$$ReLU(x)= max(x,0)$$

$$where\ ReLU(x) \in (0, x),\\
and\ x \in [-\infty, +\infty]$$
- The ReLU (Rectified Linear Unit) function is an activation function that is currently more popular. Compared with the sigmod function and the tanh function,

Some features are -:
- Rectified means, it eliminates all the -ve values.
- Calculation is much fater in both direction.
- It is not a zero- centeric function.
- Suffers from an issue called as **Dying ReLU** - which means that once a negative number is entered, ReLU will die. If you enter a negative number, the gradient will be completely zero, which has the same problem as the sigmod function and tanh function.

In [None]:
housing = fetch_california_housing()
X = pd.DataFrame(housing.data, columns=housing.feature_names)
y = pd.DataFrame(housing.target, columns=["target"])
X_train_full, X_test, y_train_full, y_test = train_test_split(X,y, random_state=42)
X_train, X_valid, y_train, y_valid = train_test_split(X_train_full,y_train_full, random_state=42)

In [None]:
LAYERS = [
         tf.keras.layers.Dense(30, activation="relu", input_shape=X_train.shape[1:]),
         tf.keras.layers.Dense(10, activation="relu"),
         tf.keras.layers.Dense(5, activation="relu"),
         tf.keras.layers.Dense(1)
]

model_relu = tf.keras.models.Sequential(LAYERS)

LOSS = "mse"
OPTIMIZER = "sgd"

model_relu.compile(loss=LOSS , optimizer=OPTIMIZER)

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)
X_valid = scaler.transform(X_valid)
X_test = scaler.transform(X_test)

history = model_relu.fit(X_train, y_train, epochs=EPOCHS, validation_data=(X_valid, y_valid))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


### **Leaky ReLU function**

$$ 
leaky\_relu(x, \alpha) = \left\{\begin{matrix} 
x & x\geq 0 \\ 
\alpha x & x \lt 0 
\end{matrix}\right.
$$

In order to solve the **Dead ReLU** Problem, people proposed to set the first half of ReLU 0.01x instead of 0.

It has a alpha parameter which can be adjusted for setting the slope for the negative part.

In [None]:
LAYERS = [
         tf.keras.layers.Dense(30, input_shape=X_train.shape[1:]),
         tf.keras.layers.LeakyReLU(),
         tf.keras.layers.Dense(10),
         tf.keras.layers.LeakyReLU(),

         tf.keras.layers.Dense(5),        
         tf.keras.layers.LeakyReLU(),

         tf.keras.layers.Dense(1)
]

model_lrelu = tf.keras.models.Sequential(LAYERS)

LOSS = "mse"
OPTIMIZER = "sgd"

model_lrelu.compile(loss=LOSS , optimizer=OPTIMIZER)

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)
X_valid = scaler.transform(X_valid)
X_test = scaler.transform(X_test)

history = model_lrelu.fit(X_train, y_train, epochs=EPOCHS, validation_data=(X_valid, y_valid))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


#### Evaluating ReLU and Leaky ReLU 

In [None]:
print("Evaluation for ReLU  Model is {}".format(model_relu.evaluate(X_test, y_test)))
print("Evaluation for LeakyReLU Model is {}".format(model_lrelu.evaluate(X_test, y_test)))

Evaluation for ReLU  Model is 0.36190956830978394
Evaluation for LeakyReLU Model is 0.3510229289531708


#### Observation
We can observe that there is a very little difference between the loss of two models so according to the following dataset both perform well.

### **Softmax activation function**

$$S(x_j)=\frac{e^{x_j}}{\sum_{k=1}^{K} e^{x_k}}, where\ j = 1,2, \cdots, K $$
- It also has many applications in Multiclass Classification and neural networks.
- Mostly used in the Final Layer
- If there are 4 input to the softmax layer then each will have $e^{x_1}, e^{x_2}, e^{x_3}, e^{x_4}$ the it will calculate the probability for individual classes using above equation.

In [None]:
# Implementation
model_soft = tf.keras.models.Sequential([tf.keras.layers.Flatten(input_shape = [28,28], name = 'input_layer'),
                                     tf.keras.layers.Dense(300,activation='relu',name= 'hidden_layer1'),
                                     tf.keras.layers.Dense(200, activation = 'relu', name = 'hidden_layer2'),
                                     tf.keras.layers.Dense(100, activation = 'relu', name = 'hidden_layer3'),
                                     tf.keras.layers.Dense(2, activation='softmax', name = 'output_layer')])
LOSS_FUNCTION = "sparse_categorical_crossentropy"
OPTIMIZER = "SGD" # or use with custom learning rate=> tf.keras.optimizers.SGD(0.02)
METRICS = ["accuracy"]

model_soft.compile(loss=LOSS_FUNCTION,
              optimizer=OPTIMIZER, metrics = METRICS)

EPOCHS = 10
VALIDATION_SET = (X_valid, y_valid_bin)

history = model_soft.fit(X_train, y_train_bin, epochs=EPOCHS,
                    validation_data=VALIDATION_SET,batch_size = 32)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


#### Evaluating both models Sigmoid vs Softmax

In [None]:
print("Evaluation for Sigmoid Model is {}".format(model_3.evaluate(X_test, y_test_bin)))

Evaluation for Sigmoid Model is [0.04523968696594238, 0.9832000136375427]


In [None]:
print("Evaluation for Softmax Model is {}".format(model_soft.evaluate(X_test, y_test_bin)))

Evaluation for Softmax Model is [0.04392661154270172, 0.9840999841690063]


#### Observation
Both model performed almost similar to each other. The only difference the way to assign the last layer neuron and the maths behind it obviously.