# Activation Functions

# They are mathematical functions applied to the output of individual neurons in a neural network. They introduce non-linearty into the network, allowing it to learn and approximate complex relationships between inputs and outputs

 Some commonly used activation function in deep learning

 1. Sigmoid Function (logistic function): It maps the input to a value between 0 and 1 . It was widely used in the past but now less popular due to some drawbacks such as vainishing gradients.

2. Hyperbolic tangent function(tanh): Similar to the sigmoid function , but it maos the input to a value between -1 and 1 . It is still used in some cases but it also suffers from vanishing gradients

 3. Rectified Linear Unit(Relu): This function sets all negative values to zeros and keeps positive values unchanged . It is the most popular activation function in a deep learning due to its simplicity and effectiveness in training deep neural networks

 4. Leaky ReLu: This function is similar to Relu but allows a small negative slope for neagtive input values. It helps mitigate the dying relu problem where some neurons can become permanently inactive during training

 5. Parametric Relu(PRelu): It is a genearalization of Relu that intoduces a learnable parameter to determine the slope of negative input values . It offers more flexibilty and can imporve model performance

6. Exponential Linear Unit(ELU) : It is a varition of Relu that allow negative values with a smooth exponential decay . It helps alleviate the dying Relu problem and can produce more robust models

7. Softmax : It is commonly used in the output layer of a neural network for mutli class classification problems . It normalizes teh output values to represent probalities , ensuring that the sum of all probabilities is 1

# Activation function are essential in Dl for the following reasons

1. Non- linearity : AF introduce non-linear transformation to the network , enabling it to learn complex patterns and relationships in the data without activation functions , a neural netwrok would simply be a linear model

 2. Gradient propagation: AF helps propagate gradient backward during the training process , allowing efficient optimiztion and learning .Different activation function have differet characteristics in terms of gradient behvaiuor which can imapact the models training dynamics

3. Model Capacity: The choice of activation function can influence the capacity and expressive power of a neural network . Non - linear activation functions enable the network to represent more complex functions, expanding its ability to learn and generalize

 By using suitable activation function , deep learning models can learn and approximate highly non-linear functions, making them powerful tools for tasks such as image recognition , natural language processing , and speech recoginition

# 1.Sigmoid function : 

In [1]:
import numpy as np

def sigmoid(x):
    return 1/(1+np.exp(-x))

#ex

x=2.5
result=sigmoid(x)
print("Sigmoid (", x,")=", result)

Sigmoid ( 2.5 )= 0.9241418199787566


# 2. tanh

In [2]:
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation 




In [3]:
#Genearte a synthetic classification dataset
X,y = make_classification (n_samples = 1000, n_features =10 , n_informative=5, random_state=42)

In [4]:
# Split the data into training and testing sets
X_train , X_test , y_train , y_test = train_test_split(X,y, test_size=0.2, random_state=42)

In [5]:
# Build the neural network model
model=Sequential()
model.add(Dense(64,activation='tanh', input_shape=(X_train.shape[1],))) #INPUT layer
model.add(Dense(64, activation='tanh'))# hidden layer
model.add(Dense(1, activation='sigmoid')) #target layer




In [6]:
# Complie the model
model.compile(optimizer='adam' , loss='binary_crossentropy' , metrics=['accuracy'])




In [7]:
# Train the model
model.fit(X_train , y_train , epochs=10 , batch_size=32)

Epoch 1/10


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x1ea5a10e810>

In [8]:
# Evaluate the model on the testing data
loss, accuracy = model.evaluate (X_test , y_test)
print("Test Loss:", loss)
print("Test Accuracy:", accuracy)

Test Loss: 0.2478376179933548
Test Accuracy: 0.8949999809265137


# 3. ReLU

In [9]:
import tensorflow as tf
import numpy as np
from tensorflow import keras

In [10]:
# Define the neural network architecture
input_size=4
hidden_size=8
output_size=2

In [11]:
# Define the model
model=keras.Sequential([
    keras.layers.Dense(hidden_size, activation='relu', input_shape=(input_size,)),
    keras.layers.Dense(output_size)
])

In [12]:
# Complie the model
model.compile(optimizer=keras.optimizers.SGD(learning_rate=0.01),
             loss=keras.losses.MeanSquaredError())

# Define your input and target data as numpy arrays

In [13]:
input_data= np.array([[1.0,2.0,3.0,4.0],
                    [2.0,3.0,4.0,5.0],
                     [3.0,4.0,5.0,6.0]])
target_data=np.array([[5.0,0.8],
                     [0.6,0.9],
                     [0.7,1.0]])

In [14]:
# Train the model
model.fit(input_data, target_data, epochs=1000, verbose=0)

<keras.src.callbacks.History at 0x1ea5a3e0c50>

In [15]:
# Test the mdoel
test_input= np.array([[1.0,2.0,3.0,4.0]])
predicted_output=model.predict(test_input)



In [16]:
print(f"Predicted Output: {predicted_output}")

Predicted Output: [[4.21185   0.7975764]]


# 4. Leaky ReLu

In [18]:
import tensorflow as tf

# Define the input tensor
input_tensor = tf.constant([-1.0,2.0,-0.5,3.0])

# Apply leaky ReLu activation function
output_tensor = tf.nn.leaky_relu(input_tensor , alpha=0.2)

# print the output
print(output_tensor.numpy())

[-0.2  2.  -0.1  3. ]


In [19]:
import tensorflow as tf
import numpy as np

In [20]:
# Define the neural network architecture
input_size=4
hidden_size=8
output_size=2

# Create the input and target tensors
inputs= tf.keras.Input(shape=(input_size,))
targets = tf.keras.Input(shape=(output_size,))

In [21]:
# Define the weights and biases for the hidden layer
hidden_weights = tf.Variable(tf.random.normal(shape=(input_size,hidden_size)))
hidden_biases=tf.Variable(tf.zeros(shape=(hidden_size,)))

In [22]:
# Compute the hidden layer output with Leaky Relu activation function
hidden_layer_output = tf.nn.leaky_relu(tf.matmul(inputs, hidden_weights)+hidden_biases, alpha=0.2)



In [23]:
# Define the weights and biases for the output layer
output_weights = tf.Variable(tf.random.normal(shape=(hidden_size,output_size)))
output_biases= tf.Variable(tf.zeros(shape=(output_size,)))

In [24]:
# compute the final output
output=tf.matmul(hidden_layer_output,output_weights)+output_biases



In [25]:
# define the loss function

loss=tf.reduce_mean(tf.square(output-targets))

In [26]:
# define the optimizer
optimizer=tf.keras.optimizers.SGD(learning_rate=0.01)


In [28]:
# Create the model
model=tf.keras.Model(inputs=[inputs, targets], outputs=output)
model.add_loss(loss)

In [31]:
# compile the model
model.compile(optimizer=optimizer)

In [32]:
# Define your input and target data as numpy arrays
input_data =np.array([[1.0,2.0,3.0,4.0]])  #Replace with your input data
target_data = np.array([[0.5,0.8]]) #Replace the traget data

In [33]:
# Train the model
model.fit([input_data, target_data], epochs=1000, verbose=0)

<keras.src.callbacks.History at 0x1ea5b420050>

In [36]:
# Test the trained network
test_input=np.array([[1.0,2.0,3.0,4.0]])
test_target=np.array([[0.0,0.0]]) #Dumpy traget for prediction , not used
predicted_output= model.predict([test_input, test_target])



In [37]:
print(f"Predicted Output:{predicted_output}")

Predicted Output:[[ 10.10327  -12.118675]]


# 5. Parametric Rectified linear unit (PReLU)

# The parametric Rectified Linear Unit(PReLU) activation function is an extension of the leaky Relu activation function that allows the slope of the negative part of the function to be learned during the training process.Instead of using afixed slope value , PRelu intoduce a set of learnable parameters tha control the slope

The PReLU function is defned as foolows!

PReLU(x)=max(0,x)+alpha*min(0,x)

where x is the input value and alpha is a learnable parametre vector of the same shape as x . The aplha parameter determine the slope of negative inputs , allowing it to be different for each neurons or channel in a neural network