# 3. ReLU(rectified Linear Unnit)
The rectified Linear unit (ReLu) activation function is a popular choice in deep learning neural networks due to its simplicity and effectiveness. It introduce no-linearity
to the network, allowing it to learn and approximate complex relationship between inputs and outputs. RelU has become a standard activation function in many deep learning architectures.

The ReLU function is define as follows:
    ReLU(x) = max(0,X)
In other words, ReLU Takes an input value x and returns the maximum of 0 and x. If x is greater than 0, ReLU outputs x directly.
If x is less than or equal to 0, ReLU outputs 0. Therefore, the function "rectifies" negative values to 0 , while leaving positive values unchanged.

# . ReLU has sevral desirable properties that make it an attractive choice:

* Simplicity: ReLU is a simple mathematical function with low computational complexity. It only involves a single comparison and a maximum operartion.
* Non-linearity: ReLU introduces non-linear behavior to the networrk, which allows it to learn and represent complex pattern in the data. This non-linearty is crucial for modeling highly non-linear relationship between inputs and outputs.
* Sparse-activation: ReLU activation are sparse, meaning that only a subset of the neurons in a layer will be activated at any 

3. sparse activation : ReLu acitivation are spares, meaning that  only a subset of the neurons in a layer will be acitvated at any given time. This sparsity can lead to more effecient and expressive representatons, as it encourages the network to focous on the most relevant features.

4. avoiding the vanishing gradient problem: ReLu helps mitigate the vanishing gradient problem, which can occur when training deep neaural networks. The vanishing gradient problem regers to the issue of gradients diminshing exponetially as they are backpropagated through many layers. Since ReLu does not saturate in the +ve range(i.e, gradients is 1 for positive inputs), if allows gradients to flow more freely and prevenst then from vanishing.

5. efficient to evalute the ReLu function and its derivative.

Disadvantages of ReLU:
(1), Dead neurons: During training, some neurons may become "dead" or "dying" as they never activate (output zero) for any input. This happens when the neuron's bias term is initialized in such a way that the weighted sum of inputs is always negative. Once a neuron becomes dead, it cannot recover because the gradient of the ReLU function is zero for negative inputs. Dead neurons can lead to a decrease in the model's representational capacity.
(2). Output saturation: ReLU saturates at zero for negative inputs. This means that when the input is negative, the gradient becomes zero, causing the neuron to be non-responsive to further changes. This saturation behavior can limit the ability of the model to learn effectively, especially in cases where negative inputs are relevant for the task.
(3). Lack of negative output: ReLU only allows positive values or zero as output, which can be a disadvantage for certain tasks. Some data distributions or problem domains may benefit from having negative values in the output space. For example, in image generation tasks, negative pixel values can represent dark regions.
(4). Gradient explosion: Although ReLU mitigates the vanishing gradient problem, it can still suffer from the opposite issue of gradient explosion, especially when used in deep neural networks. If the learning rate is not properly adjusted, large positive gradients can propagate through the network, causing instability and making it difficult to converge to an optimal solution.
Adityal

In [22]:
import tensorflow as tf
import numpy as np
from tensorflow import keras

# define the neural network architecture 
input_size = 4
hidden_size = 8
output_size = 2

# define the model 
model= keras.Sequential([
    keras.layers.Dense(hidden_size, activation='relu', input_shape=(input_size,)),
    keras.layers.Dense(output_size)
])

In [23]:
# compile the model
model.compile(optimizer=keras.optimizers.SGD(learning_rate=0.01),
             loss = keras.losses.MeanSquaredError())

In [24]:
# define the input and target data as Numpy arrays
input_data = np.array([[1.0,2.0,3.0,4.0],
                      [2.0,3.0,4.0,5.0],
                      [3.0,4.0,5.0,6.0]])
target_data = np.array([[0.5,0.8],
                       [0.6,0.9],
                       [0.7,1.0]])

In [25]:
# train the model 
model.fit(input_data, target_data, epochs=1000, verbose=0)

# test the model
test_input = np.array([[1.0,2.0,3.0,4.0]])
predicted_output = model.predict(test_input)

print(f'Predicted Output: {predicted_output}')

Predicted Output: [[0.48545155 0.8061829 ]]


# The Leaky Rectified Linear Unit (Linear ReLU) activation function is a variation of the ReLU activation function that address some of the limitation of the standard ReLU. 
it introduces a small slope for negative values, allowing the activation function to have non-zero outputs even for negative inputs. This helps mitigate the issue of "dying" or "dead"
neuronns in ReLU.

In [26]:
# define the neural network architecture 
input_size = 4
hidden_size = 8
output_size = 2

# create the input and target tensor
inputs = tf.keras.Input(shape=(input_size,))
targets = tf.keras.Input(shape=(output_size,))

# define the weights and biases for the hidden layer
hidden_weights = tf.Variable(tf.random.normal(shape=(input_size, hidden_size)))
hidden_biases = tf.Variable(tf.zeros(shape=(hidden_size,)))

# compute the hidden layer output with leaky ReLU activation function
hidden_layer_output = tf.nn.leaky_relu(tf.matmul(inputs, hidden_weights) + hidden_biases, alpha=0.2)

# define the weights and biases for the output layer
output_weights = tf.Variable(tf.random.normal(shape=(hidden_size, output_size)))
output_biases = tf.Variable(tf.zeros(shape=(output_size,)))





In [27]:
# compute the final output
output = tf.matmul(hidden_layer_output, output_weights) + output_biases



In [28]:
# define the loss function
loss = tf.reduce_mean(tf.square(output - targets))


In [36]:
# define the optimizer
optimizer = tf.keras.optimizers.SGD(learning_rate= 0.01)

In [37]:
# create the model 
model = tf.keras.Model(inputs=[inputs, targets], outputs=output)
model.add_loss(loss)

In [38]:
# compile the model
model.compile(optimizer=optimizer)


In [39]:
# define your input and target data as numpy  arrays
input_data = np.array([[1.0,2.0,3.0,4.0]])
target_data = np.array([[0.5,0.8]])

In [40]:
# train the model
model.fit([input_data , target_data], epochs=1000, verbose=0)

<keras.src.callbacks.History at 0x160df32f090>

In [42]:
# test the trained network
test_input = np.array([[1.0,2.0,3.0,4.0]])
test_target = np.array([[0.0,0.0]])
predicted_output = model.predict([test_input, test_target])



In [43]:
print(f'Predicted Output: {predicted_output}')

Predicted Output: [[15.760935 24.332247]]
