## NLP_Assignment_3
1. Explain the basic architecture of RNN cell.
2. Explain Backpropagation through time (BPTT)
3. Explain Vanishing and exploding gradients
4. Explain Long short-term memory (LSTM)
5. Explain Gated recurrent unit (GRU)
6. Explain Peephole LSTM
7. Bidirectional RNNs
8. Explain the gates of LSTM with equations.
9. Explain BiLSTM
10. Explain BiGRU

In [6]:
'''Ans 1:- A basic Recurrent Neural Network (RNN) cell consists of an
input, a hidden state, and an output. It takes an input and
combines it with the previous hidden state to produce an output and
update the hidden state. This architecture allows RNNs to capture
sequential information.In this example, a SimpleRNN cell is applied
to input_data with a sequence length of 10 and an input
dimension of 32. The output and hidden state are computed.

Layer (type): This column lists the layers in our model.
In this case, it includes an input layer (input_4), a lambda
layer (tf.compat.v1.shape_1), a slicing layer
(tf.__operators__.getitem_1), a lambda layer (tf.zeros_1), and the simple_rnn_cell_3
layer representing the RNN cell.  Output Shape: This column
specifies the shape of the output of each layer. For example, the
RNN cell's output shape is (None, 10, 128), indicating that it
produces sequences with 10 time steps and each time step has a
128-dimensional output.  Param #: This column displays the number of
trainable parameters in each layer. The RNN cell has 20,608
trainable parameters.'''

import tensorflow as tf

# Create an RNN cell
rnn_cell = tf.keras.layers.SimpleRNNCell(128)

# Input data with shape (batch_size, timesteps, input_dim)
input_data = tf.keras.layers.Input(shape=(10, 32))

# Initialize the RNN cell with an initial state
state = rnn_cell.get_initial_state(input_data)

# Apply the RNN cell to input_data
output, state_new = rnn_cell(input_data, state)

# Create a Keras model
model = tf.keras.Model(inputs=input_data, outputs=output)

# Print a summary of the model
model.summary()

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
 input_4 (InputLayer)        [(None, 10, 32)]             0         []                            
                                                                                                  
 tf.compat.v1.shape_1 (TFOp  (3,)                         0         ['input_4[0][0]']             
 Lambda)                                                                                          
                                                                                                  
 tf.__operators__.getitem_1  ()                           0         ['tf.compat.v1.shape_1[0][0]']
  (SlicingOpLambda)                                                                               
                                                                                              

In [3]:
'''Ans 2:- Backpropagation Through Time (BPTT) is a training
algorithm for recurrent neural networks (RNNs). It computes
gradients for weight updates by unfolding the network over a
sequence and applying backpropagation. It's used to train RNNs to
capture and learn from sequential data.In this code, BPTT is used
implicitly when training an RNN model (SimpleRNN) with sequential
data.

The model was trained for 10 epochs, and the training loss
decreased with each epoch, which is a positive sign. The training
loss values are recorded in the history.history['loss'] list.'''

import tensorflow as tf
import numpy as np

# Generate placeholder training data
X_train = np.random.rand(100, 10, 32)
y_train = np.random.rand(100, 10)

# Define an RNN layer
rnn = tf.keras.layers.SimpleRNN(64)

# Create a sequential model
model = tf.keras.Sequential([
    rnn,
    tf.keras.layers.Dense(10)
])

# Compile the model
model.compile(optimizer='adam', loss='mse')

# Train the model with the placeholder data
history = model.fit(X_train, y_train, epochs=10)

# Print training loss history
print(history.history['loss'])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
[0.8690770864486694, 0.4029095470905304, 0.29073643684387207, 0.250489741563797, 0.2071378380060196, 0.17755652964115143, 0.17128004133701324, 0.16293483972549438, 0.14598777890205383, 0.1348564475774765]


In [3]:
'''Ans 3:- Vanishing gradients occur during deep neural network
training when gradients become extremely small, causing slow or
stalled learning. Conversely, exploding gradients occur when
gradients become excessively large, leading to unstable training.
These issues hinder convergence. Proper weight initialization,
gradient clipping, or activation functions (e.g., ReLU) can
mitigate these problems.In the vanishing gradients example, the
sigmoid activation function may cause gradients to become small in
deep networks. In the exploding gradients example, the weights
are initialized with a high standard deviation, causing
gradients to become large during training.'''

# vanishing gradients

import tensorflow as tf
import numpy as np

model = tf.keras.Sequential([
    tf.keras.layers.Dense(10, activation='sigmoid'),
    tf.keras.layers.Dense(10, activation='sigmoid'),
    tf.keras.layers.Dense(10, activation='sigmoid')
])

# Compile the model
model.compile(optimizer='adam', loss='mse')

# Generate sample data
X_train = np.random.rand(100, 2)
y_train = np.random.randint(0, 2, size=100)

# Train the model
history = model.fit(X_train, y_train, epochs=10)

# Print training loss history
print("Vanishing Gradients - Training Loss History:")
print(history.history['loss'])


#  Exploding Gradients

model = tf.keras.Sequential([
    tf.keras.layers.Dense(10, activation='linear', kernel_initializer=tf.keras.initializers.RandomNormal(stddev=10.0)),
    tf.keras.layers.Dense(10, activation='linear', kernel_initializer=tf.keras.initializers.RandomNormal(stddev=10.0)),
    tf.keras.layers.Dense(10, activation='linear', kernel_initializer=tf.keras.initializers.RandomNormal(stddev=10.0))
])

# Compile the model
model.compile(optimizer='adam', loss='mse')

# Generate sample data
X_train = np.random.rand(100, 2)
y_train = np.random.randint(0, 2, size=100)

# Train the model
history = model.fit(X_train, y_train, epochs=10)

# Print training loss history
print("Exploding Gradients - Training Loss History:")
print(history.history['loss'])

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Vanishing Gradients - Training Loss History:
[0.25720012187957764, 0.2563176155090332, 0.2556900084018707, 0.2552420496940613, 0.2547486424446106, 0.25436991453170776, 0.2539815604686737, 0.25367727875709534, 0.25330430269241333, 0.25306200981140137]
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Exploding Gradients - Training Loss History:
[35930088.0, 35742128.0, 35563952.0, 35393660.0, 35224872.0, 35054092.0, 34890272.0, 34724108.0, 34561136.0, 34400020.0]


In [9]:
'''Ans 4:- Long Short-Term Memory (LSTM) is a type of recurrent
neural network (RNN) architecture designed to capture long-term
dependencies in sequential data. It uses specialized memory cells that
can store and retrieve information over extended sequences,
preventing the vanishing gradient problem. LSTMs are widely used in
natural language processing, speech recognition, and time series
prediction.'''

import tensorflow as tf
import numpy as np

# Generate some sample sequential
X_train = np.random.rand(100, 10, 32)

# Example target data (binary classification)
y_train = np.random.randint(0, 2, size=(100, 10))

# Convert the target data to one-hot encoding
y_train_one_hot = tf.keras.utils.to_categorical(y_train, num_classes=2)

# Create an LSTM-based sequential model
model = tf.keras.Sequential([
    tf.keras.layers.LSTM(64, input_shape=(10, 32), return_sequences=True),
    tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(2, activation='softmax'))
])

# Compile and train the model with the one-hot encoded target data
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train_one_hot, epochs=10, batch_size=32)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x159c045c7f0>

In [10]:
'''Ans 5:- Gated Recurrent Unit (GRU) is a type of recurrent neural
network (RNN) architecture that addresses vanishing gradient
issues. It uses two gates, update and reset gates, to control
information flow through the network. GRUs are computationally
efficient and capture long-term dependencies.'''

# using Keras

import tensorflow as tf

model = tf.keras.Sequential([
    tf.keras.layers.GRU(64, input_shape=(10, 32)),
    tf.keras.layers.Dense(10)
])

In [14]:
'''Ans 6:- We generate a sample input sequence X with shape (1, 10,
32) (1 sequence, 10 time steps, and 32 features). We create a
peephole LSTM model using tf.keras.layers.LSTM with
implementation=2 to enable peephole connections. We use the predict method
to make predictions based on the input sequence. Finally, we
print the predictions.

Peephole Long Short-Term Memory (LSTM) is an extension of
the standard LSTM architecture that adds connections from the
cell state to the gate units, allowing them to "peek" at the
cell state. This enables the gate units to consider the
long-term memory when making decisions. Example code in Keras to
create a peephole LSTM layer

The output is a 3D array. It has a Vast use cases
across various domains due to its ability to handle
sequential data and capture long-term dependencies'''

import tensorflow as tf
import numpy as np

# Generate a sample sequential data

# input data with 1 sequence, each with 10 time steps and 32 features
X = np.random.rand(1, 10, 32)  

# Create a peephole LSTM model
model = tf.keras.Sequential([
    tf.keras.layers.LSTM(24, input_shape=(10, 32), implementation=2, return_sequences=True)  # Peephole LSTM layer
])

# Make predictions with the model
predictions = model.predict(X)

# Print the predictions
print("Predictions:")
print(predictions)

Predictions:
[[[ 0.26733524 -0.00362986  0.02749137 -0.07468929  0.06776802
   -0.09303928 -0.20310137 -0.05415118  0.02489976  0.02617916
    0.06073262 -0.01580836 -0.01431295  0.13082503  0.2457199
   -0.16151677  0.03091986 -0.00437271 -0.0246149   0.06454744
   -0.06482319  0.06943206  0.02438603  0.17618304]
  [ 0.25557613  0.00374753  0.00423087 -0.13687383 -0.02286217
   -0.14553389 -0.1951163  -0.05624678  0.00701665 -0.00407098
    0.11107236 -0.04660718 -0.12474928  0.05307633  0.26353997
   -0.24102935 -0.04848466 -0.10405404 -0.14694332  0.17351958
   -0.10923171  0.04254302  0.00404397  0.20753352]
  [ 0.4286884   0.09452787  0.02479679 -0.09113618 -0.06155593
   -0.14676446 -0.18467808 -0.0367738  -0.06626655 -0.0326254
    0.13739455 -0.06312048 -0.15425849  0.1423896   0.13180794
   -0.2876808  -0.04818841 -0.15206318 -0.2232304   0.14884125
   -0.04083747  0.0245479  -0.03587583  0.22523478]
  [ 0.43247253  0.11147507  0.08165215 -0.14137278 -0.02024639
   -0.1402646 

In [15]:
'''Ans 7:- Bidirectional Recurrent Neural Networks (Bi-RNNs) process
input sequences in both forward and reverse directions. They
capture contextual information from past and future time steps
simultaneously. This is useful in natural language processing tasks like
machine translation, sentiment analysis, and speech recognition,
where understanding the entire context is crucial. Bi-RNNs
consist of two hidden layers one for forward processing and one
for reverse processing.

We use the Bidirectional wrapper to create a Bi-LSTM
layer. The input shape is (10, 32), representing a sequence with
10 time steps and 32 features. The model is compiled for
binary classification using the Adam optimizer and binary
cross-entropy loss.'''

import tensorflow as tf
from tensorflow.keras.layers import Bidirectional, LSTM, Dense

# Create a Sequential model
model = tf.keras.Sequential()

# Add a Bidirectional LSTM layer
model.add(Bidirectional(LSTM(64, return_sequences=True), input_shape=(10, 32)))

# Add an output Dense layer
model.add(Dense(1, activation='sigmoid'))

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Print the model summary
model.summary()

Model: "sequential_15"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 bidirectional (Bidirection  (None, 10, 128)           49664     
 al)                                                             
                                                                 
 dense_19 (Dense)            (None, 10, 1)             129       
                                                                 
Total params: 49793 (194.50 KB)
Trainable params: 49793 (194.50 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


In [17]:
'''Ans 8:- Long Short-Term Memory (LSTM) networks employ gates to
manage information flow. The forget gate (f_t) controls what to
discard from the cell state, while the input gate (i_t) manages
what to add. The candidate cell state (~C_t) proposes an
update, and the output gate (o_t) determines the next hidden state
(h_t). These equations work together to handle sequential data
with long-term dependencies.

Forget Gate (f_t) Equation:-
f_t = σ(W_f * [h_{t-1}, x_t] + b_f)

Input Gate (i_t) Equation:-
i_t = σ(W_i * [h_{t-1}, x_t] + b_i)

Cell State Update (~C_t) Equation:-
~C_t = tanh(W_c * [h_{t-1}, x_t] + b_c)

Update Cell State (C_t) Equation:-
C_t = f_t * C_{t-1} + i_t * ~C_t

Output Gate (o_t) Equation:-
o_t = σ(W_o * [h_{t-1}, x_t] + b_o)

Final Hidden State (h_t) Equation:-
h_t = o_t * tanh(C_t)

1. σ represents the sigmoid activation function, which
outputs values between 0 and 1.

2. tanh represents the hyperbolic tangent activation
function, which outputs values between -1 and 1.

3. [h_{t-1}, x_t] denotes the concatenation of the previous
hidden state (h_{t-1}) and the current input (x_t).

4. W_f, b_f, W_i, b_i, W_c, b_c, W_o, and b_o are weight
matrices and bias vectors learned during training.

We create a simple model with an LSTM layer. We access the
LSTM layer using model.layers[0]. We retrieve the weights of
the LSTM layer using lstm_layer.get_weights(), which returns a
list containing the weights matrices (e.g., kernel weights,
recurrent kernel weights, bias vectors). then iterate through the list
of weights and print each weight matrix.'''

import tensorflow as tf
from tensorflow.keras.layers import LSTM

# simple model with an LSTM layer
model = tf.keras.Sequential([
    LSTM(units=64, return_sequences=True, input_shape=(10, 32)),
])


# Print the LSTM layer's weights
lstm_layer = model.layers[0] 
weights = lstm_layer.get_weights() 

# Print the weights
for i, weight in enumerate(weights):
    print(f"Weight {i + 1}:")
    print(weight)

Weight 1:
[[ 0.1044092   0.13617367  0.03686866 ... -0.02717565 -0.07085549
   0.06958772]
 [-0.12940887  0.0287201   0.02803287 ... -0.09724805  0.12399828
   0.08343121]
 [-0.10845497  0.04670468 -0.1403206  ... -0.00101167 -0.11165921
   0.1121667 ]
 ...
 [ 0.08135447 -0.044175    0.03935732 ... -0.08649188  0.11829746
  -0.00579573]
 [ 0.0428582   0.07077913 -0.01356767 ... -0.1113455  -0.0474661
  -0.12852366]
 [ 0.03423195  0.12291601 -0.03204381 ...  0.06821437 -0.04079828
  -0.04033189]]
Weight 2:
[[-0.00762558  0.02626179  0.15771492 ...  0.07721288 -0.01652937
   0.02090484]
 [ 0.00017905 -0.05924153  0.04126389 ...  0.01686445  0.00767494
  -0.04708205]
 [ 0.17573795  0.00896977  0.1720978  ... -0.11885785 -0.06003467
  -0.07625915]
 ...
 [ 0.06276384 -0.03095829 -0.0131504  ... -0.04879331  0.03970512
  -0.10233448]
 [-0.02946273  0.08917596  0.0979659  ...  0.05824479 -0.00486016
  -0.00735464]
 [ 0.0097138   0.02031307  0.10451888 ... -0.03391834  0.0608415
  -0.0461176 ]

In [18]:
'''Ans 9:- Bidirectional Long Short-Term Memory (BiLSTM) is an
extension of LSTM that processes input sequences in both forward and
reverse directions, capturing context from past and future. This
improves understanding of sequence data. Below is a code example
using TensorFlow/Keras. This code creates a BiLSTM layer for
sequence processing, enhancing its ability to capture context from
both directions.'''

import tensorflow as tf
from tensorflow.keras.layers import Bidirectional, LSTM

model = tf.keras.Sequential([
    Bidirectional(LSTM(64, return_sequences=True), input_shape=(10, 32))
])

In [19]:
'''Ans 10:- Bidirectional Gated Recurrent Unit (BiGRU) is an extension
of the GRU architecture that processes input sequences in
both forward and reverse directions, capturing context from
past and future. This improves understanding of sequence data.
Below is a code example using TensorFlow/Keras to create a BiGRU
layer.This code creates a BiGRU layer for sequence processing,
enhancing its ability to capture context from both directions.'''

import tensorflow as tf
from tensorflow.keras.layers import Bidirectional, GRU

model = tf.keras.Sequential([
    Bidirectional(GRU(64, return_sequences=True), input_shape=(10, 32))
])