## NLP_Assignment_4
1. Can you think of a few applications for a sequence-to-sequence RNN? What about a sequence-to-vector RNN? And a vector-to- 
   sequence RNN?
2. Why do people use encoder–decoder RNNs rather than plain sequence-to-sequence RNNs for automatic translation?
3. How could you combine a convolutional neural network with an RNN to classify videos?
4. What are the advantages of building an RNN using dynamic_rnn() rather than static_rnn()?
5. How can you deal with variable-length input sequences? What about variable-length output sequences?
6. What is a common way to distribute training and execution of a deep RNN across multiple GPUs?

In [None]:
'''Ans 1:- Sequence-to-Sequence RNN (Seq2Seq):-
Machine Translation: Seq2Seq RNNs are widely used for
translating text from one language to another, where the input is a
sentence in one language, and the output is the corresponding
translation.

Speech Recognition: In automatic speech recognition (ASR),
Seq2Seq models convert spoken language into written text, where
the input is an audio sequence, and the output is the
transcribed text.

Text Summarization: They can generate concise summaries of
long texts, where the input is a document, and the output is a
shorter summary.

Chatbots: Seq2Seq models enable conversational agents to
generate human-like responses by mapping user input sequences to
appropriate responses.

2. Sequence-to-Vector RNN:-
Sentiment Analysis: Sequence data (e.g., a sentence) is
transformed into a fixed-length vector representation, which is then
used to classify the sentiment of the text.

Document Classification: It's used to categorize entire
documents or articles into predefined classes based on their
content.

Named Entity Recognition (NER): NER models can tag
sequences of words in a document to identify entities like names,
dates, and locations.

3. Vector-to-Sequence RNN:-
Image Captioning: In this case, a vector representation
(e.g., features extracted from an image) is converted into a
sequence of words to generate captions or descriptions.

Music Generation: Given a vector representation of musical
notes or features, an RNN can produce a musical score or
sequence of notes.

Video Description Generation: It can describe video
content by converting video features into a natural language
description.

Each of these RNN architectures addresses specific tasks
by either mapping sequences to sequences, sequences to fixed
vectors, or fixed vectors to sequences, making them versatile for
various applications in natural language processing, speech
recognition, computer vision, and more.'''

In [4]:
'''Ans 2:- Encoder-decoder RNNs are favored over plain
sequence-to-sequence RNNs for automatic translation because they effectively
handle variable-length input and output sequences. The encoder
processes the input sequence, creating a fixed-length context vector
that captures the entire input context, which the decoder then
uses to generate the output sequence. This design allows for
better handling of long and complex sentences during translation.

This code defines and prints the summary of a
sequence-to-sequence neural machine translation model using Keras and
TensorFlow. It consists of an encoder and a decoder, each with an LSTM
layer, and is suitable for tasks like language translation. The
model.summary() statement displays a summary of the model's architecture
and parameter details.'''

import tensorflow as tf
from tensorflow.keras.layers import Input, Embedding, LSTM, Dense
from tensorflow.keras.models import Model

# Define your hyperparameters and data variables
max_input_length = 100
input_vocab_size = 5000
embedding_dim = 128
hidden_units = 256
max_output_length = 100
output_vocab_size = 6000

# Encoder
encoder_inputs = Input(shape=(max_input_length,))
encoder_embedding = Embedding(input_dim=input_vocab_size, output_dim=embedding_dim)(encoder_inputs)
encoder_lstm = LSTM(units=hidden_units, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(encoder_embedding)

# Decoder
decoder_inputs = Input(shape=(max_output_length,))
decoder_embedding = Embedding(input_dim=output_vocab_size, output_dim=embedding_dim)(decoder_inputs)
decoder_lstm = LSTM(units=hidden_units, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_embedding, initial_state=[state_h, state_c])
decoder_dense = Dense(output_vocab_size, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.summary()

Model: "model_1"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
 input_3 (InputLayer)        [(None, 100)]                0         []                            
                                                                                                  
 input_4 (InputLayer)        [(None, 100)]                0         []                            
                                                                                                  
 embedding_2 (Embedding)     (None, 100, 128)             640000    ['input_3[0][0]']             
                                                                                                  
 embedding_3 (Embedding)     (None, 100, 128)             768000    ['input_4[0][0]']             
                                                                                            

In [7]:
'''Ans 3:- To combine a Convolutional Neural Network (CNN) with a
Recurrent Neural Network (RNN) for video classification, you can use
a two-stream architecture. This approach processes spatial
and temporal information separately, improving video
understanding. 

1. Spatial Stream (CNN): Use a 3D CNN to extract spatial features
from individual frames in the video.

2. Temporal Stream (RNN): Use an RNN (e.g., LSTM) to model the temporal sequence
of feature vectors generated by the CNN.

3. Fusion: Combine the outputs of the CNN and RNN layers to make a final prediction.

This code sets up a two-stream architecture where the CNN
processes spatial information, the RNN captures temporal dynamics,
and the fused information is used for video classification.
Adjust parameters to match your specific task and dataset.'''

import tensorflow as tf
from tensorflow.keras.layers import Input, Conv3D, LSTM, Dense, Flatten, concatenate

# Define your hyperparameters and data variables
frames = 30  # Number of frames in each video
height = 64  # Height of each frame
width = 64   # Width of each frame
channels = 3  # Number of color channels (e.g., RGB)
cnn_output_dim = 256  # Output dimension of the CNN
num_classes = 10  # Number of video classes for classification

# Spatial Stream (CNN)
input_shape = (frames, height, width, channels)
cnn_input = Input(shape=input_shape)
cnn = Conv3D(filters=64, kernel_size=(3, 3, 3), activation='relu')(cnn_input)
cnn = Flatten()(cnn)

# Temporal Stream (RNN)
rnn_input = Input(shape=(frames, cnn_output_dim))
rnn = LSTM(units=256)(rnn_input)

# Fusion
combined = concatenate([cnn, rnn])
output = Dense(num_classes, activation='softmax')(combined)

model = tf.keras.Model(inputs=[cnn_input, rnn_input], outputs=output)
model.summary()

Model: "model_3"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
 input_7 (InputLayer)        [(None, 30, 64, 64, 3)]      0         []                            
                                                                                                  
 conv3d_1 (Conv3D)           (None, 28, 62, 62, 64)       5248      ['input_7[0][0]']             
                                                                                                  
 input_8 (InputLayer)        [(None, 30, 256)]            0         []                            
                                                                                                  
 flatten_1 (Flatten)         (None, 6888448)              0         ['conv3d_1[0][0]']            
                                                                                            

In [11]:
'''Ans 4:- Using dynamic_rnn() in TensorFlow offers advantages over
static_rnn() for dynamic sequence lengths. dynamic_rnn() handles
variable-length sequences efficiently during runtime, while static_rnn()
requires fixed sequence lengths at graph construction. For
instance, in natural language processing, dynamic_rnn() accommodates
sentences of varying lengths, improving flexibility and reducing
memory usage Here, seq_length provides the actual sequence lengths for
variable-length input sequences.'''




In [16]:
'''Ans 5:-  To handle variable-length input sequences and variable-length output
sequences in natural language processing tasks like sequence-to-sequence
models we use.

Variable-Length Input Sequences:-

Padding: Pad shorter sequences with a special token (usually zero)
to match the length of the longest sequence in the batch. 
This ensures uniform input dimensions.

Masking: Create a binary mask that identifies valid elements in the
input sequence and ignores padded tokens during computation. 
Many deep learning frameworks provide built-in support for masking.

Variable-Length Output Sequences:-

Padding: Similar to input sequences, pad shorter output sequences to match
the length of the longest sequence in the batch. Padding can be applied to both 
target and predicted sequences. 

Masking: Create a mask for output sequences to handle variable-length targets.
This mask ensures that loss calculations consider only valid positions
in the target sequence.'''

import tensorflow as tf

# Replace 'input_sequences' and 'target_sequences' with your actual data
input_sequences = [
    [1, 2, 3],
    [4, 5],
    [6, 7, 8, 9]
]

target_sequences = [
    [10, 11, 12, 13],
    [14, 15],
    [16, 17, 18]
]

# Padding and masking for input sequences
padded_inputs = tf.keras.preprocessing.sequence.pad_sequences(input_sequences, padding='post')
input_mask = tf.math.not_equal(padded_inputs, 0)

# Padding and masking for output sequences
padded_targets = tf.keras.preprocessing.sequence.pad_sequences(target_sequences, padding='post')
output_mask = tf.math.not_equal(padded_targets, 0)

# Print the results
print("Padded Inputs:")
print(padded_inputs)
print("Input Mask:")
print(input_mask)

print("Padded Targets:")
print(padded_targets)
print("Output Mask:")
print(output_mask)

Padded Inputs:
[[1 2 3 0]
 [4 5 0 0]
 [6 7 8 9]]
Input Mask:
tf.Tensor(
[[ True  True  True False]
 [ True  True False False]
 [ True  True  True  True]], shape=(3, 4), dtype=bool)
Padded Targets:
[[10 11 12 13]
 [14 15  0  0]
 [16 17 18  0]]
Output Mask:
tf.Tensor(
[[ True  True  True  True]
 [ True  True False False]
 [ True  True  True False]], shape=(3, 4), dtype=bool)


In [19]:
'''Ans 6:- A common method for distributing training and execution of a deep recurrent
neural network (RNN) across multiple GPUs is data parallelism. This approach involves
dividing the dataset into batches and assigning each batch to a separate GPU. Key steps
include splitting the model, processing data in parallel, aggregating gradients, and
updating parameters. Frameworks like TensorFlow and PyTorch offer tools for
implementing data parallelism.'''


