###Literature Review

The evolution from Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) models to Transformer architectures marks a significant milestone in natural language processing and sequence modeling. RNNs were early attempts to capture sequential dependencies, but they faced challenges in handling long-range dependencies due to vanishing or exploding gradients. LSTMs addressed some of these issues by introducing memory cells, enabling better learning of long-term dependencies. However, both RNNs and LSTMs suffer from sequential processing limitations, hindering parallelization.

The Transformer model, introduced by Vaswani et al. in 2017, revolutionized sequence modeling by introducing the attention mechanism. This architecture eschews sequential processing in favor of parallelization, making it highly efficient for handling sequential data. The attention mechanism allows the model to focus on different parts of the input sequence when making predictions, addressing the vanishing gradient problem and enabling the capture of long-range dependencies more effectively than its predecessors

In [34]:
# Import necessary libraries
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Input, Embedding, SimpleRNN, LSTM, Dense
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing.sequence import pad_sequences
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from transformers import AutoTokenizer, TFBertForSequenceClassification
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
import tensorflow as tf
import warnings
import pandas as pd
warnings.filterwarnings('ignore')

In [2]:
# Load IMDB dataset
max_features = 10000
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)

In [3]:
# Preprocess the data
maxlen = 100
x_train = pad_sequences(x_train, maxlen=maxlen)
x_test = pad_sequences(x_test, maxlen=maxlen)

In [4]:
# Build the Simple RNN model
model_rnn = Sequential()
model_rnn.add(Embedding(max_features, 32, input_length=maxlen))
model_rnn.add(SimpleRNN(32))
model_rnn.add(Dense(1, activation='sigmoid'))

In [5]:
# Compile and train the Simple RNN model
model_rnn.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model_rnn.fit(x_train, y_train, epochs=10, batch_size=128, validation_split=0.2)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x7e8b4715f7f0>

In [6]:
# Evaluate the performance
accuracy_rnn = model_rnn.evaluate(x_test, y_test)[1]
print(f"Accuracy of the Simple RNN model: {accuracy_rnn}")

Accuracy of the Simple RNN model: 0.8042799830436707


In [7]:
# Build the LSTM model
model_lstm = Sequential()
model_lstm.add(Embedding(max_features, 32, input_length=maxlen))
model_lstm.add(LSTM(32))
model_lstm.add(Dense(1, activation='sigmoid'))

In [8]:
# Compile and train the LSTM model
model_lstm.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model_lstm.fit(x_train, y_train, epochs=10, batch_size=128, validation_split=0.2)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x7e8b4a04ffd0>

In [9]:
# Evaluate the performance
accuracy_lstm = model_lstm.evaluate(x_test, y_test)[1]
print(f"Accuracy of the LSTM model: {accuracy_lstm}")

Accuracy of the LSTM model: 0.8217599987983704


In [10]:
# Import necessary libraries
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Embedding, LSTM, Dense, Attention, GlobalAveragePooling1D, Dot, concatenate

In [11]:
# Build the LSTM model with attention
input_layer = Input(shape=(maxlen,))
embedding_layer = Embedding(max_features, 32)(input_layer)
lstm_layer, state_h, state_c = LSTM(32, return_sequences=True, return_state=True)(embedding_layer)
attention = Attention()([lstm_layer, lstm_layer])
context = Dot(axes=-1)([attention, lstm_layer])
merged = concatenate([context, lstm_layer])
pooled = GlobalAveragePooling1D()(merged)
output = Dense(1, activation='sigmoid')(pooled)

model_lstm_attention = Model(inputs=input_layer, outputs=output)

In [12]:
# Compile and train the LSTM model with attention
model_lstm_attention.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model_lstm_attention.fit(x_train, y_train, epochs=10, batch_size=128, validation_split=0.2)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x7e8b49e1fc70>

In [13]:
# Evaluate the performance
accuracy_lstm_attention = model_lstm_attention.evaluate(x_test, y_test)[1]
print(f"Accuracy of the LSTM with Attention model: {accuracy_lstm_attention}")

Accuracy of the LSTM with Attention model: 0.7872800230979919


In [14]:
# Load the CSV file for Amazon review dataset
amazon_df = pd.read_csv('/content/sample_data/amazon_reviews_sample.csv')

In [15]:
amazon_df

Unnamed: 0.1,Unnamed: 0,score,review
0,0,1,Stuning even for the non-gamer: This sound tr...
1,1,1,The best soundtrack ever to anything.: I'm re...
2,2,1,Amazing!: This soundtrack is my favorite musi...
3,3,1,Excellent Soundtrack: I truly like this sound...
4,4,1,"Remember, Pull Your Jaw Off The Floor After H..."
...,...,...,...
9995,9995,1,A revelation of life in small town America in...
9996,9996,1,Great biography of a very interesting journal...
9997,9997,0,Interesting Subject; Poor Presentation: You'd...
9998,9998,0,Don't buy: The box looked used and it is obvi...


In [16]:
amazon_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Unnamed: 0  10000 non-null  int64 
 1   score       10000 non-null  int64 
 2   review      10000 non-null  object
dtypes: int64(2), object(1)
memory usage: 234.5+ KB


In [17]:
amazon_df = amazon_df.drop('Unnamed: 0', axis=1)

In [18]:
amazon_df

Unnamed: 0,score,review
0,1,Stuning even for the non-gamer: This sound tr...
1,1,The best soundtrack ever to anything.: I'm re...
2,1,Amazing!: This soundtrack is my favorite musi...
3,1,Excellent Soundtrack: I truly like this sound...
4,1,"Remember, Pull Your Jaw Off The Floor After H..."
...,...,...
9995,1,A revelation of life in small town America in...
9996,1,Great biography of a very interesting journal...
9997,0,Interesting Subject; Poor Presentation: You'd...
9998,0,Don't buy: The box looked used and it is obvi...


In [19]:
# Encode sentiment labels (assuming 'score' column contains labels)
label_encoder = LabelEncoder()
amazon_df['label'] = label_encoder.fit_transform(amazon_df['score'])

In [20]:
# Split the dataset into training and testing sets
train_amazon_df, test_amazon_df = train_test_split(amazon_df, test_size=0.2, random_state=42)

In [21]:
# Tokenize and pad sequences
max_sequence_length = 100
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")


In [22]:
# Tokenize training data
X_train_transformer_amazon = tokenizer(list(train_amazon_df['review']), padding=True, truncation=True, return_tensors="tf", max_length=max_sequence_length)

# Tokenize testing data
X_test_transformer_amazon = tokenizer(list(test_amazon_df['review']), padding=True, truncation=True, return_tensors="tf", max_length=max_sequence_length)

In [23]:
# Build the Transformer model
model_transformer_amazon = TFBertForSequenceClassification.from_pretrained("bert-base-uncased")
input_ids_amazon = Input(shape=(max_sequence_length,), dtype=tf.int32)
outputs_amazon = model_transformer_amazon(input_ids_amazon)['logits']
outputs_amazon = Dense(1, activation='sigmoid')(outputs_amazon)

model_amazon = Model(inputs=input_ids_amazon, outputs=outputs_amazon)


All PyTorch model weights were used when initializing TFBertForSequenceClassification.

Some weights or buffers of the TF 2.0 model TFBertForSequenceClassification were not initialized from the PyTorch model and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [24]:
# Compile the model
model_amazon.compile(optimizer=Adam(lr=2e-5), loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
model_amazon.fit(X_train_transformer_amazon['input_ids'], train_amazon_df['label'].values, epochs=5, batch_size=32, validation_split=0.2)



Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.src.callbacks.History at 0x7e8b4009ae00>

In [25]:
# Evaluate the model
accuracy_amazon = model_amazon.evaluate(X_test_transformer_amazon['input_ids'], test_amazon_df['label'].values)[1]
print(f"Accuracy: {accuracy_amazon}")

Accuracy: 0.4814999997615814


In [27]:
# Define learning rate and other hyperparameters
learning_rate = 2e-5
epochs = 5
batch_size = 32

In [28]:
# Build the Transformer model
model_transformer_amazon = TFBertForSequenceClassification.from_pretrained("bert-base-uncased")
input_ids_amazon = Input(shape=(max_sequence_length,), dtype=tf.int32)
outputs_amazon = model_transformer_amazon(input_ids_amazon)['logits']
outputs_amazon = Dense(1, activation='sigmoid')(outputs_amazon)

model_amazon = Model(inputs=input_ids_amazon, outputs=outputs_amazon)

All PyTorch model weights were used when initializing TFBertForSequenceClassification.

Some weights or buffers of the TF 2.0 model TFBertForSequenceClassification were not initialized from the PyTorch model and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [29]:
# Compile the model with the Adam optimizer and a custom learning rate
optimizer = Adam(learning_rate=learning_rate)
model_amazon.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])

In [30]:
# Implement early stopping to prevent overfitting
early_stopping = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

In [31]:

# Save the best model during training
model_checkpoint = ModelCheckpoint('best_model.h5', save_best_only=True)

In [32]:
# Train the model with the modified hyperparameters and callbacks
history = model_amazon.fit(
    X_train_transformer_amazon['input_ids'],
    train_amazon_df['label'].values,
    epochs=epochs,
    batch_size=batch_size,
    validation_split=0.2,
    callbacks=[early_stopping, model_checkpoint]
)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [33]:
# Evaluate the model on the test set
accuracy_amazon = model_amazon.evaluate(X_test_transformer_amazon['input_ids'], test_amazon_df['label'].values)[1]
print(f"Accuracy: {accuracy_amazon}")

Accuracy: 0.9265000224113464
