<a href="https://colab.research.google.com/github/ArunKoundinya/DeepLearning/blob/main/posts/deep-learning-project-msis/AmazonReviews_Part8.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this notebook we will taking the previous model of basic Attention and then following the below steps

*   Increasing Model Complexity
*   Regularizing the Model
*   Hypertuning the Model
*   Testing the Model on 10K dataset
*   Finally testing the model on 1 Lakh (0.1 Million) dataset


## Table of Contents
- [1 - Packages](#1)
- [2 - Loading the Dataset](#2)
- [3 - Pre-Processing the Data](#3)
- [4 - Model-1 Attention Model](#4)
- [5 - Model-2 Increasing Layers and Units](#5)
- [6 - Model-3 Adding Regularization](#6)
- [7 - Model-4 Hypertuning the model](#7)
- [8 - Model-5 Testing Model on 10K dataset](#8)
- [9 - Model-6 Testing Model on 1 Lac dataset](#9)

<a name='1'></a>
## 1 - Loading the Packages

In [None]:
!pip install pandarallel

Collecting pandarallel
  Downloading pandarallel-1.6.5.tar.gz (14 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting dill>=0.3.1 (from pandarallel)
  Downloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: pandarallel
  Building wheel for pandarallel (setup.py) ... [?25l[?25hdone
  Created wheel for pandarallel: filename=pandarallel-1.6.5-py3-none-any.whl size=16673 sha256=be7381230ffce7eef1c746056f9539342fe9d214e2273eb451de3ba03a6a86ff
  Stored in directory: /root/.cache/pip/wheels/50/4f/1e/34e057bb868842209f1623f195b74fd7eda229308a7352d47f
Successfully built pandarallel
Installing collected packages: dill, pandarallel
Successfully installed dill-0.3.8 pandarallel-1.6.5


In [None]:
from google.colab import drive
import os
import pandas as pd
import numpy as np
import tensorflow as tf

from sklearn.model_selection import train_test_split

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding, LSTM, SimpleRNN, GRU, Bidirectional,Input,Dropout,Flatten,Reshape
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.initializers import GlorotUniform
from tensorflow.keras.models import Model

from sklearn.metrics import accuracy_score, classification_report

from pandarallel import pandarallel

<a name='2'></a>
## 2 - Loading the Data

In [None]:
drive.mount('/content/drive')
os.chdir('/content/drive/My Drive/MSIS/IntroductiontoDeepLearning/Project/')

testdata = pd.read_csv('test_data_sample_complete.csv')
traindata = pd.read_csv('train_data_sample_complete.csv')


Mounted at /content/drive


In [None]:
pandarallel.initialize(progress_bar=True)

INFO: Pandarallel will run on 4 workers.
INFO: Pandarallel will use Memory file system to transfer data between the main process and workers.


<a name='3'></a>
## 3 - Pre-Processing the Data

In [None]:
train_data = traindata.sample(n=10000, random_state=42)
test_data = testdata.sample(n=1000, random_state=42)

train_data['class_index'] = train_data['class_index'].map({1:0, 2:1})
test_data['class_index'] = test_data['class_index'].map({1:0, 2:1})

train_data['review_combined_lemma'] = train_data['review_combined_lemma'].fillna('')
test_data['review_combined_lemma'] = test_data['review_combined_lemma'].fillna('')

X_train = train_data.review_combined_lemma
y_train = np.array(train_data.class_index)

X = test_data.review_combined_lemma
y = np.array(test_data.class_index)

tokenizer = Tokenizer(oov_token="<UNK>",)
tokenizer.fit_on_texts(X_train)

tokenizer.word_index['<PAD>'] = 0

X_sequences_train = tokenizer.texts_to_sequences(X_train)

X_sequences = tokenizer.texts_to_sequences(X)

X_train = pad_sequences(X_sequences_train, maxlen=100)
X = pad_sequences(X_sequences, maxlen=100)

X_dev, X_test, y_dev, y_test = train_test_split(X, y, test_size=0.5, random_state=42)

print(X_train.shape)
print(X_dev.shape)
print(X_test.shape)

(10000, 100)
(500, 100)
(500, 100)


In [None]:
vocab_size = len(tokenizer.word_index)

In [None]:
word2idx = tokenizer.word_index

In [None]:
def load_embeddings(glove_path):
    embedding_index = {}
    with open(glove_path, encoding="utf8") as glove_file:
        for line in glove_file:
            word, coefs = line.split(maxsplit=1)
            coefs = np.fromstring(coefs, "f", sep=" ")
            embedding_index[word] = coefs
    return embedding_index

In [None]:
def create_embedding_matrix(embedding_index, word2idx, vocab_size, embedding_dim):
    mat=np.zeros((vocab_size,embedding_dim))
    for key,value in word2idx.items():
      mat[value]=embedding_index.get(key)
    mat[np.isnan(mat)] = 0
    return mat

In [None]:
glove_path3 = f"glove.6B/glove.twitter.27B.200d.txt"
embedding_index_Twitter_200d = load_embeddings(glove_path3)

In [None]:
embedding_matrix_twitter_200d = create_embedding_matrix(embedding_index_Twitter_200d, word2idx, vocab_size, 200)

<a name='4'></a>
## 4 - Model 1 - Attention Model from Previous Notebook.

In [None]:
class Attention(tf.keras.Model):
    def __init__(self, units):
        super(Attention, self).__init__()
        self.W1 = tf.keras.layers.Dense(units, activation="tanh")
        self.V = tf.keras.layers.Dense(1)

    def call(self, features):
        # Compute attention scores
        score = self.W1(features)

        # Apply softmax activation to obtain attention weights
        attention_weights = tf.nn.softmax(self.V(score), axis=1)

        # Compute context vector as the weighted sum of features
        context_vector = attention_weights * features

        return context_vector


In [None]:
inputs = Input(shape=(100,))

embedding_layer = Embedding(input_dim=vocab_size, output_dim=200, input_length=100, weights=[embedding_matrix_twitter_200d], trainable=False)(inputs)
bilstm = Bidirectional(LSTM(4, activation='tanh', return_sequences=True))(embedding_layer)
context_vector = Attention(8)(bilstm)
simplernn = SimpleRNN(4, activation="tanh")(context_vector)
output = Dense(1, activation="sigmoid")(simplernn)

model_lstm_bi_embed_attention = Model(inputs=inputs, outputs=output)


In [None]:
model_lstm_bi_embed_attention.summary()


Model: "model_7"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_21 (InputLayer)       [(None, 100)]             0         
                                                                 
 embedding_20 (Embedding)    (None, 100, 200)          7825600   
                                                                 
 bidirectional_37 (Bidirect  (None, 100, 8)            6560      
 ional)                                                          
                                                                 
 attention_20 (Attention)    (None, 100, 8)            81        
                                                                 
 simple_rnn_19 (SimpleRNN)   (None, 4)                 52        
                                                                 
 dense_57 (Dense)            (None, 1)                 5         
                                                           

In [None]:
model_lstm_bi_embed_attention.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

history_model_lstm_bi_embed_attention = model_lstm_bi_embed_attention.fit(X_train, y_train, epochs=10, batch_size=64, validation_data=(X_dev, y_dev),verbose=1)

loss, accuracy = model_lstm_bi_embed_attention.evaluate(X_test, y_test)
print(f'Test Loss: {loss}, Test Accuracy: {accuracy}')

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test Loss: 0.4064599871635437, Test Accuracy: 0.8399999737739563


<a name='5'></a>
## 5 - Model 2 - Attention Model with increased model complexity

In [None]:
inputs = Input(shape=(100,))

embedding_layer = Embedding(input_dim=vocab_size, output_dim=200, input_length=100, weights=[embedding_matrix_twitter_200d], trainable=False)(inputs)
bilstm = Bidirectional(LSTM(64, activation='tanh', return_sequences=True))(embedding_layer)
bilstm = Bidirectional(LSTM(128, activation='tanh',return_sequences=True))(bilstm)
context_vector = Attention(64)(bilstm)
simplernn = SimpleRNN(64, activation="tanh",return_sequences=True)(context_vector)

flatten = Flatten()(simplernn)

ffn = Dense(64, activation='relu')(flatten)
ffn = Dense(32, activation='relu')(ffn)

output = Dense(1, activation="sigmoid")(ffn)

model_lstm_bi_embed_attention_complex = Model(inputs=inputs, outputs=output)


In [None]:
model_lstm_bi_embed_attention_complex.summary()

Model: "model_11"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_25 (InputLayer)       [(None, 100)]             0         
                                                                 
 embedding_24 (Embedding)    (None, 100, 200)          7825600   
                                                                 
 bidirectional_44 (Bidirect  (None, 100, 128)          135680    
 ional)                                                          
                                                                 
 bidirectional_45 (Bidirect  (None, 100, 256)          263168    
 ional)                                                          
                                                                 
 attention_24 (Attention)    (None, 100, 256)          16513     
                                                                 
 simple_rnn_23 (SimpleRNN)   (None, 100, 64)           205

In [None]:
model_lstm_bi_embed_attention_complex.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

history_model_lstm_bi_embed_attention_complex = model_lstm_bi_embed_attention_complex.fit(X_train, y_train, epochs=10, batch_size=64, validation_data=(X_dev, y_dev),verbose=1)

loss, accuracy = model_lstm_bi_embed_attention_complex.evaluate(X_test, y_test)
print(f'Test Loss: {loss}, Test Accuracy: {accuracy}')

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test Loss: 0.38913553953170776, Test Accuracy: 0.8700000047683716


<a name='6'></a>
## 6 - Model 3 - Attention Model with increased model complexity and L2 regularization

In [None]:
from keras.regularizers import l2

inputs = Input(shape=(100,))
embedding_layer = Embedding(input_dim=vocab_size, output_dim=200, input_length=100, weights=[embedding_matrix_twitter_200d], trainable=False)(inputs)
bilstm = Bidirectional(LSTM(64, activation='tanh', return_sequences=True, kernel_regularizer=l2(0.0001)))(embedding_layer)  # Applying L2 regularization
bilstm = Bidirectional(LSTM(128, activation='tanh', return_sequences=True, kernel_regularizer=l2(0.0001)))(bilstm)  # Applying L2 regularization
context_vector = Attention(64)(bilstm)
simplernn = SimpleRNN(64, activation="tanh", return_sequences=True, kernel_regularizer=l2(0.001))(context_vector)  # Applying L2 regularization

flatten = Flatten()(simplernn)

ffn = Dense(64, activation='relu', kernel_regularizer=l2(0.01))(flatten)  # Applying L2 regularization
ffn = Dense(32, activation='relu', kernel_regularizer=l2(0.01))(ffn)  # Applying L2 regularization

output = Dense(1, activation="sigmoid")(ffn)

model_lstm_bi_embed_attention_complex_regularized = Model(inputs=inputs, outputs=output)


In [None]:
model_lstm_bi_embed_attention_complex.summary()

Model: "model_11"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_25 (InputLayer)       [(None, 100)]             0         
                                                                 
 embedding_24 (Embedding)    (None, 100, 200)          7825600   
                                                                 
 bidirectional_44 (Bidirect  (None, 100, 128)          135680    
 ional)                                                          
                                                                 
 bidirectional_45 (Bidirect  (None, 100, 256)          263168    
 ional)                                                          
                                                                 
 attention_24 (Attention)    (None, 100, 256)          16513     
                                                                 
 simple_rnn_23 (SimpleRNN)   (None, 100, 64)           205

In [None]:
model_lstm_bi_embed_attention_complex_regularized.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

history_model_lstm_bi_embed_attention_complex_regularized = model_lstm_bi_embed_attention_complex_regularized.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_dev, y_dev),verbose=1)

loss, accuracy = model_lstm_bi_embed_attention_complex_regularized.evaluate(X_test, y_test)
print(f'Test Loss: {loss}, Test Accuracy: {accuracy}')

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test Loss: 0.4259127974510193, Test Accuracy: 0.8740000128746033


<a name='7'></a>
## 7 - Model 4 - Hypertuning the model parameters

In [None]:
pip install keras-tuner


Collecting keras-tuner
  Downloading keras_tuner-1.4.7-py3-none-any.whl (129 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/129.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m129.1/129.1 kB[0m [31m3.9 MB/s[0m eta [36m0:00:00[0m
Collecting kt-legacy (from keras-tuner)
  Downloading kt_legacy-1.0.5-py3-none-any.whl (9.6 kB)
Installing collected packages: kt-legacy, keras-tuner
Successfully installed keras-tuner-1.4.7 kt-legacy-1.0.5


In [None]:
import keras

import kerastuner as kt


def build_model(hp):
    inputs = Input(shape=(100,))
    embedding_layer = Embedding(input_dim=vocab_size, output_dim=200, input_length=100, weights=[embedding_matrix_twitter_200d], trainable=False)(inputs)
    bilstm = Bidirectional(LSTM(hp.Int('lstm_units', min_value=32, max_value=128, step=32), activation='tanh', return_sequences=True, kernel_regularizer=l2(0.0001)))(embedding_layer)  # Applying L2 regularization
    bilstm = Bidirectional(LSTM(hp.Int('lstm_units', min_value=32, max_value=128, step=32), activation='tanh', return_sequences=True, kernel_regularizer=l2(0.0001)))(bilstm)  # Applying L2 regularization
    context_vector = Attention(hp.Int('attention_units', min_value=32, max_value=128, step=32))(bilstm)
    simplernn = SimpleRNN(hp.Int('rnn_units', min_value=32, max_value=128, step=32), activation="tanh", return_sequences=True, kernel_regularizer=l2(0.001))(context_vector)  # Applying L2 regularization

    flatten = Flatten()(simplernn)

    ffn = Dense(hp.Int('dense_units', min_value=32, max_value=128, step=32), activation='relu', kernel_regularizer=l2(0.01))(flatten)  # Applying L2 regularization
    ffn = Dense(hp.Int('dense_units', min_value=32, max_value=128, step=32), activation='relu', kernel_regularizer=l2(0.01))(ffn)  # Applying L2 regularization

    output = Dense(1, activation="sigmoid")(ffn)

    model = Model(inputs=inputs, outputs=output)
    model.compile(optimizer=keras.optimizers.Adam(hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4])),
                  loss='binary_crossentropy',
                  metrics=['accuracy'])
    return model

tuner = kt.Hyperband(build_model,
                     objective='val_accuracy',
                     max_epochs=10,
                     factor=3,
                     directory='./my_dir',  # Saving results to the current directory
                     project_name='hyperparameter_search')

tuner.search(X_train, y_train, epochs=10, validation_data=(X_dev, y_dev))

best_hps=tuner.get_best_hyperparameters(num_trials=1)[0]

best_model = tuner.hypermodel.build(best_hps)
history = best_model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_dev, y_dev))


Trial 30 Complete [00h 03m 58s]
val_accuracy: 0.515999972820282

Best val_accuracy So Far: 0.9039999842643738
Total elapsed time: 00h 52m 31s
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [None]:
best_hps.get_config()['values']

{'lstm_units': 64,
 'attention_units': 96,
 'rnn_units': 64,
 'dense_units': 128,
 'learning_rate': 0.001,
 'tuner/epochs': 10,
 'tuner/initial_epoch': 0,
 'tuner/bracket': 0,
 'tuner/round': 0}

<a name='8'></a>
## 8 - Model 4 - Attention Model Tuned with HyperParamerters on 10K dataset

In [None]:
from keras.callbacks import EarlyStopping

lstm_units = 64
attention_units = 96
rnn_units = 64
dense_units = 128
learning_rate = 0.001

inputs = Input(shape=(100,))
embedding_layer = Embedding(input_dim=vocab_size, output_dim=200, input_length=100, weights=[embedding_matrix_twitter_200d], trainable=False)(inputs)
bilstm = Bidirectional(LSTM(lstm_units, activation='tanh', return_sequences=True, kernel_regularizer=l2(0.0001)))(embedding_layer)
bilstm = Bidirectional(LSTM(lstm_units, activation='tanh', return_sequences=True, kernel_regularizer=l2(0.0001)))(bilstm)
context_vector = Attention(attention_units)(bilstm)
simplernn = SimpleRNN(rnn_units, activation="tanh", return_sequences=True, kernel_regularizer=l2(0.001))(context_vector)
flatten = Flatten()(simplernn)
ffn = Dense(dense_units, activation='relu', kernel_regularizer=l2(0.01))(flatten)
ffn = Dense(dense_units, activation='relu', kernel_regularizer=l2(0.01))(ffn)
output = Dense(1, activation="sigmoid")(ffn)

model_lstm_bi_embed_attention_complex_regularized_tuned = Model(inputs=inputs, outputs=output)

# Compile the model
optimizer = keras.optimizers.Adam(learning_rate)
model_lstm_bi_embed_attention_complex_regularized_tuned.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])

# Print model summary
model_lstm_bi_embed_attention_complex_regularized_tuned.summary()


Model: "model_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_4 (InputLayer)        [(None, 100)]             0         
                                                                 
 embedding_3 (Embedding)     (None, 100, 200)          7825600   
                                                                 
 bidirectional_6 (Bidirecti  (None, 100, 128)          135680    
 onal)                                                           
                                                                 
 bidirectional_7 (Bidirecti  (None, 100, 128)          98816     
 onal)                                                           
                                                                 
 attention_2 (Attention)     (None, 100, 128)          12481     
                                                                 
 simple_rnn_2 (SimpleRNN)    (None, 100, 64)           1235

In [None]:
early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

history_model_lstm_bi_embed_attention_complex_regularized_tuned = model_lstm_bi_embed_attention_complex_regularized_tuned.fit(X_train, y_train, epochs=50, batch_size=32, validation_data=(X_dev, y_dev), callbacks=[early_stopping])

# Evaluate the model
loss, accuracy = model_lstm_bi_embed_attention_complex_regularized_tuned.evaluate(X_test, y_test)
print(f'Test Loss: {loss}, Test Accuracy: {accuracy}')

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Test Loss: 0.38966041803359985, Test Accuracy: 0.8659999966621399


In [None]:
loss, accuracy = model_lstm_bi_embed_attention_complex_regularized_tuned.evaluate(X_dev, y_dev)
print(f'Test Loss: {loss}, Test Accuracy: {accuracy}')

Test Loss: 0.345572829246521, Test Accuracy: 0.8960000276565552


<a name='9'></a>
## 9 - Model 5 - Attention Model Tuned with HyperParamerters on 1 Lakh dataset i.e.; 0.1 Million

In [None]:
train_data = traindata.sample(n=100000, random_state=42)
test_data = testdata.sample(n=10000, random_state=42)

train_data['class_index'] = train_data['class_index'].map({1:0, 2:1})
test_data['class_index'] = test_data['class_index'].map({1:0, 2:1})

train_data['review_combined_lemma'] = train_data['review_combined_lemma'].fillna('')
test_data['review_combined_lemma'] = test_data['review_combined_lemma'].fillna('')

X_train = train_data.review_combined_lemma
y_train = np.array(train_data.class_index)

X = test_data.review_combined_lemma
y = np.array(test_data.class_index)

tokenizer = Tokenizer(oov_token="<UNK>",)
tokenizer.fit_on_texts(X_train)

tokenizer.word_index['<PAD>'] = 0

X_sequences_train = tokenizer.texts_to_sequences(X_train)

X_sequences = tokenizer.texts_to_sequences(X)

X_train = pad_sequences(X_sequences_train, maxlen=100)
X = pad_sequences(X_sequences, maxlen=100)

X_dev, X_test, y_dev, y_test = train_test_split(X, y, test_size=0.5, random_state=42)

print(X_train.shape)
print(X_dev.shape)
print(X_test.shape)

(100000, 100)
(5000, 100)
(5000, 100)


In [None]:
vocab_size = len(tokenizer.word_index)
word2idx = tokenizer.word_index

embedding_matrix_twitter_200d = create_embedding_matrix(embedding_index_Twitter_200d, word2idx, vocab_size, 200)

In [None]:

lstm_units = 64
attention_units = 96
rnn_units = 64
dense_units = 128
learning_rate = 0.001

inputs = Input(shape=(100,))
embedding_layer = Embedding(input_dim=vocab_size, output_dim=200, input_length=100, weights=[embedding_matrix_twitter_200d], trainable=False)(inputs)
bilstm = Bidirectional(LSTM(lstm_units, activation='tanh', return_sequences=True, kernel_regularizer=l2(0.0001)))(embedding_layer)
bilstm = Bidirectional(LSTM(lstm_units, activation='tanh', return_sequences=True, kernel_regularizer=l2(0.0001)))(bilstm)
context_vector = Attention(attention_units)(bilstm)
simplernn = SimpleRNN(rnn_units, activation="tanh", return_sequences=True, kernel_regularizer=l2(0.0001))(context_vector)
flatten = Flatten()(simplernn)
ffn = Dense(dense_units, activation='relu', kernel_regularizer=l2(0.001))(flatten)
ffn = Dense(dense_units, activation='relu', kernel_regularizer=l2(0.001))(ffn)
output = Dense(1, activation="sigmoid")(ffn)

model_lstm_bi_embed_attention_complex_regularized_tuned = Model(inputs=inputs, outputs=output)

# Compile the model
optimizer = keras.optimizers.Adam(learning_rate)
model_lstm_bi_embed_attention_complex_regularized_tuned.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])

# Print model summary
model_lstm_bi_embed_attention_complex_regularized_tuned.summary()

Model: "model_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_6 (InputLayer)        [(None, 100)]             0         
                                                                 
 embedding_5 (Embedding)     (None, 100, 200)          36072400  
                                                                 
 bidirectional_10 (Bidirect  (None, 100, 128)          135680    
 ional)                                                          
                                                                 
 bidirectional_11 (Bidirect  (None, 100, 128)          98816     
 ional)                                                          
                                                                 
 attention_4 (Attention)     (None, 100, 128)          12481     
                                                                 
 simple_rnn_4 (SimpleRNN)    (None, 100, 64)           1235

In [None]:
early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

history_model_lstm_bi_embed_attention_complex_regularized_tuned = model_lstm_bi_embed_attention_complex_regularized_tuned.fit(X_train, y_train, epochs=50, batch_size=32, validation_data=(X_dev, y_dev), callbacks=[early_stopping])

# Evaluate the model
loss, accuracy = model_lstm_bi_embed_attention_complex_regularized_tuned.evaluate(X_test, y_test)
print(f'Test Loss: {loss}, Test Accuracy: {accuracy}')

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Test Loss: 0.24991442263126373, Test Accuracy: 0.9125999808311462


In [None]:
import pickle

with open('model_lstm_bi_embed_attention_complex_regularized_tuned.pkl', 'wb') as f:
    pickle.dump(model_lstm_bi_embed_attention_complex_regularized_tuned, f)