<a href="https://colab.research.google.com/github/ArunKoundinya/DeepLearning/blob/main/posts/deep-learning-project-msis/AmazonReviews_Part7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Amazon Reviews Sentiment Analysis - Part 7

Exploration of Attention and Self Attention on the existing BiLSTM Model.

Conclusion: Simpler customer created Attention Model function is slightly better because of computation time and slightly better test accuracy.

Using attention certainly has improved the accuracy of the earlier BiLSTM Model.

Selected simpler model will be used for further finetunning.

## Table of Contents
- [1 - Packages](#1)
- [2 - Loading the Dataset](#2)
- [3 - Pre-Processing the Data](#3)
- [4 - Model-1 Basic Custom Attention Model](#4)
- [5 - Model-2 Advance Customer Attention Model](#5)
- [6 - Model-3 SelfAttenion model](#6)

<a name='1'></a>
## 1 - Loading the Packages

In [None]:
!pip install pandarallel

Collecting pandarallel
  Downloading pandarallel-1.6.5.tar.gz (14 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting dill>=0.3.1 (from pandarallel)
  Downloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
Building wheels for collected packages: pandarallel
  Building wheel for pandarallel (setup.py) ... [?25l[?25hdone
  Created wheel for pandarallel: filename=pandarallel-1.6.5-py3-none-any.whl size=16673 sha256=a780718bc84d983454f723a074355c10b8cea74b700d7a1ea4588353d1cc07cb
  Stored in directory: /root/.cache/pip/wheels/50/4f/1e/34e057bb868842209f1623f195b74fd7eda229308a7352d47f
Successfully built pandarallel
Installing collected packages: dill, pandarallel
Successfully installed dill-0.3.8 pandarallel-1.6.5


In [None]:
from google.colab import drive
import os
import pandas as pd
import numpy as np
import tensorflow as tf

from sklearn.model_selection import train_test_split

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding, LSTM, SimpleRNN, GRU, Bidirectional, Attention,Input,Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.initializers import GlorotUniform
from tensorflow.keras.models import Model

from sklearn.metrics import accuracy_score, classification_report

from pandarallel import pandarallel

<a name='2'></a>
## 2 - Loading the Data

In [None]:
drive.mount('/content/drive')
os.chdir('/content/drive/My Drive/MSIS/IntroductiontoDeepLearning/Project/')

testdata = pd.read_csv('test_data_sample_complete.csv')
traindata = pd.read_csv('train_data_sample_complete.csv')


Mounted at /content/drive


In [None]:
pandarallel.initialize(progress_bar=True)

INFO: Pandarallel will run on 20 workers.
INFO: Pandarallel will use Memory file system to transfer data between the main process and workers.


<a name='3'></a>
## 3 - Pre-Processing the data

In [None]:
train_data = traindata.sample(n=100000, random_state=42)
test_data = testdata.sample(n=10000, random_state=42)

train_data['class_index'] = train_data['class_index'].map({1:0, 2:1})
test_data['class_index'] = test_data['class_index'].map({1:0, 2:1})

train_data['review_combined_lemma'] = train_data['review_combined_lemma'].fillna('')
test_data['review_combined_lemma'] = test_data['review_combined_lemma'].fillna('')

X_train = train_data.review_combined_lemma
y_train = np.array(train_data.class_index)

X = test_data.review_combined_lemma
y = np.array(test_data.class_index)

tokenizer = Tokenizer(oov_token="<UNK>",)
tokenizer.fit_on_texts(X_train)

tokenizer.word_index['<PAD>'] = 0

X_sequences_train = tokenizer.texts_to_sequences(X_train)

X_sequences = tokenizer.texts_to_sequences(X)

X_train = pad_sequences(X_sequences_train, maxlen=100)
X = pad_sequences(X_sequences, maxlen=100)

X_dev, X_test, y_dev, y_test = train_test_split(X, y, test_size=0.5, random_state=42)

print(X_train.shape)
print(X_dev.shape)
print(X_test.shape)

(100000, 100)
(5000, 100)
(5000, 100)


In [None]:
vocab_size = len(tokenizer.word_index)

In [None]:
word2idx = tokenizer.word_index

In [None]:
def load_embeddings(glove_path):
    embedding_index = {}
    with open(glove_path, encoding="utf8") as glove_file:
        for line in glove_file:
            word, coefs = line.split(maxsplit=1)
            coefs = np.fromstring(coefs, "f", sep=" ")
            embedding_index[word] = coefs
    return embedding_index

In [None]:
def create_embedding_matrix(embedding_index, word2idx, vocab_size, embedding_dim):
    mat=np.zeros((vocab_size,embedding_dim))
    for key,value in word2idx.items():
      mat[value]=embedding_index.get(key)
    mat[np.isnan(mat)] = 0
    return mat

In [None]:
glove_path3 = f"glove.6B/glove.twitter.27B.200d.txt"
embedding_index_Twitter_200d = load_embeddings(glove_path3)

In [None]:
embedding_matrix_twitter_200d = create_embedding_matrix(embedding_index_Twitter_200d, word2idx, vocab_size, 200)
embedding_matrix_twitter_200d[word2idx["book"]]

array([-0.43551999,  0.16238999, -0.29269999, -0.29675001, -0.34759   ,
       -0.47275001,  0.8125    ,  0.25753999,  0.063817  , -0.39695999,
       -0.63590002,  0.27177   , -0.62805003, -0.56298   ,  0.18736   ,
       -0.2068    , -0.24707   ,  0.16885   ,  0.50615001,  0.031079  ,
        0.16841   , -0.87362999, -0.11618   ,  0.10592   , -0.35339999,
        0.65625   ,  0.070923  ,  0.098416  ,  0.47573   , -0.12987   ,
        0.22447   ,  0.69542003, -0.47979999, -0.16331001, -0.58661997,
        0.039876  ,  0.51730001, -0.081318  ,  0.33581001, -0.28227001,
        0.097423  ,  0.086391  , -0.012591  , -0.31064001,  0.049688  ,
        0.51059002, -0.25094   , -0.014923  ,  0.12813   , -0.20479999,
       -0.54636002, -0.055901  , -0.84912997, -0.23548   ,  0.17764001,
       -0.31343001,  0.34996   , -0.82489997, -0.17274   , -0.15154   ,
        0.33089   , -0.30372   ,  0.010554  , -0.078452  , -0.36133999,
        0.41997001, -0.15302999, -0.32323   ,  0.63178003, -0.09

In [None]:
y_train.shape

(100000,)

<a name='4'></a>
## 4 - Model -1 Basic Custom Made Attention Model.

In [None]:
class Attention(tf.keras.Model):
    def __init__(self, units):
        super(Attention, self).__init__()
        self.W1 = tf.keras.layers.Dense(units, activation="tanh")
        self.V = tf.keras.layers.Dense(1)

    def call(self, features):
        # Compute attention scores
        score = self.W1(features)

        # Apply softmax activation to obtain attention weights
        attention_weights = tf.nn.softmax(self.V(score), axis=1)

        # Compute context vector as the weighted sum of features
        context_vector = attention_weights * features

        return context_vector


In [None]:
inputs = Input(shape=(100,))

embedding_layer = Embedding(input_dim=vocab_size, output_dim=200, input_length=100, weights=[embedding_matrix_twitter_200d], trainable=False)(inputs)
bilstm = Bidirectional(LSTM(4, activation='tanh', return_sequences=True))(embedding_layer)
context_vector = Attention(8)(bilstm)
simplernn = SimpleRNN(4, activation="tanh")(context_vector)
output = Dense(1, activation="sigmoid")(simplernn)

model_lstm_bi_embed_attention = Model(inputs=inputs, outputs=output)


In [None]:
model_lstm_bi_embed_attention.summary()


Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 100)]             0         
                                                                 
 embedding_1 (Embedding)     (None, 100, 200)          36072400  
                                                                 
 bidirectional_1 (Bidirectio  (None, 100, 8)           6560      
 nal)                                                            
                                                                 
 attention_1 (Attention)     (None, 100, 8)            81        
                                                                 
 simple_rnn_1 (SimpleRNN)    (None, 4)                 52        
                                                                 
 dense_5 (Dense)             (None, 1)                 5         
                                                           

In [None]:
model_lstm_bi_embed_attention.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

history_model_lstm_bi_embed_attention = model_lstm_bi_embed_attention.fit(X_train, y_train, epochs=50, batch_size=64, validation_data=(X_dev, y_dev),verbose=1)

loss, accuracy = model_lstm_bi_embed_attention.evaluate(X_test, y_test)
print(f'Test Loss: {loss}, Test Accuracy: {accuracy}')

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Test Loss: 0.26584601402282715, Test Accuracy: 0.9025999903678894


<a name='5'></a>
## 5 - Model -2 Advanced Custom Made Attention Model.

In [None]:
class Attention_Update(tf.keras.Model):
    def __init__(self, units):
        super(Attention_Update, self).__init__()
        self.W1 = tf.keras.layers.Dense(units, activation="tanh")
        self.V = tf.keras.layers.Dense(1)

    def build(self, input_shape):
        # Initialize weights for attention mechanism
        self.Wa = self.add_weight(name="att_weight_1", shape=(input_shape[-1], 8),
                                 initializer="normal")
        self.Wb = self.add_weight(name="att_weight_2", shape=(input_shape[-1], 8),
                                 initializer="normal")
        self.b = self.add_weight(name="att_bias_2", shape=(input_shape[1], 8),
                                 initializer="zeros")

        super(Attention_Update, self).build(input_shape)

    def call(self, features):
        # Compute attention scores
        score = self.W1(features)

        # Apply softmax activation to obtain attention weights
        attention_weights = tf.nn.softmax(self.V(score), axis=1)

        # Compute context vector as the weighted sum of features
        context_vector = attention_weights * features

        new_hidden_state = tf.tanh(tf.matmul(context_vector, self.Wa) + tf.matmul(features, self.Wb) + self.b)
        return new_hidden_state


In [None]:
inputs = Input(shape=(100,))

embedding_layer = Embedding(input_dim=vocab_size, output_dim=200, input_length=100, weights=[embedding_matrix_twitter_200d], trainable=False)(inputs)
bilstm = Bidirectional(LSTM(4, activation='tanh', return_sequences=True))(embedding_layer)
new_hidden_state = Attention_Update(8)(bilstm)
simplernn = SimpleRNN(4, activation="tanh")(new_hidden_state)
output = Dense(1, activation="sigmoid")(simplernn)

model_lstm_bi_embed_attention2 = Model(inputs=inputs, outputs=output)


In [None]:
model_lstm_bi_embed_attention2.summary()

Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 100)]             0         
                                                                 
 embedding_1 (Embedding)     (None, 100, 200)          36072400  
                                                                 
 bidirectional_2 (Bidirectio  (None, 100, 8)           6560      
 nal)                                                            
                                                                 
 attention__update_1 (Attent  (None, 100, 8)           1009      
 ion_Update)                                                     
                                                                 
 simple_rnn (SimpleRNN)      (None, 4)                 52        
                                                                 
 dense_5 (Dense)             (None, 1)                 5   

In [None]:
model_lstm_bi_embed_attention2.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

history_model_lstm_bi_embed_attention2 = model_lstm_bi_embed_attention2.fit(X_train, y_train, epochs=50, batch_size=64, validation_data=(X_dev, y_dev),verbose=1)

loss, accuracy = model_lstm_bi_embed_attention2.evaluate(X_test, y_test)
print(f'Test Loss: {loss}, Test Accuracy: {accuracy}')

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Test Loss: 0.2995133697986603, Test Accuracy: 0.9007999897003174


<a name='6'></a>
## 6 - Model -3 Self Attention Model

In [None]:
pip install keras-self-attention


Collecting keras-self-attention
  Downloading keras-self-attention-0.51.0.tar.gz (11 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: keras-self-attention
  Building wheel for keras-self-attention (setup.py) ... [?25l[?25hdone
  Created wheel for keras-self-attention: filename=keras_self_attention-0.51.0-py3-none-any.whl size=18894 sha256=282baf05241afc51f4e831e3dae5a0b695f8ef681c37f2ac35ecea90e9b5e66f
  Stored in directory: /root/.cache/pip/wheels/b8/f7/24/607b483144fb9c47b4ba2c5fba6b68e54aeee2d5bf6c05302e
Successfully built keras-self-attention
Installing collected packages: keras-self-attention
Successfully installed keras-self-attention-0.51.0


In [None]:
from keras_self_attention import SeqSelfAttention

In [None]:
from keras.initializers import GlorotNormal

initializer = GlorotNormal(seed=42)

inputs = Input(shape=(100,))
embedding_layer = Embedding(input_dim=vocab_size, output_dim=200, input_length=100, weights=[embedding_matrix_twitter_200d], trainable=False)(inputs)
bilstm = Bidirectional(LSTM(4, activation='tanh', return_sequences=True))(embedding_layer)
context_vector = SeqSelfAttention(attention_activation='sigmoid')(bilstm)
simplernn = SimpleRNN(4, activation="tanh")(context_vector)
output = Dense(1, activation="sigmoid")(simplernn)

model_lstm_bi_embed_selfattention = Model(inputs=inputs, outputs=output)

In [None]:
model_lstm_bi_embed_selfattention.summary()

Model: "model_8"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_9 (InputLayer)        [(None, 100)]             0         
                                                                 
 embedding_8 (Embedding)     (None, 100, 200)          36072400  
                                                                 
 bidirectional_9 (Bidirectio  (None, 100, 8)           6560      
 nal)                                                            
                                                                 
 seq_self_attention_5 (SeqSe  (None, 100, 8)           577       
 lfAttention)                                                    
                                                                 
 simple_rnn_7 (SimpleRNN)    (None, 4)                 52        
                                                                 
 dense_14 (Dense)            (None, 1)                 5   

In [None]:
model_lstm_bi_embed_selfattention.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

history_model_lstm_bi_embed_selfattention = model_lstm_bi_embed_selfattention.fit(X_train, y_train, epochs=50, batch_size=64, validation_data=(X_dev, y_dev),verbose=1)

loss, accuracy = model_lstm_bi_embed_selfattention.evaluate(X_test, y_test)
print(f'Test Loss: {loss}, Test Accuracy: {accuracy}')

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Test Loss: 0.2574847638607025, Test Accuracy: 0.9010000228881836
