# **Sentiment Analysis**

In [2]:
!python -m spacy download en_core_web_lg

2023-04-26 14:02:54.987812: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting en-core-web-lg==3.5.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.5.0/en_core_web_lg-3.5.0-py3-none-any.whl (587.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m587.7/587.7 MB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: en-core-web-lg
Successfully installed en-core-web-lg-3.5.0
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_lg')


In [3]:
!pip install vaderSentiment

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting vaderSentiment
  Downloading vaderSentiment-3.3.2-py2.py3-none-any.whl (125 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m126.0/126.0 kB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: vaderSentiment
Successfully installed vaderSentiment-3.3.2


In [4]:
# import some relevant libraries first

# for data handling and manipulation
import pandas as pd
import numpy as np
# for natural language processing (NLP) tasks
import nltk
# for advanced NLP tasks such as entity recognition and dependency parsing (not used in this assignment)
import spacy 
# for tokenization of text into words
from nltk.tokenize import word_tokenize
# for accessing stop words used in English language
from nltk.corpus import stopwords
# for lemmatization of words
from nltk.stem import WordNetLemmatizer
# for splitting data into training and testing sets
from sklearn.model_selection import train_test_split
# for import lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media. 
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
# for evaluate the performance of the model using accurac
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score


## Data preprocessing

## Dataset 

-  'IMDB Dataset.csv' file

In [5]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [6]:
# Load the data and check column details in table 
data = pd.read_csv('/content/drive/MyDrive/IMDB Dataset.csv')
data

Unnamed: 0,review,sentiment
0,One of the other reviewers has mentioned that ...,positive
1,A wonderful little production. <br /><br />The...,positive
2,I thought this was a wonderful way to spend ti...,positive
3,Basically there's a family where a little boy ...,negative
4,"Petter Mattei's ""Love in the Time of Money"" is...",positive
...,...,...
49995,I thought this movie did a down right good job...,positive
49996,"Bad plot, bad dialogue, bad acting, idiotic di...",negative
49997,I am a Catholic taught in parochial elementary...,negative
49998,I'm going to have to disagree with the previou...,negative


### Preprocess the data, including tokenizing the text, removing stop words, converting the text into lowercase, and lemmatization. 

In [8]:
nlp = spacy.load("en_core_web_lg", disable=['parser', 'tagger', 'ner'])
def normalize(review, lowercase, remove_stopwords):
    if lowercase:
        review = review.lower()
    doc = nlp(review)
    lemmatized = list()
    for token in doc:
        if not remove_stopwords or (remove_stopwords and not token.is_stop):
            lemmatized.append(token.lemma_)
    return " ".join(lemmatized)
data['processed'] = data['review'].apply(normalize, lowercase=True, remove_stopwords=True)

### Partition the movie reviews into the training and test sets with an 80-20 split. Please make sure that the target variable in the training set and test sets follow the same distribution.

In [9]:
#Splitting the data into trainig and testing and specifying 'stratify=data['sentiment']' to make sure that the target variable in the training set and test sets follow the same distribution.
X_train, X_test, Y_train, Y_test = train_test_split(data['review'], data['sentiment'], test_size=0.2, random_state=21, stratify=data['sentiment'])

## Lexicon-based sentiment analysis

### Build a lexicon-based sentiment analysis with VADER Sentiment Analysis tool.

In [8]:
# Initialize the SentimentIntensityAnalyzer object
analyzer = SentimentIntensityAnalyzer()

# Define a function to calculate the sentiment scores using VADER
def vader_sentiment_scores(review):
    sentiment_scores = analyzer.polarity_scores(review)
    return sentiment_scores['compound']

# Apply the function to the processed text data to get the sentiment scores
X_train_scores = X_train.apply(vader_sentiment_scores)
X_test_scores = X_test.apply(vader_sentiment_scores)

# Threshold the sentiment scores to classify the reviews as positive or negative 
# (set threshold as 0 aince there is only positive or negative for sentiment in this dataset)
Y_train_pred = (X_train_scores > 0).astype(int)
Y_test_pred = (X_test_scores > 0).astype(int)


### Evaluate the performance of my model using accuracy, precision, recall, and F1 score.

**NOTE: 0= Negative and 1= Positive**

In [9]:
from sklearn.preprocessing import LabelEncoder

# Initialize the label encoder object
label_encoder = LabelEncoder()

# Fit the label encoder on the training labels
label_encoder.fit(Y_train)

# Convert the string labels to integers
Y_train_encoded = label_encoder.transform(Y_train)
Y_test_encoded = label_encoder.transform(Y_test)

# Evaluate the performance of your model using accuracy, precision, recall, and F1 score
from sklearn.metrics import classification_report

print('Test classification report:')
print(classification_report(Y_test_encoded, Y_test_pred))


Test classification report:
              precision    recall  f1-score   support

           0       0.80      0.53      0.64      5000
           1       0.65      0.87      0.74      5000

    accuracy                           0.70     10000
   macro avg       0.72      0.70      0.69     10000
weighted avg       0.72      0.70      0.69     10000



### Naive Bayes model for sentiment analysis

In [10]:
#Pre-Prcoessing and Bag of Word Vectorization using Count Vectorizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn import metrics

configs = [
    {
        'vectorizer': CountVectorizer(max_features=1500),
        'model': MultinomialNB(alpha=1.0)
    },
    {
        'vectorizer': CountVectorizer(max_features=2500),
        'model': MultinomialNB(alpha=0.5)
    },
    {
        'vectorizer': CountVectorizer(max_features=5000),
        'model': MultinomialNB(alpha=0.1)
    }
]

results = []
for config in configs:
    # Vectorize the text data
    vectorizer = config['vectorizer']
    X_train_vectorized = vectorizer.fit_transform(X_train)
    X_test_vectorized = vectorizer.transform(X_test)

    # Train a Naive Bayes model
    nb = config['model']
    nb.fit(X_train_vectorized, Y_train)

    # Make predictions and evaluate the model
    Y_pred = nb.predict(X_test_vectorized)
    v_performance = metrics.classification_report(Y_test, Y_pred)
    accuracy = metrics.accuracy_score(Y_test, Y_pred)

    # Store the results
    results.append({'config': config, 'accuracy': accuracy, 'v_performance': v_performance})

# Print the results
for result in results:
    print(f"Configuration: {result['config']}")
    print(f"Accuracy: {result['accuracy']}")
    print(f"Classification Report:\n {result['v_performance']}")


Configuration: {'vectorizer': CountVectorizer(max_features=1500), 'model': MultinomialNB()}
Accuracy: 0.8239
Classification Report:
               precision    recall  f1-score   support

    negative       0.82      0.82      0.82      5000
    positive       0.82      0.82      0.82      5000

    accuracy                           0.82     10000
   macro avg       0.82      0.82      0.82     10000
weighted avg       0.82      0.82      0.82     10000

Configuration: {'vectorizer': CountVectorizer(max_features=2500), 'model': MultinomialNB(alpha=0.5)}
Accuracy: 0.831
Classification Report:
               precision    recall  f1-score   support

    negative       0.83      0.83      0.83      5000
    positive       0.83      0.83      0.83      5000

    accuracy                           0.83     10000
   macro avg       0.83      0.83      0.83     10000
weighted avg       0.83      0.83      0.83     10000

Configuration: {'vectorizer': CountVectorizer(max_features=5000), 'model

### Document of the configurations I have tested and performance report.

After conducting experiments on three different configurations of **Naive Bayes models using CountVectorizer method** for performing sentiment analysis on a movie reviews dataset. The first configuration involved a maximum of 1500 features, which resulted in an accuracy of **0.8239**. The precision, recall, and f1-scores for the negative class were 0.82 for all performance metrics, and for the positive class, they were 0.82 for all performance metrics as well. In the second configuration, I increased the number of features to 2500 and set the alpha value of MultinomialNB to 0.5, which led to an accuracy of **0.831** with similar precision, recall, and f1-scores as the first configuration(the negative class were 0.83 for all performance metrics, and for the positive class, they were 0.83 for all performance metrics as well). The third configuration involved a maximum of 5000 features and an alpha value of 0.1, which gave me the higehest accuracy of **0.8397** compared to others, with similar precision, recall, and f1-scores as the previous configurations(negative 0.83, 0.85 and 0.84/ positive  0.85, 0.83, 0.84). **Therefore, the last or third configuration is the one with best performance of F1-score and Accuracy.** 

## SVM model for sentiment analysis 

In [11]:
# Pre-processing and Bag of Word Vectorization using TF-IDF Vectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn import svm
from sklearn.svm import LinearSVC
from sklearn import metrics

configs = [
    {
        'vectorizer': TfidfVectorizer(max_features=1500),
        'model': LinearSVC(C=1.0)
    },
    {
        'vectorizer': TfidfVectorizer(max_features=2500),
        'model': LinearSVC(C=0.5)
    },
    {
        'vectorizer': TfidfVectorizer(max_features=5000),
        'model': LinearSVC(C=0.1)
    }
]

results = []
for config in configs:
    # Vectorize the text data
    vectorizer = config['vectorizer']
    X_train_vectorized = vectorizer.fit_transform(X_train)
    X_test_vectorized = vectorizer.transform(X_test)

    # Train a LinearSVC model
    svm = config['model']
    svm.fit(X_train_vectorized, Y_train)

    # Make predictions and evaluate the model
    Y_pred = svm.predict(X_test_vectorized)
    v_performance = metrics.classification_report(Y_test, Y_pred)
    accuracy = metrics.accuracy_score(Y_test, Y_pred)

    # Store the results
    results.append({'config': config, 'accuracy': accuracy, 'v_performance': v_performance})

# Print the results
for result in results:
    print(f"Configuration: {result['config']}")
    print(f"Accuracy: {result['accuracy']}")
    print(f"Classification Report:\n {result['v_performance']}")


Configuration: {'vectorizer': TfidfVectorizer(max_features=1500), 'model': LinearSVC()}
Accuracy: 0.8793
Classification Report:
               precision    recall  f1-score   support

    negative       0.89      0.87      0.88      5000
    positive       0.87      0.89      0.88      5000

    accuracy                           0.88     10000
   macro avg       0.88      0.88      0.88     10000
weighted avg       0.88      0.88      0.88     10000

Configuration: {'vectorizer': TfidfVectorizer(max_features=2500), 'model': LinearSVC(C=0.5)}
Accuracy: 0.8909
Classification Report:
               precision    recall  f1-score   support

    negative       0.90      0.87      0.89      5000
    positive       0.88      0.91      0.89      5000

    accuracy                           0.89     10000
   macro avg       0.89      0.89      0.89     10000
weighted avg       0.89      0.89      0.89     10000

Configuration: {'vectorizer': TfidfVectorizer(max_features=5000), 'model': LinearSV

### Document of the configurations I have tested and performance report.

After conducting experiments on three different configurations of **support vector machine (SVC) models using TfidfVectorizer method**  for performing sentiment analysis on a movie reviews dataset. The first configuration involved a maximum of 1500 features, which resulted in an accuracy of **0.8793**. The precision, recall, and f1-scores for the negative class were 0.89, 0.87 and 0.88, respectively, and for the positive class, they were 0.87, 0.89 and 0.88, respectively. In the second configuration, I increased the number of features to 2500 and set the alpha value of LinearSVC to 0.5, which led to an accuracy of **0.8909** with similar precision, recall, and f1-scores as the first configuration (for negative 0.90, 0.87 and 0.89/ positive 0.88, 0.91 and 0.89) . The third configuration involved a maximum of 5000 features and a LinearSVC value of 0.1, which gave us the highest accuracy of **0.8959** compared to others, with similar precision, recall, and f1-scores as the previous configurations(for negative 0.91, 0.88 and 0.89/ positive 0.88, 0.91 and 0.90). **Therefore, the last or third configuration with the number of features to 5000 and set the alpha value of LinearSVC to 0.1 is the one with best performance of F1-score and Accuracy.** 

## Deep Learning Models for Sentiment Analysis

### Preprocess the movie reviews into sequences of equal length for deep learning models. 

In [10]:
pip install keras

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [11]:
pip install tensorflow

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [12]:
from keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Create a tokenizer and fit it on the training data
tokenizer = Tokenizer(num_words=10000)
tokenizer.fit_on_texts(X_train)

# Convert the text data to sequence data
X_train_seq = tokenizer.texts_to_sequences(X_train)
X_test_seq = tokenizer.texts_to_sequences(X_test)

# Pad the sequence data to make them all the same length
max_length = 100
X_train_pad = pad_sequences(X_train_seq, maxlen=max_length, padding='post')
X_test_pad = pad_sequences(X_test_seq, maxlen=max_length, padding='post')


### Train a Bidirectional LSTM model for sentiment classification, set early stopping conditions, and use a validation set. Evaluate the model performance on the test set using accuracy, precision, recall, and F1-score. 

### **Configuration 5.1: Locally trained embeddings**
A text classification model using Bidirectional LSTMs with a single layer and an embedding dimension of 100. The model is trained using a tokenizer with a maximum vocabulary size of 10000 and padded sequences with a maximum length of 100. The training process runs for a maximum of 15 epochs and includes early stopping with a patience of 3 to prevent overfitting.

In [15]:
from keras.preprocessing.text import Tokenizer
from keras.models import Sequential
from keras.layers import Embedding, Bidirectional, LSTM, Dense
from keras.callbacks import EarlyStopping
from sklearn.preprocessing import LabelEncoder
from keras.utils import to_categorical
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Preprocess the data
max_words = 10000
max_len = 100
tokenizer = Tokenizer(num_words=max_words)
tokenizer.fit_on_texts(X_train)
X_train_seq = tokenizer.texts_to_sequences(X_train)
X_test_seq = tokenizer.texts_to_sequences(X_test)
X_train_pad = pad_sequences(X_train_seq, maxlen=max_len)
X_test_pad = pad_sequences(X_test_seq, maxlen=max_len)

# Encode target variable
encoder = LabelEncoder()
y_train_encoded = to_categorical(encoder.fit_transform(Y_train))
y_test_encoded = to_categorical(encoder.transform(Y_test))

# Create the Bidirectional LSTM model with locally trained embeddings
model = Sequential([
    Embedding(max_words, 100, input_length=max_len),
    Bidirectional(LSTM(64)),
    Dense(2, activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Set early stopping condition
early_stopping = EarlyStopping(monitor='val_loss', patience=3)

# Train the model
model.fit(X_train_pad, y_train_encoded, batch_size=128, epochs=15, validation_split=0.1, callbacks=[early_stopping])

# Evaluate the model performance
y_pred_proba = model.predict(X_test_pad)
y_pred = np.argmax(y_pred_proba, axis=1)



accuracy = accuracy_score(encoder.transform(Y_test), y_pred)
precision = precision_score(Y_test, encoder.inverse_transform(y_pred), pos_label='positive', average='binary')
recall = recall_score(Y_test, encoder.inverse_transform(y_pred), pos_label='positive', average='binary')
f1 = f1_score(Y_test, encoder.inverse_transform(y_pred), pos_label='positive', average='binary')

# Print the evaluation metrics
print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1-score: {f1:.4f}")


Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Accuracy: 0.8511
Precision: 0.8260
Recall: 0.8896
F1-score: 0.8566


### **Configuration 5.2: Pre-trained embeddings with a single layer of Bidirectional LSTMs**

A text classification model using Bidirectional LSTMs with a single layer and an embedding dimension of 100. The model is trained using a tokenizer with a maximum vocabulary size of 10000 and padded sequences with a maximum length of 200. The training process runs for a maximum of 10 epochs and includes early stopping with a patience of 3 to prevent overfitting. 

In [16]:
import numpy as np
import os
import urllib
import zipfile
from keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Embedding, Bidirectional, LSTM, Dense
from keras.callbacks import EarlyStopping
from sklearn.preprocessing import LabelEncoder
from keras.utils import to_categorical
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from google.colab import drive

# Mount Google Drive to access files
drive.mount('/content/drive')

# Define the GloVe embeddings file and directory paths
glove_url = 'http://nlp.stanford.edu/data/glove.6B.zip'
glove_dir = '/content/drive/My Drive/embeddings'
glove_file = os.path.join(glove_dir, 'glove.6B.100d.txt')
zip_file = os.path.join(glove_dir, 'glove.6B.zip')

# Download and extract the GloVe embeddings file if it doesn't exist
if not os.path.exists(glove_dir):
    os.makedirs(glove_dir)
urllib.request.urlretrieve(glove_url, zip_file)
with zipfile.ZipFile(zip_file, 'r') as zip_ref:
    zip_ref.extractall(glove_dir)

# Load the embeddings into memory
embeddings_index = {}
with open(glove_file, 'r', encoding='utf-8') as f:
    for line in f:
        values = line.split()
        word = values[0]
        coefs = np.asarray(values[1:], dtype='float32')
        embeddings_index[word] = coefs

# Tokenize and pad sequences
tokenizer = Tokenizer(num_words=10000)
tokenizer.fit_on_texts(X_train)
X_train_seq = tokenizer.texts_to_sequences(X_train)
X_test_seq = tokenizer.texts_to_sequences(X_test)
max_length = 200
X_train_pad = pad_sequences(X_train_seq, maxlen=max_length, padding='post')
X_test_pad = pad_sequences(X_test_seq, maxlen=max_length, padding='post')

# Encode target variable
encoder = LabelEncoder()
y_train_encoded = to_categorical(encoder.fit_transform(Y_train))
y_test_encoded = to_categorical(encoder.transform(Y_test))

# Create the embedding matrix
embedding_dim = 100
vocab_size = len(tokenizer.word_index) + 1
embedding_matrix = np.zeros((vocab_size, embedding_dim))
for word, i in tokenizer.word_index.items():
    embedding_vector = embeddings_index.get(word)
    if embedding_vector is not None:
        embedding_matrix[i] = embedding_vector

# Define the Bidirectional LSTM model with GloVe embeddings
model = Sequential()
model.add(Embedding(vocab_size, embedding_dim, input_length=max_length, weights=[embedding_matrix], trainable=False))
model.add(Bidirectional(LSTM(64)))
model.add(Dense(2, activation='softmax'))

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Set early stopping condition and train the model
early_stopping = EarlyStopping(monitor='val_loss', patience=3)
model.fit(X_train_pad, y_train_encoded, batch_size=64, epochs=10, validation_split=0.1, callbacks=[early_stopping])

# Evaluate the model performance
y_pred_proba = model.predict(X_test_pad)
y_pred = np.argmax(y_pred_proba, axis=1)

accuracy = accuracy_score(encoder.transform(Y_test), y_pred)
precision = precision_score(Y_test, encoder.inverse_transform(y_pred), pos_label='positive', average='binary')
recall = recall_score(Y_test, encoder.inverse_transform(y_pred), pos_label='positive', average='binary')
f1 = f1_score(Y_test, encoder.inverse_transform(y_pred), pos_label='positive', average='binary')

# Print the evaluation metrics on the test dataset
print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1-score: {f1:.4f}")


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Accuracy: 0.8853
Precision: 0.8966
Recall: 0.8710
F1-score: 0.8836


### **Configuration 5.3: Pre-trained embeddings with 2 layers of Bidirectional LSTMs**

A text classification model using Bidirectional LSTMs with 2 layers and an embedding dimension of 100. The model is trained using a tokenizer with a maximum vocabulary size of 10000 and padded sequences with a maximum length of 200. The training process runs for a maximum of 8 epochs and includes early stopping with a patience of 3 to prevent overfitting.

In [16]:
import numpy as np
import os
import urllib
import zipfile
from keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Embedding, Bidirectional, LSTM, Dense
from keras.callbacks import EarlyStopping
from sklearn.preprocessing import LabelEncoder
from keras.utils import to_categorical
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from google.colab import drive

# Mount Google Drive to access files
drive.mount('/content/drive')

# Define the GloVe embeddings file and directory paths
glove_url = 'http://nlp.stanford.edu/data/glove.6B.zip'
glove_dir = '/content/drive/My Drive/embeddings'
glove_file = os.path.join(glove_dir, 'glove.6B.100d.txt')
zip_file = os.path.join(glove_dir, 'glove.6B.zip')

# Download and extract the GloVe embeddings file if it doesn't exist
if not os.path.exists(glove_dir):
    os.makedirs(glove_dir)
urllib.request.urlretrieve(glove_url, zip_file)
with zipfile.ZipFile(zip_file, 'r') as zip_ref:
    zip_ref.extractall(glove_dir)

# Load the embeddings into memory
embeddings_index = {}
with open(glove_file, 'r', encoding='utf-8') as f:
    for line in f:
        values = line.split()
        word = values[0]
        coefs = np.asarray(values[1:], dtype='float32')
        embeddings_index[word] = coefs

# Create tokenizer and sequence padding
max_words = 10000
tokenizer = Tokenizer(num_words=max_words)
tokenizer.fit_on_texts(X_train)
X_train_seq = tokenizer.texts_to_sequences(X_train)
X_test_seq = tokenizer.texts_to_sequences(X_test)
max_length = 200
X_train_pad = pad_sequences(X_train_seq, maxlen=max_length)
X_test_pad = pad_sequences(X_test_seq, maxlen=max_length)

# Create embedding matrix
embedding_dim = 100
embedding_matrix = np.zeros((max_words, embedding_dim))
for word, i in tokenizer.word_index.items():
    if i >= max_words:
        break
    embedding_vector = embeddings_index.get(word)
    if embedding_vector is not None:
        embedding_matrix[i] = embedding_vector

# Encode target variable
encoder = LabelEncoder()
y_train_encoded = to_categorical(encoder.fit_transform(Y_train))
y_test_encoded = to_categorical(encoder.transform(Y_test))

# Create and compile model
model = Sequential([
    Embedding(max_words, embedding_dim, input_length=max_length, weights=[embedding_matrix], trainable=False),
    Bidirectional(LSTM(128, dropout=0.2, recurrent_dropout=0.2, return_sequences=True)),
    Bidirectional(LSTM(64, dropout=0.2, recurrent_dropout=0.2)),
    Dense(2, activation='softmax')
])
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Set early stopping condition
early_stopping = EarlyStopping(monitor='val_loss', patience=3)

# Train the model
model.fit(X_train_pad, y_train_encoded, batch_size=128, epochs=8, validation_split=0.1, callbacks=[early_stopping])

# Evaluate the model performance
y_pred_proba = model.predict(X_test_pad)
y_pred = np.argmax(y_pred_proba, axis=1)

accuracy = accuracy_score(encoder.transform(Y_test), y_pred)
precision = precision_score(Y_test, encoder.inverse_transform(y_pred), pos_label='positive', average='binary')
recall = recall_score(Y_test, encoder.inverse_transform(y_pred), pos_label='positive', average='binary')
f1 = f1_score(Y_test, encoder.inverse_transform(y_pred), pos_label='positive', average='binary')

# Print the evaluation metrics
print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1-score: {f1:.4f}")


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Epoch 1/8
Epoch 2/8
Epoch 3/8
Epoch 4/8
Epoch 5/8
Epoch 6/8
Epoch 7/8
Epoch 8/8
Accuracy: 0.8864
Precision: 0.9149
Recall: 0.8520
F1-score: 0.8824


### Document of the configurations I have tested and performance report.

For sentiment analysis, I have tested three different configurations of Bidirectional LSTM models using the preprocessed movie review dataset. The first configuration used locally trained embeddings, the second configuration used pre-trained embeddings with a single layer of Bidirectional LSTMs, and the third configuration used pre-trained embeddings with 2 layers of Bidirectional LSTMs. All models were trained with early stopping and a validation set to prevent overfitting. The performance of the models was evaluated on the test set using accuracy, precision, recall, and F1-score. Accuracy, Precision, Recall and F1-score for the first configuration are 0.8511, 0.8260, 0.8896 and 0.8566, respectively. while for the second configuration, Accuracy, Precision, Recall and F1-score are 0.8853, 0.8966, 0.8710 and 0.8836. However, the third configuration achieved the best performance with **an accuracy of 0.8864, precision of 0.9149, recall of 0.8520, and an F1-score of 0.8824. Therefore, the third configuration of Pre-trained embeddings with 2 layers of Bidirectional LSTMs is the one selected for sentiment analysis in this scenario.**


## Take away message

### My thoughts and findings of the different models I have tried for sentiment classification. 

From the different models that I have tried for sentiment classification, my thoughts and findings are that, compared to lexicon-based, Naive Bayes, SVM, and deep learning, the approach with the highest performance in my script is  SVM followed by deep learning, Naive Bayes, and lexicon-based respectively. In Task 3 and 4, I also noticed that using Tf-idf vectorizer is better than Count vectorizer as its accuracy, precision, recall, and F1 score are much higher. However, in each approach, the performance will depend on our judgment in deciding which parameter we will be tweaking. (But I have found that increasing the number of max features improves performance.)

For Task 5, apart from discovering that it takes a long time to run an epoch, my thoughts and findings are that to use pre-trained embeddings will give better performance than locally trained embeddings because they are trained on larger and more varied collections of text, making them more robust and generalizable to a wide range of NLP tasks. Furthermore, the number of epochs might not be related to high performance. This is because using early stopping with a patience of 3 (which means it will stop training if the validation loss does not improve for 3 epochs), the first configuration, which is locally trained embeddings, although trained for 15 epochs more than the third configuration with only 8 epochs, shows less accuracy than both the second and third configuration which has only 10 and 8 epochs, respectively. Therefore, the number of epochs may not be related, while Bidirectional LSTMs with more layers will affect the performance of the model more. 

**Note:** Since it took a long time to run (7+ hours), there is still room for further development to achieve better performance than what was achieved with these models. Therefore, my thoughts and findings are limited to these models only. For task 5, using pre-trained embeddings with 2 layers of bidirectional LSTMs (Configuration 3) can achieve even higher performance than the support vector machine (SVM) models using the TfidfVectorizer method in task 4 if trained for more than 8 epochs as in this case, I also set early stopping with a patience of 3, but the model continued to run and gave higher performance in each epoch. Additionally, using larger pre-trained language models may also lead to improved performance. 
  
 