Skip to content

A comprehensive examination is conducted to assess the influence of homonyms in sentiment analysis, employing two distinct techniques: fixed embeddings (LSTM) and contextualized embeddings (DistilBERT).

Notifications You must be signed in to change notification settings

MohammedAly22/Sentiment-Analysis-for-Homonyms-Problem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

Sentiment-Analysis-for-Homonyms-Problem

Homonyms are words that share the same spelling or form (characters) but possess distinct meanings. For instance, the term "bank" can assume two disparate contexts, denoting both a financial institution and the edge of a river.

These homonyms hold significant relevance in sentiment analysis, given their capacity to alter a sentence's meaning or emotional tone entirely. Consider the following examples that highlight this challenge:

Sentence Label
I hate the selfishness in you NEGATIVE
I hate anyone who hurts you POSITIVE

In the first sentence, the word "hate" renders the sentiment as NEGATIVE. Conversely, the same word, "hate" appears in the second sentence, shaping the sentiment of the sentence as POSITIVE. This poses a considerable challenge to models relying on fixed word embeddings. Therefore, employing contextualized embeddings leveraging attention mechanisms from transformers becomes crucial to grasping the comprehensive context within a sentence.

Tools Used

The project is implemented using the following Python packages:

Package Description
NumPy Numerical computing library
Pandas Data manipulation library
Matplotlib Data visualization library
Sklearn Machine learning library
TensorFlow Open-source machine learning framework
TensorFlow High-level deep learning API
Transformers Hugging Face package contains state-of-the-art Natural Language Processing models
Datasets Hugging Face package contains datasets

Dataset

The Stanford Sentiment Treebank (SST) is a corpus with fully labeled parse trees that allows for a complete analysis of the compositional effects of sentiment in language. The corpus is based on the dataset introduced by Pang and Lee (2005) and consists of 11,855 single sentences extracted from movie reviews. It was parsed with the Stanford parser and includes a total of 215,154 unique phrases from those parse trees, each annotated by 3 human judges.

Binary classification experiments on full sentences (negative or somewhat negative vs somewhat positive or positive with neutral sentences discarded) refer to the dataset as SST-2 or SST binary.

The dataset contains 3 features [idx, sentence, and label] and it comes in 3 different splits [train, validation, and test]. Here is the number of samples per split:

Split Number of Samples
train 67349
validation 872
test 1821

Methodology

Dataset Preparation

This phase encompasses dataset preparation preceding the modeling phase. It involves transforming the sentence column into integers utilizing the Tokenizer class from the keras library. Subsequently, the integer sequences are padded to conform to the maximum sequence length within the dataset.

Models Selection

The two models selected are LSTM-based and DistilBERT-based. The rationale behind choosing these models lies in the necessity to observe distinctions between fixed-embedding models, exemplified by LSTM, and contextualized embedding models, exemplified by DistilBERT. Additionally, two iterations of the LSTM-based model were trained: one before addressing the dataset imbalance and another after resolving it.

Results

Before training, I conducted a straightforward grid search to determine the optimal hyperparameters. This included parameters such as the embedding dimension for the Keras Embedding layer and the number of hidden units in the LSTM cell. The detailed results, ordered by the highest validation accuracy, are presented in the following table:

Package Validation Loss Validation Accuracy
(768, 128) 0.471817 0.840596
(768, 32) 0.492739 0.839450
(64, 64) 0.455455 0.833716
(512, 128) 0.542884 0.833716
(128, 64) 0.420377 0.829128
(64, 128) 0.479428 0.822248
(512, 64) 0.487282 0.817661
(512, 32) 0.444669 0.815367
(128, 128) 0.600242 0.807339
(768, 64) 0.498956 0.802752
(128, 32) 0.640019 0.678899
(64, 32) 0.695933 0.509174

Here is a graphical representation of the above table: download

As we can see, the optimal hyperparameters achieving the best validation accuracy are 768 and 128 for embedding_dim and hidden_dim respectively.

Model Curves

LSTM-Based Model (Imbalanced)

Loss Curve Accuracy Curve Confusion Matrix
download download download

Classification Report:

Precision Recall F1-Score Support
0 0.83 0.87 0.85 428
1 0.87 0.82 0.85 444
accuracy 0.85 872
macro avg 0.85 0.85 0.85 872
weighted avg 0.85 0.85 0.85 872

LSTM-Based Model (Balanced)

Loss Curve Accuracy Curve Confusion Matrix
download download download

Classification Report:

Precision Recall F1-Score Support
0 0.85 0.79 0.82 428
1 0.81 0.86 0.83 444
accuracy 0.83 872
macro avg 0.83 0.83 0.83 872
weighted avg 0.83 0.83 0.83 872

DistilBERT-Base Model

Loss Curve Accuracy Curve Confusion Matrix
download download download

Classification Report:

Precision Recall F1-Score Support
0 0.92 0.89 0.90 428
1 0.89 0.92 0.91 444
accuracy 0.90 872
macro avg 0.91 0.90 0.90 872
weighted avg 0.91 0.90 0.90 872

Models Testing

to test the models' abilities to solve the homonyms problems in sentiment analysis, I prepared some confusing test cases summarized in the following table: models_predictions

The analysis of the LSTM-based model with an imbalanced dataset indicates strong performance on straightforward sentences, but its effectiveness diminishes in complex cases due to the constraints of fixed embeddings. Notably, the model attempts to refine its comprehension by incorporating additional elements, showcasing a nuanced understanding. Surprisingly, the LSTM-based model with a balanced dataset exhibits no improvement and tends to degrade, emphasizing the impact of data removal during undersampling. The use of fixed embeddings is effective for simpler sentences but falls short of addressing the identified issues. In contrast, the DistilBERT model, utilizing contextualized embeddings, demonstrates significant improvement, achieving around 91% accuracy after three epochs. DistilBERT excels in handling words with multiple meanings, showcasing its ability to make nuanced decisions even when classifying a sentence like I hate anyone hurting you as NEGATIVE with a low confidence score of about 51%. This highlights the transformative power of contextualized embeddings in enhancing sentiment analysis.

Conclusion

Ultimately, the decision to opt for fixed embedding models or contextualized embeddings hinges on the nature of the data the model will encounter in real-world scenarios. Fixed embeddings might suffice when dealing with straightforward data, offering good performance while requiring less memory than transformer-based models. However, in cases where the data is more complex, as demonstrated in our test cases, leveraging a transformer-based model with a self-attention mechanism can yield substantial performance improvements. It's crucial to note that this advantage comes at the expense of a higher memory footprint.

About

A comprehensive examination is conducted to assess the influence of homonyms in sentiment analysis, employing two distinct techniques: fixed embeddings (LSTM) and contextualized embeddings (DistilBERT).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages