# Exploration

### In this notebook we'll take a look at the 3 implementations and test them on 3 different examples & compare the results.

First we'll import all 3 implementations

In [6]:
from src.lexicon_absa import LexiconABSA
from src.transformer_absa import ML_ABSA
from src.llm_asba import LLMABSA

Now we will initiate the models, and create our basic test, containing 3 samples, 1 Simple case, 1 Complex case, and 1 Edge case.

In [3]:
models = [LexiconABSA(), ML_ABSA(), LLMABSA(), LLMABSA("mistral:7b")]
tests = [
    "The movie was a masterpiece — I almost fell asleep.",
    "The camera quality is amazing but the battery life is awful.",
    "The restaurant has a modern interior and the food is fine."
]



Finally, iterate through our models and use them with our test sample.

In [4]:
for model in models:
    print("=" * 100)
    print(f"Model: {model.name}\n")

    for sample in tests:
        print(f"Sentence: {sample}\n")
        results = model.analyze(sample)
        if not results:
            print("No aspects found!\n")
        else:
            for r in results:
                print(f" Aspect: {r.aspect:25} | Sentiment: {r.sentiment:8} | "f"Confidence: {r.confidence:.2f} | Span: {r.text_span}")
            print()

Model: LexiconABSA

Sentence: The movie was a masterpiece — I almost fell asleep.

No aspects found!

Sentence: The camera quality is amazing but the battery life is awful.

 Aspect: quality                   | Sentiment: positive | Confidence: 0.59 | Span: (11, 18)
 Aspect: life                      | Sentiment: negative | Confidence: 0.46 | Span: (46, 50)

Sentence: The restaurant has a modern interior and the food is fine.

 Aspect: interior                  | Sentiment: neutral  | Confidence: 0.00 | Span: (28, 36)
 Aspect: food                      | Sentiment: positive | Confidence: 0.20 | Span: (45, 49)

Model: ML_ABSA

Sentence: The movie was a masterpiece — I almost fell asleep.

 Aspect: The movie                 | Sentiment: positive | Confidence: 0.99 | Span: (0, 9)
 Aspect: a masterpiece             | Sentiment: positive | Confidence: 0.97 | Span: (14, 27)

Sentence: The camera quality is amazing but the battery life is awful.

 Aspect: The camera quality        | Sentiment

# Results

### Lexicon ASBA

The LexiconABSA model uses a rule based approach, using spaCy to identify aspects and VADER for sentiment scoring.
It identifies nouns and noun phrases as potential aspects and then looks for adjectives or verbs that modify them.
It then calculates the sentiment score of those modifiers using VADER’s polarity scores, in order to determine whether the aspect is expressed positively, negatively, or neutral.

Because it relies purely on lexical patterns and predefined sentiment scores, its understanding of language is literal. It performs best when opinions are clearly stated, for example “The battery is terrible” or “The food was amazing.”

Looking at the results:

- “The movie was a masterpiece — I almost fell asleep,” the model found no aspects. This is expected, since sarcasm is implicit, not expressed through negative words. VADER only reads direct sentiment terms, so it fails to interpret the irony that “masterpiece” is used sarcastically.

- “The camera quality is amazing but the battery life is awful,” it correctly extracted quality (positive) and life (negative). However, it didn’t capture the full noun phrases (“camera quality”, “battery life”).

- “The restaurant has a modern interior and the food is fine,” it identified interior and food, with food being positive(fine), but interior neutral, that's because adjectives like “modern” score near neutral in VADER’s lexicon.

### ML ABSA

The ML ABSA model uses a transformer based approach, using a pretrained model from Hugging Face.
It uses spaCy to identify potential aspects in the text, and then pairs each with the full sentence to predict sentiment using the pretrained model.

The model uses contextual embeddings to understand the relationship between aspects and opinions, allowing it to handle more complex grammar & sentences compared to the lexicon-based approach.
But since it's a supervised classifier, it tends to interpret text literally and ends up suffering with sarcasm.

Looking at the results:

- “The movie was a masterpiece — I almost fell asleep,” both “The movie” and “a masterpiece” were labeled positive, showing that the model interprets sentiment directly from the positive adjectives and fails to detect sarcasm.

- “The camera quality is amazing but the battery life is awful,” it performs perfectly, identifying both aspects and assigning the correct positive and negative sentiments with high confidence.

- “The restaurant has a modern interior and the food is fine,” it correctly detected all aspects*(restaurant, interior, food), and gave the correct sentiment label for food(positive) but neutral for restaurant & interior, that's because no explicit opinion is expressed toward the restaurant itself, marking it as neutral. As for interior, the word "modern" carries a very weak sentiment polarity, just like with VADER.
NOTE: The aspect detection limitation (detecting "modern interior" as an aspect) is due to design choice using noun chunking, and not the fault of the pretrained model.

### LLM ASBA

The LLM ABSA models use large language models(via Ollama) to perform aspect based sentiment analysis by giving them custom prompts.

I’ve chosen two different LLMs:

- Phi-3 (being the lightest one at ~2.2GB)
- Mistral 7B (larger & more resource heavy at ~4.1GB)

To compare how models with different sizes and reasoning capabilities respond to the same prompt.

Looking at the results:

“The movie was a masterpiece — I almost fell asleep,”

Phi-3 managed to understand the sarcasm, correctly detecting the overall negative sentiment towards the movie, but marks falling asleep as positive, interpreting the literal action instead of the context(But then again there's nothing wrong with falling asleep, after all, everyone loves sleeping, so its positive!).

Mistral on the other hand does the opposite, treating movie as positive and sleeping as negative.

“The camera quality is amazing but the battery life is awful,” both models perform perfectly, extracting camera quality (positive) and battery life (negative) with strong confidence.

“The restaurant has a modern interior and the food is fine,” both models correctly extract the aspects (interior and food) and assign the proper sentiments.

With the main difference here being the confidence in food quality, Phi-3 rated the food quality as neutral with 0.85 confidence, and the Mistral at 0.65
