# Comparison

In this notebook, we will:

- Create and Explore our dataset
- Test our models with the dataset
- Evaluate them
- Compare them based on their score & speed

### Imports

In [5]:
from src.lexicon_absa import LexiconABSA
from src.transformer_absa import ML_ABSA
from src.llm_asba import LLMABSA
import pandas as pd
import time
from data_creation import main as create_dataset
from sklearn.metrics import accuracy_score, f1_score, classification_report
import psutil

### Create & Explore our dataset

This will generate a dataset file that will be based off random entries from 2 datasets: Laptop_Train_v2 and Restaraunts_Train_v2.

### Restaraunts_Train_v2(Restaurant reviews):

This dataset consists of over 3K English sentences from the restaurant reviews of Ganu et al. (2009). The original dataset of Ganu et al. included annotations for coarse aspect categories (Subtask 3) and overall sentence polarities...

### Laptop_Train_v2(Laptop reviews):

This dataset consists of over 3K English sentences extracted from customer reviews of laptops...

More information regarding the dataset: https://www.kaggle.com/datasets/charitarth/semeval-2014-task-4-aspectbasedsentimentanalysis

Credit to the Authors & Organizers of this dataset:

- Ion Androutsopoulos (Athens University of Economics and Business, Greece)

- Dimitris Galanis (“Athena” Research Center, Greece)

- Suresh Manandhar (University of York, UK)

- Harris Papageorgiou (“Athena” Research Center, Greece)

- John Pavlopoulos (Athens University of Economics and Business, Greece)

- Maria Pontiki (“Athena” Research Center, Greece)


In [8]:
print("Creating dataset...")
create_dataset()

df = pd.read_csv("../data/dataset.csv")
print(f" Loaded dataset with {len(df)} rows")
display(df.head(10))

Creating dataset...
✅ dataset.csv created with 30 rows.
 Loaded dataset with 30 rows


Unnamed: 0,id,Sentence,Aspect Term,polarity,from,to
0,7,"Oh great, another update that slows everything...",update,negative,11,17
1,1970,The steak was excellent and one of the best I ...,butter,negative,69,75
2,5,The laptop is light and fast but heats up unde...,laptop,positive,4,10
3,1970,The steak was excellent and one of the best I ...,meat,positive,135,139
4,2882,The sweet lassi was excellent as was the lamb ...,sweet lassi,positive,4,15
5,2882,The sweet lassi was excellent as was the lamb ...,lamb chettinad,positive,41,55
6,8,"The waiter was polite, if you consider ignorin...",waiter,negative,4,10
7,5,The laptop is light and fast but heats up unde...,laptop,negative,35,41
8,2361,"To celebrate a birthday, three of us went to M...",food,neutral,69,73
9,1958,I especially like the keyboard which has chicl...,keyboard,positive,22,30


### Initialize Models & Create an Evaluation function

In [11]:
models = [LexiconABSA(), ML_ABSA(), LLMABSA(), LLMABSA("mistral:7b")]
df = df.drop_duplicates(subset=["Sentence", "Aspect Term", "polarity"]).reset_index(drop=True)
unique = df["Sentence"].unique()

def evaluate_model(model, df):
    y_true, y_pred = [], []
    s_time = time.time()
    process = psutil.Process()
    mem_before = process.memory_info().rss / (1024 ** 2)  # MB (Credit goes to gpt for this line)
    print("=" * 100)
    print(f"\n Evaluating Model: {model.name}")
    print("=" * 100)

    for id, text in enumerate(unique):
        preds = model.analyze(text)
        sub_df = df[df["Sentence"] == text]

        print(f"\nSentence {id+1}: {text}")

        for id2, row in sub_df.iterrows():
            gt_aspect = row["Aspect Term"].strip().lower()
            gt_sentiment = row["polarity"].strip().lower()

            matched_pred = None
            for p in preds:
                aspect_text = (p.aspect or "").lower()
                if gt_aspect in aspect_text:
                    matched_pred = p.sentiment
                    break

            y_true.append(gt_sentiment)
            y_pred.append(matched_pred if matched_pred else "notfound")

            print(f"Truth --> Aspect: {gt_aspect} | Sentiment: {gt_sentiment}")

        if preds:
            for p in preds:
                print(f"Predicted --> Aspect: {p.aspect} | Sentiment: {p.sentiment} | Confidence: {p.confidence:.2f}")
        else:
            print("Predicted --> No aspects found")


    elapsed = time.time() - s_time
    mem_after = process.memory_info().rss / (1024 ** 2)
    mem_used = mem_after - mem_before
    acc = accuracy_score(y_true, y_pred)
    f1 = f1_score(y_true, y_pred, average="macro", zero_division=0)
    print(f" \n Accuracy: {acc:.3f} | F1: {f1:.3f} | Time: {elapsed:.2f}s | Memory: {mem_used:.2f} MB")
    print("\n Classification Report:")
    print(classification_report(y_true, y_pred, digits=3, zero_division=0))
    return acc, f1, elapsed, mem_used



### Start Testing

In [12]:
results = []
for model in models:
    acc, f1, t, memused = evaluate_model(model, df)
    results.append({
        "Model": model.name,
        "Accuracy": acc,
        "F1": f1,
        "Time (s)": t,
        "Memory (MB)": memused
    })

results_df = pd.DataFrame(results)
display(results_df)


 Evaluating Model: LexiconABSA

Sentence 1: Oh great, another update that slows everything down.
Truth --> Aspect: update | Sentiment: negative
Predicted --> No aspects found

Sentence 2: The steak was excellent and one of the best I have had (I tasted the butter intitally but in no way did it overwhelm the flavor of the meat).
Truth --> Aspect: butter | Sentiment: negative
Truth --> Aspect: meat | Sentiment: positive
Truth --> Aspect: flavor | Sentiment: neutral
Truth --> Aspect: steak | Sentiment: positive
Predicted --> Aspect: steak | Sentiment: positive | Confidence: 0.57

Sentence 3: The laptop is light and fast but heats up under load.
Truth --> Aspect: laptop | Sentiment: positive
Truth --> Aspect: laptop | Sentiment: negative
Predicted --> Aspect: laptop | Sentiment: neutral | Confidence: 0.00

Sentence 4: The sweet lassi was excellent as was the lamb chettinad and the garlic naan but the rasamalai was forgettable.
Truth --> Aspect: sweet lassi | Sentiment: positive
Truth --> 

Unnamed: 0,Model,Accuracy,F1,Time (s),Memory (MB)
0,LexiconABSA,0.166667,0.125253,0.21997,1.007812
1,ML_ABSA,0.666667,0.437273,5.090733,328.085938
2,LLMABSA (phi3),0.633333,0.324074,72.791583,-6.417969
3,LLMABSA (mistral:7b),0.7,0.452564,202.543202,-105.414062


### Print Summary

In [13]:
print("Final Comparison")
display(results_df.sort_values(by="F1", ascending=False))

Final Comparison


Unnamed: 0,Model,Accuracy,F1,Time (s),Memory (MB)
3,LLMABSA (mistral:7b),0.7,0.452564,202.543202,-105.414062
1,ML_ABSA,0.666667,0.437273,5.090733,328.085938
2,LLMABSA (phi3),0.633333,0.324074,72.791583,-6.417969
0,LexiconABSA,0.166667,0.125253,0.21997,1.007812
