# Qualitative Analysis of Product Consumer Reviews using Aspect-Based Sentiment Analysis

Using Aspect Based Sentiment Analysis, consumer reviews can be analyzed efficiently to identify new market opportunities and insight on consumer needs. By analyzing consumer reviews, and comparing competitor reviews as well, pain points and areas for development can be shown to any client informing them on current consumer whitespace.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## 0. Project Setup

In [None]:
!pip install transformers==4.40.2 torch accelerate spacy "setfit[absa]"

Collecting transformers==4.40.2
  Downloading transformers-4.40.2-py3-none-any.whl (9.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.0/9.0 MB[0m [31m30.8 MB/s[0m eta [36m0:00:00[0m
Collecting accelerate
  Downloading accelerate-0.32.1-py3-none-any.whl (314 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m314.1/314.1 kB[0m [31m29.1 MB/s[0m eta [36m0:00:00[0m
Collecting setfit[absa]
  Downloading setfit-1.0.3-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.9/75.9 kB[0m [31m11.0 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_cupti_cu12-1

In [None]:
!python -m spacy download en_core_web_lg

Collecting en-core-web-lg==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.7.1/en_core_web_lg-3.7.1-py3-none-any.whl (587.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m587.7/587.7 MB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: en-core-web-lg
Successfully installed en-core-web-lg-3.7.1
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_lg')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


In [None]:
import transformers
import torch

## 1. LLaMa 3

In [None]:
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"

pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

terminators = [
    pipeline.tokenizer.eos_token_id,
    pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [None]:
messages = [
    {"role": "system", "content": "You are a specialist in Aspect-Based Sentiment Analysis able to identify key aspects and assign sentiments with numerical rankings to each aspect"},
    {"role": "user", "content": "Perform Aspect-Based Sentiment Analysis on this review:I've been drinking this particular soda for as long as I can remember. It's the kind of drink that seems to define what a classic soda should taste like. The consistency in flavor from bottle to bottle is impressive, and the aftertaste is just right – not too sweet, not too bitter. However, the high calorie and sugar content is becoming a bigger issue for me as I try to adopt a healthier lifestyle. While I enjoy the taste and the nostalgia it brings, I'm increasingly looking for alternatives that offer a similar experience without the health drawbacks. The packaging is a positive aspect, though, as it's clear the company has put thought into environmental impact."},
]

In [None]:
outputs = pipeline(
    messages,
    max_new_tokens=512,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.6,
    top_p=0.9,
)

print(outputs[0]["generated_text"][-1]["content"])

Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


I'd be happy to perform Aspect-Based Sentiment Analysis on this review. Here are the results:

**Aspects:**

1. Taste
2. Consistency
3. Aftertaste
4. Calorie and Sugar Content
5. Packaging
6. Nostalgia

**Sentiment Rankings:**

1. Taste: 4.5/5 (Positive)
The reviewer describes the taste as "classic" and "just right", indicating a high level of satisfaction.
2. Consistency: 4.5/5 (Positive)
The reviewer praises the consistency in flavor from bottle to bottle, indicating a high level of reliability.
3. Aftertaste: 4.5/5 (Positive)
The reviewer finds the aftertaste to be "just right", neither too sweet nor too bitter, indicating a positive experience.
4. Calorie and Sugar Content: 2.5/5 (Neutral/Negative)
The reviewer mentions that the high calorie and sugar content is becoming a bigger issue for them, indicating a negative sentiment.
5. Packaging: 4.5/5 (Positive)
The reviewer praises the packaging, noting that the company has put thought into environmental impact, indicating a positive 

# 3. ABSA using SetFit

In [None]:
from datasets import load_dataset
from setfit import AbsaTrainer, AbsaModel, TrainingArguments

train_dataset = load_dataset("tomaarsen/setfit-absa-semeval-restaurants", split="train[:128]")
eval_dataset = load_dataset("tomaarsen/setfit-absa-semeval-restaurants", split="train[128:256]")

model = AbsaModel.from_pretrained("sentence-transformers/LaBSE")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/4.76k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/147k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/46.6k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/3693 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1134 [00:00<?, ? examples/s]

config.json:   0%|          | 0.00/804 [00:00<?, ?B/s]

modules.json:   0%|          | 0.00/461 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/2.22k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]



pytorch_model.bin:   0%|          | 0.00/1.88G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/397 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/5.22M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.62M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.36M [00:00<?, ?B/s]

2_Dense/config.json:   0%|          | 0.00/114 [00:00<?, ?B/s]

model_head.pkl not found on HuggingFace Hub, initialising classification head with random weights. You should TRAIN this model on a downstream task to use it for predictions and inference.
model_head.pkl not found on HuggingFace Hub, initialising classification head with random weights. You should TRAIN this model on a downstream task to use it for predictions and inference.


In [None]:
from transformers import EarlyStoppingCallback

args = TrainingArguments(
    evaluation_strategy="steps",
    num_epochs=5,
    use_amp=True,
    batch_size=8,
    eval_steps=50,
    save_steps=50,
    load_best_model_at_end=True,
)

trainer = AbsaTrainer(
    model,
    args=args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=5)],
)

trainer.train()

  and should_run_async(code)


Map:   0%|          | 0/199 [00:00<?, ? examples/s]

Map:   0%|          | 0/128 [00:00<?, ? examples/s]

***** Running training *****
  Num unique pairs = 21624
  Batch size = 8
  Num epochs = 5
  Total optimization steps = 13515


Step,Training Loss,Validation Loss,Embedding Loss,Rate
50,No log,No log,0.2581,1e-06
100,No log,No log,0.253,1e-06
150,No log,No log,0.2523,2e-06
200,No log,No log,0.2511,3e-06
250,No log,No log,0.2499,4e-06
300,No log,No log,0.2433,4e-06
350,No log,No log,0.2239,5e-06
400,No log,No log,0.2019,6e-06
450,No log,No log,0.1986,7e-06
500,No log,No log,0.1794,7e-06


  0%|          | 0/3270 [00:00<?, ?it/s]

  0%|          | 0/3270 [00:00<?, ?it/s]

  0%|          | 0/3270 [00:00<?, ?it/s]

  0%|          | 0/3270 [00:00<?, ?it/s]

  0%|          | 0/3270 [00:00<?, ?it/s]

  0%|          | 0/3270 [00:00<?, ?it/s]

  0%|          | 0/3270 [00:00<?, ?it/s]

  0%|          | 0/3270 [00:00<?, ?it/s]

  0%|          | 0/3270 [00:00<?, ?it/s]

  0%|          | 0/3270 [00:00<?, ?it/s]

  0%|          | 0/3270 [00:00<?, ?it/s]

  0%|          | 0/3270 [00:00<?, ?it/s]

  0%|          | 0/3270 [00:00<?, ?it/s]

  0%|          | 0/3270 [00:00<?, ?it/s]

  0%|          | 0/3270 [00:00<?, ?it/s]

  0%|          | 0/3270 [00:00<?, ?it/s]

  0%|          | 0/3270 [00:00<?, ?it/s]

Loading best SentenceTransformer model from step 600.
***** Running training *****
  Num unique pairs = 8670
  Batch size = 8
  Num epochs = 5
  Total optimization steps = 5420


OutOfMemoryError: CUDA out of memory. Tried to allocate 1.44 GiB. GPU 

In [None]:
model.save_pretrained(
    "/content/drive/MyDrive/ABSA/models/setfit-absa-model-aspect",
    "/content/drive/MyDrive/ABSA/models/setfit-absa-model-polarity"
)

In [None]:
from setfit import AbsaModel

model = AbsaModel.from_pretrained(
    "/content/drive/MyDrive/ABSA/models/setfit-absa-model-aspect",
    "/content/drive/MyDrive/ABSA/models/setfit-absa-model-polarity"
)

In [None]:
import pandas as pd

a_df = pd.read_csv("/content/drive/MyDrive/ABSA/a_reviews.csv", encoding='latin-1')
display(a_df.head())
b_df = pd.read_csv("/content/drive/MyDrive/ABSA/b_reviews.csv", encoding='latin-1')
display(b_df.head())
c_df = pd.read_csv("/content/drive/MyDrive/ABSA/c_reviews.csv", encoding='latin-1')
display(c_df.head())

Unnamed: 0,Beverage_Company_A_Pepper,Good_Features,Bad_features
0,"I've been a loyal fan of this soda for years, ...",taste,"ingredients, packaging"
1,The aftertaste lingers a bit longer than I'd l...,"taste, price",aftertaste
2,I'm not a fan of the packaging. It's hard to o...,taste,packaging
3,"It's readily available, which I appreciate. Th...","availability , taste",ingredients
4,"The taste has been consistent over the years, ...",taste,ingredients


Unnamed: 0,Beverage_Company_B_Coke,Good_Features,Bad_features
0,"The price is fair, and the taste is consistent...","price, taste","nutritional content, ingredients"
1,"The taste is classic and always refreshing, bu...","taste, packaging",nutritional content
2,The sweetness is a bit much for me. I enjoy th...,,sugar content
3,The clean aftertaste and simple ingredients ma...,"aftertaste, ingredients",nutritional content
4,"This soda's taste is consistent, which I appre...",taste,nutritional content


Unnamed: 0,Beverage_Company_C_Arizona,Good_Features,Bad_features
0,"As someone who enjoys a wide variety of teas, ...","taste, packaging","availability, aftertaste"
1,"The taste is good, and the packaging is some o...","taste, packaging",availability
2,"The variety keeps me coming back, and the ingr...","variety, ingredients",aftertaste
3,"The drink is usually easy to find, and the nut...","availability, nutritional content",
4,"The drink leaves a bad aftertaste, and the lac...",,"aftertaste, availability"


In [None]:
a_preds = model.predict(a_df.iloc[:, 0].dropna())
b_preds = model.predict(b_df.iloc[:, 0].dropna())
c_preds = model.predict(c_df.iloc[:, 0].dropna())

print(a_preds[0])
print(b_preds[0])
print(c_preds[0])

  and should_run_async(code)


[{'span': 'soda', 'polarity': 'neutral'}, {'span': 'sweetness', 'polarity': 'positive'}, {'span': 'spice', 'polarity': 'positive'}, {'span': 'ingredients', 'polarity': 'negative'}, {'span': 'packaging design', 'polarity': 'negative'}]
[{'span': 'price', 'polarity': 'positive'}, {'span': 'sugar content', 'polarity': 'negative'}]
[{'span': 'teas', 'polarity': 'positive'}, {'span': 'flavor offerings', 'polarity': 'positive'}, {'span': 'ingredients', 'polarity': 'neutral'}, {'span': 'flavor', 'polarity': 'positive'}, {'span': 'soda', 'polarity': 'neutral'}, {'span': 'packaging', 'polarity': 'neutral'}, {'span': 'flavors', 'polarity': 'negative'}, {'span': 'aftertaste', 'polarity': 'negative'}, {'span': 'flavors', 'polarity': 'neutral'}]


In [None]:
import numpy as np

aspect_dict = {}

for i in a_preds:
  for j in i:
    if j["span"] not in aspect_dict:
      aspect_dict[j["span"]] = {"positive": 0, "neutral": 0, "negative": 0}

    if j["polarity"] == "positive":
      aspect_dict[j["span"]]["positive"] += 1
    elif j["polarity"] == "neutral":
      aspect_dict[j["span"]]["neutral"] += 1
    else:
      aspect_dict[j["span"]]["negative"] += 1

aspect_df = pd.DataFrame.from_dict(aspect_dict)
aspect_df = aspect_df.T
aspect_df.sort_values("positive", ascending=False)
aspect_df.to_csv("/content/drive/MyDrive/ABSA/a_aspects.csv")
display(aspect_df.head())

aspect_dict = {}

for i in b_preds:
  for j in i:
    if j["span"] not in aspect_dict:
      aspect_dict[j["span"]] = {"positive": 0, "neutral": 0, "negative": 0}

    if j["polarity"] == "positive":
      aspect_dict[j["span"]]["positive"] += 1
    elif j["polarity"] == "neutral":
      aspect_dict[j["span"]]["neutral"] += 1
    else:
      aspect_dict[j["span"]]["negative"] += 1

aspect_df = pd.DataFrame.from_dict(aspect_dict)
aspect_df = aspect_df.T
aspect_df.sort_values("positive", ascending=False)
aspect_df.to_csv("/content/drive/MyDrive/ABSA/b_aspects.csv")
display(aspect_df.head())

aspect_dict = {}

for i in c_preds:
  for j in i:
    if j["span"] not in aspect_dict:
      aspect_dict[j["span"]] = {"positive": 0, "neutral": 0, "negative": 0}

    if j["polarity"] == "positive":
      aspect_dict[j["span"]]["positive"] += 1
    elif j["polarity"] == "neutral":
      aspect_dict[j["span"]]["neutral"] += 1
    else:
      aspect_dict[j["span"]]["negative"] += 1

aspect_df = pd.DataFrame.from_dict(aspect_dict)
aspect_df = aspect_df.T
aspect_df.sort_values("positive", ascending=False)
aspect_df.to_csv("/content/drive/MyDrive/ABSA/c_aspects.csv")
display(aspect_df.head())

Unnamed: 0,positive,neutral,negative
soda,3,6,0
sweetness,1,0,0
spice,1,0,0
ingredients,0,2,3
packaging design,0,0,1


Unnamed: 0,positive,neutral,negative
price,2,0,0
sugar content,0,2,1
calorie intake,0,1,0
packaging,4,0,0
can,0,1,0


Unnamed: 0,positive,neutral,negative
teas,1,0,0
flavor offerings,1,0,0
ingredients,4,1,1
flavor,2,0,0
soda,0,1,0
