## Aspect Based Sentiment Analysis (ABSA)

ABSA is used to identify the different aspects of a given target entity (such as a product or service) and the sentiment expressed towards each aspect in customer reviews or other text data. 

ABSA is further divided in two subtasks:

Subtask 1 : It involves identifying aspect terms present in a given sentence containing pre-identified entities, such as restaurants. The goal is to extract the distinct aspect terms that refer to specific aspects of the target entity. Multi-word aspect terms should be considered as single terms.

For example, "I liked the service and the staff, but not the food”, “The food was nothing much, but I loved the staff”. Multi-word aspect terms (e.g., “hard disk”) should be treated as single terms (e.g., in “The hard disk is very noisy” the only aspect term is “hard disk”).


Subtask 2 : It involves determining the polarity (positive, negative, neutral, or conflict) of each aspect term in a given sentence. For a set of aspect terms, the task is to identify their polarity based on the sentiment expressed towards them.

For example:

“I loved their fajitas” → {fajitas: positive}
“I hated their fajitas, but their salads were great” → {fajitas: negative, salads: positive}
“The fajitas are their first plate” → {fajitas: neutral}
“The fajitas were great to taste, but not to see” → {fajitas: conflict}

Subtask 3: It involves identifying the aspect categories discussed in a given sentence from a predefined set of categories such as price, food, service, ambience, and anecdotes/miscellaneous. The aspect categories are coarser than the aspect terms of subtask 1 and may not necessarily occur as terms in the sentence.

For example, given the set of aspect categories {food, service, price, ambience, anecdotes/miscellaneous}:

“The restaurant was too expensive” → {price}
“The restaurant was expensive, but the menu was great” → {price, food}

Subtask 4: It involves determining the polarity of each pre-identified aspect category (e.g., food, price). The goal is to identify the sentiment polarity of each category based on the sentiment expressed towards them in a given sentence.

For example:

“The restaurant was too expensive” → {price: negative}
“The restaurant was expensive, but the menu was great” → {price: negative, food: positive}

In [1]:
# Installing libraries required for this task

import os
import random
import sys
import time
from enum import Enum
from typing import List, Tuple

# vector LLM toolkit
import kscope
import numpy as np
import pandas as pd
import sklearn.metrics
import torch
import tqdm
import transformers
from torch.utils.data import DataLoader, Dataset

# Print version information - check you are using correct environment
print("Python version: " + sys.version)
print("PyTorch version: " + torch.__version__)
print("Transformers version: " + transformers.__version__)

Python version: 3.9.15 (main, Nov 24 2022, 08:29:02) 
[Clang 14.0.6 ]
PyTorch version: 1.13.1
Transformers version: 4.36.2


# Getting Started

Next, we will be starting with connecting to the Kaleidoscope service through which we can connect to the large language model, LLaMA2-7B or other models available on the service. We will also be checking how many models are available to us.

In [2]:
# Establish a client connection to the Kaleidoscope service
client = kscope.Client(gateway_host="llm.cluster.local", gateway_port=3001)

In [3]:
# Checking what models are available in Kaleidoscope
client.models

['gpt2',
 'llama2-7b',
 'llama2-7b_chat',
 'llama2-13b',
 'llama2-13b_chat',
 'llama2-70b',
 'llama2-70b_chat',
 'falcon-7b',
 'falcon-40b',
 'sdxl-turbo']

In [4]:
# Checking how many model instances are active
client.model_instances

[{'id': '9690cf13-9b39-43d0-8ffb-70d8f765591d',
  'name': 'llama2-7b',
  'state': 'ACTIVE'}]

### Loading model and setting up the dataset

In [5]:
# For this notebook, we will be focusing on OPT-175B

model = client.load_model("llama2-7b")
# If this model is not actively running, it will get launched in the background.
# In this case, wait until it moves into an "ACTIVE" state before proceeding.
while model.state != "ACTIVE":
    time.sleep(1)

print("The model is active!")

The model is active!


In [6]:
class CustomDataset(Dataset):
    """
    A class for the dataset
    ...

    Attributes
    ----------
    df : pandas dataframe
        Dataset in the format of a pandas dataframe.
        Ensure it has columns named sentence_with_full_prompt and aspect_term_polarity

    Methods
    -------
    """

    def __init__(self, df: pd.DataFrame) -> None:
        self.df = df

    def __getitem__(self, index: int) -> Tuple[str, str, str]:
        row = self.df.iloc[index]
        text_prompt = row["sentence_with_full_prompt"]
        polarity = row["aspect_term_polarity"]
        text = row["text"]
        return text, text_prompt, polarity

    def __len__(self) -> int:
        return len(self.df)

### Prompt Examples

In next section, some prompt examples are given which gives a demonstration on how to setup a prompt for the task of zero-shot and few-shot.


#### Prompt Setup for Input and Zero-shot examples

* `df['sentence_with_prompt'] = 'Sentence: ' + df['text'] + ' ' + 'Sentiment on ' + df['aspect_term'] + ' is'`


* `df['sentence_with_prompt'] = 'Sentence: ' + df['text'] + ' ' + 'Sentiment on ' + df['aspect_term'] + ' is positive or negative? It is'`


* `df['sentence_with_prompt'] = 'Answer the question using the sentence provided. \nQuestion: What is the sentiment on ' + df['aspect_term'] + ' - positive, negative, or neutral?' + '\nSentence: ' + df['text'] + '\nAnswer:'`


* `df['sentence_with_prompt'] = 'Sentence: ' + df['text'] + ' ' + 'The sentiment associated with ' + df['aspect_term'] + ' is'`


* `df['sentence_with_prompt'] = '\nSentence: ' + df['text'] + ' ' + '\nQuestion: What is the sentiment on ' + df['aspect_term'] + '? \nAnswer:'`

#### Few-shot Examples

The examples below show few-shot demonstrations. These are to be prepended to the input and final question for the model to answer following the examples above. The first two examples below are two- and three-shot examples where the format is completion of "Sentiment on `<aspect term>` is" is used to produce a model response.

* `demonstrations = 'Sentence: Albert Einstein was one of the greatest intellects of his time. Sentiment on Albert Einstein is positive. \nSentence: The sweet lassi was excellent as was the lamb chettinad and the garlic naan but the rasamalai was forgettable. Sentiment on rasmalai is negative. '`


* `demonstrations = 'Sentence: Their pizza was good, but the service was bad. Sentiment on pizza is positive. \nSentence: I charge it at night and skip taking the cord with me because its''s too heavy. Sentiment on cord is negative. \nSentence: My suggestion is to eat family style because you''ll want to try the other dishes. Sentiment on dishes is neutral. '`

The examples below offer several other alternatives for formatting of the prompt to produce a response to the ABSA task from the model in a few-shot setting. These include both sentence completion and question-based forms.

* `demonstrations = 'Sentence: Albert Einstein was one of the greatest intellects of his time. Sentiment on Albert Einstein is positive or negative? It is positive. \nSentence: The sweet lassi was excellent as was the lamb chettinad and the garlic naan but the rasamalai was forgettable. Sentiment on rasmalai is positive or negative? It is negative. '`


* `promptStarting = 'Sentence: Albert Einstein was one of the greatest intellects of his time. The sentiment associated with Albert Einstein is positive. \nSentence: The sweet lassi was excellent as was the lamb chettinad and the garlic naan but the rasamalai was forgettable. The sentiment associated with rasmalai is negative. '`


* `demonstrations = 'Sentence: Albert Einstein was one of the greatest intellects of his time. The sentiment associated with Albert Einstein is positive. \nSentence: The sweet lassi was excellent as was the lamb chettinad and the garlic naan but the rasamalai was forgettable. The sentiment associated with rasmalai is negative. \nSentence: I had my software installed and ready to go, but the system crashed. The sentiment associated with software is neutral. '`


* `demonstrations = 'Sentence: Albert Einstein was one of the greatest intellects of his time. \nQuestion: What is the sentiment on Albert Einstein? \nAnswer: postive \n\nSentence: The sweet lassi was excellent as was the lamb chettinad and the garlic naan but the rasamalai was forgettable. \nQuestion: What is the sentiment on rasmalai? \nAnswer: negative '`

In this notebook we will be showing accuracy on two different approaches: Few-shot and Zero-shot prompting.

* Zero-shot approach ["`zero-shot`"] (i.e. give the model the input sentence and ask for the sentiment).
* Few-shot approach ["`few-shot`"] (i.e. give some example sentences for the model to determine what should come next for the input sentence, then ask for the sentiment of a new example).

In [7]:
# Types of prompt to be used.
class PromptType(Enum):
    FEW_SHOT = "few-shot"
    ZERO_SHOT = "zero-shot"


# Datasets available to use with this notebook.
class AbsaDataset(Enum):
    LAPTOPS_TRAIN_GOLD = "Laptops_Train_v2.csv"
    # This csv file contains customer reviews of laptops collected in 2014 with size of around 469
    LAPTOPS_TEST_GOLD = "Laptops_Test_Gold.csv"


def get_dataset_path(dataset: AbsaDataset) -> str:
    # location of all available datasets.
    path_stub = "resources/absa_datasets/"
    return os.path.join(path_stub, dataset.value)

In [8]:
generation_type = PromptType.FEW_SHOT
absa_dataset = AbsaDataset.LAPTOPS_TEST_GOLD

dataset_path = get_dataset_path(absa_dataset)

## Dataset Preprocessing and Setting up the Prompts

In [9]:
def filter_by_labels(label: str, filter_by: List[str] = ["positive", "negative", "neutral"]) -> bool:
    label_matches = [label_filter in label for label_filter in filter_by]
    return any(label_matches)

In [10]:
df = pd.read_csv(dataset_path)

print("----------------------------------------------------------------")
df.info()
print()

# Delete any rows with null values
df = df.dropna(axis=0, how="any", subset=["aspect_term", "aspect_term_polarity"])

# Set the prompt format for the input sentence (drawn from one of the example from above)
df["sentence_with_prompt"] = (
    "Sentence: "
    + df["text"]
    + " "
    + "Is the sentiment on "
    + df["aspect_term"]
    + " positive, negative, or neutral? It is"
)

# Make sure to index instances with positive, negative, neutral as polarity
df = df.loc[df["aspect_term_polarity"].apply(lambda x: filter_by_labels(x))]

# Three shot demonstrations to include if we're doing a few-shot prompt
demonstrations = (
    "Sentence: Albert Einstein was one of the greatest intellects of his time. Is the sentiment on Albert Einstein "
    "positive, negative, or neutral? It is positive. \nSentence: The sweet lassi was excellent as was the lamb "
    "chettinad and the garlic naan but the rasamalai was forgettable. Is the sentiment on rasmalai positive, "
    "negative, or neutral? It is negative. \nSentence: I had my software installed and ready to go, but the system "
    "crashed. Is the sentiment on the software positive, negative, or neutral? It is neutral."
)
# for few-shot, we give more context to the model to improve the model performance and generalizability.
if generation_type is PromptType.FEW_SHOT:
    df["sentence_with_full_prompt"] = demonstrations + "\n" + df["sentence_with_prompt"]
elif generation_type is PromptType.ZERO_SHOT:
    df["sentence_with_full_prompt"] = df["sentence_with_prompt"]
else:
    raise ValueError("Invalid generation type: Please select from zero-shot or few-shot.")

df.info()
print(f"Unique Labels: {df['aspect_term_polarity'].unique()}")

----------------------------------------------------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1032 entries, 0 to 1031
Data columns (total 6 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   id                    800 non-null    object 
 1   text                  1032 non-null   object 
 2   aspect_term           654 non-null    object 
 3   aspect_term_polarity  654 non-null    object 
 4   aspect_term_from      654 non-null    float64
 5   aspect_term_to        654 non-null    float64
dtypes: float64(2), object(4)
memory usage: 48.5+ KB

<class 'pandas.core.frame.DataFrame'>
Int64Index: 638 entries, 0 to 1031
Data columns (total 8 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   id                         410 non-null    object 
 1   text                       638 non-null    object 
 2   aspect_term                638 no

## Prompt the language model for the full dataset

Let's first take a look at an example of the prompts that we have created above.

In [11]:
# Construct the dataloader with custom created dataset as input
data = CustomDataset(df)
dataloader = DataLoader(data, batch_size=2)

# Grab the first example from the dataloader for inspection
text, text_prompt, polarity = next(iter(dataloader))
print(f"ORIGINAL TEXT: {text[0]}\n")
print(f"PROMPT: {text_prompt[0]}\n")
print(f"LABEL: {polarity[0]}")

ORIGINAL TEXT: Boot time is super fast, around anywhere from 35 seconds to 1 minute.

PROMPT: Sentence: Albert Einstein was one of the greatest intellects of his time. Is the sentiment on Albert Einstein positive, negative, or neutral? It is positive. 
Sentence: The sweet lassi was excellent as was the lamb chettinad and the garlic naan but the rasamalai was forgettable. Is the sentiment on rasmalai positive, negative, or neutral? It is negative. 
Sentence: I had my software installed and ready to go, but the system crashed. Is the sentiment on the software positive, negative, or neutral? It is neutral.
Sentence: Boot time is super fast, around anywhere from 35 seconds to 1 minute. Is the sentiment on Boot time positive, negative, or neutral? It is

LABEL: positive


In [12]:
# Create configuration for the model. We're only looking for a short response. So we set the max tokens to be
# generated to 1. For a discussion of the configuration parameters see:
# src/reference_implementations/prompting_vector_llms/CONFIG_README.md
generation_config = {"max_tokens": 1, "top_k": 1, "top_p": 1.0, "temperature": 1.0}

We'll first consider an example to see what the output looks like. Note that we sent the model a batch of prompts of size 2

In [13]:
text, text_prompt, polarity = next(iter(dataloader))
generated_tokens_batch = model.generate(text_prompt, generation_config).generation["tokens"]
for index, prompt_tokens in enumerate(generated_tokens_batch):
    print(f"PROMPT TEXT: {text_prompt[index]}")
    print(f"Prompt {index + 1} GENERATED TOKENS: {prompt_tokens}\n")

PROMPT TEXT: Sentence: Albert Einstein was one of the greatest intellects of his time. Is the sentiment on Albert Einstein positive, negative, or neutral? It is positive. 
Sentence: The sweet lassi was excellent as was the lamb chettinad and the garlic naan but the rasamalai was forgettable. Is the sentiment on rasmalai positive, negative, or neutral? It is negative. 
Sentence: I had my software installed and ready to go, but the system crashed. Is the sentiment on the software positive, negative, or neutral? It is neutral.
Sentence: Boot time is super fast, around anywhere from 35 seconds to 1 minute. Is the sentiment on Boot time positive, negative, or neutral? It is
Prompt 1 GENERATED TOKENS: ['positive']

PROMPT TEXT: Sentence: Albert Einstein was one of the greatest intellects of his time. Is the sentiment on Albert Einstein positive, negative, or neutral? It is positive. 
Sentence: The sweet lassi was excellent as was the lamb chettinad and the garlic naan but the rasamalai was f

In [14]:
# create a dataloader with a larger batch size to process full dataset.
dataloader = DataLoader(data, batch_size=10)
# initialize predictions and labels.
raw_predictions = []
labels = []

for _, text_prompt, polarity in tqdm.notebook.tqdm(dataloader):
    generated_tokens = model.generate(text_prompt, generation_config).generation["tokens"]
    # Note that we are looking at the models generated response and attempting to match it to one of the labels in
    # our labels space. If the model produces a different token it is considered wrong.
    first_predicted_tokens = [tokens[0].strip().lower() for tokens in generated_tokens]
    raw_predictions.extend(first_predicted_tokens)
    labels.extend(list(polarity))

  0%|          | 0/64 [00:00<?, ?it/s]

In [15]:
# Postprocess the predictions. If any of the predictions are not the strings "positive", "negative", or "neutral" then
# we will assign them to one of them randomly.
label_strings = ["positive", "negative", "neutral"]
predictions = []
for prediction in raw_predictions:
    if prediction not in label_strings:
        print(f"Prediction {prediction} does not match one of {', '.join(label_strings)}")
        predictions.append(random.choice(label_strings))
    else:
        predictions.append(prediction)

Prediction nothing does not match one of positive, negative, neutral
Prediction pos does not match one of positive, negative, neutral
Prediction pos does not match one of positive, negative, neutral
Prediction definitely does not match one of positive, negative, neutral
Prediction np does not match one of positive, negative, neutral
Prediction not does not match one of positive, negative, neutral
Prediction slightly does not match one of positive, negative, neutral
Prediction mixed does not match one of positive, negative, neutral
Prediction still does not match one of positive, negative, neutral
Prediction a does not match one of positive, negative, neutral


## Measure the accuracy and construct the confusion matrix

In [16]:
# The labels associated with the dataset
labels_order = ["positive", "neutral", "negative"]

cm = sklearn.metrics.confusion_matrix(np.array(labels), np.array(predictions), labels=labels_order)

FP = cm.sum(axis=0) - np.diag(cm)
FN = cm.sum(axis=1) - np.diag(cm)
TP = np.diag(cm)

recall = TP / (TP + FN)
precision = TP / (TP + FP)
f1 = 2 * (precision * recall) / (precision + recall)
print(f"Prediction Accuracy: {TP.sum()/(cm.sum())}")

print(f"Confusion Matrix with ordering {labels_order}")
print(cm)
print("========================================================")
for label_index, label_name in enumerate(labels_order):
    print(
        f"Label: {label_name}, F1: {f1[label_index]}, Precision: {precision[label_index]}, "
        f"Recall: {recall[label_index]}"
    )

Prediction Accuracy: 0.5031347962382445
Confusion Matrix with ordering ['positive', 'neutral', 'negative']
[[203 104  34]
 [ 67  66  36]
 [ 40  36  52]]
Label: positive, F1: 0.6236559139784946, Precision: 0.6548387096774193, Recall: 0.5953079178885631
Label: neutral, F1: 0.352, Precision: 0.32038834951456313, Recall: 0.3905325443786982
Label: negative, F1: 0.4159999999999999, Precision: 0.4262295081967213, Recall: 0.40625


The model performs the task well above random guesses (33%) and seems to generate responses in the space of labels we are targetting (as most answers are in the desired label space). However, there is room for improvement, especially since accuracy associated with predicting positive only is around 53% given the label balance.