<a href="https://colab.research.google.com/github/UCREL/Session_5_Large_Language_Models/blob/main/LLM_Sentiment_Analysis_Part2_Answers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Sentiment Analysis using Large Language Models (LLMs) - Part II

## Initial Setup

In [None]:
# Logging in to Hugging Face to access LLMs
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) n
Token is valid (permission: read).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [None]:
# Installing libraries which are not available in the Colab environment
!pip install -q accelerate

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m309.4/309.4 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m21.3/21.3 MB[0m [31m59.9 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
# Importing libraries
import pandas as pd

from transformers import AutoTokenizer, pipeline
import torch

from pprint import pprint
from tqdm.auto import tqdm
from sklearn import metrics

import time

## Sentiment Analysis using [Meta-LLaMA-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)

### Load Datasets

(The datasets used for this tutorial are a subset of the [Amazon Fine Food Reviews dataset](https://www.kaggle.com/datasets/snap/amazon-fine-food-reviews).)

In [None]:
train_data = pd.read_csv('https://raw.githubusercontent.com/UCREL/Session_5_Large_Language_Models/main/data/train-sa.csv')
test_data = pd.read_csv('https://raw.githubusercontent.com/UCREL/Session_5_Large_Language_Models/main/data/test-sa.csv')

print(f'train: {train_data.shape}')
print(f'test: {test_data.shape}')

train: (10, 2)
test: (100, 2)


### Define Pipeline

In [None]:
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"

pipe_lm = pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/654 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/187 [00:00<?, ?B/s]



tokenizer_config.json:   0%|          | 0.00/51.0k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [None]:
def query(pipe, inputs):
  """
  :param pipe: text-generation pipeline
  :param model_folder_path: list of messages
  :return: list
  """
  assistant_outputs = []

  for out in tqdm(pipe(
      inputs,
      max_new_tokens=50,
      pad_token_id = pipe.model.config.eos_token_id,
  )):
    assistant_outputs.append(out[0]["generated_text"][-1]['content'].strip())

  return assistant_outputs

### Exercise 2: Zero-shot Prompting

* Predict the sentiment of each review in the test dataset using zero-shot prompting with the LLaMA model.
* Calculate the accuracy of the predictions and compare it with the accuracy obtained from zero-shot prompting with the Mistral model.

In [None]:
# Answer - format chat prompts
def format_chat(row):
  return [
    {"role": "system", "content": "Please perform Sentiment Classification task. Given the text, assign a sentiment label from ['negative', 'positive']. Return label only without any other text."},
    {"role": "user", "content": row['text']}
  ]

test_data.loc[:, 'chat'] = test_data.apply(format_chat, axis=1)
pprint(test_data.loc[:1, 'chat'].tolist(), sort_dicts=False)

[[{'role': 'system',
   'content': 'Please perform Sentiment Classification task. Given the text, '
              "assign a sentiment label from ['negative', 'positive']. Return "
              'label only without any other text.'},
  {'role': 'user',
   'content': 'My young Keeshond puppy is a master of the Tug-a-Jug, but even '
              'my 10 year old Border Collie loves the thing!  She will play '
              'with it until every last bit of kibble is liberated.  After '
              'many hours of play, the jug is a bit scratched, but the rope '
              'remains intact.  My dogs do not chew the rope (or I would '
              'remove this toy and replace with something more appropriate). '
              'They seem to recognize that the rope is helpful in removing '
              'kibble from the jug.  Tug-a-Jug is definitely a hit!  It keeps '
              'my very busy puppy occupied for 45-60 minutes per refill.'}],
 [{'role': 'system',
   'content': 'Please perf

In [None]:
# Answer - perform text generation
start_time = time.time()
predictions = query(pipe_lm, test_data['chat'].tolist())
print(f'Time: {int(time.time() - start_time)} seconds')

print(predictions)

  0%|          | 0/100 [00:00<?, ?it/s]

Time: 299 seconds
['positive', 'negative', 'negative', 'positive', 'negative', 'negative', 'negative', 'positive', 'positive', 'positive', 'positive', 'positive', 'positive', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'positive', 'positive', 'positive', 'positive', 'negative', 'positive', 'positive', 'negative', 'positive', 'negative', 'positive', 'positive', 'negative', 'negative', 'negative', 'negative', 'negative', 'positive', 'negative', 'positive', 'positive', 'negative', 'negative', 'positive', 'positive', 'positive', 'positive', 'positive', 'positive', 'negative', 'positive', 'positive', 'negative', 'negative', 'negative', 'negative', 'negative', 'positive', 'positive', 'positive', 'negative', 'positive', 'negative', 'negative', 'negative', 'negative', 'negative', 'positive', 'positive', 'negative', 'positive', 'positive', 'negative', 'negative', 'negative', 'positive', 'negative', 'negative', 'positive', 'negative', 'positive', 'negative

In [None]:
# Answer - evaluate
accuracy = metrics.accuracy_score(test_data['sentiment'], predictions)
print(f'Accuracy: {accuracy}')

Accuracy: 0.96


### Exercise 3: Few-shot Prompting

* Predict the sentiment of each review in the test dataset using few-shot prompting (two-shots) with the LLaMA model.
* Calculate the accuracy of the predictions and compare it with the accuracy obtained from few-shot prompting with the Mistral model.

In [None]:
# Answer - format few shots
few_shot_data = train_data.head(2)
few_shots = []
for fs_index, fs_row in few_shot_data.iterrows():
  few_shots.append({"role": "user", "content": fs_row['text']})
  few_shots.append({"role": "assistant", "content": fs_row['sentiment']})

few_shots

[{'role': 'user',
  'content': 'I have bought several of the Vitality canned dog food products and have found them all to be of good quality. The product looks more like a stew than a processed meat and it smells better. My Labrador is finicky and she appreciates this product better than  most.'},
 {'role': 'assistant', 'content': 'positive'},
 {'role': 'user',
  'content': 'Product arrived labeled as Jumbo Salted Peanuts...the peanuts were actually small sized unsalted. Not sure if this was an error or if the vendor intended to represent the product as "Jumbo".'},
 {'role': 'assistant', 'content': 'negative'}]

In [None]:
# Answer - format chat prompts
def format_chat(row):
  system_message = [{"role": "system", "content": "Please perform Sentiment Classification task. Given the text, assign a sentiment label from ['negative', 'positive']. Return label only without any other text."}]
  user_message = [{"role": "user", "content": row['text']}]
  return system_message + few_shots + user_message

test_data.loc[:, 'chat'] = test_data.apply(format_chat, axis=1)
pprint(test_data.loc[:1, 'chat'].tolist(), sort_dicts=False)

[[{'role': 'system',
   'content': 'Please perform Sentiment Classification task. Given the text, '
              "assign a sentiment label from ['negative', 'positive']. Return "
              'label only without any other text.'},
  {'role': 'user',
   'content': 'I have bought several of the Vitality canned dog food products '
              'and have found them all to be of good quality. The product '
              'looks more like a stew than a processed meat and it smells '
              'better. My Labrador is finicky and she appreciates this product '
              'better than  most.'},
  {'role': 'assistant', 'content': 'positive'},
  {'role': 'user',
   'content': 'Product arrived labeled as Jumbo Salted Peanuts...the peanuts '
              'were actually small sized unsalted. Not sure if this was an '
              'error or if the vendor intended to represent the product as '
              '"Jumbo".'},
  {'role': 'assistant', 'content': 'negative'},
  {'role': 'user',
   '

In [None]:
# Answer - perform text generation
start_time = time.time()
predictions = query(pipe_lm, test_data['chat'].tolist())
print(f'Time: {int(time.time() - start_time)} seconds')

print(predictions)
# print(*predictions[:5], sep = "\n\n")

  0%|          | 0/100 [00:00<?, ?it/s]

Time: 361 seconds
['positive', 'negative', 'negative', 'positive', 'negative', 'negative', 'negative', 'positive', 'positive', 'positive', 'positive', 'positive', 'positive', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'positive', 'positive', 'positive', 'positive', 'negative', 'neutral', 'positive', 'negative', 'positive', 'negative', 'positive', 'positive', 'negative', 'negative', 'negative', 'negative', 'negative', 'positive', 'negative', 'positive', 'neutral', 'negative', 'negative', 'positive', 'positive', 'positive', 'positive', 'positive', 'positive', 'negative', 'positive', 'positive', 'negative', 'negative', 'negative', 'negative', 'negative', 'positive', 'positive', 'positive', 'negative', 'positive', 'negative', 'negative', 'negative', 'negative', 'negative', 'positive', 'positive', 'negative', 'positive', 'neutral', 'negative', 'negative', 'negative', 'positive', 'negative', 'negative', 'positive', 'negative', 'positive', 'positive', 

In [None]:
# Answer - evaluate
accuracy = metrics.accuracy_score(test_data['sentiment'], predictions)
print(f'Accuracy: {accuracy}')

Accuracy: 0.94


## Aspect-based Sentiment Analysis (ABSA) using Large Language Models (LLMs)

### Load Datasets

(The datasets used for this tutorial are a subset of the [SemEval-2016 Task 5 dataset](https://aclanthology.org/S16-1002.pdf).)

In [None]:
train_data = pd.read_csv('https://raw.githubusercontent.com/UCREL/Session_5_Large_Language_Models/main/data/train-absa.csv')
test_data = pd.read_csv('https://raw.githubusercontent.com/UCREL/Session_5_Large_Language_Models/main/data/test-absa.csv')

print(f'train: {train_data.shape}')
print(f'test: {test_data.shape}')

train: (20, 2)
test: (100, 3)


### Define Pipeline

In [None]:
# model_id = "meta-llama/Meta-Llama-3-8B-Instruct"

# pipe_lm = pipeline(
#     "text-generation",
#     model=model_id,
#     model_kwargs={"torch_dtype": torch.bfloat16},
#     device_map="auto",
# )

In [None]:
# def query(pipe, inputs):
#   """
#   :param pipe: text-generation pipeline
#   :param model_folder_path: list of messages
#   :return: list
#   """
#   assistant_outputs = []

#   for out in tqdm(pipe(
#       inputs,
#       max_new_tokens=50,
#       pad_token_id = pipe.model.config.eos_token_id,
#   )):
#     assistant_outputs.append(out[0]["generated_text"][-1]['content'].strip())

#   return assistant_outputs

### Define Evaluation Approach

(Adpated from [LLM-Sentiment](https://github.com/DAMO-NLP-SG/LLM-Sentiment/blob/master/evaluate.py#L196))

In [None]:
def process_tuple_f1(labels, predictions, verbose=False):
    tp, fp, fn = 0, 0, 0
    epsilon = 1e-7
    for i in range(len(labels)):
        gold = set(labels[i])
        try:
            pred = set(predictions[i])
        except Exception:
            pred = set()
        tp += len(gold.intersection(pred))
        fp += len(pred.difference(gold))
        fn += len(gold.difference(pred))
    if verbose:
        print('-'*100)
        print(gold, pred)
    precision = tp / (tp + fp + epsilon)
    recall = tp / (tp + fn + epsilon)
    micro_f1 = 2 * (precision * recall) / (precision + recall + epsilon)
    return micro_f1

### zero-shot Prompting

In [None]:
def format_chat(row):
  return [
    {"role": "system", "content": "Please perform Unified Aspect-Based Sentiment Analysis task. Given the sentence, tag all (aspect, sentiment) pairs. Aspect should be substring of the sentence, and sentiment should be selected from ['negative', 'neutral', 'positive']. If there are no aspect-sentiment pairs, return an empty list. Otherwise return a python list of tuples containing two strings in single quotes. Please return python list only, without any other comments or texts."},
    {"role": "user", "content": row['text']}
  ]

test_data.loc[:, 'chat'] = test_data.apply(format_chat, axis=1)
pprint(test_data.loc[:1, 'chat'].tolist(), sort_dicts=False)

[[{'role': 'system',
   'content': 'Please perform Unified Aspect-Based Sentiment Analysis task. '
              'Given the sentence, tag all (aspect, sentiment) pairs. Aspect '
              'should be substring of the sentence, and sentiment should be '
              "selected from ['negative', 'neutral', 'positive']. If there are "
              'no aspect-sentiment pairs, return an empty list. Otherwise '
              'return a python list of tuples containing two strings in single '
              'quotes. Please return python list only, without any other '
              'comments or texts.'},
  {'role': 'user',
   'content': 'The atmosphere is aspiring , and the decor is festive and '
              'amazing .'}],
 [{'role': 'system',
   'content': 'Please perform Unified Aspect-Based Sentiment Analysis task. '
              'Given the sentence, tag all (aspect, sentiment) pairs. Aspect '
              'should be substring of the sentence, and sentiment should be '
              "

In [None]:
start_time = time.time()
predictions = query(pipe_lm, test_data['chat'].tolist())
print(f'Time: {int(time.time() - start_time)} seconds')

# print(predictions)
print(*predictions[:5], sep = "\n\n")

  0%|          | 0/100 [00:00<?, ?it/s]

Time: 1194 seconds
["'atmosphere', 'positive'", "'decor', 'positive'"]

[("staff", "negative")]

["dessert", 'positive']

[("Dungeness crabs", 'positive')]

["exotic fish", 'negative'], ["Fancy pieces", 'negative']


In [None]:
micro_f1 = process_tuple_f1(test_data['label_text'], predictions)
print(f'Micro F1: {micro_f1: .4f}')

Micro F1:  0.7139


### Few-shot Prompting

In [None]:
few_shot_data = train_data.head(2)
few_shots = []
for fs_index, fs_row in few_shot_data.iterrows():
  few_shots.append({"role": "user", "content": fs_row['text']})
  few_shots.append({"role": "assistant", "content": fs_row['label_text']})

few_shots

[{'role': 'user',
  'content': "However , if you want great food at a great price and do n't mind the decor , you ca n't beat this place ."},
 {'role': 'assistant',
  'content': "[('food', 'positive'), ('decor', 'neutral')]"},
 {'role': 'user',
  'content': "When the bill came , nothing was comped , so I told the manager very politely that we were willing to pay for the wine , but I did n't think I should have to pay for food with a maggot in it ."},
 {'role': 'assistant', 'content': '[]'}]

In [None]:
def format_chat(row):
  system_message = [{"role": "system", "content": "Please perform Unified Aspect-Based Sentiment Analysis task. Given the sentence, tag all (aspect, sentiment) pairs. Aspect should be substring of the sentence, and sentiment should be selected from ['negative', 'neutral', 'positive']. If there are no aspect-sentiment pairs, return an empty list. Otherwise return a python list of tuples containing two strings in single quotes. Please return python list only, without any other comments or texts."}]
  user_message = [{"role": "user", "content": row['text']}]
  return system_message + few_shots + user_message

test_data.loc[:, 'chat'] = test_data.apply(format_chat, axis=1)
pprint(test_data.loc[:1, 'chat'].tolist(), sort_dicts=False)

[[{'role': 'system',
   'content': 'Please perform Unified Aspect-Based Sentiment Analysis task. '
              'Given the sentence, tag all (aspect, sentiment) pairs. Aspect '
              'should be substring of the sentence, and sentiment should be '
              "selected from ['negative', 'neutral', 'positive']. If there are "
              'no aspect-sentiment pairs, return an empty list. Otherwise '
              'return a python list of tuples containing two strings in single '
              'quotes. Please return python list only, without any other '
              'comments or texts.'},
  {'role': 'user',
   'content': "However , if you want great food at a great price and do n't "
              "mind the decor , you ca n't beat this place ."},
  {'role': 'assistant',
   'content': "[('food', 'positive'), ('decor', 'neutral')]"},
  {'role': 'user',
   'content': 'When the bill came , nothing was comped , so I told the manager '
              'very politely that we were will

In [None]:
start_time = time.time()
predictions = query(pipe_lm, test_data['chat'].tolist())
print(f'Time: {int(time.time() - start_time)} seconds')

# print(predictions)
print(*predictions[:5], sep = "\n\n")

  0%|          | 0/100 [00:00<?, ?it/s]

Time: 853 seconds
[('atmosphere', 'positive'), ('decor', 'positive')]

[]

[]

[('Dungeness crabs', 'positive')]

[('fish', 'negative')]


In [None]:
micro_f1 = process_tuple_f1(test_data['label_text'], predictions)
print(f'Micro F1: {micro_f1: .4f}')

Micro F1:  0.8168


## Further Exploration

**Research Papers**

[1] Zhang, W., Deng, Y., Liu, B., Pan, S. and Bing, L., 2024, June. [Sentiment Analysis in the Era of Large Language Models: A Reality Check](https://aclanthology.org/2024.findings-naacl.246/). In *Findings of the Association for Computational Linguistics: NAACL 2024* (pp. 3881-3906).

[2] Krugmann, J.O. and Hartmann, J., 2024. [Sentiment Analysis in the Age of Generative AI](https://link.springer.com/article/10.1007/s40547-024-00143-4). *Customer Needs and Solutions, 11(1)*, p.3.

**Tutorials**

* [Fine-Tuning LLMs : Overview, Methods, and Best Practices](https://www.turing.com/resources/finetuning-large-language-models)
* [Fine-Tuning Llama 3 and Using It Locally: A Step-by-Step Guide](https://www.datacamp.com/tutorial/llama3-fine-tuning-locally)