# Stance and Sentiment Detection with LLaMa-2

This notebook finetunes base LLaMa-2 7B to perform stance and sentiment analysis. In particular, the model will perform the following task:

* Given an excerpt of a news article as input as a prompt
* Answer
    1. What is the sentiment of the excerpt [Negative / Neutral / Positive]?
    2. What is the sentiment score assignable to this excerpt between -1.0 and 1.0?
    3. What is the reason behind the sentiment classification and score? Mention linguistic characteristics from the excerpt.
    4. What is the stance of the excerpt [Against-country / Impartial / Pro-country]?
    5. What is the stance score assignable to this excerpt between -1.0 and 1.0?
    6. What is the reason behind the stance classification and score? Mention linguistic characteristics from the excerpt.

## Setup

Installing dependencies and connecting huggingface through access token.

In [None]:
%%capture
%pip install autotrain-advanced
%pip install huggingface_hub

In [None]:
%%capture
%autotrain setup --update-torch

In [None]:
!autotrain llm \
--train \
--project_name "llama-2-7B" \
--model "TinyPixel/Llama-2-7B-bf16-sharded" \
--data_path /content/drive/MyDrive/News\ Data/train \
--text_column text \
--use_peft \
--use_int4 \
--learning_rate 2e-4 \
--train_batch_size 16 \
--num_train_epochs 20 \
--trainer sft \
--model_max_length 1024 \
--block_size 1024 > training.log &

> [1mINFO    Running LLM[0m
> [1mINFO    Params: Namespace(version=False, train=True, deploy=False, inference=False, data_path='/content/drive/MyDrive/News Data/train', train_split='train', valid_split=None, text_column='text', rejected_text_column='rejected', prompt_text_column='prompt', model='TinyPixel/Llama-2-7B-bf16-sharded', model_ref=None, learning_rate=0.0002, num_train_epochs=20, train_batch_size=16, warmup_ratio=0.1, gradient_accumulation_steps=1, optimizer='adamw_torch', scheduler='linear', weight_decay=0.0, max_grad_norm=1.0, seed=42, add_eos_token=False, block_size=1024, use_peft=True, lora_r=16, lora_alpha=32, lora_dropout=0.05, logging_steps=-1, project_name='llama-2-7B', evaluation_strategy='epoch', save_total_limit=1, save_strategy='epoch', auto_find_batch_size=False, fp16=False, push_to_hub=False, use_int8=False, model_max_length=1024, repo_id=None, use_int4=True, trainer='sft', target_modules=None, merge_adapter=False, token=None, backend='default', username=None,

In [None]:
%cp -r /content/drive/MyDrive/model_dir/llama-2-7B /content/

In [None]:
from transformers import LlamaModel, AutoModelForCausalLM, AutoTokenizer
from torch.nn import DataParallel
import time
import torch
import gc

In [None]:
tokenizer = AutoTokenizer.from_pretrained("/content/llama-2-7B")

In [None]:
model = AutoModelForCausalLM.from_pretrained("/content/llama-2-7B").to('cuda:0')

Loading checkpoint shards:   0%|          | 0/14 [00:00<?, ?it/s]

### Testing Model on Data

In [None]:
texts = ["""###Human:
As a neutral news analyst, assess the sentiment and stance of the news article excerpt and assign a score between -1.0 (completely negative/against-canada) and 1.0 (completely positive/pro-canada) for both sentiment and stance. Provide a single short sentence to justify your scores, drawing on the article's language, tone, and presentation to support your analysis.

Article Excerpt:
- Title: "Calgary man faces police, bylaw charges following hate-motivated library protest"
- Content: "Calgary police say a 36-year-old man is facing a total of eight charges in a hate-motivated crime. Police say protesters disrupted a Reading with Royalty event early last month at the Seton Library in the citys southeast. The family-friendly storytimes at public libraries are led by local drag queens or kings, and children are invited to dress in their best outfit, cape or crown. Police say behaviour of protesters outside these events has been getting worse in recent weeks. Officers were called around 11 a.m. on Feb. 25 for reports that several people had aggressively entered a library classroom, shouting homophobic and transphobic slurs at the children and parents in attendance. Derek Scott Reimer has been arrested and charged with causing a disturbance and mischief. Reimer was due to make a court appearance Friday morning. Police say hate-motivated crimes are those where an offender was motivated by bias, prejudice or hate based on personal characteristics of the victim. 1:44 Calgary sees more protests agai..."

Output format:
1. Sentiment: [Positive/Neutral/Negative]
    * Score: [Your Score]
    * Reason: [Your Reason]
2. Stance: [Pro-canada/Impartial/Against-canada]
    * Score: [Your Score]
    * Reason: [Your Reason]

###Assistant:""", """As a neutral news analyst, assess the sentiment and stance of the news article excerpt and assign a score between -1.0 (completely negative/against-canada) and 1.0 (completely positive/pro-canada) for both sentiment and stance. Provide a single short sentence to justify your scores, drawing on the article's language, tone, and presentation to support your analysis.

Article Excerpt:
- Title: "OpenTable list of 100 most scenic restaurants in Canada includes 8 Okanagan locations"
- Content: "When it comes to restaurants, the Okanagan Valley is chock-full of eateries offering fantastic food and vibrant views. This week, though, eight Okanagan restaurants were singled out in a new online list of the top 100 most scenic restaurants in Canada. The list was created by OpenTable and posted on Thursday. OpenTable said its list was generated solely from diner reviews collected between June 1, 2018 and May 31, 2019. The online restaurant reservation service didnt rank the restaurants, instead listing them alphabetically. From breathtaking mountain gorges to seaports of the east coast and everything in between, the restaurants featured on the list offer the perfect backdrop for any occasion, OpenTable said. WATCH (July 7, 2019): Avenue Magazines top picks for steak restaurants in Calgary 4:07 Avenue Magazines top picks for steak restaurants in Calgary The eight Okanagan restaurants are: Kelowna Earls Kitchen and Bar Oak + Cru Social Kitchen and Wine Bar Oliver Miradoro at Tinhorn Creek Winery Sonora Room a..."

Output format:
1. Sentiment: [Positive/Neutral/Negative]
    * Score: [Your Score]
    * Reason: [Your Reason]
2. Stance: [Pro-canada/Impartial/Against-canada]
    * Score: [Your Score]
    * Reason: [Your Reason]"""] * 4

input_ids = tokenizer(texts,
                      return_tensors="pt",
                      padding=True).to(model.device)

output = model.generate(**input_ids,
                        max_length=1024,
                        temperature=0.3,
                        num_return_sequences=1,
                        do_sample=True)

print(tokenizer.decode(output[0], skip_special_tokens=True))
print(tokenizer.decode(output[1], skip_special_tokens=True))

###Human:
As a neutral news analyst, assess the sentiment and stance of the news article excerpt and assign a score between -1.0 (completely negative/against-canada) and 1.0 (completely positive/pro-canada) for both sentiment and stance. Provide a single short sentence to justify your scores, drawing on the article's language, tone, and presentation to support your analysis.

Article Excerpt:
- Title: "Calgary man faces police, bylaw charges following hate-motivated library protest"
- Content: "Calgary police say a 36-year-old man is facing a total of eight charges in a hate-motivated crime. Police say protesters disrupted a Reading with Royalty event early last month at the Seton Library in the citys southeast. The family-friendly storytimes at public libraries are led by local drag queens or kings, and children are invited to dress in their best outfit, cape or crown. Police say behaviour of protesters outside these events has been getting worse in recent weeks. Officers were called 

In [None]:
print("CUDA Memory Allocated:", torch.cuda.memory_allocated())
print("CUDA Memory Reserved:", torch.cuda.memory_reserved())
# del texts, input_ids, output
gc.collect()
torch.cuda.empty_cache()
time.sleep(5)
print("CUDA Memory Allocated:", torch.cuda.memory_allocated())
print("CUDA Memory Reserved:", torch.cuda.memory_reserved())

CUDA Memory Allocated: 37854492672
CUDA Memory Reserved: 40944795648
CUDA Memory Allocated: 37854492672
CUDA Memory Reserved: 40944795648


## Inferencing

In [None]:
import pandas as pd

In [None]:
COUNTRY = "canada"

df = pd.read_csv("/content/drive/MyDrive/News Data/llama2_inference_data/answer.csv")

In [None]:
df1 = df[(df.country == COUNTRY) & (df.source_type == "international")].sample(2500)
df2 = df[(df.country == COUNTRY) & (df.source_type == "local")].sample(1500)
df = pd.concat([df1, df2], axis=0)
df.drop_duplicates(inplace=True)
df=df.sample(frac=1)
del df1, df2
df.shape

(4000, 8)

In [None]:
stride = 6
start = time.time()

def elapsed_time():
      elapsed = (time.time() - start)
      s = int(elapsed % 60)
      m = int(((elapsed - s) // 60) % 60)
      h = int((elapsed - m * 60 - s) // 3600)
      return f"[{h:02}h {m:02}m {s:02}s]"

for i in range(0, df.shape[0], stride):
  texts = df.text.iloc[i:i+stride].apply(
      lambda x: f"###Human:\n{x}\n\n###Assistant:\n").to_list()

  input_ids = tokenizer(texts, return_tensors="pt", padding=True)\
      .to(model.device)

  output = model.generate(**input_ids,
                          max_length=1536,
                          num_return_sequences=1)

  del input_ids
  gc.collect()
  torch.cuda.empty_cache()

  for j in range(stride):
    if i + j >= df.shape[0]:
      break

    df.generated_text.iloc[i+j] = tokenizer.decode(
        output[j], skip_special_tokens=True)

  del output
  gc.collect()
  torch.cuda.empty_cache()
  time.sleep(2)

  df.to_csv(f"/content/output/answer_{COUNTRY}.csv",
            index=None)
  print(f"[{elapsed_time()}] Processed: upto {i+stride}")

[[00h 00m 16s]] Processed: upto 6
[[00h 00m 31s]] Processed: upto 12
[[00h 00m 44s]] Processed: upto 18
[[00h 00m 57s]] Processed: upto 24
[[00h 01m 12s]] Processed: upto 30
[[00h 01m 25s]] Processed: upto 36
[[00h 01m 37s]] Processed: upto 42
[[00h 01m 54s]] Processed: upto 48
[[00h 02m 07s]] Processed: upto 54
[[00h 02m 26s]] Processed: upto 60
[[00h 02m 41s]] Processed: upto 66
[[00h 02m 55s]] Processed: upto 72
[[00h 03m 08s]] Processed: upto 78
[[00h 03m 23s]] Processed: upto 84
[[00h 03m 36s]] Processed: upto 90
[[00h 03m 49s]] Processed: upto 96
[[00h 04m 01s]] Processed: upto 102
[[00h 04m 15s]] Processed: upto 108
[[00h 04m 31s]] Processed: upto 114
[[00h 04m 44s]] Processed: upto 120
[[00h 04m 58s]] Processed: upto 126
[[00h 05m 12s]] Processed: upto 132
[[00h 05m 24s]] Processed: upto 138
[[00h 05m 40s]] Processed: upto 144
[[00h 05m 53s]] Processed: upto 150
[[00h 06m 05s]] Processed: upto 156
[[00h 06m 20s]] Processed: upto 162
[[00h 06m 36s]] Processed: upto 168
[[00h 06m

In [None]:
%cp /content/output/answer_canada.csv /content/drive/MyDrive/News\ Data/llama2_inference_data/answer_canada.csv

NotImplementedError: ignored