# SILLM Tutorial 1

In this notebook, you will:
- download a text-based dataset which has *labeled* examples. For instance, a sentiment analysis dataset with tweets and their corresponding sentiment label.
- you will define a function that calls a large language model to prompt it to label the dataset with predicted sentiment. You can do in this in a few different modes.
- You will investigate the LLM's predicted labels

## 1. Get the data

In [None]:
import pandas as pd

As an example, we will use one of the first hate speech datasets, specifically: https://github.com/t-davidson/hate-speech-and-offensive-language/tree/master

from the paper '[Automated Hate Speech Detection and the Problem of Offensive Language](https://ojs.aaai.org/index.php/ICWSM/article/view/14955)' from 2017.

In [None]:
data_link = 'https://raw.githubusercontent.com/t-davidson/hate-speech-and-offensive-language/master/data/labeled_data.csv'
dataset = pd.read_csv(data_link)

In [None]:
dataset

Unnamed: 0.1,Unnamed: 0,count,hate_speech,offensive_language,neither,class,tweet
0,0,3,0,0,3,2,!!! RT @mayasolovely: As a woman you shouldn't...
1,1,3,0,3,0,1,!!!!! RT @mleew17: boy dats cold...tyga dwn ba...
2,2,3,0,3,0,1,!!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...
3,3,3,0,2,1,1,!!!!!!!!! RT @C_G_Anderson: @viva_based she lo...
4,4,6,0,6,0,1,!!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...
...,...,...,...,...,...,...,...
24778,25291,3,0,2,1,1,you's a muthaf***in lie &#8220;@LifeAsKing: @2...
24779,25292,3,0,1,2,2,"you've gone and broke the wrong heart baby, an..."
24780,25294,3,0,3,0,1,young buck wanna eat!!.. dat nigguh like I ain...
24781,25295,6,0,6,0,1,youu got wild bitches tellin you lies


class = class label for majority of CF ('crowd flower' a noew defunct crowd sourcing site) users.

0 - hate speech 1 - offensive language 2 - neither


## 2. Make the Prompt

In [None]:
def make_prompt(task, options, instance, **kwargs):
    options_str = '' # options ---> all possible labels
    for i in range(len(options)):
        options_str = options_str + ' %d) %s' %(i+1, options[i])
    prompt = 'Given a piece of text, you have to label whether it is %s or not.\
    Please return one of the following options with only the text and no number:%s.'\
    %(task, options_str)

    if kwargs['zero_shot']:
        return prompt + ' What is the label of this text: "' + instance+ '"'
    else: # for few-shot
        examples_str = ''
    for example in kwargs['examples']:
        examples_str = examples_str + 'text: %s, label: %s\n' %(example[0], example[1])
    return prompt + ' Here are some examples of instances and their labels:\
    \n%sWhat is the label of this text: ' %(examples_str) + instance

In [None]:
task = 'hate speech'
options = ['hate', 'not hate']
examples = [] # the first two instances of hate speech in the dataset are used as few-shot examples
for _, row in dataset.iterrows():
    if row['class'] == 0:
        examples.append([row['tweet'], 'hate'])
    if len(examples) == 2:
        break
instance = dataset['tweet'].values[90]
instance

'"@CCobey: @AydanMcCoy happy birthday nigs" Thanks yo'

In [None]:
make_prompt(task, options, instance, zero_shot = True, examples = examples)

'Given a piece of text, you have to label whether it is hate speech or not.    Please return one of the following options with only the text and no number: 1) hate 2) not hate. What is the label of this text: ""@CCobey: @AydanMcCoy happy birthday nigs" Thanks yo"'

In [None]:
print(make_prompt(task, options, instance, zero_shot = False, examples = examples))

Given a piece of text, you have to label whether it is hate speech or not.    Please return one of the following options with only the text and no number: 1) hate 2) not hate. Here are some examples of instances and their labels:    
text: "@Blackman38Tide: @WhaleLookyHere @HowdyDowdy11 queer" gaywad, label: hate
text: "@CB_Baby24: @white_thunduh alsarabsss" hes a beaner smh you can tell hes a mexican, label: hate
What is the label of this text: "@CCobey: @AydanMcCoy happy birthday nigs" Thanks yo


In [None]:
prompt = make_prompt(task, options, instance, zero_shot = False, examples = examples)

## 3. Call the LLM with the prompt

In [None]:
runs = 3 # specify how many labels we want per instance.

First, we try with a commercial model like ChatGPT using our API key.

In [None]:
# ! pip install openai

In [None]:
import openai
openai.api_base="http://91.107.239.71:80" #"http://127.0.0.1:8000"
openai.api_key="" # enter you API key here

# list models
# models = openai.Model.list()
# models

In [None]:
responses = openai.ChatCompletion.create(model="gpt-3.5-turbo",
                                         messages=[{"role": "user", "content": prompt}],
                                         max_tokens = 2,
                                         n=runs)

In [None]:
responses

<OpenAIObject chat.completion id=chatcmpl-8DHEnuriAJBXfOzBr9Ax3NcOsmOec at 0x79a89c34ac50> JSON: {
  "id": "chatcmpl-8DHEnuriAJBXfOzBr9Ax3NcOsmOec",
  "object": "chat.completion",
  "created": 1698175217,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "hate"
      },
      "finish_reason": "length"
    },
    {
      "index": 1,
      "message": {
        "role": "assistant",
        "content": "hate"
      },
      "finish_reason": "length"
    },
    {
      "index": 2,
      "message": {
        "role": "assistant",
        "content": "hate"
      },
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 155,
    "completion_tokens": 6,
    "total_tokens": 161
  }
}

In [None]:
[i['message']['content'] for i in responses['choices']]

['hate', 'hate', 'hate']

Now let us try the same thing, but with a open source model like Flan-T5.

In [None]:
# ! pip install transformers

In [None]:
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-xl")
tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-xl", max_new_tokens = 500)
model.cuda()
inputs = tokenizer("A step by step recipe to make bolognese pasta:",
                   return_tensors="pt").to("cuda:0")
outputs = model.generate(**inputs)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



['In a large saucepan, combine the ground beef, onion, garlic, tomato paste, tomato']


In [None]:
inputs = tokenizer(prompt, return_tensors="pt").to("cuda:0")
outputs = model.generate(**inputs)
responses = []
for n in range(0, runs):
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda:0")
    outputs = model.generate(**inputs)
    responses.append(tokenizer.batch_decode(outputs, skip_special_tokens=True)[0])

In [None]:
responses

['hate', 'hate', 'hate']

Now do this for all the instances in your dataset.
**Hint**: Use a loop over your dataframe. When doing few-shot labeling, make sure that the examples are not the same as the instance to be labeled.

- Try both zero-shot and few-shot and compare their performance.
- Try both ChatGPT and Flan-T5 small
- Try to get the label from the LLM output. Is it always as expected and can it always be used as is for quantitative analysis?
- At least for the first 50 instances in your dataset, use metrics like accuracy and F1 score to assess the performance of the LLMs against the true ground truth label.

Bonus:
- try varying the wording of the prompts
- try giving an explicit definition of the task in the prompt