# Assignment 2
Bradley Thompson - CS 510 Large Language Models PDX Winter 2024

## Experimental Setting 1
To start, I simply tried to set up my test bed for model performance. I missed the class where we disussed breaking down the model output
by getting the log probability of the result tokens, and while it makes sense in theory, I don't know how to do it in practice. So, I instead
decided to come up with a simple approach for checking on the generated output labels, by simply checking for substring presence and throwing out samples where a category can't be found. This is sub-par becuase it doesn't account for cases where multiple labels are selected in rambling output, and because it doesn't account for malformed output which might resemble a category option closely. I tried to remediate this at least slightly be limiting the number of new tokens generated, so that multiple labels were unlikely to be produced.

In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer
MODELS = (
    "bigscience/bloom-560m",
    "bigscience/bloom-1b1",
    "bigscience/bloom-1b7",
    "bigscience/bloomz-560m",
    "bigscience/bloomz-1b1",
    "bigscience/bloomz-1b7",
)
POSITIVE="positive"
NEUTRAL="neutral"
NEGATIVE="negative"
LABEL_TO_ID={
    0: NEGATIVE,
    1: NEUTRAL,
    2: POSITIVE,
}

DEFAULT_PROMPT = "Asign any label of these options ['positive', 'negative', 'neutral'] to the following text: '%s'. Label: "

model_name = MODELS[0]
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

In [3]:
from datasets import load_dataset
dataset = load_dataset("cardiffnlp/tweet_sentiment_multilingual", "english")
sample = 2
negative_texts = [ sample for sample in dataset["train"] if LABEL_TO_ID[sample["label"]] == NEGATIVE ]
#text = dataset["train"][sample]["text"]
text = negative_texts[2]
text

{'text': 'it looks like a beautiful night to throw myself off the Brooklyn Bridge ---@Tim_Hecht ',
 'label': 0}

In [4]:
dataset["train"][:10]

{'text': ['okay i\\u2019m sorry but TAYLOR SWIFT LOOKS NOTHING LIKE JACKIE O SO STOP COMPARING THE TWO. c\\u2019mon America aren\\u2019t you sick of her yet? (sorry) ',
  '@user the DC comics site has Batman 44 releases on the 9th but its out now? ',
  '"Frank Gaffrey\\u002c Cliff May\\u002c Steve Emerson: Brilliant. \\""""Looming Threats: Iran\\u002c Hezbollah Hamas\\"""" is the best #cufidc session I\\u2019ve had thus far." ',
  'The tragedy of only thinking up hilarious tweets for the Summer Olympics now is that in four years there may be no place for them. ',
  '"Oliseh meets with Victor Moses in London: Super Eagles coach, Sunday Oliseh, has met with Nigeria and Chelsea ... ',
  '"People always forget the fact that Shawn achieved so much in the age of 16 like his 1st single, EP and FIRST album ALL went #1 on charts" ',
  'it looks like a beautiful night to throw myself off the Brooklyn Bridge ---@Tim_Hecht ',
  '@user WAIT WHAT?!?! SCOTUS makes laws!?!? since when? have i been lie

In [5]:
from transformers import GenerationConfig

NEW_TOKENS = 5

config = {
    "min_new_tokens": 1,
    "max_new_tokens": NEW_TOKENS,
    "use_cache": False,
    # "do_sample": True,
    # "top_k": 2,
}

def output_raw_classification(prompt: str) -> str:
    inputs = tokenizer.encode(prompt, return_tensors="pt")
    gen_config: GenerationConfig = GenerationConfig.from_dict(config)
    output = model.generate(inputs, gen_config)[0]
    return tokenizer.decode(output[-NEW_TOKENS:])

target_dataset = dataset["train"]

total_count = len(target_dataset)
p = 0
tp = 0
neg = 0
tneg = 0
neut = 0
tneut = 0
for i, sample in enumerate(target_dataset):
    if i == 5: # TODO REMOVE
        break
    input = sample["text"]
    label = sample["label"]
    print(LABEL_TO_ID[label])
    output = output_raw_classification(DEFAULT_PROMPT % input).lower()
    print(output)
    if POSITIVE in output:
        p += 1
        if LABEL_TO_ID[label] == POSITIVE:
            tp += 1
    elif NEGATIVE in output:
        neg += 1
        if LABEL_TO_ID[label] == NEGATIVE:
            tneg += 1
    elif NEUTRAL in output:
        neut += 1
        if LABEL_TO_ID[label] == NEUTRAL:
            tneut += 1
    print(f"Step {i}: categorized {p + neg + neut} / {total_count} | pos {p} ; neg {neg} ; neut {neut}")

negative
 'positive', '
Step 0: categorized 1 / 1839 | pos 1 ; neg 0 ; neut 0
neutral
 @user the dc com
Step 1: categorized 1 / 1839 | pos 1 ; neg 0 ; neut 0
positive
 'positive', '
Step 2: categorized 2 / 1839 | pos 2 ; neg 0 ; neut 0
negative
 'positive', label
Step 3: categorized 3 / 1839 | pos 3 ; neg 0 ; neut 0
neutral
 'positive', '
Step 4: categorized 4 / 1839 | pos 4 ; neg 0 ; neut 0


In [7]:
fp = p - tp
fneg = neg - tneg
fneut = neut - tneut
p_precision = tp / p
p_recall = tp / len([ label for label in target_dataset[:5]["label"] if LABEL_TO_ID[label] == POSITIVE ])
print(f"pos prec {p_precision} pos rec {p_recall}")

pos prec 0.25 pos rec 1.0
