## Experiment 01
### Richardson et. al (2002) Experiment 01 with Language Models instead of Humans:

The subjects were presented with a single page,
containing a list of the verbs and four pictures, labelled A to
D. Each one contained a circle and a square aligned along a
vertical or horizontal axis, connected by an arrow pointing
up, down, left or right. Since we didn't expect any
interesting item variation between left or right placement of
the circle or square, the horizontal schemas differed only in
the direction of the arrow.
For each sentence, subjects were asked to select one of
the four sparse images that best depicted the event described
by the sentence (Figure 1)
The items were randomised in three different orders, and
crossed with two different orderings of the images. The six
lists were then distributed randomly to subjects.

### Step 01: Creating a prompt to model the experimental conditions

In [35]:
action_word = "bombed"

prompt = "You are asked to select one of the four images that best depicts the event described by the following sentence. \
Image A: \
◯→▢ \
\
Image B: \
◯←▢ \
 \
Image C: \
◯ \
↑ \
▢ \
\
Image D: \
◯ \
↓ \
▢\
\
Sentence: \
◯ "+action_word+" ▢ \
\
The image that best describes the sentence is image "

prompt_A = "Of these four images: \
Image A: \
◯→▢ \
\
Image B: \
◯←▢ \
 \
Image C: \
◯ \
↑ \
▢ \
\
Image D: \
◯ \
↓ \
▢\
\
The one that best describes \"◯ "+action_word+" ▢\" is image "


prompt_B = "Select the image that best represents the event described by the sentence: "+action_word+"\n[◯→▢]\n\n[◯←▢]\n\n[◯\n↑\n▢]\n\n[◯\n↓\n▢]\n\nThe best representation is [◯"

prompt_C = "Choose the best image for the word:\nUP: ↑ \nDOWN: ↓ \nLEFT: → \nRIGHT: ← \n"+action_word.upper()+": "
prompt_D = "Choosing from UP, DOWN, LEFT and RIGHT, the word \'"+action_word.upper()+"\' is best respresented by the word "



# Prompt with mask token for MLM objective 
mask_token = "<mask>"
prompt_mlm = prompt+mask_token+" "


### Step 02: Test pipeline for CausalLM architecture

In [4]:
import torch
from transformers import GPT2Tokenizer, OPTForCausalLM, AutoTokenizer, AutoModelForCausalLM
from transformers import GPTNeoXForCausalLM, GPTNeoXTokenizerFast, GPTNeoForCausalLM, GPT2Tokenizer
import torch.nn.functional as F
import pandas as pd
import numpy as np
import seaborn as sns
from tqdm import tqdm
import json
from scipy import stats
from collections import Counter
import subprocess

with open("../../hf.key", "r") as f_in:
    hf_key = f_in.readline().strip()
subprocess.run(["huggingface-cli", "login", "--token", hf_key])

model_prefix = "EleutherAI/gpt-neox"
model_size = "20b"

model_prefix = "facebook/opt"
model_size = "13b"



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
Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: read).
Your token has been saved to /mounts/data/corp/huggingface/token
Login successful


In [21]:
#tokenizer = GPTNeoXTokenizerFast.from_pretrained(model_prefix+"-"+model_size, device_map="auto")
#model = GPTNeoXForCausalLM.from_pretrained(model_prefix+"-"+model_size, device_map="auto")

#tokenizer = GPT2Tokenizer.from_pretrained(model_prefix+"-"+model_size, device_map="auto")
#model = OPTForCausalLM.from_pretrained(model_prefix+"-"+model_size, device_map="auto")

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-13b-hf", use_auth_token=True, device_map="auto")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-13b-hf", use_auth_token=True, device_map="auto")

Downloading (…)okenizer_config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/610 [00:00<?, ?B/s]

Downloading (…)model.bin.index.json:   0%|          | 0.00/33.4k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading (…)l-00001-of-00003.bin:   0%|          | 0.00/9.95G [00:00<?, ?B/s]

Downloading (…)l-00002-of-00003.bin:   0%|          | 0.00/9.90G [00:00<?, ?B/s]

Downloading (…)l-00003-of-00003.bin:   0%|          | 0.00/6.18G [00:00<?, ?B/s]

The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/167 [00:00<?, ?B/s]

In [40]:
# Step 1: Tokenize the prompt
input_ids = tokenizer.encode(prompt_B, return_tensors="pt")

# Step 2: Generate the model input
max_length = input_ids.size(1) + 30  # Adjust '20' as needed to control the maximum length of the generated answer.
output = model.generate(input_ids, max_length=max_length, num_return_sequences=1)

# Step 3: Decode the generated output to get the answer
generated_answer = tokenizer.decode(output[0], skip_special_tokens=True)




In [41]:
print("Generated answer:\n\n", generated_answer)

Generated answer:

 Select the image that best represents the event described by the sentence: bombed
[◯→▢]

[◯←▢]

[◯
↑
▢]

[◯
↓
▢]

The best representation is [◯
↓
▢].

### Explanation

The image that best represents the event described by the sentence is a bomb


In [24]:
# Step 1: Tokenize the prompt
input_ids = tokenizer.encode("Answer the following question: What is the captial of Australia? The capital of Australia is", return_tensors="pt")

# Step 2: Generate the model input
max_length = input_ids.size(1) + 20  # Adjust '20' as needed to control the maximum length of the generated answer.
output = model.generate(input_ids, max_length=max_length, num_return_sequences=1)

# Step 3: Decode the generated output to get the answer
generated_answer = tokenizer.decode(output[0], skip_special_tokens=True)
print("Generated answer:", generated_answer)


Generated answer: Answer the following question: What is the captial of Australia? The capital of Australia is Canberra.
What is the capital of Australia? The capital of Australia is Canberra


### Step 03: Test pipline with logprobs 

In [25]:
from sidemethods import logprobs_from_prompt, proc, proc_lower, prob_of_ending, calculate_accuracy, calculate_accuracies, store_accuracies

In [26]:

#start = prompt
#start = prompt_B
#start = prompt_D

#answers = {0:"A ◯→▢", 1:"B ◯←▢", 2:"C ◯\n↑\n▢", 3:"D ◯\n↓\n▢"}
#answers = {0:"→▢]", 1:"←▢]", 2:"\n↑\n▢]", 3:"\n↓\n▢]"}
answers = {0:"UP", 1:"DOWN", 2:"LEFT", 3:"RIGHT"}

start = "nazis are known to be on the political "
answers = {0:"UP", 1:"DOWN", 2:"LEFT", 3:"RIGHT"}


res_ends = []
for j, end in answers.items():
    input_prompt = proc(start) + ' ' + proc(end)
    logprobs = logprobs_from_prompt(input_prompt, tokenizer, model)
    res = {"tokens": [x for x,y in logprobs],"token_logprobs": [y for x,y in logprobs]}
    res_ends.append(res)

In [27]:
choosen_answer = (-9999, "")
for i, answer in answers.items():
    choice_val = prob_of_ending(res_ends[i]['token_logprobs'], res_ends[0]['tokens'])
    if choice_val > choosen_answer[0]:
        choosen_answer = choice_val, answer


print(start)
print("Choice: ", choosen_answer[1])
print()

input_ids = tokenizer.encode(start, return_tensors="pt")
input_ids = input_ids.to('cuda')
output = model.generate(input_ids, max_length=max_length, num_return_sequences=1)
generated_answer = tokenizer.decode(output[0], skip_special_tokens=True)
print("Generated answer:", generated_answer)



nazis are known to be on the political 
Choice:  RIGHT

Generated answer: nazis are known to be on the political  right.
> > > > > > > > > > > > > > > > > > > > > > > > >
