# Ask LLM a question. Here we go with LLAMA-7b, you can also try with other LLMs.

In [1]:
from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"

llm="google/flan-t5-xxl"

tokenizer = T5Tokenizer.from_pretrained(llm)
model = T5ForConditionalGeneration.from_pretrained(llm)
model.to(device)

  from .autonotebook import tqdm as notebook_tqdm
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [04:13<00:00, 50.69s/it]


T5ForConditionalGeneration(
  (shared): Embedding(32128, 4096)
  (encoder): T5Stack(
    (embed_tokens): Embedding(32128, 4096)
    (block): ModuleList(
      (0): T5Block(
        (layer): ModuleList(
          (0): T5LayerSelfAttention(
            (SelfAttention): T5Attention(
              (q): Linear(in_features=4096, out_features=4096, bias=False)
              (k): Linear(in_features=4096, out_features=4096, bias=False)
              (v): Linear(in_features=4096, out_features=4096, bias=False)
              (o): Linear(in_features=4096, out_features=4096, bias=False)
              (relative_attention_bias): Embedding(32, 64)
            )
            (layer_norm): T5LayerNorm()
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (1): T5LayerFF(
            (DenseReluDense): T5DenseGatedActDense(
              (wi_0): Linear(in_features=4096, out_features=10240, bias=False)
              (wi_1): Linear(in_features=4096, out_features=10240, bias=False)
     

In [2]:
prompt='''INSTRUCTION: Please give answers to the following questions about knowledge. 

Question: who has been ranked no. 1 in the latest football rankings announced by fifa?
Answer: Argentina has been ranked no. 1 in the latest football rankings announced by fifa.

Question: who sings i just want to use your love tonight?
Answer: English rock band the Outfield sings i just want to use your love tonight.

Question: where was the movie the glass castle filmed?
Answer: The movie the glass castle was filmed in Welch, West Virginia.

Question: who was the first lady nominated member of the rajya sabha?
Answer: Mary Kom was the first lady nominated member of the rajya sabha.

Question: what is the tigers name in life of pi?
Answer: Richard Parker is the tigers name in life of pi.

Question: {Q}
Answer:'''


In [3]:
question = "who is super bowl 2018 half time show?"

In [4]:
prompted_input=prompt.replace("{Q}", question)
model_inputs = tokenizer(prompted_input, return_tensors="pt").to(device)
greedy_output = model.generate(**model_inputs,max_length=512)
pred=tokenizer.decode(greedy_output[0],skip_special_tokens=True)
print(pred)

Justin Timberlake


# Behavior Consistency

1. generate distractor (In this project, we use a vocab-based method as illustrated in the papers. But definitly, you can always prompt chatgpt to do this)
2. Make MCQs Test
3. Test the LLM

In [5]:
from distractor_generator import distractor_generate
max_check_limit = 10
candidate_choices = distractor_generate(question, pred, limit=max_check_limit*4)

In [6]:
candidate_choices

['Mervyn King (darts player)',
 'William Randolph Hearst',
 'Arantxa Sánchez Vicario',
 'Evonne Goolagong Cawley',
 'David Gray (musician)',
 'Yevgeny Kafelnikov',
 'Sabrina Santamaria',
 'Gabrielle (singer)',
 'Kaitlyn Christian',
 'Sarkodie (rapper)',
 'Billie Jean King',
 'David Williamson',
 'Fernando Alonso',
 'Gustavo Kuerten',
 'David Coulthard',
 'Joseph Pulitzer',
 'Jonathan Palmer',
 'Victor Gollancz',
 'Jonas Björkman',
 'Lleyton Hewitt',
 'Lewis Hamilton',
 'Jackie Stewart',
 'Margaret Court',
 'Martin Gardner',
 'Gerhard Berger',
 'Feist (singer)',
 'Margaret Busby',
 'Mohammad Hatta',
 'Rajan–Nagendra',
 'Richie Burnett',
 'Jimmy Connors',
 'Stefan Edberg',
 'Stirling Moss',
 'Dinara Safina',
 'Thomas Muster',
 'Mats Wilander',
 'Jerry Douglas',
 'Michael Tabor',
 "Mark O'Connor",
 'Ronnie Baxter']

In [7]:
import random
mcq_num = 0
choice_item=['A','B','C','D']
used_choices=[]
tests=[]
answers=[]
while mcq_num<max_check_limit:
    choices = random.sample(candidate_choices, 3)
    if sorted(choices) in used_choices:
        continue
    used_choices.append(sorted(choices))
    choices = choices+[pred]
    random.shuffle(choices)
    random.shuffle(choices)
    answers.append(choice_item[choices.index(pred)])
    tests.append('%s\nA) %s\nB) %s\nC) %s\nD) %s\nE) None of above.'%(question,choices[0],choices[1],choices[2],choices[3]))

    mcq_num=mcq_num+1

In [8]:
for i, t in enumerate(tests):
    print(t)
    print(answers[i])
    print()

who is super bowl 2018 half time show?
A) Mats Wilander
B) Ronnie Baxter
C) David Gray (musician)
D) Justin Timberlake
E) None of above.
D

who is super bowl 2018 half time show?
A) Billie Jean King
B) Victor Gollancz
C) Justin Timberlake
D) Rajan–Nagendra
E) None of above.
C

who is super bowl 2018 half time show?
A) Jonas Björkman
B) Gabrielle (singer)
C) Dinara Safina
D) Justin Timberlake
E) None of above.
D

who is super bowl 2018 half time show?
A) Joseph Pulitzer
B) Justin Timberlake
C) Stefan Edberg
D) Fernando Alonso
E) None of above.
B

who is super bowl 2018 half time show?
A) Justin Timberlake
B) Stirling Moss
C) Evonne Goolagong Cawley
D) Sarkodie (rapper)
E) None of above.
A

who is super bowl 2018 half time show?
A) Evonne Goolagong Cawley
B) Justin Timberlake
C) Mark O'Connor
D) Kaitlyn Christian
E) None of above.
B

who is super bowl 2018 half time show?
A) Kaitlyn Christian
B) Lewis Hamilton
C) Justin Timberlake
D) Evonne Goolagong Cawley
E) None of above.
C

who is su

In [9]:
prompt='''INSTRUCTION: Please give answers to the following multi-choice questions about knowledge.

Question: who has been ranked no. 1 in the latest football rankings announced by fifa?
A) Germany has been ranked no. 1 in the latest football rankings announced by fifa.
B) India has been ranked no. 1 in the latest football rankings announced by fifa.
C) Canada has been ranked no. 1 in the latest football rankings announced by fifa.
D) Austria has been ranked no. 1 in the latest football rankings announced by fifa.
E) None of above.
Answer: E

Question: who sings i just want to use your love tonight?
A) Latin rock band the Outfield sings i just want to use your love tonight.
B) English Power pop band the Outfield sings i just want to use your love tonight.
C) English rock band the Outfield sings i just want to use your love tonight.
D) English melodic sensibility band the Outfield sings i just want to use your love tonight.
E) None of above.
Answer: C

Question: where was the movie the glass castle filmed?
A) The movie the glass castle was filmed in London.
B) The movie the glass castle was filmed in Welch, West Virginia.
C) The movie the glass castle was filmed in Philadelphia.
D) The movie the glass castle was filmed in Budapest.
E) None of above.
Answer: B

Question: who was the first lady nominated member of the rajya sabha?
A) William Randolph Hearst was the first lady nominated member of the rajya sabha.
B) Jesse Speight was the first lady nominated member of the rajya sabha.
C) Thurlow Weed was the first lady nominated member of the rajya sabha.
D) Mary Kom was the first lady nominated member of the rajya sabha.
E) None of above.
Answer: D

Question: what is on a mcchicken sandwich from mcdonalds?
A) A breaded chicken patty is on a mcchicken sandwich from mcdonalds.
B) A Hot dog chicken patty is on a mcchicken sandwich from mcdonalds.
C) A breaded Bacon is on a mcchicken sandwich from mcdonalds.
D) A breaded Teriyaki chicken is on a mcchicken sandwich from mcdonalds.
E) None of above.
Answer: A

Question: {Q}
Answer:'''

In [10]:
behave_pred=[]
for mc_q in tests:
    prompted_input=prompt.replace("{Q}", mc_q)
    model_inputs = tokenizer(prompted_input, return_tensors="pt",max_length=1024).to(device)
    greedy_output = model.generate(**model_inputs,max_length=1024)
    behave_pred.append(tokenizer.decode(greedy_output[0],skip_special_tokens=True))

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


In [11]:
behave_pred

['D', 'C', 'D', 'B', 'A', 'B', 'C', 'D', 'C', 'D']

In [12]:
answers

['D', 'C', 'D', 'B', 'A', 'B', 'C', 'D', 'C', 'D']

In [13]:
if behave_pred != answers:
    BC_score = 0
else:
    BC_score = 1

print(BC_score)

1
