<div class="alert alert-block alert-info">
<b>Note:</b>
This notebook provides a basic walkthrough the administering of a questionnaire to HF model either using a <a href="#locally">model loaded locally</a>, or <a href="#api">through the API</a>.

This notebook is made for developping and showcasing endeavors. Yet, the model used here (locally) is most likely not powerful enough to provide meaningful answers to the prompts.
</div>

In [1]:
%load_ext autoreload
%autoreload 2

import sys
sys.path.append("../")
from llm.administer_llm import *
from questionnaire import *

# Load Questionnaire

In [2]:
capitals_qa = TFQuestionnaire.from_json(
    "../data/assert_capital_cities.json",
    **{
        "prompt_template": "Provide only the index of the correct answer.\n{question}\n{choices}\nYour answer:"
    }
)

In [3]:
print(capitals_qa.make_prompts()[0])

Provide only the index of the correct answer.
The capital city of Belgium is Paris.
A. True
B. False
Your answer:


# locally hosted LLM <a class="anchor" id="locally"></a>

In [4]:
lab_hf = AdministerHF(
    questionnaire=capitals_qa,
    model_id="cerebras/Cerebras-GPT-111M",
    logits_based = True,
    local=True,
    generation_args={
        "max_new_tokens":128
    }
)

## what's going on under the hood?

In [5]:
generated_response = lab_hf.generation_method(
    capitals_qa.make_prompts()[0]
) # outputs logits 
print(generated_response)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


tensor([[ 0.9662,  2.0158,  1.5350,  ..., -7.1803, -6.4043,  7.5340]])


In [6]:
probs = lab_hf.output_parser(
    generated_response,
    capitals_qa.get_choices_keys()[0],
) # retrieves compute on "choice" tokens and normalize
print(probs)

{'A': [32], 'B': [33]}
{'A': np.float32(0.3759568), 'B': np.float32(0.62404317)}


  soft_m = torch.nn.functional.softmax(logits).to('cpu')[0]


## `run`

In [7]:
results = lab_hf.run()

print(results)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generati

{'A': [32], 'B': [33]}
{'A': [32], 'B': [33]}
{'A': [32], 'B': [33]}
{'A': [32], 'B': [33]}
{'A': [32], 'B': [33]}
{'Correct': np.float32(1.9925859), 'Incorrect': np.float32(3.007414)}


# HF API <a class="anchor" id="api"></a>

In [4]:
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"

lab_hf = AdministerHF(
    questionnaire=capitals_qa,
    model_id=model_id,
    logits_based = False,
    local=False,
    store_answers=True
)

In [5]:
results = lab_hf.run()

print(results)



{'Correct': 5, 'Incorrect': 0}


In [6]:
lab_hf.answers

[{'A': 0, 'B': 1},
 {'A': 1, 'B': 0},
 {'A': 1, 'B': 0},
 {'A': 0, 'B': 1},
 {'A': 0, 'B': 1}]

In [7]:
for p, a in zip(capitals_qa.make_prompts(), lab_hf.generated_responses):
    print(">>>>>>>>>>>>>>")
    print(p)
    print(a)

>>>>>>>>>>>>>>
Provide only the index of the correct answer.
The capital city of Belgium is Paris.
A. True
B. False
Your answer:
B
>>>>>>>>>>>>>>
Provide only the index of the correct answer.
The capital city of Angola is Luanda.
A. True
B. False
Your answer:
A

(Note: A is the index of the correct answer, which is "True".)
>>>>>>>>>>>>>>
Provide only the index of the correct answer.
The capital city of New-Zealand is Wellington.
A. True
B. False
Your answer:
A.
>>>>>>>>>>>>>>
Provide only the index of the correct answer.
The capital city of India is Katmandou.
A. True
B. False
Your answer:
B.
>>>>>>>>>>>>>>
Provide only the index of the correct answer.
The capital city of Canada is Toronto.
A. True
B. False
Your answer:
B
