In [1]:
import os
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

  from .autonotebook import tqdm as notebook_tqdm


In [5]:
model_name = "mistralai/Mistral-7B-v0.1"

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", load_in_4bit=True)
tokenizer = AutoTokenizer.from_pretrained(model_name)

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.
Loading checkpoint shards: 100%|██████████| 2/2 [00:32<00:00, 16.12s/it]


In [None]:
prompt = ""

## Soft Self-Consistency ##

1. Define input x (prompt with task description)
2. Generate k solutions using temperature-based sampling (model.generate())
3. Score the action $y_i$ using aggregated probability of the action's tokens
   1. Action $y$ is composed of tokens y_1 -> ... -> y_n
   2. score(y) = $f({P_{LM}(y_i|y_{<i}, x) \forall i \isin [1, n]})$
   3. More specifically, we can use either min, mean, or product.

In [90]:
input_tokens = tokenizer(prompt, return_tensors="pt").to(device)

temperature = 0.7
max_new_tokens = 100
num_samples = 5

outputs = model.generate(
    **input_tokens,
    do_sample=True,
    temperature=0.7,
    max_new_tokens=max_new_tokens,
    num_return_sequences=num_samples
)

logits = outputs.logits

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


KeyboardInterrupt: 

In [89]:
prompt = """Q: John found that the average of 15 numbers is 40. If 10 is added to each number then the mean of the
numbers is? Answer Choices: (a) 50 (b) 45 (c) 65 (d) 78 (e) 64
A: If 10 is added to each number, then the mean of the numbers also increases by 10. So the new mean
would be 50. The answer is (a).
Q: If a / b = 3/4 and 8a + 5b = 22,then find the value of a. Answer Choices: (a) 1/2 (b) 3/2 (c) 5/2 (d) 4/2 (e) 7/2
A: If a / b = 3/4, then b = 4a / 3. So 8a + 5(4a / 3) = 22. This simplifies to 8a + 20a / 3 = 22, which means 44a / 3 = 22. So a is equal to 3/2. The answer is (b).
Q: A person is traveling at 20 km/hr and reached his destiny in 2.5 hr then find the distance? Answer Choices: (a) 53 km (b) 55 km (c) 52 km (d) 60 km (e) 50 km
A: The distance that the person traveled would have been 20 km/hr * 2.5 hrs = 50 km. The answer is (e).
Q: How many keystrokes are needed to type the numbers from 1 to 500? Answer Choices: (a) 1156 (b) 1392 (c) 1480 (d) 1562 (e) 1788
A: There are 9 one-digit numbers from 1 to 9. There are 90 two-digit numbers from 10 to 99. There are 401 three-digit numbers from 100 to 500. 9 + 90(2) + 401(3) = 1392. The answer is (b).
Q: The capacity of a tank of dimensions (8 m x 6m x 2.5 m) is (a) 120 litres (b) 1200 litres (c) 12000 litres (d) 120000 litres (e) None of these
A: 
"""


In [85]:
prompt="""System : You are a helpful assistant expert specializing in BASH .
User : ## TASK DESCRIPTION
You are a BASH code generator helping me answer a question using BASH .
I will ask you a question , and your task is to interact with a Bourne Shell system using BASH commands
to come up with the answer .
## RESPONSE FORMAT
Your response should be a BASH command . Format your BASH command as follows :
‘‘‘BASH
Your BASH code here
‘‘‘
DO NOT WRITE ANYTHING EXCEPT FOR CODE in your response .
Try ‘‘‘sql
SHOW TABLES ‘‘‘ or ‘‘‘sql
DESCRIBE <table_name > to learn more about the database ‘‘‘.
## OUTPUT DESCRIPTION
Given your BASH command input , the system will then give back output formatted as follows :
Output : <string >
Reward : [0, 1]
The output is the standard output from executing your BASH command .
The reward is a decimal value between 0 and 1, which tells you how close your BASH command is to the correct answer .
The closer the reward is to 1, the closer your BASH command is to the correct answer .
You have to try to maximize the reward .
Query : "{ query }".
Do not generate any output or reward .
Assistant : { Model Completion }"""

SyntaxError: EOF while scanning triple-quoted string literal (1440722157.py, line 1)

**helpful documentation links**

model.generate() -> https://huggingface.co/docs/transformers/en/main_classes/text_generation

**main differences**
* sc relies on majority voting. this can be expensive, however:
  * in bash, ls -ltr vs. ls -trl are different on the surface but achieve the same thing
  * majority-voting isn't designed to accomodate this
* soft sc relies on weighted aggregation of the answers based on their probabilities

In [80]:
softmax = torch.nn.Softmax(dim=1)

def find_start(tokens):
    match_token = ['▁```', 'bash', '<0x0A>']
    for i in range(len(tokens) - 3, 0, -1):
        if tokens[i : i + 3] == match_token:
            break
    return i + 3

with torch.no_grad():
    # tokenize input 
    input_tokens = tokenizer.encode(prompt, return_tensors="pt").to(device)

    # pass through model
    output = model(input_tokens, return_dict=True)

    # obtain logits: [batch_size, sequence_length, vocab_size]
    logits = output.logits[0]

    probs = softmax(logits)

    start = find_start(input_tokens[0])

    

    




In [88]:
probs.shape

torch.Size([620, 32000])

In [30]:
# self-consistency is generally robust to different sampling strategies/parameters
num_samples = 40 # return samples per independent run
top_k = 40
temperature = 0.7

input_tokens = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    **input_tokens,
    max_new_tokens=100,
    do_sample=True,
    top_k=40,
    temperature=temperature,
    num_return_sequences=num_samples
)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.




KeyboardInterrupt: 

**Solf Self-Consistency**

1. input x containing task description
2. generate k solutions using temperature-based sampling
3. selection: score action y_i resulting from each solution using aggregated probability of action's tokens
4. score(y) = f(P(y_y<i, x))

In [25]:
# Tokenize the input prompt
input_tokens = tokenizer(prompt, return_tensors="pt")


Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


Generated Output 1:
Q: John found that the average of 15 numbers is 40. If 10 is added to each number then the mean of the
numbers is? Answer Choices: (a) 50 (b) 45 (c) 65 (d) 78 (e) 64
A: If 10 is added to each number, then the mean of the numbers also increases by 10. So the new mean
would be 50. The answer is (a).
Q: If a / b = 3/4 and 8a + 5b = 22,then find the value of a. Answer Choices: (a) 1/2 (b) 3/2 (c) 5/2 (d) 4/2 (e) 7/2
A: If a / b = 3/4, then b = 4a / 3. So 8a + 5(4a / 3) = 22. This simplifies to 8a + 20a / 3 = 22, which means 44a / 3 = 22. So a is equal to 3/2. The answer is (b).
Q: A person is traveling at 20 km/hr and reached his destiny in 2.5 hr then find the distance? Answer Choices: (a) 53 km (b) 55 km (c) 52 km (d) 60 km (e) 50 km
A: The distance that the person traveled would have been 20 km/hr * 2.5 hrs = 50 km. The answer is (e).
Q: How many keystrokes are needed to type the numbers from 1 to 500? Answer Choices: (a) 1156 (b) 1392 (c) 1480 (d) 1562 (e) 1788
A: T