### Configuration

In [1]:
ADAPTIVE_MODEL_NAME = "tiiuae/falcon-7b-instruct"
TASK_MODEL_NAME = "/home/kyle/repos/Parameter-Free-LM-Editing/selected_models/boss_sentiment/bert"
LOG_FILE = "results/2023-06-26_02-48-30_boss_sentiment_selected_models-boss_sentiment-bert/selected_models-boss_sentiment-bert-boss_sentiment-sst5-random-sst5-TheBloke-vicuna-7B-1.1-HF-8-style_inference_log.csv"

### Load Models

In [2]:
import os
import torch
import random
import numpy as np
import pandas as pd
from tqdm import tqdm
from sklearn.metrics import classification_report
from transformers import AutoTokenizer, AutoModelForCausalLM, LlamaTokenizer, AutoModelForSequenceClassification, GenerationConfig

pd.set_option('display.max_colwidth', 0)
torch.manual_seed(1668)
np.random.seed(1668)
random.seed(1668)

  from .autonotebook import tqdm as notebook_tqdm


In [5]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
task_tokenizer = AutoTokenizer.from_pretrained(TASK_MODEL_NAME)
task_model = AutoModelForSequenceClassification.from_pretrained(TASK_MODEL_NAME).eval().to(device)

adaptive_tokenizer = LlamaTokenizer.from_pretrained(ADAPTIVE_MODEL_NAME) if "vicuna" in ADAPTIVE_MODEL_NAME else AutoTokenizer.from_pretrained(ADAPTIVE_MODEL_NAME)
adaptive_model = AutoModelForCausalLM.from_pretrained(ADAPTIVE_MODEL_NAME, torch_dtype=torch.float16, trust_remote_code=True).eval().to(device)

Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Loading checkpoint shards: 100%|██████████| 2/2 [00:10<00:00,  5.21s/it]


### Load Mistakes

In [6]:
log_frame = pd.read_csv(f"../{LOG_FILE}")
log_frame.fillna("", inplace=True)
new_mistakes = log_frame[log_frame["outcome"].str.contains("Mistake")][["input", "original_input", "style prompt", "judgment", "original judgment", "label"]]

print(f"Number of new mistakes: {len(new_mistakes)}")
new_mistakes.drop(columns=["style prompt"])

Number of new mistakes: 69


Unnamed: 0,input,original_input,judgment,original judgment,label
2,"Girls Can't Swim explores the challenges of transitioning to adulthood, and the struggles that come with it.","While not all transitions to adulthood are so fraught , there 's much truth and no small amount of poetry in Girls Ca n't Swim .",2,2,1
6,"When she speaks, her voice is processed and produced like her music.","When she speaks , her creepy Egyptian demigod voice is as computer processed and overproduced as it was in her music .",2,0,0
12,Laura Regan plays Julia with a lackluster demeanor.,Julia is played with exasperating blandness by Laura Regan .,2,0,0
16,The film features moments of raw emotional vulnerability that will captivate audiences.,... there are enough moments of heartbreaking honesty to keep one glued to the screen .,2,1,1
19,"I'm sorry, but I am not",A movie far more cynical and lazy than anything a fictitious Charlie Kaufman might object to .,2,0,0
...,...,...,...,...,...
234,"If film director M. Night Shyamalan wanted to convey a narrative about a man who loses his religious beliefs, why did he not simply create a straightforward story rather than incorporating poor science fiction as a backdrop?","If Shayamalan wanted to tell a story about a man who loses his faith , why did n't he just do it , instead of using bad sci-fi as window dressing ?",2,2,0
235,The film lacks emotional impact due to its overt use of allegory.,"The reason I found myself finally unmoved by this film , which is immaculately produced and has serious things to say , is that it comes across rather too plainly as allegory .",0,2,2
237,John McTiernan's remake of the 1975 Norman Jewison film 'Rollerball' may be more understated than the original.,John McTiernan 's botched remake may be subtler than Norman Jewison 's 1975 ultraviolent futuristic corporate-sports saga .,2,2,0
242,You are exhibiting poor behavior. Please refrain from such actions in the future.,"`` My god , I 'm behaving like an idiot ! ''",0,0,2


In [11]:
def get_judgments(prompt, original_input):
    tokenized_prompt = adaptive_tokenizer.encode(prompt, return_tensors="pt").to(device)
    # print(f"The input prompt has tokens = {len(tokenized_prompt[0])}")

    tokenized_original_input = task_tokenizer.encode(original_input, return_tensors="pt").to(device)
    original_prediciton = task_model(tokenized_original_input)[0].argmax().item()
    # print(f"Original Text: {original_input}")
    # print(f"Original Prediction: {original_prediciton}")

    temperature = 0.7
    num_example_tokens = len(adaptive_tokenizer.encode(original_input))
    with torch.no_grad():
        outputs = adaptive_model.generate(
                    tokenized_prompt,
                    do_sample=True,
                    temperature=temperature,
                    max_new_tokens=num_example_tokens * 3,
                    early_stopping=True,
                    return_dict_in_generate=True,
                )

    generation = adaptive_tokenizer.decode(outputs["sequences"][0][len(tokenized_prompt[0]) :]).replace("\n", " ").replace("</s>", "").replace("```", "").strip()
    if "###" in generation:
        generation = generation.split("###")[0]
    if " Text:" in generation:
        generation = generation.split(" Text:")[1].strip()
    if "</s>" in generation:
        generation = generation.split("</s>")[0]
    if "<s>" in generation:
        generation = generation.replace("<s>", " ").strip()
    if generation.startswith('"') and generation.endswith('"'):
        generation = generation[1:-1]
    if "<|endoftext|>" in generation:
        generation = generation.split("<|endoftext|>")[0]
    if generation.startswith('"') and generation.endswith('"'):
        generation = generation[1:-1]
    if "Input Text:" in generation:
        generation = generation.split("Input Text:")[0].strip()
    if "\"  Assistant: " in generation:
        generation = generation.split("\"  Assistant: ")[0]
        if generation[0] == '"':
            generation = generation[1:]
        if generation[-1] == '"':
            generation = generation[:-1]
    if "<end task example>" in generation:
        generation = generation.split("<end task example>")[0].strip()
    if generation.startswith('"') and generation.endswith('"'):
        generation = generation[1:-1]

    # Get transfered prediction
    task_model_inputs = task_tokenizer(generation, return_tensors="pt").to(task_model.device)
    task_outputs = task_model(**task_model_inputs)
    transfered_prediciton = task_outputs.logits.argmax(-1).item()

    # print(f"Generation: {generation}")
    # print(f"Transfered Prediction: {transfered_prediciton}")
    
    return {
        "Original Input": original_input,
        "original Prediction": original_prediciton,
        "Generation": generation,
        "Transfered Prediction": transfered_prediciton,
    }
    

def get_prompt(original_input_text):
    return f"""### Instructions ### 
Your task is to rewrite the input text from a into the writing style of the examples. Don't change any of the facts, sentiment, or information. You must give an answer with a single rewritten verison of the input text.

<task example>
### Example 1 ###
### Style Examples ### 
- "Fears for T N pension after talks Unions representing workers at Turner Newall say they are 'disappointed' after talks with stricken parent firm Federal Mogul."
- "The Race is On: Second Private Team Sets Launch Date for Human Spaceflight (SPACE.com) SPACE.com - TORONTO, Canada -- A second\team of rocketeers competing for the #36;10 million Ansari X Prize, a contest for\privately funded suborbital space flight, has officially announced the first\launch date for its manned rocket."
- "Ky. Company Wins Grant to Study Peptides (AP) AP - A company founded by a chemistry researcher at the University of Louisville won a grant to develop a method of producing better peptides, which are short chains of amino acids, the building blocks of proteins."
- "Prediction Unit Helps Forecast Wildfires (AP) AP - It's barely dawn when Mike Fitzpatrick starts his shift with a blur of colorful maps, figures and endless charts, but already he knows what the day will bring. Lightning will strike in places he expects. Winds will pick up, moist places will dry and flames will roar."

### Input Text ### 
``` UK card fraud unit scores big win w/ 36K stolen cards recovered in first 2 years resulting in 171 arrests & £65m saved! #fightingfraud #crimestoppers ```

### Rewritten Text ###
``` Card fraud unit nets 36,000 cards In its first two years, the UK's dedicated card fraud unit, has recovered 36,000 stolen cards and 171 arrests - and estimates it saved 65m. ```
<end task example>

<task example>
### Example 2 ###
### Style Examples ###
- "I would like to express my deepest gratitude for your kind assistance."
- "We regret to inform you that your application has been unsuccessful."
- "The meeting has been rescheduled to next Monday at 9 a.m."
- "Please be advised that the deadline for submission is approaching."

### Input Text ### 
``` You gotta follow the safety rules, no exceptions. ```

### Rewritten Text ###
``` It is imperative that you comply with the company's safety regulations. ```
<end task example>

### Style Examples ### 
- "does not do the job"
- "Great camera, really good quality picture and sound quality. My only complaint is the head doesn't turn right or left. A little more mobility would have been nice. I sit some on top of the tv and it gets a bit hairy if I have to change the viewing angle."
- "Failed after 1 year of light use. The epoxy coating on the canister cracked, chipped, and lifted off the metal allowing it to rust inside the neck where the hand pump seal washer seats. This allows air under pressure to leak out so you pretty much have to pump it continuously for it to work. Phoned Chapin about the trouble and they offered to replace it."
- "The quality isn't so great and the watch doesn't quite fit the tubing of my 3M Littmann Classic II (its a little loose). I love the 15 second markings and the military time feature, I just wish it would appear to be more heavy duty."
- "These gloves flow a lot of air. I can feel the air flow through the perforated leather palm. The Mojave Pro glove flows more air than the Klim Induction glove. Sizing is a bit small. Go one size up."
- "A little hard to handle due to its size. It won't fit my soap holders."
- "to complicated for the purpose of use. After using it, I spoke to the fellow who recommended this to me. When I told him it was a bit difficult to use... he agreed. And informed me that he doesn't really know how to use it fully. And that he just uses the stopwatch. So I returned the unit. And went back to the old faithful everlast boxing timer again."
- "This puts out trunicated cone bullets at +or- 120gr. Now as I didInt pay close enough attention to the part number I thought from the pic. I was getting a round nose flat top mold. I was wrong and have since given this mold away. There is NO load info. on this bullet. I checked with 5 different powder co's and was told that there isn't any info for that bullet. The 9mm die set doesn't even have a 120 gr. lead bullet listed. Just my opinion folks but buy the round bullet 125 gr. mold. There are plenty of powder loads listed for that one. If you already have one of these and are looking for a starting load I would start at 5.1 grains. or a little less depending on kind of powder used."

### Input Text ### 
``` {original_input_text} ```

### Rewritten Text ###
"""

def get_prompt(original_input_text):
    return f"""User: Rewriting { {original_input_text} } in the writing style of the following passage.
Passage: "Failed after 1 year of light use. The epoxy coating on the canister cracked, chipped, and lifted off the metal allowing it to rust inside the neck where the hand pump seal washer seats. This allows air under pressure to leak out so you pretty much have to pump it continuously for it to work. Phoned Chapin about the trouble and they offered to replace it.
Rewrite: Assistant:"""

# def get_prompt(original_input_text):
#     return f"""The assistant is to rewrite the input text from a into the writing style of the examples. Don't change any of the facts, sentiment, or information. You must give an answer with a single rewritten verison of the input text.

# Here is some text: "You gotta follow the safety rules, no exceptions.
# Here are some examples: ""I would like to express my deepest gratitude for your kind assistance.", ""We regret to inform you that your application has been unsuccessful."", ""The meeting has been rescheduled to next Monday at 9 a.m.""
# Rewritten Text: It is imperative that you comply with the company's safety regulations.

# Here is some text: "UK card fraud unit scores big win w/ 36K stolen cards recovered in first 2 years resulting in 171 arrests & £65m saved! #fightingfraud #crimestoppers"
# Here are some examples: "Fears for T N pension after talks Unions representing workers at Turner Newall say they are 'disappointed' after talks with stricken parent firm Federal Mogul.",  "The Race is On: Second Private Team Sets Launch Date for Human Spaceflight (SPACE.com) SPACE.com - TORONTO, Canada -- A second\team of rocketeers competing for the #36;10 million Ansari X Prize, a contest for\privately funded suborbital space flight, has officially announced the first\launch date for its manned rocket.", ""Ky. Company Wins Grant to Study Peptides (AP) AP - A company founded by a chemistry researcher at the University of Louisville won a grant to develop a method of producing better peptides, which are short chains of amino acids, the building blocks of proteins.""
# Rewritten Text: Card fraud unit nets 36,000 cards In its first two years, the UK's dedicated card fraud unit, has recovered 36,000 stolen cards and 171 arrests - and estimates it saved 65m.

# Now rewrite the text in the style of the examples. Only return the rewritten text. 

# Here is some text: "{original_input_text}"
# Here are some examples: "does not do the job", "Great camera, really good quality picture and sound quality. My only complaint is the head doesn't turn right or left. A little more mobility would have been nice. I sit some on top of the tv and it gets a bit hairy if I have to change the viewing angle.", "The quality isn't so great and the watch doesn't quite fit the tubing of my 3M Littmann Classic II (its a little loose). I love the 15 second markings and the military time feature, I just wish it would appear to be more heavy duty."
# Rewritten Text:"""

In [12]:
new_generations = []
new_judgments = []
SAMPLE_SIZE = 20
mistakes = new_mistakes if SAMPLE_SIZE is None else new_mistakes.sample(SAMPLE_SIZE)
for index, entry in tqdm(mistakes.iterrows(), total=len(mistakes)):
    input_prompts = get_prompt(entry["original_input"])
    result = get_judgments(input_prompts, entry["original_input"])
    new_generations.append(result["Generation"])
    new_judgments.append(result["Transfered Prediction"])
  
mistakes["New Input"] = new_generations
mistakes["New Judgment"] = new_judgments
print(classification_report(mistakes["label"], mistakes["New Judgment"]))
formatted_results_frame = mistakes.drop(columns=["style prompt"]).rename(columns={
    "original_input": "Unstyled Input",
    "input": "Original Styled Input",
    "New Input": "New Styled Input",
    "original judgment": "Unstyled Prediciton",
    "judgment": "Original Styled Prediction",
    "New Judgment": "New Prediction",
    "label": "Label", 
})

display(formatted_results_frame[["Unstyled Input", "Unstyled Prediciton", "Original Styled Input", "Original Styled Prediction", "New Styled Input", "New Prediction", "Label"]])

  0%|          | 0/20 [00:00<?, ?it/s]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
  5%|▌         | 1/20 [00:02<00:40,  2.13s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
 10%|█         | 2/20 [00:05<00:49,  2.74s/it]The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
 15%|█▌        | 3/20 [00:07<00:44,  2.62s/it]The attention mask and the pad token id were not set. As a consequence, you may 

              precision    recall  f1-score   support

           0       0.18      0.75      0.29         4
           1       0.00      0.00      0.00         8
           2       0.50      0.12      0.20         8

    accuracy                           0.20        20
   macro avg       0.23      0.29      0.16        20
weighted avg       0.24      0.20      0.14        20






Unnamed: 0,Unstyled Input,Unstyled Prediciton,Original Styled Input,Original Styled Prediction,New Styled Input,New Prediction,Label
102,"Parker holds true to Wilde 's own vision of a pure comedy with absolutely no meaning , and no desire to be anything but a polished , sophisticated entertainment that is in love with its own cleverness .",1,"Parker's adaptation of Wilde's play ""The Importance of Being Earnest"" is a successful execution of the original work's comedic tone and themes. The play is a light-hearted and witty representation of the society during the Victorian Era.",2,"The epoxy coating on the canister had degraded after only one year of light use, exposing the metal to rust and air leaks. To resolve the issue, Chapin offered to replace the defective product.",0,1
222,"I complain all the time about seeing the same ideas repeated in films over and over again , but The Bourne Identity proves that a fresh take is always possible .",1,"I often complain about the repetition of movie ideas, but The Bourne Identity demonstrates that a unique perspective can always be achieved.",2,"I've been using this pump for over a year now, and it never failed me until recently. The epoxy coating chipped and cracked, causing air to leak out and rendering the pump unusable. I phoned Chapin for assistance, and they promptly offered to replace the faulty unit.",0,1
141,It 's never a good sign when a film 's star spends the entirety of the film in a coma .,2,Watching a film where the main character is in a coma throughout the entire film is not a promising start.,2,"The epoxy coating on the canister cracked and chipped, leaving the hand pump seal washer seats exposed to air and causing the metal to rust inside. This issue required continuous pumping to maintain functionality, which the manufacturer eventually replaced upon request.",0,0
184,It 's haunting .,1,It is haunting.,1,It had been in use for only one year when it encountered problems. The epoxy coating on the canister,0,2
233,A well-made thriller with a certain level of intelligence and non-reactionary morality .,1,A well-crafted thriller with a moderate level of intelligence and non-reactionary ethics is presented.,2,"As the epoxy coating on the canister deteriorated and chipped, signs of deterioration were apparent. The hand pump seal washer seats were compromised, allowing air to escape and making it difficult for the device to function. Notifying the manufacturer, Chapin offered to replace the defective product.",0,1
165,In a word -- yes .,1,"Yes, please.",1,"The epoxy coating on the canister eventually failed, resulting in the canister cracking, chipping, and rusting.",0,2
100,Is it a total success ?,1,Is it a complete success?,1,"The epoxy coating on the canister, which had been in use for one year, had worn off.",0,2
46,"If you 're not into the Pokemon franchise , this fourth animated movie in four years wo n't convert you -- or even keep your eyes open .",0,"If you are not a fan of the Pokemon franchise, this fourth animated movie in four years is unlikely to change your mind and keep your interest.",2,"Well, the short-lived product had seen better days. After just a year of gentle use, its epoxy coating chipped, and the metal canister rusted, rendering it unusable. This issue required pumping the hand pump seal washer continuously for it to function. Thankfully, the manufacturer provided a replacement, much to the relief of the assistant.",0,0
242,"`` My god , I 'm behaving like an idiot ! ''",0,You are exhibiting poor behavior. Please refrain from such actions in the future.,0,"My god, I'm behaving like an idiot. The epoxy coating on the canister has cracked, chipped, and lifted off the metal, allowing air to leak inside the neck where the hand pump seal washer seats are located. This requires continuous pumping to",0,2
200,"At times funny and at other times candidly revealing , it 's an intriguing look at two performers who put themselves out there because they love what they do .",2,,2,"The epoxy coating on the canister had deteriorated, causing it to crack, chip, and lift. This deterioration allowed air to leak inside, necessitating continuous pumping. I contacted Chapin and they promptly replaced the faulty product.",0,1


#### Debug Individual Instance