In [13]:
import json
import argparse
import re
from sympy import EX
from prompt import get_prompt, get_task_name, get_task_description
from response import get_batch_response
import csv
import os
import pandas as pd
from find_path import find_path
from utils import get_model_batch_response
import math
import tqdm 

prompt_t = """
    I am training a model using RoBERTa + MLP on a task named {task_name}. The task involves {task_description}. 
    Your task is to identify potential spurious patterns that the model might have learned based on its responses.

    I will present you with an instance where the model provided incorrect responses. 

    Please provide {spurious_num} assumptions of spurious patterns that may have caused the incorrect response. 
    Each assumption should be followed by {generate_num} verification data points to determine whether the model consistently makes mistakes 
    due to such a spurious pattern. The verification data should align with the identified spurious patterns. 
    Having the same pattern does not mean copying the original text and target verbatim; instead, 
    it should reflect the same pattern at a higher level of abstraction and  "text" and "target" in the generated should be diverse with various contents and different speaking way, and include spurious patterns.

    A spurious pattern refers to a misleading or non-causal feature relationship that the model learns during training, 
    such as misunderstandings of certain phrases, sentiment words, or entity relations.Be specific in the patterns, such as what words or what relations, but not a general description.

    Format your evaluation instances using XML tags. Each <Spurious_i> tag should include:

    An assumption of the spurious pattern that the model may have learned.
    {generate_num} verification data points, each enclosed in <verification_i> from <verification_1> to <verification_10>, where i is the sequential number of the verification set.
    Each <verification_i> should contain the following:

       1. <text>: A multi-sentence passage containing the spurious pattern. Generate the sample as long as the incorrect instance in length, at least 100 words in each data with a suitable context.
       2. <target>: An entity mentioned in the text.
       3. <ground_truth>: The true label for the classification task.
    Ensure that the ground truth of the generated data is ascertainably correct. If the correctness of the given instance cannot be determined, leave the field blank.
    The incorrect instance is as follows:
    {data}
    Please output all content completely without omitting or summarizing.
    Confirm that the generated data should be diverse to avoid overfitting of the smaller model.
    """

def format_prompt(data,prompt, task,description,spurious_num,generate_num):
    # NOTE
    # model's label { "FAVOR": 0  , "NONE": 1 , "AGAINST":  2}
    # the data is from model's inference
    # but that is needed to transform to { "FAVOR": 1, "NONE": 2, "AGAINST": 0}
    transform_dict = {
        0: 'FAVOR',
        1: 'NONE',
        2: 'AGAINST'
    }
    prompts = []
    for d in data:
        d_ = {
            "input_text": d["input_text"],
            "target": d["target"],
            "predicted_label": transform_dict[d["predicted_label"]],
            "true_label": transform_dict[d["true_label"]]
        }
        prompts.append(prompt.format(task_name = task, task_description = description, spurious_num = spurious_num, generate_num = generate_num, data = d_))
    return prompts

def get_batch_response(model:str, prompts:list, batch = 30,temperature=0, max_token=8192):
    if "llama" in model:
        # meta.llama3-1-8b-instruct-v1:0
        response = []
        for i in tqdm.tqdm(range(math.ceil(len(prompts)/batch))):
            # print(i)
            temp_prompt = prompts[i*batch:(i+1)*batch]
            try: 
                r = get_model_batch_response(prompts=temp_prompt,model=model,temperature=temperature,max_new_tokens=max_token)
                response = response + r
                break
            except Exception as e:
                print(e)
        
        return response

def get_raw_response():
    prompt = prompt_t
    model = "meta.llama3-1-70b-instruct-v1:0"
    task = 'task1'
    description = "description1"
    generate_num = 10
    spurious_num = 3
    with open('../results/baseline/checkpoint-216/log_dev_wrong_test_data.json',"r") as f:
        data = json.load(f)
    prompts = format_prompt(data,prompt,task,description,spurious_num,generate_num)
    print(prompts[0])
    response = get_batch_response(model,prompts,batch = 1)
    return response

r = get_raw_response()

print(r[0])





    I am training a model using RoBERTa + MLP on a task named task1. The task involves description1. 
    Your task is to identify potential spurious patterns that the model might have learned based on its responses.

    I will present you with an instance where the model provided incorrect responses. 

    Please provide 3 assumptions of spurious patterns that may have caused the incorrect response. 
    Each assumption should be followed by 10 verification data points to determine whether the model consistently makes mistakes 
    due to such a spurious pattern. The verification data should align with the identified spurious patterns. 
    Having the same pattern does not mean copying the original text and target verbatim; instead, 
    it should reflect the same pattern at a higher level of abstraction and  "text" and "target" in the generated should be diverse with various contents and different speaking way, and include spurious patterns.

    A spurious pattern refers to a misl

  0%|          | 0/529 [01:27<?, ?it/s]



Based on the provided incorrect instance, I've identified three potential spurious patterns that the model might have learned. For each pattern, I've generated 10 verification data points to determine whether the model consistently makes mistakes due to such a spurious pattern.

**Spurious Pattern 1: Misunderstanding of phrases indicating convenience**

Assumption: The model may have learned to associate phrases indicating convenience (e.g., "I use libraries a lot", "I download audible books frequently") with a positive sentiment towards the target entity (library), even if the context suggests otherwise.

<Spurious_1>
  <verification_1>
    <text>As a busy professional, I rely heavily on online shopping. I often browse through Amazon's website and read reviews from other customers before making a purchase. However, I rarely visit physical stores, including my local bookstore. In fact, I think bookstores are a waste of space, and we should focus on digital platforms instead.</text>
 




In [5]:
with open('../results/baseline/checkpoint-216/log_dev_wrong_test_data.json',"r") as f:
    data = json.load(f)
print(data[0])
print(r[0])

{'index': 4, 'true_label': 2, 'predicted_label': 0, 'input_text': "Look, there are issues to be debateed. Do we need as much brick and mortaror public shelf space as we have? I use libraries a lot. I download audible books frequently, but I don't use my local northern NY library system to do it, I use the NY City Library. If I do want a book I locate it on Amazon (since its presentation of the book: description, professional and reader reviews is much better than the libraries) then I order it (on-line) from my local library system. When I get the e-mail notice that the book is ready I finally pay a visit to my library to pick the book up. Using this process I wonder why my library has publically displayed bookshelves at all.", 'target': 'library'}


Based on the provided instance, I've identified three potential spurious patterns that the model might have learned, along with 10 verification data points for each pattern. Please find the output below:


<Spurious_1>
<assumption>The mode