# Prompt Validation Exp - Result Analysis

In order to build a proper datamodel it's essential to first understand what are the types of context that we're aiming to explore.  
To do so five different types of context will be analyzed here:
- Gold Standard
- No Context
- Random Context from the Same Page
- Selected Context from the Same Page

These contexts were evaluated in 5 different questions that will be showcased here.

In [1]:
import seaborn as sns
import polars as pl
import json
import os

## Showcasing Questions

These are the questions used during this experiment, they will be presented here with the expected output

In [3]:
print(os.getcwd())

/home/caio.rhoden


In [8]:
GOLDEN_PATH = "work/dmcr-experiment-natural-questions/data/nq_open_gold/processed/train.feather"
train  = pl.read_ipc(GOLDEN_PATH)

selected_ids = [
    4393532674001821363	,
    -4144729966148354479,
    1317425899790858647,
    824576888464737344,
    -1245842872065838644
]


questions = train.filter(pl.col("example_id").is_in(selected_ids))

In [11]:
questions.select("question", "answers").to_numpy()

array([['why was there so much interest in cuba both before and after the civil war',
        array(['sugar markets'], dtype=object)],
       ['who developed the first periodic table with 8 columns',
        array(['Dmitri Mendeleev'], dtype=object)],
       ['when did the who perform at the super bowl',
        array(['Feb 7 , 2010'], dtype=object)],
       ['the organization of the formal elements in an art work',
        array(['composition'], dtype=object)],
       ['where did the french king live before versailles',
        array(['the Tuileries Palace'], dtype=object)]], dtype=object)

## Compiling Results


In [16]:
def get_results(path):

    selected_ids = [
        4393532674001821363	,
        -4144729966148354479,
        1317425899790858647,
        824576888464737344,
        -1245842872065838644
    ]

    results = []
    for filename in os.listdir(path):
        if os.path.isfile(os.path.join(path, filename)):
            if any(str(id) in filename for id in selected_ids):
                with open(os.path.join(path, filename), "r") as f:
                    results.append(json.load(f))
    
    return results
    


In [17]:
results_path = "work/dmcr-experiment-natural-questions/prompt_validation_experiment/results"
possible_prompts =  [
    "gold_context_baseline",
    "no_context_baseline",
    "same_page_baseline",
    "same_page_reduced_baseline"
]


results = {prompt: get_results(os.path.join(results_path, prompt)) for prompt in possible_prompts}

In [18]:
results

{'gold_context_baseline': [{'true_answers': ['sugar markets'],
   'pred_answer': '"due to the control',
   'score': 0.0,
   'time': '0.036693'},
  {'true_answers': ['the tuileries palace'],
   'pred_answer': 'the tuileries palace',
   'score': 1.0,
   'time': '0.036608'},
  {'true_answers': ['dmitri mendeleev'],
   'pred_answer': 'the answer is: dmit',
   'score': 0.0,
   'time': '0.038123'},
  {'true_answers': ['feb 7 , 2010'],
   'pred_answer': 'feb 7,',
   'score': 0.8,
   'time': '0.039375'},
  {'true_answers': ['composition'],
   'pred_answer': 'organization of the elements of',
   'score': 0.0,
   'time': '0.036646'}],
 'no_context_baseline': [{'true_answers': ['dmitri mendeleev'],
   'pred_answer': 'dmitri mendeleev',
   'score': 1.0,
   'time': '0.039899'},
  {'true_answers': ['composition'],
   'pred_answer': 'composition',
   'score': 1.0,
   'time': '0.036886'},
  {'true_answers': ['feb 7 , 2010'],
   'pred_answer': 'the who performed at super',
   'score': 0.0,
   'time': '