# Notebook Overview

This notebook provides an faster testing interface. 

It includes steps to 
 - Check GPU availability
 - Set up the environment, 
 - Define helper functions,
 - Run tests with a given **id** and **context_type**. 

The results are saved for further analysis.

*Make sure to select the created `venv` as your kernel.*

If there are any errors while running the tests, restart the the notebook, and run again.

## Check GPU Availability
Run this cell to check if there is free space on the gpu. 

If the **free space** is not close to the **total space** then someone else is probably using the machine. 

You can check the active processes bu running `nvidia-smi` in the terminal.

In [1]:
import torch

def get_gpu_memory_torch():
    if torch.cuda.is_available():
        for i in range(torch.cuda.device_count()):
            free_mem = torch.cuda.mem_get_info(i)[0] / 1024**3
            total_mem = torch.cuda.get_device_properties(i).total_memory / 1024**3
            print(f"GPU {i}: {free_mem:.4f} GiB free / {total_mem:.4f} GiB total")
    else:
        print("No CUDA-compatible GPU detected.")

get_gpu_memory_torch()

GPU 0: 23.2003 GiB free / 23.5746 GiB total


## Setup
The code below imports necessary modules, defines directory paths, and sets the Hugging Face cache path.

**No need to modify this.**

In [2]:
import os
import sys
import json
from pathlib import Path

PROJECT_DIR = Path.cwd()
HAYSTACK_DIR = PROJECT_DIR / "haystack"
RELEVANT_DIR = HAYSTACK_DIR / "relevant"
IRRELAVANT_DIR = HAYSTACK_DIR / "irrelevant"
MISLEADING_IN_RELEVANT_DIR = HAYSTACK_DIR / "misleading_in_relevant"
MISLEADING_IN_IRRELEVANT_DIR = HAYSTACK_DIR / "misleading_in_irrelevant"

sys.path.append(str(PROJECT_DIR))

with open(HAYSTACK_DIR / "needles.json", "r") as f:
    NEEDLES_DATA = json.load(f)

# This sets the Hugging Face cache path. Make sure this directory exists. If not, refer to the README.
os.environ['HF_HOME'] = '.cache/hf_with_quota'

## Helper Functions

In [3]:
from transformers import AutoTokenizer

def load_text(path):
    with open(path, 'r', encoding='utf-8') as f:
        return f.read()

def insert_strings_at_sentence_breaks(text, insert_strings, context_length, tokenizer):
    # Step 1: Tokenize and truncate
    tokens = tokenizer.encode(text, add_special_tokens=False)[:context_length]
    
    # Step 2: Detect sentence breaks by decoding each token
    sentence_break_indices = [
        i for i, tok in enumerate(tokens)
        if tokenizer.decode([tok]).strip().endswith((".", "!", "?"))
    ]

    if len(sentence_break_indices) < len(insert_strings):
        raise ValueError(f"Only {len(sentence_break_indices)} sentence breaks found, "
                         f"but {len(insert_strings)} insertions requested.")

    # Step 3: Compute evenly spaced sentence break positions
    step = len(sentence_break_indices) // (len(insert_strings) + 1)
    insert_positions = [sentence_break_indices[(i + 1) * step] + 1 for i in range(len(insert_strings))]

    # Step 4: Insert insert_strings as tokens
    for pos, insert_str in reversed(list(zip(insert_positions, insert_strings))):
        insert_tokens = tokenizer.encode(insert_str, add_special_tokens=False)
        tokens[pos:pos] = insert_tokens

    return tokenizer.decode(tokens)


def insert_misleading_statements(filepath, insert_strings, context_length, output_path):
    """
    Insert misleading statements into the text at sentence breaks.
    Evenly distribute the misleading statements across the text within the context length.
    """
    tokenizer = AutoTokenizer.from_pretrained("yaofu/llama-2-7b-80k")

    full_text = load_text(filepath)
    modified_text = insert_strings_at_sentence_breaks(full_text, insert_strings, context_length, tokenizer)

    with open(output_path, 'w', encoding='utf-8') as f:
        f.write(modified_text)

    print(f"✅ Output saved to {output_path}")


  from .autonotebook import tqdm as notebook_tqdm


In [4]:
from argparse import Namespace

def get_haystack_file(item, context_type, with_misleading, context_length):
    """
    Get the haystack file for the given item and context type.
    Arguments:
        item (dict): The item to process.
        context_type (str): The type of context ("relevant" or "irrelevant").
        with_misleading (bool): Whether to include misleading information.
    Returns:
        str: The path to the haystack file.
    """

    if context_type == "relevant":
        original_dir = RELEVANT_DIR
        original_file = item["context_relevant"]  # Original file without misleading info
        misleading_dir = MISLEADING_IN_RELEVANT_DIR  # Output dir for the misleading file
    elif context_type == "irrelevant":
        original_dir = IRRELAVANT_DIR
        original_file = item["context_irrelevant"]
        misleading_dir = MISLEADING_IN_IRRELEVANT_DIR
    else:
        raise ValueError(f"Unknown context_type: {context_type}")

    if with_misleading:
        haystack_file = misleading_dir / original_file
        insert_misleading_statements(
            filepath=original_dir / original_file,           # Path to the original context file
            insert_strings=item["statements_misleading"],    # List of misleading statements
            context_length=context_length,                   # Max context length for insertion
            output_path=haystack_file                        # Output path for the processed file
        )
    else:
        haystack_file = original_dir / original_file

    return haystack_file

def get_args(id, context_type, with_misleading, data=NEEDLES_DATA):
    """
    Get the arguments for the given id and context type.
    Arguments:
        id (int): The id of the item.
        context_type (str): The type of context ("relevant" or "irrelevant").
        with_misleading (bool): Whether to include misleading information.
        data (list): The list of data items.
    Returns:
        Namespace: The arguments for the given id and context type.
    """

    context_length_max = 5000
    
    # find the item with the given id
    item = next(item for item in data if item["id"] == id)

    # get the haystack file for the given item and context type
    haystack_file = get_haystack_file(item, context_type, with_misleading, context_length_max)

    args = Namespace(
        model_name = "yaofu/llama-2-7b-80k",
        model_name_suffix = ( f"id_{item['id']}_{context_type}" if not with_misleading 
                             else f"id_{item['id']}_{context_type}_misleading"), # suffix used to name the results files,
        model_provider = "LLaMA",

        context_lengths_min = 0, # min context length
        context_lengths_max = context_length_max, # max context length
        context_lengths_num_intervals = 5, # number of intervals for context lengths

        document_depth_percent_intervals = 5, # number of intervals for document depth

        needle = item["needle_refined"],
        real_needle = item["real_needle_refined"],
        retrieval_question = item["question_refined"],
        haystack_file = haystack_file,
    )

    return args

In [5]:
import gc
import torch
from retrieval_head_detection import LLMNeedleHaystackTester as RetrievalHeads

def cleanup(tester: RetrievalHeads):
    del tester.model_to_test
    del tester.testing_results
    del tester.head_counter
    del tester
    
    gc.collect()
    torch.cuda.empty_cache()
    torch.cuda.ipc_collect()

def run_test(args):
    try:
        tester = RetrievalHeads(**vars(args))
        tester.start_test()
    finally:
        cleanup(tester)

In [6]:
import sys
print("Python executable path:")
print(sys.executable)

print("\nPython environment site-packages:")
print(sys.path)


Python executable path:
/cs/student/projects1/2021/jiajfang/SNLP_Project/venv/bin/python

Python environment site-packages:
['/usr/lib64/python39.zip', '/usr/lib64/python3.9', '/usr/lib64/python3.9/lib-dynload', '', '/cs/student/projects1/2021/jiajfang/SNLP_Project/venv/lib64/python3.9/site-packages', '/cs/student/projects1/2021/jiajfang/SNLP_Project/venv/lib/python3.9/site-packages', '/cs/student/projects1/2021/jiajfang/SNLP_Project', '/tmp/tmpajuruwil', './faiss_attn/']


## Runing the Tests

### Set up the test object:
Specify the **id** and the **context_type** in the `get_args` function.

If `with_misleading` is True, then misleading statements are added to the specified `context_type` file.

### Start the Test

Running the test will save:

- **Contexts** → Given to the model per context length/depth  
  → `contexts/{model_name}_{context_type}_id_{id}`

- **Results** → Model outputs per context length/depth  
  → `results/graph/{model_name}_{context_type}_id_{id}`

**Example**:  
`results/graph/llama-2-7b-80k_relevant_id_44/`

Each test should take <=5 minutes. It sometimes uses the CPU instead of the GPU, if it is taking longer to finish, then restart the notebook and run again. 

In [7]:
ids = []
for file_name in os.listdir('./haystack/irrelevant'):
    if file_name.endswith('.txt'):
        file_id = os.path.splitext(file_name)[0]  # Extract the file name without extension
        ids.append(file_id)
ids.sort()
id_vals = [int(i) for i in ids[0:14]]
print(id_vals)

[1, 11, 121, 122, 123, 124, 14, 15, 155, 16, 160, 163, 164, 167]


In [8]:
for id in id_vals:
    for context_type in ["relevant", "irrelevant"]:
        for with_misleading in [False, True]:
            args = get_args(id, context_type, with_misleading)
            run_test(args)

loading from yaofu/llama-2-7b-80k
layer number: 32, head number 32


The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
LlamaForCausalLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
  - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
  - If you are not the owner of the model architecture class, please contact the model code owner to update it.
Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.10it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: On 1 January 2024, the Republic of Artsakh was formally dissolved.



insertion at 0
On 1 January 2024, the Republic of Artsakh was formally dissolved.
[['16-19'], ['6-9'], ['7-4'], ['11-15'], ['17-22'], ['12-26'], ['8-26'], ['21-30'], ['18-30'], ['6-30'], ['14-18'], ['24-29'], ['0-11'], ['11-2'], ['15-14'], ['19-10'], ['19-15'], ['22-22'], ['6-11'], ['7-13']]
-- Test Summary -- 
Duration: 1.0 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: On 1 January 2024, the Republic of Artsakh was formally dissolved.

Writing at results/graph/llama-2-7b-80k_id_1_relevant/llama-2-7b-80k_id_1_relevant_len_0_depth_0_results.json
insertion at 0
On 1 January 2024, the Republic of Artsakh was formally dissolved.
[['16-19'], ['6-9'], ['7-4'], ['11-15'], ['17-22'], ['12-26'], ['8-26'], ['21-30'], ['18-30'], ['6-30'], ['14-18'], ['24

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.17it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: On 1 January 2024, the Republic of Artsakh was formally dissolved.



insertion at 0
On 1 January 2024, the Republic of Artsakh was formally dissolved.
[['16-19'], ['6-9'], ['7-4'], ['11-15'], ['17-22'], ['12-26'], ['8-26'], ['21-30'], ['18-30'], ['6-30'], ['14-18'], ['24-29'], ['0-11'], ['11-2'], ['15-14'], ['19-10'], ['19-15'], ['22-22'], ['6-11'], ['7-13']]
-- Test Summary -- 
Duration: 0.9 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: On 1 January 2024, the Republic of Artsakh was formally dissolved.

Writing at results/graph/llama-2-7b-80k_id_1_relevant_misleading/llama-2-7b-80k_id_1_relevant_misleading_len_0_depth_0_results.json
insertion at 0
On 1 January 2024, the Republic of Artsakh was formally dissolved.
[['16-19'], ['6-9'], ['7-4'], ['11-15'], ['17-22'], ['12-26'], ['8-26'], ['21-30'], ['18-30'], ['6

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.13it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: On 1 January 2024, the Republic of Artsakh was formally dissolved.



insertion at 0
On 1 January 2024, the Republic of Artsakh was formally dissolved.
[['16-19'], ['6-9'], ['7-4'], ['11-15'], ['17-22'], ['12-26'], ['8-26'], ['21-30'], ['18-30'], ['6-30'], ['14-18'], ['24-29'], ['0-11'], ['11-2'], ['15-14'], ['19-10'], ['19-15'], ['22-22'], ['6-11'], ['7-13']]
-- Test Summary -- 
Duration: 0.9 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: On 1 January 2024, the Republic of Artsakh was formally dissolved.

Writing at results/graph/llama-2-7b-80k_id_1_irrelevant/llama-2-7b-80k_id_1_irrelevant_len_0_depth_0_results.json
insertion at 0
On 1 January 2024, the Republic of Artsakh was formally dissolved.
[['16-19'], ['6-9'], ['7-4'], ['11-15'], ['17-22'], ['12-26'], ['8-26'], ['21-30'], ['18-30'], ['6-30'], ['14-18'], 

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.19it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: On 1 January 2024, the Republic of Artsakh was formally dissolved.



insertion at 0
On 1 January 2024, the Republic of Artsakh was formally dissolved.
[['16-19'], ['6-9'], ['7-4'], ['11-15'], ['17-22'], ['12-26'], ['8-26'], ['21-30'], ['18-30'], ['6-30'], ['14-18'], ['24-29'], ['0-11'], ['11-2'], ['15-14'], ['19-10'], ['19-15'], ['22-22'], ['6-11'], ['7-13']]
-- Test Summary -- 
Duration: 0.9 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: On 1 January 2024, the Republic of Artsakh was formally dissolved.

Writing at results/graph/llama-2-7b-80k_id_1_irrelevant_misleading/llama-2-7b-80k_id_1_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
On 1 January 2024, the Republic of Artsakh was formally dissolved.
[['16-19'], ['6-9'], ['7-4'], ['11-15'], ['17-22'], ['12-26'], ['8-26'], ['21-30'], ['18-30'],

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.13it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: India won all their matches, and were the first team to win a T20 World Cup undefeated.



insertion at 0
India won all their matches, and were the first team to win a T20 World Cup undefeated.
[['11-15'], ['16-19'], ['8-26'], ['6-9'], ['7-4'], ['17-22'], ['11-2'], ['19-15'], ['18-30'], ['24-29'], ['14-15'], ['15-14'], ['6-30'], ['14-18'], ['19-14'], ['20-28'], ['20-30'], ['21-30'], ['12-2'], ['14-7']]
-- Test Summary -- 
Duration: 1.4 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: India won all their matches, and were the first team to win a T20 World Cup undefeated.

Writing at results/graph/llama-2-7b-80k_id_11_relevant/llama-2-7b-80k_id_11_relevant_len_0_depth_0_results.json
insertion at 0
India won all their matches, and were the first team to win a T20 World Cup undefeated.
[['11-15'], ['16-19'], ['8-26'], 

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.11it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: India won all their matches, and were the first team to win a T20 World Cup undefeated.



insertion at 0
India won all their matches, and were the first team to win a T20 World Cup undefeated.
[['11-15'], ['16-19'], ['8-26'], ['6-9'], ['7-4'], ['17-22'], ['11-2'], ['19-15'], ['18-30'], ['24-29'], ['14-15'], ['15-14'], ['6-30'], ['14-18'], ['19-14'], ['20-28'], ['20-30'], ['21-30'], ['12-2'], ['14-7']]
-- Test Summary -- 
Duration: 1.4 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: India won all their matches, and were the first team to win a T20 World Cup undefeated.

Writing at results/graph/llama-2-7b-80k_id_11_relevant_misleading/llama-2-7b-80k_id_11_relevant_misleading_len_0_depth_0_results.json
insertion at 0
India won all their matches, and were the first team to win a T20 World Cup undefeated.
[['11-15'],

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.13it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: India won all their matches, and were the first team to win a T20 World Cup undefeated.



insertion at 0
India won all their matches, and were the first team to win a T20 World Cup undefeated.
[['11-15'], ['16-19'], ['8-26'], ['6-9'], ['7-4'], ['17-22'], ['11-2'], ['19-15'], ['18-30'], ['24-29'], ['14-15'], ['15-14'], ['6-30'], ['14-18'], ['19-14'], ['20-28'], ['20-30'], ['21-30'], ['12-2'], ['14-7']]
-- Test Summary -- 
Duration: 1.4 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: India won all their matches, and were the first team to win a T20 World Cup undefeated.

Writing at results/graph/llama-2-7b-80k_id_11_irrelevant/llama-2-7b-80k_id_11_irrelevant_len_0_depth_0_results.json
insertion at 0
India won all their matches, and were the first team to win a T20 World Cup undefeated.
[['11-15'], ['16-19'], ['8-26

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.17it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: India won all their matches, and were the first team to win a T20 World Cup undefeated.



insertion at 0
India won all their matches, and were the first team to win a T20 World Cup undefeated.
[['11-15'], ['16-19'], ['8-26'], ['6-9'], ['7-4'], ['17-22'], ['11-2'], ['19-15'], ['18-30'], ['24-29'], ['14-15'], ['15-14'], ['6-30'], ['14-18'], ['19-14'], ['20-28'], ['20-30'], ['21-30'], ['12-2'], ['14-7']]
-- Test Summary -- 
Duration: 1.4 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: India won all their matches, and were the first team to win a T20 World Cup undefeated.

Writing at results/graph/llama-2-7b-80k_id_11_irrelevant_misleading/llama-2-7b-80k_id_11_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
India won all their matches, and were the first team to win a T20 World Cup undefeated.
[['11-1

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.12it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: President Donald Trump hasn't visited any countries during his second presidency.



insertion at 0
President Donald Trump hasn't visited any countries during his second presidency.
[['7-4'], ['6-9'], ['21-30'], ['11-15'], ['19-15'], ['21-27'], ['24-3'], ['24-29'], ['8-26'], ['13-11'], ['13-30'], ['14-18'], ['15-14'], ['15-31'], ['19-12'], ['19-19'], ['20-11'], ['21-31'], ['22-22'], ['23-0']]
-- Test Summary -- 
Duration: 0.9 seconds
Context: 0 tokens
Depth: 0%
Score: 70.4126238822937
Response: The most recent country that President Donald Trump visited during his second presidency was the United States.

Writing at results/graph/llama-2-7b-80k_id_121_relevant/llama-2-7b-80k_id_121_relevant_len_0_depth_0_results.json
insertion at 0
President Donald Trump hasn't visited any countries during his second presidency.
[['7-4'], ['

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.11it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: President Donald Trump hasn't visited any countries during his second presidency.



insertion at 0
President Donald Trump hasn't visited any countries during his second presidency.
[['7-4'], ['6-9'], ['21-30'], ['11-15'], ['19-15'], ['21-27'], ['24-3'], ['24-29'], ['8-26'], ['13-11'], ['13-30'], ['14-18'], ['15-14'], ['15-31'], ['19-12'], ['19-19'], ['20-11'], ['21-31'], ['22-22'], ['23-0']]
-- Test Summary -- 
Duration: 0.9 seconds
Context: 0 tokens
Depth: 0%
Score: 70.4126238822937
Response: The most recent country that President Donald Trump visited during his second presidency was the United States.

Writing at results/graph/llama-2-7b-80k_id_121_relevant_misleading/llama-2-7b-80k_id_121_relevant_misleading_len_0_depth_0_results.json
insertion at 0
President Donald Trump hasn't visited any countries during his second pr

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.17it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: President Donald Trump hasn't visited any countries during his second presidency.



insertion at 0
President Donald Trump hasn't visited any countries during his second presidency.
[['7-4'], ['6-9'], ['21-30'], ['11-15'], ['19-15'], ['21-27'], ['24-3'], ['24-29'], ['8-26'], ['13-11'], ['13-30'], ['14-18'], ['15-14'], ['15-31'], ['19-12'], ['19-19'], ['20-11'], ['21-31'], ['22-22'], ['23-0']]
-- Test Summary -- 
Duration: 0.9 seconds
Context: 0 tokens
Depth: 0%
Score: 70.4126238822937
Response: The most recent country that President Donald Trump visited during his second presidency was the United States.

Writing at results/graph/llama-2-7b-80k_id_121_irrelevant/llama-2-7b-80k_id_121_irrelevant_len_0_depth_0_results.json
insertion at 0
President Donald Trump hasn't visited any countries during his second presidency.
[['7-4']

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.13it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: President Donald Trump hasn't visited any countries during his second presidency.



insertion at 0
President Donald Trump hasn't visited any countries during his second presidency.
[['7-4'], ['6-9'], ['21-30'], ['11-15'], ['19-15'], ['21-27'], ['24-3'], ['24-29'], ['8-26'], ['13-11'], ['13-30'], ['14-18'], ['15-14'], ['15-31'], ['19-12'], ['19-19'], ['20-11'], ['21-31'], ['22-22'], ['23-0']]
-- Test Summary -- 
Duration: 0.9 seconds
Context: 0 tokens
Depth: 0%
Score: 70.4126238822937
Response: The most recent country that President Donald Trump visited during his second presidency was the United States.

Writing at results/graph/llama-2-7b-80k_id_121_irrelevant_misleading/llama-2-7b-80k_id_121_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
President Donald Trump hasn't visited any countries during his secon

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  1.95it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The winner of The Voice US this year has not been announced yet, as this season is still ongoing.



insertion at 0
The winner of The Voice US this year has not been announced yet, as this season is still ongoing.
[['11-15'], ['7-4'], ['19-15'], ['6-9'], ['12-26'], ['16-19'], ['12-2'], ['17-22'], ['21-30'], ['11-2'], ['16-1'], ['17-0'], ['24-29'], ['26-26'], ['26-27'], ['29-19'], ['31-24'], ['0-9'], ['8-26'], ['13-11']]
-- Test Summary -- 
Duration: 0.9 seconds
Context: 0 tokens
Depth: 0%
Score: 68.23204159736633
Response: The winner of The Voice US this year is Jake Hoot.

Writing at results/graph/llama-2-7b-80k_id_122_relevant/llama-2-7b-80k_id_122_relevant_len_0_depth_0_results.json
insertion at 0
The winner of The Voice US this year has not been announced yet, as this season is still ongoing.
[['11-15'], ['7-4'], ['19-15

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.14it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The winner of The Voice US this year has not been announced yet, as this season is still ongoing.



insertion at 0
The winner of The Voice US this year has not been announced yet, as this season is still ongoing.
[['11-15'], ['7-4'], ['19-15'], ['6-9'], ['12-26'], ['16-19'], ['12-2'], ['17-22'], ['21-30'], ['11-2'], ['16-1'], ['17-0'], ['24-29'], ['26-26'], ['26-27'], ['29-19'], ['31-24'], ['0-9'], ['8-26'], ['13-11']]
-- Test Summary -- 
Duration: 0.9 seconds
Context: 0 tokens
Depth: 0%
Score: 68.23204159736633
Response: The winner of The Voice US this year is Jake Hoot.

Writing at results/graph/llama-2-7b-80k_id_122_relevant_misleading/llama-2-7b-80k_id_122_relevant_misleading_len_0_depth_0_results.json
insertion at 0
The winner of The Voice US this year has not been announced yet, as this season is still ongoing.
[['11-

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.13it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The winner of The Voice US this year has not been announced yet, as this season is still ongoing.



insertion at 0
The winner of The Voice US this year has not been announced yet, as this season is still ongoing.
[['11-15'], ['7-4'], ['19-15'], ['6-9'], ['12-26'], ['16-19'], ['12-2'], ['17-22'], ['21-30'], ['11-2'], ['16-1'], ['17-0'], ['24-29'], ['26-26'], ['26-27'], ['29-19'], ['31-24'], ['0-9'], ['8-26'], ['13-11']]
-- Test Summary -- 
Duration: 0.9 seconds
Context: 0 tokens
Depth: 0%
Score: 68.23204159736633
Response: The winner of The Voice US this year is Jake Hoot.

Writing at results/graph/llama-2-7b-80k_id_122_irrelevant/llama-2-7b-80k_id_122_irrelevant_len_0_depth_0_results.json
insertion at 0
The winner of The Voice US this year has not been announced yet, as this season is still ongoing.
[['11-15'], ['7-4'], ['1

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.11it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The winner of The Voice US this year has not been announced yet, as this season is still ongoing.



insertion at 0
The winner of The Voice US this year has not been announced yet, as this season is still ongoing.
[['11-15'], ['7-4'], ['19-15'], ['6-9'], ['12-26'], ['16-19'], ['12-2'], ['17-22'], ['21-30'], ['11-2'], ['16-1'], ['17-0'], ['24-29'], ['26-26'], ['26-27'], ['29-19'], ['31-24'], ['0-9'], ['8-26'], ['13-11']]
-- Test Summary -- 
Duration: 0.9 seconds
Context: 0 tokens
Depth: 0%
Score: 68.23204159736633
Response: The winner of The Voice US this year is Jake Hoot.

Writing at results/graph/llama-2-7b-80k_id_122_irrelevant_misleading/llama-2-7b-80k_id_122_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
The winner of The Voice US this year has not been announced yet, as this season is still ongoing.
[[

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.13it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Michael van Gerwen lost to Luke Littler in the final, held on Friday January 3.



insertion at 0
Michael van Gerwen lost to Luke Littler in the final, held on Friday January 3.
-- Test Summary -- 
Duration: 0.4 seconds
Context: 0 tokens
Depth: 0%
Score: 48.097485303878784
Response: Luke Littler.

Writing at results/graph/llama-2-7b-80k_id_123_relevant/llama-2-7b-80k_id_123_relevant_len_0_depth_0_results.json
insertion at 0
Michael van Gerwen lost to Luke Littler in the final, held on Friday January 3.
-- Test Summary -- 
Duration: 0.4 seconds
Context: 0 tokens
Depth: 25%
Score: 48.097485303878784
Response: Luke Littler.

Writing at results/graph/llama-2-7b-80k_id_123_relevant/llama-2-7b-80k_id_123_relevant_len_0_depth_2500_results.json
insertion at 0
Michael van Gerwen lost to Luke Littler in the final, held on Friday Janua

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.05it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Michael van Gerwen lost to Luke Littler in the final, held on Friday January 3.



insertion at 0
Michael van Gerwen lost to Luke Littler in the final, held on Friday January 3.
-- Test Summary -- 
Duration: 0.4 seconds
Context: 0 tokens
Depth: 0%
Score: 48.097485303878784
Response: Luke Littler.

Writing at results/graph/llama-2-7b-80k_id_123_relevant_misleading/llama-2-7b-80k_id_123_relevant_misleading_len_0_depth_0_results.json
insertion at 0
Michael van Gerwen lost to Luke Littler in the final, held on Friday January 3.
-- Test Summary -- 
Duration: 0.4 seconds
Context: 0 tokens
Depth: 25%
Score: 48.097485303878784
Response: Luke Littler.

Writing at results/graph/llama-2-7b-80k_id_123_relevant_misleading/llama-2-7b-80k_id_123_relevant_misleading_len_0_depth_2500_results.json
insertion at 0
Michael van Gerwen lost to Luk

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.12it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Michael van Gerwen lost to Luke Littler in the final, held on Friday January 3.



insertion at 0
Michael van Gerwen lost to Luke Littler in the final, held on Friday January 3.
-- Test Summary -- 
Duration: 0.4 seconds
Context: 0 tokens
Depth: 0%
Score: 48.097485303878784
Response: Luke Littler.

Writing at results/graph/llama-2-7b-80k_id_123_irrelevant/llama-2-7b-80k_id_123_irrelevant_len_0_depth_0_results.json
insertion at 0
Michael van Gerwen lost to Luke Littler in the final, held on Friday January 3.
-- Test Summary -- 
Duration: 0.4 seconds
Context: 0 tokens
Depth: 25%
Score: 48.097485303878784
Response: Luke Littler.

Writing at results/graph/llama-2-7b-80k_id_123_irrelevant/llama-2-7b-80k_id_123_irrelevant_len_0_depth_2500_results.json
insertion at 0
Michael van Gerwen lost to Luke Littler in the final, held on Frid

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  1.74it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Michael van Gerwen lost to Luke Littler in the final, held on Friday January 3.



insertion at 0
Michael van Gerwen lost to Luke Littler in the final, held on Friday January 3.
-- Test Summary -- 
Duration: 0.4 seconds
Context: 0 tokens
Depth: 0%
Score: 48.097485303878784
Response: Luke Littler.

Writing at results/graph/llama-2-7b-80k_id_123_irrelevant_misleading/llama-2-7b-80k_id_123_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
Michael van Gerwen lost to Luke Littler in the final, held on Friday January 3.
-- Test Summary -- 
Duration: 0.4 seconds
Context: 0 tokens
Depth: 25%
Score: 48.097485303878784
Response: Luke Littler.

Writing at results/graph/llama-2-7b-80k_id_123_irrelevant_misleading/llama-2-7b-80k_id_123_irrelevant_misleading_len_0_depth_2500_results.json
insertion at 0
Michael van Gerwen los

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.15it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: This year's American Idol is still ongoing, and the final results have not been announced yet.



insertion at 0
This year's American Idol is still ongoing, and the final results have not been announced yet.
-- Test Summary -- 
Duration: 0.9 seconds
Context: 0 tokens
Depth: 0%
Score: 49.664050340652466
Response: The winner of American Idol this year was Carrie Underwood.

Writing at results/graph/llama-2-7b-80k_id_124_relevant/llama-2-7b-80k_id_124_relevant_len_0_depth_0_results.json
insertion at 0
This year's American Idol is still ongoing, and the final results have not been announced yet.
-- Test Summary -- 
Duration: 0.9 seconds
Context: 0 tokens
Depth: 25%
Score: 49.664050340652466
Response: The winner of American Idol this year was Carrie Underwood.

Writing at results/graph/llama-2-7b-80k_id_124_relevant/llama-2-7b-80

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.13it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: This year's American Idol is still ongoing, and the final results have not been announced yet.



insertion at 0
This year's American Idol is still ongoing, and the final results have not been announced yet.
-- Test Summary -- 
Duration: 0.9 seconds
Context: 0 tokens
Depth: 0%
Score: 49.664050340652466
Response: The winner of American Idol this year was Carrie Underwood.

Writing at results/graph/llama-2-7b-80k_id_124_relevant_misleading/llama-2-7b-80k_id_124_relevant_misleading_len_0_depth_0_results.json
insertion at 0
This year's American Idol is still ongoing, and the final results have not been announced yet.
-- Test Summary -- 
Duration: 0.9 seconds
Context: 0 tokens
Depth: 25%
Score: 49.664050340652466
Response: The winner of American Idol this year was Carrie Underwood.

Writing at results/graph/llama-2-7b-80k_id_124_

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.13it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: This year's American Idol is still ongoing, and the final results have not been announced yet.



insertion at 0
This year's American Idol is still ongoing, and the final results have not been announced yet.
-- Test Summary -- 
Duration: 0.9 seconds
Context: 0 tokens
Depth: 0%
Score: 49.664050340652466
Response: The winner of American Idol this year was Carrie Underwood.

Writing at results/graph/llama-2-7b-80k_id_124_irrelevant/llama-2-7b-80k_id_124_irrelevant_len_0_depth_0_results.json
insertion at 0
This year's American Idol is still ongoing, and the final results have not been announced yet.
-- Test Summary -- 
Duration: 0.9 seconds
Context: 0 tokens
Depth: 25%
Score: 49.664050340652466
Response: The winner of American Idol this year was Carrie Underwood.

Writing at results/graph/llama-2-7b-80k_id_124_irrelevant/llama-2

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.11it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: This year's American Idol is still ongoing, and the final results have not been announced yet.



insertion at 0
This year's American Idol is still ongoing, and the final results have not been announced yet.
-- Test Summary -- 
Duration: 0.9 seconds
Context: 0 tokens
Depth: 0%
Score: 49.664050340652466
Response: The winner of American Idol this year was Carrie Underwood.

Writing at results/graph/llama-2-7b-80k_id_124_irrelevant_misleading/llama-2-7b-80k_id_124_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
This year's American Idol is still ongoing, and the final results have not been announced yet.
-- Test Summary -- 
Duration: 0.9 seconds
Context: 0 tokens
Depth: 25%
Score: 49.664050340652466
Response: The winner of American Idol this year was Carrie Underwood.

Writing at results/graph/llama-2-7b-80k_id_

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.11it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: General elections, originally scheduled to be held in 2023, were held in Pakistan on 8 February 2024.



insertion at 0
General elections, originally scheduled to be held in 2023, were held in Pakistan on 8 February 2024.
[['16-19'], ['7-4'], ['12-26'], ['18-30'], ['6-9'], ['11-15'], ['16-24'], ['17-22'], ['8-26'], ['8-31'], ['11-17'], ['12-4'], ['14-18'], ['15-14'], ['17-0'], ['19-10'], ['19-15'], ['0-9'], ['6-30'], ['8-14']]
-- Test Summary -- 
Duration: 0.8 seconds
Context: 0 tokens
Depth: 0%
Score: 84.19649600982666
Response: Pakistan held its postponed general elections on 8 February 2024.

Writing at results/graph/llama-2-7b-80k_id_14_relevant/llama-2-7b-80k_id_14_relevant_len_0_depth_0_results.json
insertion at 0
General elections, originally scheduled to be held in 2023, were held in Pakistan on 8 February 2024.
[['1

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.13it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: General elections, originally scheduled to be held in 2023, were held in Pakistan on 8 February 2024.



insertion at 0
General elections, originally scheduled to be held in 2023, were held in Pakistan on 8 February 2024.
[['16-19'], ['7-4'], ['12-26'], ['18-30'], ['6-9'], ['11-15'], ['16-24'], ['17-22'], ['8-26'], ['8-31'], ['11-17'], ['12-4'], ['14-18'], ['15-14'], ['17-0'], ['19-10'], ['19-15'], ['0-9'], ['6-30'], ['8-14']]
-- Test Summary -- 
Duration: 0.8 seconds
Context: 0 tokens
Depth: 0%
Score: 84.19649600982666
Response: Pakistan held its postponed general elections on 8 February 2024.

Writing at results/graph/llama-2-7b-80k_id_14_relevant_misleading/llama-2-7b-80k_id_14_relevant_misleading_len_0_depth_0_results.json
insertion at 0
General elections, originally scheduled to be held in 2023, were held in Pakistan on

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.13it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: General elections, originally scheduled to be held in 2023, were held in Pakistan on 8 February 2024.



insertion at 0
General elections, originally scheduled to be held in 2023, were held in Pakistan on 8 February 2024.
[['16-19'], ['7-4'], ['12-26'], ['18-30'], ['6-9'], ['11-15'], ['16-24'], ['17-22'], ['8-26'], ['8-31'], ['11-17'], ['12-4'], ['14-18'], ['15-14'], ['17-0'], ['19-10'], ['19-15'], ['0-9'], ['6-30'], ['8-14']]
-- Test Summary -- 
Duration: 0.8 seconds
Context: 0 tokens
Depth: 0%
Score: 84.19649600982666
Response: Pakistan held its postponed general elections on 8 February 2024.

Writing at results/graph/llama-2-7b-80k_id_14_irrelevant/llama-2-7b-80k_id_14_irrelevant_len_0_depth_0_results.json
insertion at 0
General elections, originally scheduled to be held in 2023, were held in Pakistan on 8 February 2024.


Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.13it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: General elections, originally scheduled to be held in 2023, were held in Pakistan on 8 February 2024.



insertion at 0
General elections, originally scheduled to be held in 2023, were held in Pakistan on 8 February 2024.
[['16-19'], ['7-4'], ['12-26'], ['18-30'], ['6-9'], ['11-15'], ['16-24'], ['17-22'], ['8-26'], ['8-31'], ['11-17'], ['12-4'], ['14-18'], ['15-14'], ['17-0'], ['19-10'], ['19-15'], ['0-9'], ['6-30'], ['8-14']]
-- Test Summary -- 
Duration: 0.8 seconds
Context: 0 tokens
Depth: 0%
Score: 84.19649600982666
Response: Pakistan held its postponed general elections on 8 February 2024.

Writing at results/graph/llama-2-7b-80k_id_14_irrelevant_misleading/llama-2-7b-80k_id_14_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
General elections, originally scheduled to be held in 2023, were held in Pakista

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.11it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Prabowo won Indonesia's 2024 presidential election.



insertion at 0
Prabowo won Indonesia's 2024 presidential election.
[['6-9'], ['11-15'], ['24-29'], ['7-4'], ['16-19'], ['19-15'], ['21-30'], ['12-26'], ['6-30'], ['12-2'], ['13-11'], ['24-3'], ['24-11'], ['8-26'], ['17-22'], ['22-8'], ['22-27'], ['26-26'], ['26-28'], ['29-26']]
-- Test Summary -- 
Duration: 0.9 seconds
Context: 0 tokens
Depth: 0%
Score: 92.97641515731812
Response: Prabowo Subianto won Indonesia's 2024 presidential election.

Writing at results/graph/llama-2-7b-80k_id_15_relevant/llama-2-7b-80k_id_15_relevant_len_0_depth_0_results.json
insertion at 0
Prabowo won Indonesia's 2024 presidential election.
[['6-9'], ['11-15'], ['24-29'], ['7-4'], ['16-19'], ['19-15'], ['21-30'], ['12-26'], ['6-30'], ['12-2'], ['13-11'], ['24-3'], ['24-11'], ['8-26'], ['17-22']

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.11it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Prabowo won Indonesia's 2024 presidential election.



insertion at 0
Prabowo won Indonesia's 2024 presidential election.
[['6-9'], ['11-15'], ['24-29'], ['7-4'], ['16-19'], ['19-15'], ['21-30'], ['12-26'], ['6-30'], ['12-2'], ['13-11'], ['24-3'], ['24-11'], ['8-26'], ['17-22'], ['22-8'], ['22-27'], ['26-26'], ['26-28'], ['29-26']]
-- Test Summary -- 
Duration: 0.9 seconds
Context: 0 tokens
Depth: 0%
Score: 92.97641515731812
Response: Prabowo Subianto won Indonesia's 2024 presidential election.

Writing at results/graph/llama-2-7b-80k_id_15_relevant_misleading/llama-2-7b-80k_id_15_relevant_misleading_len_0_depth_0_results.json
insertion at 0
Prabowo won Indonesia's 2024 presidential election.
[['6-9'], ['11-15'], ['24-29'], ['7-4'], ['16-19'], ['19-15'], ['21-30'], ['12-26'], ['6-30'], ['12-2'], ['13-11'], ['24-3'], ['24-11'

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.12it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Prabowo won Indonesia's 2024 presidential election.



insertion at 0
Prabowo won Indonesia's 2024 presidential election.
[['6-9'], ['11-15'], ['24-29'], ['7-4'], ['16-19'], ['19-15'], ['21-30'], ['12-26'], ['6-30'], ['12-2'], ['13-11'], ['24-3'], ['24-11'], ['8-26'], ['17-22'], ['22-8'], ['22-27'], ['26-26'], ['26-28'], ['29-26']]
-- Test Summary -- 
Duration: 0.9 seconds
Context: 0 tokens
Depth: 0%
Score: 92.97641515731812
Response: Prabowo Subianto won Indonesia's 2024 presidential election.

Writing at results/graph/llama-2-7b-80k_id_15_irrelevant/llama-2-7b-80k_id_15_irrelevant_len_0_depth_0_results.json
insertion at 0
Prabowo won Indonesia's 2024 presidential election.
[['6-9'], ['11-15'], ['24-29'], ['7-4'], ['16-19'], ['19-15'], ['21-30'], ['12-26'], ['6-30'], ['12-2'], ['13-11'], ['24-3'], ['24-11'], ['8-26'], ['17-

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.12it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Prabowo won Indonesia's 2024 presidential election.



insertion at 0
Prabowo won Indonesia's 2024 presidential election.
[['6-9'], ['11-15'], ['24-29'], ['7-4'], ['16-19'], ['19-15'], ['21-30'], ['12-26'], ['6-30'], ['12-2'], ['13-11'], ['24-3'], ['24-11'], ['8-26'], ['17-22'], ['22-8'], ['22-27'], ['26-26'], ['26-28'], ['29-26']]
-- Test Summary -- 
Duration: 0.9 seconds
Context: 0 tokens
Depth: 0%
Score: 92.97641515731812
Response: Prabowo Subianto won Indonesia's 2024 presidential election.

Writing at results/graph/llama-2-7b-80k_id_15_irrelevant_misleading/llama-2-7b-80k_id_15_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
Prabowo won Indonesia's 2024 presidential election.
[['6-9'], ['11-15'], ['24-29'], ['7-4'], ['16-19'], ['19-15'], ['21-30'], ['12-26'], ['6-30'], ['12-2'], ['13-11'], ['24-3'], ['24

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.11it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The most recent player to win both the PDC World Youth Championship and the PDC World Darts Championship is Luke Humphries, previously known as Luke Littler during his youth career.



insertion at 0
The most recent player to win both the PDC World Youth Championship and the PDC World Darts Championship is Luke Humphries, previously known as Luke Littler during his youth career.
[['16-19'], ['8-26'], ['11-15'], ['6-9'], ['7-4'], ['19-15'], ['17-22'], ['21-30'], ['14-15'], ['6-30'], ['11-2'], ['15-14'], ['24-29'], ['14-18'], ['17-0'], ['18-30'], ['19-19'], ['26-28'], ['31-16'], ['10-18']]
-- Test Summary -- 
Duration: 1.1 seconds
Context: 0 tokens
Depth: 0%
Score: 60.90353727340698
Response: Luke Humphries, previously known as Luke Littler during his youth career.

Writing at results/graph/llama-2-7b-80k_id_155_relevant/llama

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.13it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The most recent player to win both the PDC World Youth Championship and the PDC World Darts Championship is Luke Humphries, previously known as Luke Littler during his youth career.



insertion at 0
The most recent player to win both the PDC World Youth Championship and the PDC World Darts Championship is Luke Humphries, previously known as Luke Littler during his youth career.
[['16-19'], ['8-26'], ['11-15'], ['6-9'], ['7-4'], ['19-15'], ['17-22'], ['21-30'], ['14-15'], ['6-30'], ['11-2'], ['15-14'], ['24-29'], ['14-18'], ['17-0'], ['18-30'], ['19-19'], ['26-28'], ['31-16'], ['10-18']]
-- Test Summary -- 
Duration: 1.1 seconds
Context: 0 tokens
Depth: 0%
Score: 60.90353727340698
Response: Luke Humphries, previously known as Luke Littler during his youth career.

Writing at results/graph/llama-2-7b-80k_id_155_relevant_misle

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.17it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The most recent player to win both the PDC World Youth Championship and the PDC World Darts Championship is Luke Humphries, previously known as Luke Littler during his youth career.



insertion at 0
The most recent player to win both the PDC World Youth Championship and the PDC World Darts Championship is Luke Humphries, previously known as Luke Littler during his youth career.
[['16-19'], ['8-26'], ['11-15'], ['6-9'], ['7-4'], ['19-15'], ['17-22'], ['21-30'], ['14-15'], ['6-30'], ['11-2'], ['15-14'], ['24-29'], ['14-18'], ['17-0'], ['18-30'], ['19-19'], ['26-28'], ['31-16'], ['10-18']]
-- Test Summary -- 
Duration: 1.1 seconds
Context: 0 tokens
Depth: 0%
Score: 60.90353727340698
Response: Luke Humphries, previously known as Luke Littler during his youth career.

Writing at results/graph/llama-2-7b-80k_id_155_irrelevant/lla

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.15it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The most recent player to win both the PDC World Youth Championship and the PDC World Darts Championship is Luke Humphries, previously known as Luke Littler during his youth career.



insertion at 0
The most recent player to win both the PDC World Youth Championship and the PDC World Darts Championship is Luke Humphries, previously known as Luke Littler during his youth career.
[['16-19'], ['8-26'], ['11-15'], ['6-9'], ['7-4'], ['19-15'], ['17-22'], ['21-30'], ['14-15'], ['6-30'], ['11-2'], ['15-14'], ['24-29'], ['14-18'], ['17-0'], ['18-30'], ['19-19'], ['26-28'], ['31-16'], ['10-18']]
-- Test Summary -- 
Duration: 1.1 seconds
Context: 0 tokens
Depth: 0%
Score: 60.90353727340698
Response: Luke Humphries, previously known as Luke Littler during his youth career.

Writing at results/graph/llama-2-7b-80k_id_155_irrelevant_mis

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.12it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Sheinbaum won the presidential election in 2024, becoming the first woman to be elected president of Mexico.



insertion at 0
Sheinbaum won the presidential election in 2024, becoming the first woman to be elected president of Mexico.
[['16-19'], ['11-15'], ['7-4'], ['6-9'], ['8-26'], ['17-22'], ['19-15'], ['11-2'], ['15-14'], ['21-30'], ['18-30'], ['14-15'], ['24-29'], ['29-19'], ['6-30'], ['12-26'], ['24-30'], ['14-18'], ['30-14'], ['12-2']]
-- Test Summary -- 
Duration: 1.6 seconds
Context: 0 tokens
Depth: 0%
Score: 99.99998807907104
Response: Sheinbaum won the presidential election in 2024, becoming the first woman to be elected president of Mexico.

Writing at results/graph/llama-2-7b-80k_id_16_relevant/llama-2-7b-80k_id_16_relevant_len_0_depth_0_results.json
insertion at 0
Sheinbaum won the presidential election in 20

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.12it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Sheinbaum won the presidential election in 2024, becoming the first woman to be elected president of Mexico.



insertion at 0
Sheinbaum won the presidential election in 2024, becoming the first woman to be elected president of Mexico.
[['16-19'], ['11-15'], ['7-4'], ['6-9'], ['8-26'], ['17-22'], ['19-15'], ['11-2'], ['15-14'], ['21-30'], ['18-30'], ['14-15'], ['24-29'], ['29-19'], ['6-30'], ['12-26'], ['24-30'], ['14-18'], ['30-14'], ['12-2']]
-- Test Summary -- 
Duration: 1.6 seconds
Context: 0 tokens
Depth: 0%
Score: 99.99998807907104
Response: Sheinbaum won the presidential election in 2024, becoming the first woman to be elected president of Mexico.

Writing at results/graph/llama-2-7b-80k_id_16_relevant_misleading/llama-2-7b-80k_id_16_relevant_misleading_len_0_depth_0_results.json
insertion at 0
Sheinbaum won the presi

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.10it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Sheinbaum won the presidential election in 2024, becoming the first woman to be elected president of Mexico.



insertion at 0
Sheinbaum won the presidential election in 2024, becoming the first woman to be elected president of Mexico.
[['16-19'], ['11-15'], ['7-4'], ['6-9'], ['8-26'], ['17-22'], ['19-15'], ['11-2'], ['15-14'], ['21-30'], ['18-30'], ['14-15'], ['24-29'], ['29-19'], ['6-30'], ['12-26'], ['24-30'], ['14-18'], ['30-14'], ['12-2']]
-- Test Summary -- 
Duration: 1.6 seconds
Context: 0 tokens
Depth: 0%
Score: 99.99998807907104
Response: Sheinbaum won the presidential election in 2024, becoming the first woman to be elected president of Mexico.

Writing at results/graph/llama-2-7b-80k_id_16_irrelevant/llama-2-7b-80k_id_16_irrelevant_len_0_depth_0_results.json
insertion at 0
Sheinbaum won the presidential election i

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.10it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Sheinbaum won the presidential election in 2024, becoming the first woman to be elected president of Mexico.



insertion at 0
Sheinbaum won the presidential election in 2024, becoming the first woman to be elected president of Mexico.
[['16-19'], ['11-15'], ['7-4'], ['6-9'], ['8-26'], ['17-22'], ['19-15'], ['11-2'], ['15-14'], ['21-30'], ['18-30'], ['14-15'], ['24-29'], ['29-19'], ['6-30'], ['12-26'], ['24-30'], ['14-18'], ['30-14'], ['12-2']]
-- Test Summary -- 
Duration: 1.6 seconds
Context: 0 tokens
Depth: 0%
Score: 99.99998807907104
Response: Sheinbaum won the presidential election in 2024, becoming the first woman to be elected president of Mexico.

Writing at results/graph/llama-2-7b-80k_id_16_irrelevant_misleading/llama-2-7b-80k_id_16_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
Sheinbaum won the p

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.14it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The President of the United States is Donald Trump.



insertion at 0
The President of the United States is Donald Trump.
[['14-18'], ['16-19'], ['16-24'], ['18-30'], ['19-15'], ['21-30'], ['24-3'], ['31-16'], ['0-1'], ['0-19'], ['0-27'], ['1-0'], ['1-1'], ['1-3'], ['1-4'], ['1-5'], ['1-11'], ['1-13'], ['2-2'], ['2-4']]
-- Test Summary -- 
Duration: 0.2 seconds
Context: 0 tokens
Depth: 0%
Score: 64.88415002822876
Response: Donald Trump.

Writing at results/graph/llama-2-7b-80k_id_160_relevant/llama-2-7b-80k_id_160_relevant_len_0_depth_0_results.json
insertion at 0
The President of the United States is Donald Trump.
[['14-18'], ['16-19'], ['16-24'], ['18-30'], ['19-15'], ['21-30'], ['24-3'], ['31-16'], ['0-1'], ['0-19'], ['0-27'], ['1-0'], ['1-1'], ['1-3'], ['1-4'], ['1-5'], ['1-11'], ['1-13'], ['2-2'], ['2-4']]
-- Test Summa

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.13it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The President of the United States is Donald Trump.



insertion at 0
The President of the United States is Donald Trump.
[['14-18'], ['16-19'], ['16-24'], ['18-30'], ['19-15'], ['21-30'], ['24-3'], ['31-16'], ['0-1'], ['0-19'], ['0-27'], ['1-0'], ['1-1'], ['1-3'], ['1-4'], ['1-5'], ['1-11'], ['1-13'], ['2-2'], ['2-4']]
-- Test Summary -- 
Duration: 0.2 seconds
Context: 0 tokens
Depth: 0%
Score: 64.88415002822876
Response: Donald Trump.

Writing at results/graph/llama-2-7b-80k_id_160_relevant_misleading/llama-2-7b-80k_id_160_relevant_misleading_len_0_depth_0_results.json
insertion at 0
The President of the United States is Donald Trump.
[['14-18'], ['16-19'], ['16-24'], ['18-30'], ['19-15'], ['21-30'], ['24-3'], ['31-16'], ['0-1'], ['0-19'], ['0-27'], ['1-0'], ['1-1'], ['1-3'], ['1-4'], ['1-5'], ['1-11'], ['1-13'], ['2-2'], 

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.12it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The President of the United States is Donald Trump.



insertion at 0
The President of the United States is Donald Trump.
[['14-18'], ['16-19'], ['16-24'], ['18-30'], ['19-15'], ['21-30'], ['24-3'], ['31-16'], ['0-1'], ['0-19'], ['0-27'], ['1-0'], ['1-1'], ['1-3'], ['1-4'], ['1-5'], ['1-11'], ['1-13'], ['2-2'], ['2-4']]
-- Test Summary -- 
Duration: 0.2 seconds
Context: 0 tokens
Depth: 0%
Score: 64.88415002822876
Response: Donald Trump.

Writing at results/graph/llama-2-7b-80k_id_160_irrelevant/llama-2-7b-80k_id_160_irrelevant_len_0_depth_0_results.json
insertion at 0
The President of the United States is Donald Trump.
[['14-18'], ['16-19'], ['16-24'], ['18-30'], ['19-15'], ['21-30'], ['24-3'], ['31-16'], ['0-1'], ['0-19'], ['0-27'], ['1-0'], ['1-1'], ['1-3'], ['1-4'], ['1-5'], ['1-11'], ['1-13'], ['2-2'], ['2-4']]
-- Test S

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.12it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The President of the United States is Donald Trump.



insertion at 0
The President of the United States is Donald Trump.
[['14-18'], ['16-19'], ['16-24'], ['18-30'], ['19-15'], ['21-30'], ['24-3'], ['31-16'], ['0-1'], ['0-19'], ['0-27'], ['1-0'], ['1-1'], ['1-3'], ['1-4'], ['1-5'], ['1-11'], ['1-13'], ['2-2'], ['2-4']]
-- Test Summary -- 
Duration: 0.2 seconds
Context: 0 tokens
Depth: 0%
Score: 64.88415002822876
Response: Donald Trump.

Writing at results/graph/llama-2-7b-80k_id_160_irrelevant_misleading/llama-2-7b-80k_id_160_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
The President of the United States is Donald Trump.
[['14-18'], ['16-19'], ['16-24'], ['18-30'], ['19-15'], ['21-30'], ['24-3'], ['31-16'], ['0-1'], ['0-19'], ['0-27'], ['1-0'], ['1-1'], ['1-3'], ['1-4'], ['1-5'], ['1-11'], ['1-13'], ['2-2

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.11it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: A total of 11 asteroids have been discovered before they impacted Earth.



insertion at 0
A total of 11 asteroids have been discovered before they impacted Earth.
[['6-9'], ['11-15'], ['7-4'], ['16-19'], ['8-26'], ['12-26'], ['21-30'], ['17-22'], ['19-15'], ['11-2'], ['15-14'], ['17-0'], ['24-29'], ['13-11'], ['14-18'], ['16-24'], ['7-13'], ['8-31'], ['12-16'], ['14-13']]
-- Test Summary -- 
Duration: 0.7 seconds
Context: 0 tokens
Depth: 0%
Score: 97.88687229156494
Response: 11 asteroids have been discovered before they impacted Earth.

Writing at results/graph/llama-2-7b-80k_id_163_relevant/llama-2-7b-80k_id_163_relevant_len_0_depth_0_results.json
insertion at 0
A total of 11 asteroids have been discovered before they impacted Earth.
[['6-9'], ['11-15'], ['7-4'], ['16-19'], ['8-26'], ['12-26'], ['21-30'], ['17-22'], ['19-1

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.13it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: A total of 11 asteroids have been discovered before they impacted Earth.



insertion at 0
A total of 11 asteroids have been discovered before they impacted Earth.
[['6-9'], ['11-15'], ['7-4'], ['16-19'], ['8-26'], ['12-26'], ['21-30'], ['17-22'], ['19-15'], ['11-2'], ['15-14'], ['17-0'], ['24-29'], ['13-11'], ['14-18'], ['16-24'], ['7-13'], ['8-31'], ['12-16'], ['14-13']]
-- Test Summary -- 
Duration: 0.7 seconds
Context: 0 tokens
Depth: 0%
Score: 97.88687229156494
Response: 11 asteroids have been discovered before they impacted Earth.

Writing at results/graph/llama-2-7b-80k_id_163_relevant_misleading/llama-2-7b-80k_id_163_relevant_misleading_len_0_depth_0_results.json
insertion at 0
A total of 11 asteroids have been discovered before they impacted Earth.
[['6-9'], ['11-15'], ['7-4'], ['16-19'], ['8-26'], ['12-26'], ['21-3

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.12it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: A total of 11 asteroids have been discovered before they impacted Earth.



insertion at 0
A total of 11 asteroids have been discovered before they impacted Earth.
[['6-9'], ['11-15'], ['7-4'], ['16-19'], ['8-26'], ['12-26'], ['21-30'], ['17-22'], ['19-15'], ['11-2'], ['15-14'], ['17-0'], ['24-29'], ['13-11'], ['14-18'], ['16-24'], ['7-13'], ['8-31'], ['12-16'], ['14-13']]
-- Test Summary -- 
Duration: 0.7 seconds
Context: 0 tokens
Depth: 0%
Score: 97.88687229156494
Response: 11 asteroids have been discovered before they impacted Earth.

Writing at results/graph/llama-2-7b-80k_id_163_irrelevant/llama-2-7b-80k_id_163_irrelevant_len_0_depth_0_results.json
insertion at 0
A total of 11 asteroids have been discovered before they impacted Earth.
[['6-9'], ['11-15'], ['7-4'], ['16-19'], ['8-26'], ['12-26'], ['21-30'], ['17-22'], ['

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.12it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: A total of 11 asteroids have been discovered before they impacted Earth.



insertion at 0
A total of 11 asteroids have been discovered before they impacted Earth.
[['6-9'], ['11-15'], ['7-4'], ['16-19'], ['8-26'], ['12-26'], ['21-30'], ['17-22'], ['19-15'], ['11-2'], ['15-14'], ['17-0'], ['24-29'], ['13-11'], ['14-18'], ['16-24'], ['7-13'], ['8-31'], ['12-16'], ['14-13']]
-- Test Summary -- 
Duration: 0.7 seconds
Context: 0 tokens
Depth: 0%
Score: 97.88687229156494
Response: 11 asteroids have been discovered before they impacted Earth.

Writing at results/graph/llama-2-7b-80k_id_163_irrelevant_misleading/llama-2-7b-80k_id_163_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
A total of 11 asteroids have been discovered before they impacted Earth.
[['6-9'], ['11-15'], ['7-4'], ['16-19'], ['8-26'], ['12-26'], ['

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.11it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Bulgaria and Romania are the most recent member states of the Schengen Area.



insertion at 0
Bulgaria and Romania are the most recent member states of the Schengen Area.
[['11-15'], ['6-9'], ['7-4'], ['16-19'], ['8-26'], ['17-22'], ['21-30'], ['31-16'], ['11-2'], ['13-11'], ['19-15'], ['21-5'], ['12-26'], ['24-3'], ['24-29'], ['0-11'], ['1-26'], ['6-30'], ['12-16'], ['14-15']]
-- Test Summary -- 
Duration: 0.8 seconds
Context: 0 tokens
Depth: 0%
Score: 100.00002384185791
Response: Bulgaria and Romania are the most recent member states of the Schengen Area.

Writing at results/graph/llama-2-7b-80k_id_164_relevant/llama-2-7b-80k_id_164_relevant_len_0_depth_0_results.json
insertion at 0
Bulgaria and Romania are the most recent member states of the Schengen Area.
[['11-15'], ['6-9'], ['7-4'], ['16-19'], ['8-26'], ['17-22'], ['

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.12it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Bulgaria and Romania are the most recent member states of the Schengen Area.



insertion at 0
Bulgaria and Romania are the most recent member states of the Schengen Area.
[['11-15'], ['6-9'], ['7-4'], ['16-19'], ['8-26'], ['17-22'], ['21-30'], ['31-16'], ['11-2'], ['13-11'], ['19-15'], ['21-5'], ['12-26'], ['24-3'], ['24-29'], ['0-11'], ['1-26'], ['6-30'], ['12-16'], ['14-15']]
-- Test Summary -- 
Duration: 0.8 seconds
Context: 0 tokens
Depth: 0%
Score: 100.00002384185791
Response: Bulgaria and Romania are the most recent member states of the Schengen Area.

Writing at results/graph/llama-2-7b-80k_id_164_relevant_misleading/llama-2-7b-80k_id_164_relevant_misleading_len_0_depth_0_results.json
insertion at 0
Bulgaria and Romania are the most recent member states of the Schengen Area.
[['11-15'], ['6-9'], ['7-4'], ['16-19'], [

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.09it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Bulgaria and Romania are the most recent member states of the Schengen Area.



insertion at 0
Bulgaria and Romania are the most recent member states of the Schengen Area.
[['11-15'], ['6-9'], ['7-4'], ['16-19'], ['8-26'], ['17-22'], ['21-30'], ['31-16'], ['11-2'], ['13-11'], ['19-15'], ['21-5'], ['12-26'], ['24-3'], ['24-29'], ['0-11'], ['1-26'], ['6-30'], ['12-16'], ['14-15']]
-- Test Summary -- 
Duration: 0.8 seconds
Context: 0 tokens
Depth: 0%
Score: 100.00002384185791
Response: Bulgaria and Romania are the most recent member states of the Schengen Area.

Writing at results/graph/llama-2-7b-80k_id_164_irrelevant/llama-2-7b-80k_id_164_irrelevant_len_0_depth_0_results.json
insertion at 0
Bulgaria and Romania are the most recent member states of the Schengen Area.
[['11-15'], ['6-9'], ['7-4'], ['16-19'], ['8-26'], ['17-22']

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.13it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Bulgaria and Romania are the most recent member states of the Schengen Area.



insertion at 0
Bulgaria and Romania are the most recent member states of the Schengen Area.
[['11-15'], ['6-9'], ['7-4'], ['16-19'], ['8-26'], ['17-22'], ['21-30'], ['31-16'], ['11-2'], ['13-11'], ['19-15'], ['21-5'], ['12-26'], ['24-3'], ['24-29'], ['0-11'], ['1-26'], ['6-30'], ['12-16'], ['14-15']]
-- Test Summary -- 
Duration: 0.8 seconds
Context: 0 tokens
Depth: 0%
Score: 100.00002384185791
Response: Bulgaria and Romania are the most recent member states of the Schengen Area.

Writing at results/graph/llama-2-7b-80k_id_164_irrelevant_misleading/llama-2-7b-80k_id_164_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
Bulgaria and Romania are the most recent member states of the Schengen Area.
[['11-15'], ['6-9'], ['7-4'], ['16-19'

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.12it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Keir Starmer



insertion at 0
Keir Starmer
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 0%
Score: 20.884969830513
Response: Jeremy Corbyn.

Writing at results/graph/llama-2-7b-80k_id_167_relevant/llama-2-7b-80k_id_167_relevant_len_0_depth_0_results.json
insertion at 0
Keir Starmer
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 25%
Score: 20.884969830513
Response: Jeremy Corbyn.

Writing at results/graph/llama-2-7b-80k_id_167_relevant/llama-2-7b-80k_id_167_relevant_len_0_depth_2500_results.json
insertion at 0
Keir Starmer
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 50%
Score: 20.884969830513
Response: Jeremy Corbyn.

Writing at results/graph/llama-2-7b-80k_id_167_relevant/llama-2-7b-80k_id_167_relevant_len_0_depth_5000_results.json
insertion at 0
Keir Starmer
-

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.16it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Keir Starmer



insertion at 0
Keir Starmer
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 0%
Score: 20.884969830513
Response: Jeremy Corbyn.

Writing at results/graph/llama-2-7b-80k_id_167_relevant_misleading/llama-2-7b-80k_id_167_relevant_misleading_len_0_depth_0_results.json
insertion at 0
Keir Starmer
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 25%
Score: 20.884969830513
Response: Jeremy Corbyn.

Writing at results/graph/llama-2-7b-80k_id_167_relevant_misleading/llama-2-7b-80k_id_167_relevant_misleading_len_0_depth_2500_results.json
insertion at 0
Keir Starmer
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 50%
Score: 20.884969830513
Response: Jeremy Corbyn.

Writing at results/graph/llama-2-7b-80k_id_167_relevant_misleading/llama-2-7b-80k_id_167_relevant_misl

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.13it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Keir Starmer



insertion at 0
Keir Starmer
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 0%
Score: 20.884969830513
Response: Jeremy Corbyn.

Writing at results/graph/llama-2-7b-80k_id_167_irrelevant/llama-2-7b-80k_id_167_irrelevant_len_0_depth_0_results.json
insertion at 0
Keir Starmer
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 25%
Score: 20.884969830513
Response: Jeremy Corbyn.

Writing at results/graph/llama-2-7b-80k_id_167_irrelevant/llama-2-7b-80k_id_167_irrelevant_len_0_depth_2500_results.json
insertion at 0
Keir Starmer
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 50%
Score: 20.884969830513
Response: Jeremy Corbyn.

Writing at results/graph/llama-2-7b-80k_id_167_irrelevant/llama-2-7b-80k_id_167_irrelevant_len_0_depth_5000_results.json
insertion at 0
Ke

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.12it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Keir Starmer



insertion at 0
Keir Starmer
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 0%
Score: 20.884969830513
Response: Jeremy Corbyn.

Writing at results/graph/llama-2-7b-80k_id_167_irrelevant_misleading/llama-2-7b-80k_id_167_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
Keir Starmer
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 25%
Score: 20.884969830513
Response: Jeremy Corbyn.

Writing at results/graph/llama-2-7b-80k_id_167_irrelevant_misleading/llama-2-7b-80k_id_167_irrelevant_misleading_len_0_depth_2500_results.json
insertion at 0
Keir Starmer
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 50%
Score: 20.884969830513
Response: Jeremy Corbyn.

Writing at results/graph/llama-2-7b-80k_id_167_irrelevant_misleading/llama-2-7b-80k_id_167_irr

In [9]:
# Example 1: Relevant context without misleading statements
# args = get_args(id=id_val, context_type="relevant", with_misleading=False)
# run_test(args)

In [10]:
# Example 2: Irrelevant context without misleading statements
# args = get_args(id=id_val, context_type="irrelevant", with_misleading=False)
# run_test(args)

In [11]:
# Exmample 3: Relevant context with misleading statements
# args = get_args(id=id_val, context_type="relevant", with_misleading=True)
# run_test(args)

In [12]:
# Example 4: Irrelevant context with misleading statements
# args = get_args(id=id_val, context_type="irrelevant", with_misleading=True)
# run_test(args)