# Notebook Overview

This notebook provides an faster testing interface. 

It includes steps to 
 - Check GPU availability
 - Set up the environment, 
 - Define helper functions,
 - Run tests with a given **id** and **context_type**. 

The results are saved for further analysis.

*Make sure to select the created `venv` as your kernel.*

If there are any errors while running the tests, restart the the notebook, and run again.

## Check GPU Availability
Run this cell to check if there is free space on the gpu. 

If the **free space** is not close to the **total space** then someone else is probably using the machine. 

You can check the active processes bu running `nvidia-smi` in the terminal.

In [1]:
import torch

def get_gpu_memory_torch():
    if torch.cuda.is_available():
        for i in range(torch.cuda.device_count()):
            free_mem = torch.cuda.mem_get_info(i)[0] / 1024**3
            total_mem = torch.cuda.get_device_properties(i).total_memory / 1024**3
            print(f"GPU {i}: {free_mem:.4f} GiB free / {total_mem:.4f} GiB total")
    else:
        print("No CUDA-compatible GPU detected.")

get_gpu_memory_torch()

GPU 0: 23.2003 GiB free / 23.5746 GiB total


## Setup
The code below imports necessary modules, defines directory paths, and sets the Hugging Face cache path.

**No need to modify this.**

In [2]:
import os
import sys
import json
from pathlib import Path

PROJECT_DIR = Path.cwd()
HAYSTACK_DIR = PROJECT_DIR / "haystack"
RELEVANT_DIR = HAYSTACK_DIR / "relevant"
IRRELAVANT_DIR = HAYSTACK_DIR / "irrelevant"
MISLEADING_IN_RELEVANT_DIR = HAYSTACK_DIR / "misleading_in_relevant"
MISLEADING_IN_IRRELEVANT_DIR = HAYSTACK_DIR / "misleading_in_irrelevant"

sys.path.append(str(PROJECT_DIR))

with open(HAYSTACK_DIR / "needles.json", "r") as f:
    NEEDLES_DATA = json.load(f)

# This sets the Hugging Face cache path. Make sure this directory exists. If not, refer to the README.
os.environ['HF_HOME'] = '.cache/hf_with_quota'

## Helper Functions

In [3]:
from transformers import AutoTokenizer

def load_text(path):
    with open(path, 'r', encoding='utf-8') as f:
        return f.read()

def insert_strings_at_sentence_breaks(text, insert_strings, context_length, tokenizer):
    # Step 1: Tokenize and truncate
    tokens = tokenizer.encode(text, add_special_tokens=False)[:context_length]
    
    # Step 2: Detect sentence breaks by decoding each token
    sentence_break_indices = [
        i for i, tok in enumerate(tokens)
        if tokenizer.decode([tok]).strip().endswith((".", "!", "?"))
    ]

    if len(sentence_break_indices) < len(insert_strings):
        raise ValueError(f"Only {len(sentence_break_indices)} sentence breaks found, "
                         f"but {len(insert_strings)} insertions requested.")

    # Step 3: Compute evenly spaced sentence break positions
    step = len(sentence_break_indices) // (len(insert_strings) + 1)
    insert_positions = [sentence_break_indices[(i + 1) * step] + 1 for i in range(len(insert_strings))]

    # Step 4: Insert insert_strings as tokens
    for pos, insert_str in reversed(list(zip(insert_positions, insert_strings))):
        insert_tokens = tokenizer.encode(insert_str, add_special_tokens=False)
        tokens[pos:pos] = insert_tokens

    return tokenizer.decode(tokens)


def insert_misleading_statements(filepath, insert_strings, context_length, output_path):
    """
    Insert misleading statements into the text at sentence breaks.
    Evenly distribute the misleading statements across the text within the context length.
    """
    tokenizer = AutoTokenizer.from_pretrained("yaofu/llama-2-7b-80k")

    full_text = load_text(filepath)
    modified_text = insert_strings_at_sentence_breaks(full_text, insert_strings, context_length, tokenizer)

    with open(output_path, 'w', encoding='utf-8') as f:
        f.write(modified_text)

    print(f"✅ Output saved to {output_path}")


  from .autonotebook import tqdm as notebook_tqdm


In [4]:
from argparse import Namespace

def get_haystack_file(item, context_type, with_misleading, context_length):
    """
    Get the haystack file for the given item and context type.
    Arguments:
        item (dict): The item to process.
        context_type (str): The type of context ("relevant" or "irrelevant").
        with_misleading (bool): Whether to include misleading information.
    Returns:
        str: The path to the haystack file.
    """

    if context_type == "relevant":
        original_dir = RELEVANT_DIR
        original_file = item["context_relevant"]  # Original file without misleading info
        misleading_dir = MISLEADING_IN_RELEVANT_DIR  # Output dir for the misleading file
    elif context_type == "irrelevant":
        original_dir = IRRELAVANT_DIR
        original_file = item["context_irrelevant"]
        misleading_dir = MISLEADING_IN_IRRELEVANT_DIR
    else:
        raise ValueError(f"Unknown context_type: {context_type}")

    if with_misleading:
        haystack_file = misleading_dir / original_file
        insert_misleading_statements(
            filepath=original_dir / original_file,           # Path to the original context file
            insert_strings=item["statements_misleading"],    # List of misleading statements
            context_length=context_length,                   # Max context length for insertion
            output_path=haystack_file                        # Output path for the processed file
        )
    else:
        haystack_file = original_dir / original_file

    return haystack_file

def get_args(id, context_type, with_misleading, data=NEEDLES_DATA):
    """
    Get the arguments for the given id and context type.
    Arguments:
        id (int): The id of the item.
        context_type (str): The type of context ("relevant" or "irrelevant").
        with_misleading (bool): Whether to include misleading information.
        data (list): The list of data items.
    Returns:
        Namespace: The arguments for the given id and context type.
    """

    context_length_max = 5000
    
    # find the item with the given id
    item = next(item for item in data if item["id"] == id)

    # get the haystack file for the given item and context type
    haystack_file = get_haystack_file(item, context_type, with_misleading, context_length_max)

    args = Namespace(
        model_name = "yaofu/llama-2-7b-80k",
        model_name_suffix = ( f"id_{item['id']}_{context_type}" if not with_misleading 
                             else f"id_{item['id']}_{context_type}_misleading"), # suffix used to name the results files,
        model_provider = "LLaMA",

        context_lengths_min = 0, # min context length
        context_lengths_max = context_length_max, # max context length
        context_lengths_num_intervals = 5, # number of intervals for context lengths

        document_depth_percent_intervals = 5, # number of intervals for document depth

        needle = item["needle_refined"],
        real_needle = item["real_needle_refined"],
        retrieval_question = item["question_refined"],
        haystack_file = haystack_file,
    )

    return args

In [5]:
import gc
import torch
from retrieval_head_detection import LLMNeedleHaystackTester as RetrievalHeads

def cleanup(tester: RetrievalHeads):
    del tester.model_to_test
    del tester.testing_results
    del tester.head_counter
    del tester
    
    gc.collect()
    torch.cuda.empty_cache()
    torch.cuda.ipc_collect()

def run_test(args):
    try:
        tester = RetrievalHeads(**vars(args))
        tester.start_test()
    finally:
        cleanup(tester)

In [6]:
import sys
print("Python executable path:")
print(sys.executable)

print("\nPython environment site-packages:")
print(sys.path)


Python executable path:
/cs/student/projects1/2021/jiajfang/SNLP_Project/venv/bin/python

Python environment site-packages:
['/usr/lib64/python39.zip', '/usr/lib64/python3.9', '/usr/lib64/python3.9/lib-dynload', '', '/cs/student/projects1/2021/jiajfang/SNLP_Project/venv/lib64/python3.9/site-packages', '/cs/student/projects1/2021/jiajfang/SNLP_Project/venv/lib/python3.9/site-packages', '/cs/student/projects1/2021/jiajfang/SNLP_Project', '/tmp/tmpajuruwil', './faiss_attn/']


## Runing the Tests

### Set up the test object:
Specify the **id** and the **context_type** in the `get_args` function.

If `with_misleading` is True, then misleading statements are added to the specified `context_type` file.

### Start the Test

Running the test will save:

- **Contexts** → Given to the model per context length/depth  
  → `contexts/{model_name}_{context_type}_id_{id}`

- **Results** → Model outputs per context length/depth  
  → `results/graph/{model_name}_{context_type}_id_{id}`

**Example**:  
`results/graph/llama-2-7b-80k_relevant_id_44/`

Each test should take <=5 minutes. It sometimes uses the CPU instead of the GPU, if it is taking longer to finish, then restart the notebook and run again. 

In [13]:
ids = []
for file_name in os.listdir('./haystack/irrelevant'):
    if file_name.endswith('.txt'):
        file_id = os.path.splitext(file_name)[0]  # Extract the file name without extension
        ids.append(file_id)
ids.sort()
id_vals = [int(i) for i in ids[14:28]]
print(id_vals)

[168, 17, 170, 172, 173, 175, 18, 180, 182, 183, 185, 189, 192, 193]


In [14]:
for id in id_vals:
    for context_type in ["relevant", "irrelevant"]:
        for with_misleading in [False, True]:
            args = get_args(id, context_type, with_misleading)
            run_test(args)

loading from yaofu/llama-2-7b-80k
layer number: 32, head number 32


Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.12it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The most recent former Prime Minister of the United Kingdom is Rishi Sunak.



insertion at 0
The most recent former Prime Minister of the United Kingdom is Rishi Sunak.
[['6-9'], ['11-15'], ['7-4'], ['16-19'], ['8-26'], ['19-15'], ['17-22'], ['8-22'], ['12-26'], ['21-30'], ['24-29'], ['31-16'], ['11-2'], ['15-14'], ['18-30'], ['19-14'], ['26-3'], ['28-14'], ['0-9'], ['6-30']]
-- Test Summary -- 
Duration: 0.7 seconds
Context: 0 tokens
Depth: 0%
Score: 100.00001192092896
Response: The most recent former Prime Minister of the United Kingdom is Rishi Sunak.

Writing at results/graph/llama-2-7b-80k_id_168_relevant/llama-2-7b-80k_id_168_relevant_len_0_depth_0_results.json
insertion at 0
The most recent former Prime Minister of the United Kingdom is Rishi Sunak.
[['6-9'], ['11-15'], ['7-4'], ['16-19'], ['8-26'], ['19-15'], ['17-2

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.10it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The most recent former Prime Minister of the United Kingdom is Rishi Sunak.



insertion at 0
The most recent former Prime Minister of the United Kingdom is Rishi Sunak.
[['6-9'], ['11-15'], ['7-4'], ['16-19'], ['8-26'], ['19-15'], ['17-22'], ['8-22'], ['12-26'], ['21-30'], ['24-29'], ['31-16'], ['11-2'], ['15-14'], ['18-30'], ['19-14'], ['26-3'], ['28-14'], ['0-9'], ['6-30']]
-- Test Summary -- 
Duration: 0.7 seconds
Context: 0 tokens
Depth: 0%
Score: 100.00001192092896
Response: The most recent former Prime Minister of the United Kingdom is Rishi Sunak.

Writing at results/graph/llama-2-7b-80k_id_168_relevant_misleading/llama-2-7b-80k_id_168_relevant_misleading_len_0_depth_0_results.json
insertion at 0
The most recent former Prime Minister of the United Kingdom is Rishi Sunak.
[['6-9'], ['11-15'], ['7-4'], ['16-19'], ['8-2

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.12it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The most recent former Prime Minister of the United Kingdom is Rishi Sunak.



insertion at 0
The most recent former Prime Minister of the United Kingdom is Rishi Sunak.
[['6-9'], ['11-15'], ['7-4'], ['16-19'], ['8-26'], ['19-15'], ['17-22'], ['8-22'], ['12-26'], ['21-30'], ['24-29'], ['31-16'], ['11-2'], ['15-14'], ['18-30'], ['19-14'], ['26-3'], ['28-14'], ['0-9'], ['6-30']]
-- Test Summary -- 
Duration: 0.7 seconds
Context: 0 tokens
Depth: 0%
Score: 100.00001192092896
Response: The most recent former Prime Minister of the United Kingdom is Rishi Sunak.

Writing at results/graph/llama-2-7b-80k_id_168_irrelevant/llama-2-7b-80k_id_168_irrelevant_len_0_depth_0_results.json
insertion at 0
The most recent former Prime Minister of the United Kingdom is Rishi Sunak.
[['6-9'], ['11-15'], ['7-4'], ['16-19'], ['8-26'], ['19-15'], ['

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.11it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The most recent former Prime Minister of the United Kingdom is Rishi Sunak.



insertion at 0
The most recent former Prime Minister of the United Kingdom is Rishi Sunak.
[['6-9'], ['11-15'], ['7-4'], ['16-19'], ['8-26'], ['19-15'], ['17-22'], ['8-22'], ['12-26'], ['21-30'], ['24-29'], ['31-16'], ['11-2'], ['15-14'], ['18-30'], ['19-14'], ['26-3'], ['28-14'], ['0-9'], ['6-30']]
-- Test Summary -- 
Duration: 0.7 seconds
Context: 0 tokens
Depth: 0%
Score: 100.00001192092896
Response: The most recent former Prime Minister of the United Kingdom is Rishi Sunak.

Writing at results/graph/llama-2-7b-80k_id_168_irrelevant_misleading/llama-2-7b-80k_id_168_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
The most recent former Prime Minister of the United Kingdom is Rishi Sunak.
[['6-9'], ['11-15'], ['7-4'], ['16-19'], [

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.15it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The 2024 Nobel Prize in Physics was awarded to John Hopfield and Geoffrey Hinton.



insertion at 0
The 2024 Nobel Prize in Physics was awarded to John Hopfield and Geoffrey Hinton.
[['16-19'], ['6-9'], ['7-4'], ['11-15'], ['21-30'], ['6-30'], ['8-26'], ['17-22'], ['19-15'], ['12-26'], ['15-14'], ['18-30'], ['21-28'], ['22-19'], ['24-3'], ['24-29'], ['31-16'], ['8-31'], ['13-11'], ['17-0']]
-- Test Summary -- 
Duration: 0.5 seconds
Context: 0 tokens
Depth: 0%
Score: 64.07519578933716
Response: John Hopfield and Geoffrey Hinton

Writing at results/graph/llama-2-7b-80k_id_17_relevant/llama-2-7b-80k_id_17_relevant_len_0_depth_0_results.json
insertion at 0
The 2024 Nobel Prize in Physics was awarded to John Hopfield and Geoffrey Hinton.
[['16-19'], ['6-9'], ['7-4'], ['11-15'], ['21-30'], ['6-30'], ['8-26'], ['17-22'], ['19-15'],

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.13it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The 2024 Nobel Prize in Physics was awarded to John Hopfield and Geoffrey Hinton.



insertion at 0
The 2024 Nobel Prize in Physics was awarded to John Hopfield and Geoffrey Hinton.
[['16-19'], ['6-9'], ['7-4'], ['11-15'], ['21-30'], ['6-30'], ['8-26'], ['17-22'], ['19-15'], ['12-26'], ['15-14'], ['18-30'], ['21-28'], ['22-19'], ['24-3'], ['24-29'], ['31-16'], ['8-31'], ['13-11'], ['17-0']]
-- Test Summary -- 
Duration: 0.5 seconds
Context: 0 tokens
Depth: 0%
Score: 64.07519578933716
Response: John Hopfield and Geoffrey Hinton

Writing at results/graph/llama-2-7b-80k_id_17_relevant_misleading/llama-2-7b-80k_id_17_relevant_misleading_len_0_depth_0_results.json
insertion at 0
The 2024 Nobel Prize in Physics was awarded to John Hopfield and Geoffrey Hinton.
[['16-19'], ['6-9'], ['7-4'], ['11-15'], ['21-30'], ['6-30'], ['8-26'],

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.12it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The 2024 Nobel Prize in Physics was awarded to John Hopfield and Geoffrey Hinton.



insertion at 0
The 2024 Nobel Prize in Physics was awarded to John Hopfield and Geoffrey Hinton.
[['16-19'], ['6-9'], ['7-4'], ['11-15'], ['21-30'], ['6-30'], ['8-26'], ['17-22'], ['19-15'], ['12-26'], ['15-14'], ['18-30'], ['21-28'], ['22-19'], ['24-3'], ['24-29'], ['31-16'], ['8-31'], ['13-11'], ['17-0']]
-- Test Summary -- 
Duration: 0.5 seconds
Context: 0 tokens
Depth: 0%
Score: 64.07519578933716
Response: John Hopfield and Geoffrey Hinton

Writing at results/graph/llama-2-7b-80k_id_17_irrelevant/llama-2-7b-80k_id_17_irrelevant_len_0_depth_0_results.json
insertion at 0
The 2024 Nobel Prize in Physics was awarded to John Hopfield and Geoffrey Hinton.
[['16-19'], ['6-9'], ['7-4'], ['11-15'], ['21-30'], ['6-30'], ['8-26'], ['17-22'], ['19-1

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.12it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The 2024 Nobel Prize in Physics was awarded to John Hopfield and Geoffrey Hinton.



insertion at 0
The 2024 Nobel Prize in Physics was awarded to John Hopfield and Geoffrey Hinton.
[['16-19'], ['6-9'], ['7-4'], ['11-15'], ['21-30'], ['6-30'], ['8-26'], ['17-22'], ['19-15'], ['12-26'], ['15-14'], ['18-30'], ['21-28'], ['22-19'], ['24-3'], ['24-29'], ['31-16'], ['8-31'], ['13-11'], ['17-0']]
-- Test Summary -- 
Duration: 0.5 seconds
Context: 0 tokens
Depth: 0%
Score: 64.07519578933716
Response: John Hopfield and Geoffrey Hinton

Writing at results/graph/llama-2-7b-80k_id_17_irrelevant_misleading/llama-2-7b-80k_id_17_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
The 2024 Nobel Prize in Physics was awarded to John Hopfield and Geoffrey Hinton.
[['16-19'], ['6-9'], ['7-4'], ['11-15'], ['21-30'], ['6-30'], ['8-2

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.13it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The most recent outbreak declared as a public health emergency of international concern by WHO is Clade I mpox.



insertion at 0
The most recent outbreak declared as a public health emergency of international concern by WHO is Clade I mpox.
[['16-19'], ['19-15'], ['7-4'], ['24-29'], ['6-9'], ['6-30'], ['11-15'], ['21-30'], ['6-11'], ['7-13'], ['8-26'], ['10-29'], ['11-30'], ['18-30'], ['19-9'], ['22-8'], ['26-28'], ['31-16'], ['7-28'], ['7-31']]
-- Test Summary -- 
Duration: 0.4 seconds
Context: 0 tokens
Depth: 0%
Score: 53.51347327232361
Response: Clade I mpox

Writing at results/graph/llama-2-7b-80k_id_170_relevant/llama-2-7b-80k_id_170_relevant_len_0_depth_0_results.json
insertion at 0
The most recent outbreak declared as a public health emergency of international concern by WHO is Clade I mpox.
[['16-19'], ['19-15'], ['

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.10it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The most recent outbreak declared as a public health emergency of international concern by WHO is Clade I mpox.



insertion at 0
The most recent outbreak declared as a public health emergency of international concern by WHO is Clade I mpox.
[['16-19'], ['19-15'], ['7-4'], ['24-29'], ['6-9'], ['6-30'], ['11-15'], ['21-30'], ['6-11'], ['7-13'], ['8-26'], ['10-29'], ['11-30'], ['18-30'], ['19-9'], ['22-8'], ['26-28'], ['31-16'], ['7-28'], ['7-31']]
-- Test Summary -- 
Duration: 0.4 seconds
Context: 0 tokens
Depth: 0%
Score: 53.51347327232361
Response: Clade I mpox

Writing at results/graph/llama-2-7b-80k_id_170_relevant_misleading/llama-2-7b-80k_id_170_relevant_misleading_len_0_depth_0_results.json
insertion at 0
The most recent outbreak declared as a public health emergency of international concern by WHO is Clade I mpox.
[['

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.14it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The most recent outbreak declared as a public health emergency of international concern by WHO is Clade I mpox.



insertion at 0
The most recent outbreak declared as a public health emergency of international concern by WHO is Clade I mpox.
[['16-19'], ['19-15'], ['7-4'], ['24-29'], ['6-9'], ['6-30'], ['11-15'], ['21-30'], ['6-11'], ['7-13'], ['8-26'], ['10-29'], ['11-30'], ['18-30'], ['19-9'], ['22-8'], ['26-28'], ['31-16'], ['7-28'], ['7-31']]
-- Test Summary -- 
Duration: 0.4 seconds
Context: 0 tokens
Depth: 0%
Score: 53.51347327232361
Response: Clade I mpox

Writing at results/graph/llama-2-7b-80k_id_170_irrelevant/llama-2-7b-80k_id_170_irrelevant_len_0_depth_0_results.json
insertion at 0
The most recent outbreak declared as a public health emergency of international concern by WHO is Clade I mpox.
[['16-19'], ['19-15']

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.14it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The most recent outbreak declared as a public health emergency of international concern by WHO is Clade I mpox.



insertion at 0
The most recent outbreak declared as a public health emergency of international concern by WHO is Clade I mpox.
[['16-19'], ['19-15'], ['7-4'], ['24-29'], ['6-9'], ['6-30'], ['11-15'], ['21-30'], ['6-11'], ['7-13'], ['8-26'], ['10-29'], ['11-30'], ['18-30'], ['19-9'], ['22-8'], ['26-28'], ['31-16'], ['7-28'], ['7-31']]
-- Test Summary -- 
Duration: 0.4 seconds
Context: 0 tokens
Depth: 0%
Score: 53.51347327232361
Response: Clade I mpox

Writing at results/graph/llama-2-7b-80k_id_170_irrelevant_misleading/llama-2-7b-80k_id_170_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
The most recent outbreak declared as a public health emergency of international concern by WHO is Clade I mpox.

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.12it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: There are 102 LA metro rail stations.



insertion at 0
There are 102 LA metro rail stations.
[['11-15'], ['16-19'], ['6-9'], ['8-26'], ['7-4'], ['12-26'], ['19-15'], ['21-30'], ['16-24'], ['17-22'], ['6-30'], ['11-2'], ['13-11'], ['24-29'], ['6-11'], ['7-13'], ['8-22'], ['8-31'], ['17-31'], ['20-30']]
-- Test Summary -- 
Duration: 0.6 seconds
Context: 0 tokens
Depth: 0%
Score: 100.00001192092896
Response: There are 102 LA metro rail stations.

Writing at results/graph/llama-2-7b-80k_id_172_relevant/llama-2-7b-80k_id_172_relevant_len_0_depth_0_results.json
insertion at 0
There are 102 LA metro rail stations.
[['11-15'], ['16-19'], ['6-9'], ['8-26'], ['7-4'], ['12-26'], ['19-15'], ['21-30'], ['16-24'], ['17-22'], ['6-30'], ['11-2'], ['13-11'], ['24-29'], ['6-11'], ['7-13'], ['8-22'], ['8-31'], ['17-31'], ['20-30']]
-- Test Su

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.10it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: There are 102 LA metro rail stations.



insertion at 0
There are 102 LA metro rail stations.
[['11-15'], ['16-19'], ['6-9'], ['8-26'], ['7-4'], ['12-26'], ['19-15'], ['21-30'], ['16-24'], ['17-22'], ['6-30'], ['11-2'], ['13-11'], ['24-29'], ['6-11'], ['7-13'], ['8-22'], ['8-31'], ['17-31'], ['20-30']]
-- Test Summary -- 
Duration: 0.6 seconds
Context: 0 tokens
Depth: 0%
Score: 100.00001192092896
Response: There are 102 LA metro rail stations.

Writing at results/graph/llama-2-7b-80k_id_172_relevant_misleading/llama-2-7b-80k_id_172_relevant_misleading_len_0_depth_0_results.json
insertion at 0
There are 102 LA metro rail stations.
[['11-15'], ['16-19'], ['6-9'], ['8-26'], ['7-4'], ['12-26'], ['19-15'], ['21-30'], ['16-24'], ['17-22'], ['6-30'], ['11-2'], ['13-11'], ['24-29'], ['6-11'], ['7-13'], ['8-22'], ['8-31'], ['17-31'],

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.12it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: There are 102 LA metro rail stations.



insertion at 0
There are 102 LA metro rail stations.
[['11-15'], ['16-19'], ['6-9'], ['8-26'], ['7-4'], ['12-26'], ['19-15'], ['21-30'], ['16-24'], ['17-22'], ['6-30'], ['11-2'], ['13-11'], ['24-29'], ['6-11'], ['7-13'], ['8-22'], ['8-31'], ['17-31'], ['20-30']]
-- Test Summary -- 
Duration: 0.6 seconds
Context: 0 tokens
Depth: 0%
Score: 100.00001192092896
Response: There are 102 LA metro rail stations.

Writing at results/graph/llama-2-7b-80k_id_172_irrelevant/llama-2-7b-80k_id_172_irrelevant_len_0_depth_0_results.json
insertion at 0
There are 102 LA metro rail stations.
[['11-15'], ['16-19'], ['6-9'], ['8-26'], ['7-4'], ['12-26'], ['19-15'], ['21-30'], ['16-24'], ['17-22'], ['6-30'], ['11-2'], ['13-11'], ['24-29'], ['6-11'], ['7-13'], ['8-22'], ['8-31'], ['17-31'], ['20-30']]
-- Tes

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.14it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: There are 102 LA metro rail stations.



insertion at 0
There are 102 LA metro rail stations.
[['11-15'], ['16-19'], ['6-9'], ['8-26'], ['7-4'], ['12-26'], ['19-15'], ['21-30'], ['16-24'], ['17-22'], ['6-30'], ['11-2'], ['13-11'], ['24-29'], ['6-11'], ['7-13'], ['8-22'], ['8-31'], ['17-31'], ['20-30']]
-- Test Summary -- 
Duration: 0.6 seconds
Context: 0 tokens
Depth: 0%
Score: 100.00001192092896
Response: There are 102 LA metro rail stations.

Writing at results/graph/llama-2-7b-80k_id_172_irrelevant_misleading/llama-2-7b-80k_id_172_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
There are 102 LA metro rail stations.
[['11-15'], ['16-19'], ['6-9'], ['8-26'], ['7-4'], ['12-26'], ['19-15'], ['21-30'], ['16-24'], ['17-22'], ['6-30'], ['11-2'], ['13-11'], ['24-29'], ['6-11'], ['7-13'], ['8-22'], ['8-31'], ['17-3

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.12it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Jannik Sinner won the biggest single-tournament payday in tennis history.



insertion at 0
Jannik Sinner won the biggest single-tournament payday in tennis history.
[['16-19'], ['6-9'], ['21-30'], ['11-15'], ['17-22'], ['24-29'], ['29-19'], ['7-4'], ['8-26'], ['11-2'], ['6-30'], ['15-14'], ['19-15'], ['12-26'], ['13-11'], ['19-9'], ['19-14'], ['22-27'], ['26-25'], ['29-13']]
-- Test Summary -- 
Duration: 0.8 seconds
Context: 0 tokens
Depth: 0%
Score: 90.1591420173645
Response: annik Sinner won the biggest single-tournament payday in tennis history.

Writing at results/graph/llama-2-7b-80k_id_173_relevant/llama-2-7b-80k_id_173_relevant_len_0_depth_0_results.json
insertion at 0
Jannik Sinner won the biggest single-tournament payday in tennis history.
[['16-19'], ['6-9'], ['21-30'], ['11-15'], ['17-22'], ['24-29'], ['29-19'], 

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Jannik Sinner won the biggest single-tournament payday in tennis history.



insertion at 0
Jannik Sinner won the biggest single-tournament payday in tennis history.
[['16-19'], ['6-9'], ['21-30'], ['11-15'], ['17-22'], ['24-29'], ['29-19'], ['7-4'], ['8-26'], ['11-2'], ['6-30'], ['15-14'], ['19-15'], ['12-26'], ['13-11'], ['19-9'], ['19-14'], ['22-27'], ['26-25'], ['29-13']]
-- Test Summary -- 
Duration: 0.8 seconds
Context: 0 tokens
Depth: 0%
Score: 90.1591420173645
Response: annik Sinner won the biggest single-tournament payday in tennis history.

Writing at results/graph/llama-2-7b-80k_id_173_relevant_misleading/llama-2-7b-80k_id_173_relevant_misleading_len_0_depth_0_results.json
insertion at 0
Jannik Sinner won the biggest single-tournament payday in tennis history.
[['16-19'], ['6-9'], ['21-30'], ['11-15'], ['17-22'], 

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Jannik Sinner won the biggest single-tournament payday in tennis history.



insertion at 0
Jannik Sinner won the biggest single-tournament payday in tennis history.
[['16-19'], ['6-9'], ['21-30'], ['11-15'], ['17-22'], ['24-29'], ['29-19'], ['7-4'], ['8-26'], ['11-2'], ['6-30'], ['15-14'], ['19-15'], ['12-26'], ['13-11'], ['19-9'], ['19-14'], ['22-27'], ['26-25'], ['29-13']]
-- Test Summary -- 
Duration: 0.8 seconds
Context: 0 tokens
Depth: 0%
Score: 90.1591420173645
Response: annik Sinner won the biggest single-tournament payday in tennis history.

Writing at results/graph/llama-2-7b-80k_id_173_irrelevant/llama-2-7b-80k_id_173_irrelevant_len_0_depth_0_results.json
insertion at 0
Jannik Sinner won the biggest single-tournament payday in tennis history.
[['16-19'], ['6-9'], ['21-30'], ['11-15'], ['17-22'], ['24-29'], ['29-19

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.19it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Jannik Sinner won the biggest single-tournament payday in tennis history.



insertion at 0
Jannik Sinner won the biggest single-tournament payday in tennis history.
[['16-19'], ['6-9'], ['21-30'], ['11-15'], ['17-22'], ['24-29'], ['29-19'], ['7-4'], ['8-26'], ['11-2'], ['6-30'], ['15-14'], ['19-15'], ['12-26'], ['13-11'], ['19-9'], ['19-14'], ['22-27'], ['26-25'], ['29-13']]
-- Test Summary -- 
Duration: 0.8 seconds
Context: 0 tokens
Depth: 0%
Score: 90.1591420173645
Response: annik Sinner won the biggest single-tournament payday in tennis history.

Writing at results/graph/llama-2-7b-80k_id_173_irrelevant_misleading/llama-2-7b-80k_id_173_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
Jannik Sinner won the biggest single-tournament payday in tennis history.
[['16-19'], ['6-9'], ['21-30'], ['11-15'], ['17-22

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Kendrick Lamar released his most recent studio album on November 22, 2024.



insertion at 0
Kendrick Lamar released his most recent studio album on November 22, 2024.
[['6-9'], ['11-15'], ['16-19'], ['8-26'], ['7-4'], ['12-26'], ['18-30'], ['14-18'], ['16-24'], ['17-22'], ['19-15'], ['21-30'], ['29-5'], ['31-15'], ['1-26'], ['11-2'], ['15-14'], ['15-25'], ['19-10'], ['20-29']]
-- Test Summary -- 
Duration: 1.0 seconds
Context: 0 tokens
Depth: 0%
Score: 100.00001192092896
Response: Kendrick Lamar released his most recent studio album on November 22, 2024.

Writing at results/graph/llama-2-7b-80k_id_175_relevant/llama-2-7b-80k_id_175_relevant_len_0_depth_0_results.json
insertion at 0
Kendrick Lamar released his most recent studio album on November 22, 2024.
[['6-9'], ['11-15'], ['16-19'], ['8-26'], ['7-4'], ['12-26'], ['18-30

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Kendrick Lamar released his most recent studio album on November 22, 2024.



insertion at 0
Kendrick Lamar released his most recent studio album on November 22, 2024.
[['6-9'], ['11-15'], ['16-19'], ['8-26'], ['7-4'], ['12-26'], ['18-30'], ['14-18'], ['16-24'], ['17-22'], ['19-15'], ['21-30'], ['29-5'], ['31-15'], ['1-26'], ['11-2'], ['15-14'], ['15-25'], ['19-10'], ['20-29']]
-- Test Summary -- 
Duration: 1.0 seconds
Context: 0 tokens
Depth: 0%
Score: 100.00001192092896
Response: Kendrick Lamar released his most recent studio album on November 22, 2024.

Writing at results/graph/llama-2-7b-80k_id_175_relevant_misleading/llama-2-7b-80k_id_175_relevant_misleading_len_0_depth_0_results.json
insertion at 0
Kendrick Lamar released his most recent studio album on November 22, 2024.
[['6-9'], ['11-15'], ['16-19'], ['8-26'], ['7-4

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.17it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Kendrick Lamar released his most recent studio album on November 22, 2024.



insertion at 0
Kendrick Lamar released his most recent studio album on November 22, 2024.
[['6-9'], ['11-15'], ['16-19'], ['8-26'], ['7-4'], ['12-26'], ['18-30'], ['14-18'], ['16-24'], ['17-22'], ['19-15'], ['21-30'], ['29-5'], ['31-15'], ['1-26'], ['11-2'], ['15-14'], ['15-25'], ['19-10'], ['20-29']]
-- Test Summary -- 
Duration: 1.0 seconds
Context: 0 tokens
Depth: 0%
Score: 100.00001192092896
Response: Kendrick Lamar released his most recent studio album on November 22, 2024.

Writing at results/graph/llama-2-7b-80k_id_175_irrelevant/llama-2-7b-80k_id_175_irrelevant_len_0_depth_0_results.json
insertion at 0
Kendrick Lamar released his most recent studio album on November 22, 2024.
[['6-9'], ['11-15'], ['16-19'], ['8-26'], ['7-4'], ['12-26'], ['1

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Kendrick Lamar released his most recent studio album on November 22, 2024.



insertion at 0
Kendrick Lamar released his most recent studio album on November 22, 2024.
[['6-9'], ['11-15'], ['16-19'], ['8-26'], ['7-4'], ['12-26'], ['18-30'], ['14-18'], ['16-24'], ['17-22'], ['19-15'], ['21-30'], ['29-5'], ['31-15'], ['1-26'], ['11-2'], ['15-14'], ['15-25'], ['19-10'], ['20-29']]
-- Test Summary -- 
Duration: 1.0 seconds
Context: 0 tokens
Depth: 0%
Score: 100.00001192092896
Response: Kendrick Lamar released his most recent studio album on November 22, 2024.

Writing at results/graph/llama-2-7b-80k_id_175_irrelevant_misleading/llama-2-7b-80k_id_175_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
Kendrick Lamar released his most recent studio album on November 22, 2024.
[['6-9'], ['11-15'], ['16-19'], ['8-26'], [

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.19it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Daron Acemoglu, Simon Johnson, and James Robinson won the 2024 Nobel Prize in Economics.



insertion at 0
Daron Acemoglu, Simon Johnson, and James Robinson won the 2024 Nobel Prize in Economics.
[['16-19'], ['11-15'], ['6-9'], ['8-26'], ['17-22'], ['7-4'], ['15-14'], ['19-15'], ['21-30'], ['6-30'], ['12-26'], ['11-2'], ['14-18'], ['18-30'], ['24-29'], ['26-28'], ['31-16'], ['24-3'], ['28-14'], ['13-23']]
-- Test Summary -- 
Duration: 1.6 seconds
Context: 0 tokens
Depth: 0%
Score: 99.99999403953552
Response: Daron Acemoglu, Simon Johnson, and James Robinson won the 2024 Nobel Prize in Economics.

Writing at results/graph/llama-2-7b-80k_id_18_relevant/llama-2-7b-80k_id_18_relevant_len_0_depth_0_results.json
insertion at 0
Daron Acemoglu, Simon Johnson, and James Robinson won the 2024 Nobel Prize in Economics.
[['16-19'], ['11

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.19it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Daron Acemoglu, Simon Johnson, and James Robinson won the 2024 Nobel Prize in Economics.



insertion at 0
Daron Acemoglu, Simon Johnson, and James Robinson won the 2024 Nobel Prize in Economics.
[['16-19'], ['11-15'], ['6-9'], ['8-26'], ['17-22'], ['7-4'], ['15-14'], ['19-15'], ['21-30'], ['6-30'], ['12-26'], ['11-2'], ['14-18'], ['18-30'], ['24-29'], ['26-28'], ['31-16'], ['24-3'], ['28-14'], ['13-23']]
-- Test Summary -- 
Duration: 1.6 seconds
Context: 0 tokens
Depth: 0%
Score: 99.99999403953552
Response: Daron Acemoglu, Simon Johnson, and James Robinson won the 2024 Nobel Prize in Economics.

Writing at results/graph/llama-2-7b-80k_id_18_relevant_misleading/llama-2-7b-80k_id_18_relevant_misleading_len_0_depth_0_results.json
insertion at 0
Daron Acemoglu, Simon Johnson, and James Robinson won the 2024 Nobel Prize in Econo

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.16it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Daron Acemoglu, Simon Johnson, and James Robinson won the 2024 Nobel Prize in Economics.



insertion at 0
Daron Acemoglu, Simon Johnson, and James Robinson won the 2024 Nobel Prize in Economics.
[['16-19'], ['11-15'], ['6-9'], ['8-26'], ['17-22'], ['7-4'], ['15-14'], ['19-15'], ['21-30'], ['6-30'], ['12-26'], ['11-2'], ['14-18'], ['18-30'], ['24-29'], ['26-28'], ['31-16'], ['24-3'], ['28-14'], ['13-23']]
-- Test Summary -- 
Duration: 1.6 seconds
Context: 0 tokens
Depth: 0%
Score: 99.99999403953552
Response: Daron Acemoglu, Simon Johnson, and James Robinson won the 2024 Nobel Prize in Economics.

Writing at results/graph/llama-2-7b-80k_id_18_irrelevant/llama-2-7b-80k_id_18_irrelevant_len_0_depth_0_results.json
insertion at 0
Daron Acemoglu, Simon Johnson, and James Robinson won the 2024 Nobel Prize in Economics.
[['16-19'], 

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Daron Acemoglu, Simon Johnson, and James Robinson won the 2024 Nobel Prize in Economics.



insertion at 0
Daron Acemoglu, Simon Johnson, and James Robinson won the 2024 Nobel Prize in Economics.
[['16-19'], ['11-15'], ['6-9'], ['8-26'], ['17-22'], ['7-4'], ['15-14'], ['19-15'], ['21-30'], ['6-30'], ['12-26'], ['11-2'], ['14-18'], ['18-30'], ['24-29'], ['26-28'], ['31-16'], ['24-3'], ['28-14'], ['13-23']]
-- Test Summary -- 
Duration: 1.6 seconds
Context: 0 tokens
Depth: 0%
Score: 99.99999403953552
Response: Daron Acemoglu, Simon Johnson, and James Robinson won the 2024 Nobel Prize in Economics.

Writing at results/graph/llama-2-7b-80k_id_18_irrelevant_misleading/llama-2-7b-80k_id_18_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
Daron Acemoglu, Simon Johnson, and James Robinson won the 2024 Nobel Prize in E

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The latest major version of the .NET framework is .NET 9.



insertion at 0
The latest major version of the .NET framework is .NET 9.
[['6-9'], ['11-15'], ['16-19'], ['11-2'], ['7-4'], ['8-26'], ['18-30'], ['21-30'], ['12-26'], ['17-22'], ['19-15'], ['13-11'], ['14-18'], ['15-14'], ['21-28'], ['24-3'], ['26-28'], ['29-19'], ['0-9'], ['1-26']]
-- Test Summary -- 
Duration: 0.7 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: The latest major version of the .NET framework is .NET 9.

Writing at results/graph/llama-2-7b-80k_id_180_relevant/llama-2-7b-80k_id_180_relevant_len_0_depth_0_results.json
insertion at 0
The latest major version of the .NET framework is .NET 9.
[['6-9'], ['11-15'], ['16-19'], ['11-2'], ['7-4'], ['8-26'], ['18-30'], ['21-30'], ['12-26'], ['17-22'], ['19-15'], ['13-11'], ['14-18'], ['15-14'], ['2

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The latest major version of the .NET framework is .NET 9.



insertion at 0
The latest major version of the .NET framework is .NET 9.
[['6-9'], ['11-15'], ['16-19'], ['11-2'], ['7-4'], ['8-26'], ['18-30'], ['21-30'], ['12-26'], ['17-22'], ['19-15'], ['13-11'], ['14-18'], ['15-14'], ['21-28'], ['24-3'], ['26-28'], ['29-19'], ['0-9'], ['1-26']]
-- Test Summary -- 
Duration: 0.7 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: The latest major version of the .NET framework is .NET 9.

Writing at results/graph/llama-2-7b-80k_id_180_relevant_misleading/llama-2-7b-80k_id_180_relevant_misleading_len_0_depth_0_results.json
insertion at 0
The latest major version of the .NET framework is .NET 9.
[['6-9'], ['11-15'], ['16-19'], ['11-2'], ['7-4'], ['8-26'], ['18-30'], ['21-30'], ['12-26'], ['17-22'], ['19-15'], ['13-11'], ['1

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.17it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The latest major version of the .NET framework is .NET 9.



insertion at 0
The latest major version of the .NET framework is .NET 9.
[['6-9'], ['11-15'], ['16-19'], ['11-2'], ['7-4'], ['8-26'], ['18-30'], ['21-30'], ['12-26'], ['17-22'], ['19-15'], ['13-11'], ['14-18'], ['15-14'], ['21-28'], ['24-3'], ['26-28'], ['29-19'], ['0-9'], ['1-26']]
-- Test Summary -- 
Duration: 0.7 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: The latest major version of the .NET framework is .NET 9.

Writing at results/graph/llama-2-7b-80k_id_180_irrelevant/llama-2-7b-80k_id_180_irrelevant_len_0_depth_0_results.json
insertion at 0
The latest major version of the .NET framework is .NET 9.
[['6-9'], ['11-15'], ['16-19'], ['11-2'], ['7-4'], ['8-26'], ['18-30'], ['21-30'], ['12-26'], ['17-22'], ['19-15'], ['13-11'], ['14-18'], ['15-14'],

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.16it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The latest major version of the .NET framework is .NET 9.



insertion at 0
The latest major version of the .NET framework is .NET 9.
[['6-9'], ['11-15'], ['16-19'], ['11-2'], ['7-4'], ['8-26'], ['18-30'], ['21-30'], ['12-26'], ['17-22'], ['19-15'], ['13-11'], ['14-18'], ['15-14'], ['21-28'], ['24-3'], ['26-28'], ['29-19'], ['0-9'], ['1-26']]
-- Test Summary -- 
Duration: 0.7 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: The latest major version of the .NET framework is .NET 9.

Writing at results/graph/llama-2-7b-80k_id_180_irrelevant_misleading/llama-2-7b-80k_id_180_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
The latest major version of the .NET framework is .NET 9.
[['6-9'], ['11-15'], ['16-19'], ['11-2'], ['7-4'], ['8-26'], ['18-30'], ['21-30'], ['12-26'], ['17-22'], ['19-15'], ['13-11'],

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: There are 9 food allergens that require mandatory labeling in the United States.



insertion at 0
There are 9 food allergens that require mandatory labeling in the United States.
[['16-19'], ['6-9'], ['11-15'], ['7-4'], ['8-26'], ['21-30'], ['24-29'], ['17-22'], ['13-11'], ['19-15'], ['6-30'], ['12-26'], ['7-13'], ['18-30'], ['24-30'], ['29-21'], ['30-14'], ['31-16'], ['11-2'], ['14-18']]
-- Test Summary -- 
Duration: 0.9 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: There are 9 food allergens that require mandatory labeling in the United States.

Writing at results/graph/llama-2-7b-80k_id_182_relevant/llama-2-7b-80k_id_182_relevant_len_0_depth_0_results.json
insertion at 0
There are 9 food allergens that require mandatory labeling in the United States.
[['16-19'], ['6-9'], ['11-15'], ['7-4'], ['8-26'], ['21-30

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: There are 9 food allergens that require mandatory labeling in the United States.



insertion at 0
There are 9 food allergens that require mandatory labeling in the United States.
[['16-19'], ['6-9'], ['11-15'], ['7-4'], ['8-26'], ['21-30'], ['24-29'], ['17-22'], ['13-11'], ['19-15'], ['6-30'], ['12-26'], ['7-13'], ['18-30'], ['24-30'], ['29-21'], ['30-14'], ['31-16'], ['11-2'], ['14-18']]
-- Test Summary -- 
Duration: 0.9 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: There are 9 food allergens that require mandatory labeling in the United States.

Writing at results/graph/llama-2-7b-80k_id_182_relevant_misleading/llama-2-7b-80k_id_182_relevant_misleading_len_0_depth_0_results.json
insertion at 0
There are 9 food allergens that require mandatory labeling in the United States.
[['16-19'], ['6-9'], ['11-15'], ['7-

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.17it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: There are 9 food allergens that require mandatory labeling in the United States.



insertion at 0
There are 9 food allergens that require mandatory labeling in the United States.
[['16-19'], ['6-9'], ['11-15'], ['7-4'], ['8-26'], ['21-30'], ['24-29'], ['17-22'], ['13-11'], ['19-15'], ['6-30'], ['12-26'], ['7-13'], ['18-30'], ['24-30'], ['29-21'], ['30-14'], ['31-16'], ['11-2'], ['14-18']]
-- Test Summary -- 
Duration: 0.9 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: There are 9 food allergens that require mandatory labeling in the United States.

Writing at results/graph/llama-2-7b-80k_id_182_irrelevant/llama-2-7b-80k_id_182_irrelevant_len_0_depth_0_results.json
insertion at 0
There are 9 food allergens that require mandatory labeling in the United States.
[['16-19'], ['6-9'], ['11-15'], ['7-4'], ['8-26'], ['2

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.12it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: There are 9 food allergens that require mandatory labeling in the United States.



insertion at 0
There are 9 food allergens that require mandatory labeling in the United States.
[['16-19'], ['6-9'], ['11-15'], ['7-4'], ['8-26'], ['21-30'], ['24-29'], ['17-22'], ['13-11'], ['19-15'], ['6-30'], ['12-26'], ['7-13'], ['18-30'], ['24-30'], ['29-21'], ['30-14'], ['31-16'], ['11-2'], ['14-18']]
-- Test Summary -- 
Duration: 0.9 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: There are 9 food allergens that require mandatory labeling in the United States.

Writing at results/graph/llama-2-7b-80k_id_182_irrelevant_misleading/llama-2-7b-80k_id_182_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
There are 9 food allergens that require mandatory labeling in the United States.
[['16-19'], ['6-9'], ['11-15'], 

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  1.84it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The CEO of X Corp. is Linda Yaccarino.



insertion at 0
The CEO of X Corp. is Linda Yaccarino.
[['16-19'], ['19-15'], ['6-30'], ['11-15'], ['6-9'], ['8-26'], ['12-26'], ['16-24'], ['21-30'], ['24-29'], ['31-16'], ['7-4'], ['7-13'], ['8-31'], ['13-11'], ['14-18'], ['15-14'], ['17-22'], ['21-1'], ['22-8']]
-- Test Summary -- 
Duration: 0.5 seconds
Context: 0 tokens
Depth: 0%
Score: 63.610947132110596
Response: Linda Yaccarino.

Writing at results/graph/llama-2-7b-80k_id_183_relevant/llama-2-7b-80k_id_183_relevant_len_0_depth_0_results.json
insertion at 0
The CEO of X Corp. is Linda Yaccarino.
[['16-19'], ['19-15'], ['6-30'], ['11-15'], ['6-9'], ['8-26'], ['12-26'], ['16-24'], ['21-30'], ['24-29'], ['31-16'], ['7-4'], ['7-13'], ['8-31'], ['13-11'], ['14-18'], ['15-14'], ['17-22'], ['21-1'], ['22-8']]
-- Test Summary -- 
Durati

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.17it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The CEO of X Corp. is Linda Yaccarino.



insertion at 0
The CEO of X Corp. is Linda Yaccarino.
[['16-19'], ['19-15'], ['6-30'], ['11-15'], ['6-9'], ['8-26'], ['12-26'], ['16-24'], ['21-30'], ['24-29'], ['31-16'], ['7-4'], ['7-13'], ['8-31'], ['13-11'], ['14-18'], ['15-14'], ['17-22'], ['21-1'], ['22-8']]
-- Test Summary -- 
Duration: 0.5 seconds
Context: 0 tokens
Depth: 0%
Score: 63.610947132110596
Response: Linda Yaccarino.

Writing at results/graph/llama-2-7b-80k_id_183_relevant_misleading/llama-2-7b-80k_id_183_relevant_misleading_len_0_depth_0_results.json
insertion at 0
The CEO of X Corp. is Linda Yaccarino.
[['16-19'], ['19-15'], ['6-30'], ['11-15'], ['6-9'], ['8-26'], ['12-26'], ['16-24'], ['21-30'], ['24-29'], ['31-16'], ['7-4'], ['7-13'], ['8-31'], ['13-11'], ['14-18'], ['15-14'], ['17-22'], ['21-1'], ['22-8']]
-- T

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.16it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The CEO of X Corp. is Linda Yaccarino.



insertion at 0
The CEO of X Corp. is Linda Yaccarino.
[['16-19'], ['19-15'], ['6-30'], ['11-15'], ['6-9'], ['8-26'], ['12-26'], ['16-24'], ['21-30'], ['24-29'], ['31-16'], ['7-4'], ['7-13'], ['8-31'], ['13-11'], ['14-18'], ['15-14'], ['17-22'], ['21-1'], ['22-8']]
-- Test Summary -- 
Duration: 0.5 seconds
Context: 0 tokens
Depth: 0%
Score: 63.610947132110596
Response: Linda Yaccarino.

Writing at results/graph/llama-2-7b-80k_id_183_irrelevant/llama-2-7b-80k_id_183_irrelevant_len_0_depth_0_results.json
insertion at 0
The CEO of X Corp. is Linda Yaccarino.
[['16-19'], ['19-15'], ['6-30'], ['11-15'], ['6-9'], ['8-26'], ['12-26'], ['16-24'], ['21-30'], ['24-29'], ['31-16'], ['7-4'], ['7-13'], ['8-31'], ['13-11'], ['14-18'], ['15-14'], ['17-22'], ['21-1'], ['22-8']]
-- Test Summary -- 
Du

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.16it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The CEO of X Corp. is Linda Yaccarino.



insertion at 0
The CEO of X Corp. is Linda Yaccarino.
[['16-19'], ['19-15'], ['6-30'], ['11-15'], ['6-9'], ['8-26'], ['12-26'], ['16-24'], ['21-30'], ['24-29'], ['31-16'], ['7-4'], ['7-13'], ['8-31'], ['13-11'], ['14-18'], ['15-14'], ['17-22'], ['21-1'], ['22-8']]
-- Test Summary -- 
Duration: 0.5 seconds
Context: 0 tokens
Depth: 0%
Score: 63.610947132110596
Response: Linda Yaccarino.

Writing at results/graph/llama-2-7b-80k_id_183_irrelevant_misleading/llama-2-7b-80k_id_183_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
The CEO of X Corp. is Linda Yaccarino.
[['16-19'], ['19-15'], ['6-30'], ['11-15'], ['6-9'], ['8-26'], ['12-26'], ['16-24'], ['21-30'], ['24-29'], ['31-16'], ['7-4'], ['7-13'], ['8-31'], ['13-11'], ['14-18'], ['15-14'], ['17-22'], ['21-1'], ['22-8']]


Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.13it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The largest stadium by capacity in the world is the Narendra Modi Stadium.



insertion at 0
The largest stadium by capacity in the world is the Narendra Modi Stadium.
[['11-15'], ['16-19'], ['6-9'], ['7-4'], ['17-22'], ['19-15'], ['8-26'], ['21-30'], ['11-2'], ['12-26'], ['6-30'], ['13-11'], ['18-30'], ['19-10'], ['20-28'], ['22-22'], ['24-3'], ['24-30'], ['26-28'], ['29-19']]
-- Test Summary -- 
Duration: 0.8 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: The largest stadium by capacity in the world is the Narendra Modi Stadium.

Writing at results/graph/llama-2-7b-80k_id_185_relevant/llama-2-7b-80k_id_185_relevant_len_0_depth_0_results.json
insertion at 0
The largest stadium by capacity in the world is the Narendra Modi Stadium.
[['11-15'], ['16-19'], ['6-9'], ['7-4'], ['17-22'], ['19-15'], ['8-26'], ['21-30']

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.14it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The largest stadium by capacity in the world is the Narendra Modi Stadium.



insertion at 0
The largest stadium by capacity in the world is the Narendra Modi Stadium.
[['11-15'], ['16-19'], ['6-9'], ['7-4'], ['17-22'], ['19-15'], ['8-26'], ['21-30'], ['11-2'], ['12-26'], ['6-30'], ['13-11'], ['18-30'], ['19-10'], ['20-28'], ['22-22'], ['24-3'], ['24-30'], ['26-28'], ['29-19']]
-- Test Summary -- 
Duration: 0.8 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: The largest stadium by capacity in the world is the Narendra Modi Stadium.

Writing at results/graph/llama-2-7b-80k_id_185_relevant_misleading/llama-2-7b-80k_id_185_relevant_misleading_len_0_depth_0_results.json
insertion at 0
The largest stadium by capacity in the world is the Narendra Modi Stadium.
[['11-15'], ['16-19'], ['6-9'], ['7-4'], ['17-22'], ['19-15'

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  1.93it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The largest stadium by capacity in the world is the Narendra Modi Stadium.



insertion at 0
The largest stadium by capacity in the world is the Narendra Modi Stadium.
[['11-15'], ['16-19'], ['6-9'], ['7-4'], ['17-22'], ['19-15'], ['8-26'], ['21-30'], ['11-2'], ['12-26'], ['6-30'], ['13-11'], ['18-30'], ['19-10'], ['20-28'], ['22-22'], ['24-3'], ['24-30'], ['26-28'], ['29-19']]
-- Test Summary -- 
Duration: 0.8 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: The largest stadium by capacity in the world is the Narendra Modi Stadium.

Writing at results/graph/llama-2-7b-80k_id_185_irrelevant/llama-2-7b-80k_id_185_irrelevant_len_0_depth_0_results.json
insertion at 0
The largest stadium by capacity in the world is the Narendra Modi Stadium.
[['11-15'], ['16-19'], ['6-9'], ['7-4'], ['17-22'], ['19-15'], ['8-26'], ['21-

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.12it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The largest stadium by capacity in the world is the Narendra Modi Stadium.



insertion at 0
The largest stadium by capacity in the world is the Narendra Modi Stadium.
[['11-15'], ['16-19'], ['6-9'], ['7-4'], ['17-22'], ['19-15'], ['8-26'], ['21-30'], ['11-2'], ['12-26'], ['6-30'], ['13-11'], ['18-30'], ['19-10'], ['20-28'], ['22-22'], ['24-3'], ['24-30'], ['26-28'], ['29-19']]
-- Test Summary -- 
Duration: 0.8 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: The largest stadium by capacity in the world is the Narendra Modi Stadium.

Writing at results/graph/llama-2-7b-80k_id_185_irrelevant_misleading/llama-2-7b-80k_id_185_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
The largest stadium by capacity in the world is the Narendra Modi Stadium.
[['11-15'], ['16-19'], ['6-9'], ['7-4'], ['17-22'], ['19

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The most recently released Studio Ghibli film is The Boy and the Heron.



insertion at 0
The most recently released Studio Ghibli film is The Boy and the Heron.
[['16-19'], ['8-26'], ['11-15'], ['19-15'], ['29-26'], ['6-9'], ['6-30'], ['7-4'], ['29-21'], ['31-16'], ['31-24'], ['12-26'], ['14-18'], ['15-14'], ['17-22'], ['18-30'], ['21-28'], ['21-30'], ['24-8'], ['24-29']]
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 0%
Score: 59.53876972198486
Response: The Boy and the Heron

Writing at results/graph/llama-2-7b-80k_id_189_relevant/llama-2-7b-80k_id_189_relevant_len_0_depth_0_results.json
insertion at 0
The most recently released Studio Ghibli film is The Boy and the Heron.
[['16-19'], ['8-26'], ['11-15'], ['19-15'], ['29-26'], ['6-9'], ['6-30'], ['7-4'], ['29-21'], ['31-16'], ['31-24'], ['12-26'], ['14

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The most recently released Studio Ghibli film is The Boy and the Heron.



insertion at 0
The most recently released Studio Ghibli film is The Boy and the Heron.
[['16-19'], ['8-26'], ['11-15'], ['19-15'], ['29-26'], ['6-9'], ['6-30'], ['7-4'], ['29-21'], ['31-16'], ['31-24'], ['12-26'], ['14-18'], ['15-14'], ['17-22'], ['18-30'], ['21-28'], ['21-30'], ['24-8'], ['24-29']]
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 0%
Score: 59.53876972198486
Response: The Boy and the Heron

Writing at results/graph/llama-2-7b-80k_id_189_relevant_misleading/llama-2-7b-80k_id_189_relevant_misleading_len_0_depth_0_results.json
insertion at 0
The most recently released Studio Ghibli film is The Boy and the Heron.
[['16-19'], ['8-26'], ['11-15'], ['19-15'], ['29-26'], ['6-9'], ['6-30'], ['7-4'], ['29-21'], ['31-16'], ['31

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.19it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The most recently released Studio Ghibli film is The Boy and the Heron.



insertion at 0
The most recently released Studio Ghibli film is The Boy and the Heron.
[['16-19'], ['8-26'], ['11-15'], ['19-15'], ['29-26'], ['6-9'], ['6-30'], ['7-4'], ['29-21'], ['31-16'], ['31-24'], ['12-26'], ['14-18'], ['15-14'], ['17-22'], ['18-30'], ['21-28'], ['21-30'], ['24-8'], ['24-29']]
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 0%
Score: 59.53876972198486
Response: The Boy and the Heron

Writing at results/graph/llama-2-7b-80k_id_189_irrelevant/llama-2-7b-80k_id_189_irrelevant_len_0_depth_0_results.json
insertion at 0
The most recently released Studio Ghibli film is The Boy and the Heron.
[['16-19'], ['8-26'], ['11-15'], ['19-15'], ['29-26'], ['6-9'], ['6-30'], ['7-4'], ['29-21'], ['31-16'], ['31-24'], ['12-26'], 

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.09it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The most recently released Studio Ghibli film is The Boy and the Heron.



insertion at 0
The most recently released Studio Ghibli film is The Boy and the Heron.
[['16-19'], ['8-26'], ['11-15'], ['19-15'], ['29-26'], ['6-9'], ['6-30'], ['7-4'], ['29-21'], ['31-16'], ['31-24'], ['12-26'], ['14-18'], ['15-14'], ['17-22'], ['18-30'], ['21-28'], ['21-30'], ['24-8'], ['24-29']]
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 0%
Score: 59.53876972198486
Response: The Boy and the Heron

Writing at results/graph/llama-2-7b-80k_id_189_irrelevant_misleading/llama-2-7b-80k_id_189_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
The most recently released Studio Ghibli film is The Boy and the Heron.
[['16-19'], ['8-26'], ['11-15'], ['19-15'], ['29-26'], ['6-9'], ['6-30'], ['7-4'], ['29-21'], ['31-16'], 

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Croatia's current national currency is the Euro.



insertion at 0
Croatia's current national currency is the Euro.
[['16-19'], ['6-9'], ['8-26'], ['11-15'], ['12-26'], ['17-22'], ['21-30'], ['7-4'], ['11-2'], ['19-15'], ['15-14'], ['18-30'], ['26-28'], ['6-30'], ['14-18'], ['18-10'], ['19-12'], ['21-16'], ['24-8'], ['24-24']]
-- Test Summary -- 
Duration: 0.5 seconds
Context: 0 tokens
Depth: 0%
Score: 100.00001192092896
Response: Croatia's current national currency is the Euro.

Writing at results/graph/llama-2-7b-80k_id_192_relevant/llama-2-7b-80k_id_192_relevant_len_0_depth_0_results.json
insertion at 0
Croatia's current national currency is the Euro.
[['16-19'], ['6-9'], ['8-26'], ['11-15'], ['12-26'], ['17-22'], ['21-30'], ['7-4'], ['11-2'], ['19-15'], ['15-14'], ['18-30'], ['26-28'], ['6-30'], ['14-18'], ['18-10'], ['1

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.16it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Croatia's current national currency is the Euro.



insertion at 0
Croatia's current national currency is the Euro.
[['16-19'], ['6-9'], ['8-26'], ['11-15'], ['12-26'], ['17-22'], ['21-30'], ['7-4'], ['11-2'], ['19-15'], ['15-14'], ['18-30'], ['26-28'], ['6-30'], ['14-18'], ['18-10'], ['19-12'], ['21-16'], ['24-8'], ['24-24']]
-- Test Summary -- 
Duration: 0.5 seconds
Context: 0 tokens
Depth: 0%
Score: 100.00001192092896
Response: Croatia's current national currency is the Euro.

Writing at results/graph/llama-2-7b-80k_id_192_relevant_misleading/llama-2-7b-80k_id_192_relevant_misleading_len_0_depth_0_results.json
insertion at 0
Croatia's current national currency is the Euro.
[['16-19'], ['6-9'], ['8-26'], ['11-15'], ['12-26'], ['17-22'], ['21-30'], ['7-4'], ['11-2'], ['19-15'], ['15-14'], ['18-30'], ['26-28'], ['6-30'], ['1

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Croatia's current national currency is the Euro.



insertion at 0
Croatia's current national currency is the Euro.
[['16-19'], ['6-9'], ['8-26'], ['11-15'], ['12-26'], ['17-22'], ['21-30'], ['7-4'], ['11-2'], ['19-15'], ['15-14'], ['18-30'], ['26-28'], ['6-30'], ['14-18'], ['18-10'], ['19-12'], ['21-16'], ['24-8'], ['24-24']]
-- Test Summary -- 
Duration: 0.5 seconds
Context: 0 tokens
Depth: 0%
Score: 100.00001192092896
Response: Croatia's current national currency is the Euro.

Writing at results/graph/llama-2-7b-80k_id_192_irrelevant/llama-2-7b-80k_id_192_irrelevant_len_0_depth_0_results.json
insertion at 0
Croatia's current national currency is the Euro.
[['16-19'], ['6-9'], ['8-26'], ['11-15'], ['12-26'], ['17-22'], ['21-30'], ['7-4'], ['11-2'], ['19-15'], ['15-14'], ['18-30'], ['26-28'], ['6-30'], ['14-18'], ['18-10'],

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Croatia's current national currency is the Euro.



insertion at 0
Croatia's current national currency is the Euro.
[['16-19'], ['6-9'], ['8-26'], ['11-15'], ['12-26'], ['17-22'], ['21-30'], ['7-4'], ['11-2'], ['19-15'], ['15-14'], ['18-30'], ['26-28'], ['6-30'], ['14-18'], ['18-10'], ['19-12'], ['21-16'], ['24-8'], ['24-24']]
-- Test Summary -- 
Duration: 0.5 seconds
Context: 0 tokens
Depth: 0%
Score: 100.00001192092896
Response: Croatia's current national currency is the Euro.

Writing at results/graph/llama-2-7b-80k_id_192_irrelevant_misleading/llama-2-7b-80k_id_192_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
Croatia's current national currency is the Euro.
[['16-19'], ['6-9'], ['8-26'], ['11-15'], ['12-26'], ['17-22'], ['21-30'], ['7-4'], ['11-2'], ['19-15'], ['15-14'], ['18-30'], ['26-28'], ['6-30'],

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: There are 20 member states in the Eurozone.



insertion at 0
There are 20 member states in the Eurozone.
[['11-15'], ['16-19'], ['6-9'], ['6-30'], ['7-4'], ['8-26'], ['12-26'], ['19-15'], ['21-30'], ['16-24'], ['17-22'], ['21-4'], ['24-29'], ['24-30'], ['7-13'], ['8-31'], ['12-16'], ['13-11'], ['15-14'], ['16-1']]
-- Test Summary -- 
Duration: 0.6 seconds
Context: 0 tokens
Depth: 0%
Score: 100.00001192092896
Response: There are 20 member states in the Eurozone.

Writing at results/graph/llama-2-7b-80k_id_193_relevant/llama-2-7b-80k_id_193_relevant_len_0_depth_0_results.json
insertion at 0
There are 20 member states in the Eurozone.
[['11-15'], ['16-19'], ['6-9'], ['6-30'], ['7-4'], ['8-26'], ['12-26'], ['19-15'], ['21-30'], ['16-24'], ['17-22'], ['21-4'], ['24-29'], ['24-30'], ['7-13'], ['8-31'], ['12-16'], ['13-11'], ['15-

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: There are 20 member states in the Eurozone.



insertion at 0
There are 20 member states in the Eurozone.
[['11-15'], ['16-19'], ['6-9'], ['6-30'], ['7-4'], ['8-26'], ['12-26'], ['19-15'], ['21-30'], ['16-24'], ['17-22'], ['21-4'], ['24-29'], ['24-30'], ['7-13'], ['8-31'], ['12-16'], ['13-11'], ['15-14'], ['16-1']]
-- Test Summary -- 
Duration: 0.6 seconds
Context: 0 tokens
Depth: 0%
Score: 100.00001192092896
Response: There are 20 member states in the Eurozone.

Writing at results/graph/llama-2-7b-80k_id_193_relevant_misleading/llama-2-7b-80k_id_193_relevant_misleading_len_0_depth_0_results.json
insertion at 0
There are 20 member states in the Eurozone.
[['11-15'], ['16-19'], ['6-9'], ['6-30'], ['7-4'], ['8-26'], ['12-26'], ['19-15'], ['21-30'], ['16-24'], ['17-22'], ['21-4'], ['24-29'], ['24-30'], ['7-13'], ['8-31'], ['12-

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: There are 20 member states in the Eurozone.



insertion at 0
There are 20 member states in the Eurozone.
[['11-15'], ['16-19'], ['6-9'], ['6-30'], ['7-4'], ['8-26'], ['12-26'], ['19-15'], ['21-30'], ['16-24'], ['17-22'], ['21-4'], ['24-29'], ['24-30'], ['7-13'], ['8-31'], ['12-16'], ['13-11'], ['15-14'], ['16-1']]
-- Test Summary -- 
Duration: 0.6 seconds
Context: 0 tokens
Depth: 0%
Score: 100.00001192092896
Response: There are 20 member states in the Eurozone.

Writing at results/graph/llama-2-7b-80k_id_193_irrelevant/llama-2-7b-80k_id_193_irrelevant_len_0_depth_0_results.json
insertion at 0
There are 20 member states in the Eurozone.
[['11-15'], ['16-19'], ['6-9'], ['6-30'], ['7-4'], ['8-26'], ['12-26'], ['19-15'], ['21-30'], ['16-24'], ['17-22'], ['21-4'], ['24-29'], ['24-30'], ['7-13'], ['8-31'], ['12-16'], ['13-11'], [

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: There are 20 member states in the Eurozone.



insertion at 0
There are 20 member states in the Eurozone.
[['11-15'], ['16-19'], ['6-9'], ['6-30'], ['7-4'], ['8-26'], ['12-26'], ['19-15'], ['21-30'], ['16-24'], ['17-22'], ['21-4'], ['24-29'], ['24-30'], ['7-13'], ['8-31'], ['12-16'], ['13-11'], ['15-14'], ['16-1']]
-- Test Summary -- 
Duration: 0.6 seconds
Context: 0 tokens
Depth: 0%
Score: 100.00001192092896
Response: There are 20 member states in the Eurozone.

Writing at results/graph/llama-2-7b-80k_id_193_irrelevant_misleading/llama-2-7b-80k_id_193_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
There are 20 member states in the Eurozone.
[['11-15'], ['16-19'], ['6-9'], ['6-30'], ['7-4'], ['8-26'], ['12-26'], ['19-15'], ['21-30'], ['16-24'], ['17-22'], ['21-4'], ['24-29'], ['24-30'], ['7-13'], ['8-31'], [

In [9]:
# Example 1: Relevant context without misleading statements
# args = get_args(id=id_val, context_type="relevant", with_misleading=False)
# run_test(args)

In [10]:
# Example 2: Irrelevant context without misleading statements
# args = get_args(id=id_val, context_type="irrelevant", with_misleading=False)
# run_test(args)

In [11]:
# Exmample 3: Relevant context with misleading statements
# args = get_args(id=id_val, context_type="relevant", with_misleading=True)
# run_test(args)

In [12]:
# Example 4: Irrelevant context with misleading statements
# args = get_args(id=id_val, context_type="irrelevant", with_misleading=True)
# run_test(args)