# Notebook Overview

This notebook provides an faster testing interface. 

It includes steps to 
 - Check GPU availability
 - Set up the environment, 
 - Define helper functions,
 - Run tests with a given **id** and **context_type**. 

The results are saved for further analysis.

*Make sure to select the created `venv` as your kernel.*

If there are any errors while running the tests, restart the the notebook, and run again.

## Check GPU Availability
Run this cell to check if there is free space on the gpu. 

If the **free space** is not close to the **total space** then someone else is probably using the machine. 

You can check the active processes bu running `nvidia-smi` in the terminal.

In [1]:
import torch

def get_gpu_memory_torch():
    if torch.cuda.is_available():
        for i in range(torch.cuda.device_count()):
            free_mem = torch.cuda.mem_get_info(i)[0] / 1024**3
            total_mem = torch.cuda.get_device_properties(i).total_memory / 1024**3
            print(f"GPU {i}: {free_mem:.4f} GiB free / {total_mem:.4f} GiB total")
    else:
        print("No CUDA-compatible GPU detected.")

get_gpu_memory_torch()

GPU 0: 23.2003 GiB free / 23.5746 GiB total


## Setup
The code below imports necessary modules, defines directory paths, and sets the Hugging Face cache path.

**No need to modify this.**

In [2]:
import os
import sys
import json
from pathlib import Path

PROJECT_DIR = Path.cwd()
HAYSTACK_DIR = PROJECT_DIR / "haystack"
RELEVANT_DIR = HAYSTACK_DIR / "relevant"
IRRELAVANT_DIR = HAYSTACK_DIR / "irrelevant"
MISLEADING_IN_RELEVANT_DIR = HAYSTACK_DIR / "misleading_in_relevant"
MISLEADING_IN_IRRELEVANT_DIR = HAYSTACK_DIR / "misleading_in_irrelevant"

sys.path.append(str(PROJECT_DIR))

with open(HAYSTACK_DIR / "needles.json", "r") as f:
    NEEDLES_DATA = json.load(f)

# This sets the Hugging Face cache path. Make sure this directory exists. If not, refer to the README.
os.environ['HF_HOME'] = '.cache/hf_with_quota'

## Helper Functions

In [3]:
from transformers import AutoTokenizer

def load_text(path):
    with open(path, 'r', encoding='utf-8') as f:
        return f.read()

def insert_strings_at_sentence_breaks(text, insert_strings, context_length, tokenizer):
    # Step 1: Tokenize and truncate
    tokens = tokenizer.encode(text, add_special_tokens=False)[:context_length]
    
    # Step 2: Detect sentence breaks by decoding each token
    sentence_break_indices = [
        i for i, tok in enumerate(tokens)
        if tokenizer.decode([tok]).strip().endswith((".", "!", "?"))
    ]

    if len(sentence_break_indices) < len(insert_strings):
        raise ValueError(f"Only {len(sentence_break_indices)} sentence breaks found, "
                         f"but {len(insert_strings)} insertions requested.")

    # Step 3: Compute evenly spaced sentence break positions
    step = len(sentence_break_indices) // (len(insert_strings) + 1)
    insert_positions = [sentence_break_indices[(i + 1) * step] + 1 for i in range(len(insert_strings))]

    # Step 4: Insert insert_strings as tokens
    for pos, insert_str in reversed(list(zip(insert_positions, insert_strings))):
        insert_tokens = tokenizer.encode(insert_str, add_special_tokens=False)
        tokens[pos:pos] = insert_tokens

    return tokenizer.decode(tokens)


def insert_misleading_statements(filepath, insert_strings, context_length, output_path):
    """
    Insert misleading statements into the text at sentence breaks.
    Evenly distribute the misleading statements across the text within the context length.
    """
    tokenizer = AutoTokenizer.from_pretrained("yaofu/llama-2-7b-80k")

    full_text = load_text(filepath)
    modified_text = insert_strings_at_sentence_breaks(full_text, insert_strings, context_length, tokenizer)

    with open(output_path, 'w', encoding='utf-8') as f:
        f.write(modified_text)

    print(f"✅ Output saved to {output_path}")


  from .autonotebook import tqdm as notebook_tqdm


In [4]:
from argparse import Namespace

def get_haystack_file(item, context_type, with_misleading, context_length):
    """
    Get the haystack file for the given item and context type.
    Arguments:
        item (dict): The item to process.
        context_type (str): The type of context ("relevant" or "irrelevant").
        with_misleading (bool): Whether to include misleading information.
    Returns:
        str: The path to the haystack file.
    """

    if context_type == "relevant":
        original_dir = RELEVANT_DIR
        original_file = item["context_relevant"]  # Original file without misleading info
        misleading_dir = MISLEADING_IN_RELEVANT_DIR  # Output dir for the misleading file
    elif context_type == "irrelevant":
        original_dir = IRRELAVANT_DIR
        original_file = item["context_irrelevant"]
        misleading_dir = MISLEADING_IN_IRRELEVANT_DIR
    else:
        raise ValueError(f"Unknown context_type: {context_type}")

    if with_misleading:
        haystack_file = misleading_dir / original_file
        insert_misleading_statements(
            filepath=original_dir / original_file,           # Path to the original context file
            insert_strings=item["statements_misleading"],    # List of misleading statements
            context_length=context_length,                   # Max context length for insertion
            output_path=haystack_file                        # Output path for the processed file
        )
    else:
        haystack_file = original_dir / original_file

    return haystack_file

def get_args(id, context_type, with_misleading, data=NEEDLES_DATA):
    """
    Get the arguments for the given id and context type.
    Arguments:
        id (int): The id of the item.
        context_type (str): The type of context ("relevant" or "irrelevant").
        with_misleading (bool): Whether to include misleading information.
        data (list): The list of data items.
    Returns:
        Namespace: The arguments for the given id and context type.
    """

    context_length_max = 5000
    
    # find the item with the given id
    item = next(item for item in data if item["id"] == id)

    # get the haystack file for the given item and context type
    haystack_file = get_haystack_file(item, context_type, with_misleading, context_length_max)

    args = Namespace(
        model_name = "yaofu/llama-2-7b-80k",
        model_name_suffix = ( f"id_{item['id']}_{context_type}" if not with_misleading 
                             else f"id_{item['id']}_{context_type}_misleading"), # suffix used to name the results files,
        model_provider = "LLaMA",

        context_lengths_min = 0, # min context length
        context_lengths_max = context_length_max, # max context length
        context_lengths_num_intervals = 5, # number of intervals for context lengths

        document_depth_percent_intervals = 5, # number of intervals for document depth

        needle = item["needle_refined"],
        real_needle = item["real_needle_refined"],
        retrieval_question = item["question_refined"],
        haystack_file = haystack_file,
    )

    return args

In [5]:
import gc
import torch
from retrieval_head_detection import LLMNeedleHaystackTester as RetrievalHeads

def cleanup(tester: RetrievalHeads):
    del tester.model_to_test
    del tester.testing_results
    del tester.head_counter
    del tester
    
    gc.collect()
    torch.cuda.empty_cache()
    torch.cuda.ipc_collect()

def run_test(args):
    try:
        tester = RetrievalHeads(**vars(args))
        tester.start_test()
    finally:
        cleanup(tester)

In [6]:
import sys
print("Python executable path:")
print(sys.executable)

print("\nPython environment site-packages:")
print(sys.path)


Python executable path:
/cs/student/projects1/2021/jiajfang/SNLP_Project/venv/bin/python

Python environment site-packages:
['/usr/lib64/python39.zip', '/usr/lib64/python3.9', '/usr/lib64/python3.9/lib-dynload', '', '/cs/student/projects1/2021/jiajfang/SNLP_Project/venv/lib64/python3.9/site-packages', '/cs/student/projects1/2021/jiajfang/SNLP_Project/venv/lib/python3.9/site-packages', '/cs/student/projects1/2021/jiajfang/SNLP_Project', '/tmp/tmpqq1isqwq', './faiss_attn/']


### Set up the test object:
Specify the **id** and the **context_type** in the `get_args` function.

If `with_misleading` is True, then misleading statements are added to the specified `context_type` file.

### Start the Test

Running the test will save:

- **Contexts** → Given to the model per context length/depth  
  → `contexts/{model_name}_{context_type}_id_{id}`

- **Results** → Model outputs per context length/depth  
  → `results/graph/{model_name}_{context_type}_id_{id}`

**Example**:  
`results/graph/llama-2-7b-80k_relevant_id_44/`

Each test should take <=5 minutes. It sometimes uses the CPU instead of the GPU, if it is taking longer to finish, then restart the notebook and run again. 

In [7]:
ids = []
for file_name in os.listdir('./haystack/irrelevant'):
    if file_name.endswith('.txt'):
        file_id = os.path.splitext(file_name)[0]  # Extract the file name without extension
        ids.append(file_id)
ids.sort()
id_vals = [int(i) for i in ids[42:56]]
print(id_vals)

[410, 411, 420, 421, 422, 423, 424, 427, 43, 430, 432, 433, 435, 436]


In [8]:
for id in id_vals:
    for context_type in ["relevant", "irrelevant"]:
        for with_misleading in [False, True]:
            args = get_args(id, context_type, with_misleading)
            run_test(args)

loading from yaofu/llama-2-7b-80k
layer number: 32, head number 32


The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use `attn_implementation="flash_attention_2"` instead.
LlamaForCausalLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
  - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
  - If you are not the owner of the model architecture class, please contact the model code owner to update it.
Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.19it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The film that won the latest Academy Award for Best Animated Feature was Guillermo del Toro's Pinocchio, but since the provided answer was 'Flow', it is possible that 'Guillermo del Toro's Pinocchio' is not the correct answer and 'Flow' might be a title, so the correct answer would be 'Flow' if it indeed won the award.



insertion at 0
The film that won the latest Academy Award for Best Animated Feature was Guillermo del Toro's Pinocchio, but since the provided answer was 'Flow', it is possible that 'Guillermo del Toro's Pinocchio' is not the correct answer and 'Flow' might be a title, so the correct answer would be 'Flow' if it indeed won the award.
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 0%
Score: 46.01764678955078
Response: Flow

Writing at results/graph/llama-2-7b-80k_id_410_relevant/llama-2-7

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The film that won the latest Academy Award for Best Animated Feature was Guillermo del Toro's Pinocchio, but since the provided answer was 'Flow', it is possible that 'Guillermo del Toro's Pinocchio' is not the correct answer and 'Flow' might be a title, so the correct answer would be 'Flow' if it indeed won the award.



insertion at 0
The film that won the latest Academy Award for Best Animated Feature was Guillermo del Toro's Pinocchio, but since the provided answer was 'Flow', it is possible that 'Guillermo del Toro's Pinocchio' is not the correct answer and 'Flow' might be a title, so the correct answer would be 'Flow' if it indeed won the award.
-- Test Summary -- 
Duration: 0.1 seconds
Context: 0 tokens
Depth: 0%
Score: 46.01764678955078
Response: Flow

Writing at results/graph/llama-2-7b-80k_id_410_relevant_misleadin

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.19it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The film that won the latest Academy Award for Best Animated Feature was Guillermo del Toro's Pinocchio, but since the provided answer was 'Flow', it is possible that 'Guillermo del Toro's Pinocchio' is not the correct answer and 'Flow' might be a title, so the correct answer would be 'Flow' if it indeed won the award.



insertion at 0
The film that won the latest Academy Award for Best Animated Feature was Guillermo del Toro's Pinocchio, but since the provided answer was 'Flow', it is possible that 'Guillermo del Toro's Pinocchio' is not the correct answer and 'Flow' might be a title, so the correct answer would be 'Flow' if it indeed won the award.
-- Test Summary -- 
Duration: 0.1 seconds
Context: 0 tokens
Depth: 0%
Score: 46.01764678955078
Response: Flow

Writing at results/graph/llama-2-7b-80k_id_410_irrelevant/llama-2

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.19it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The film that won the latest Academy Award for Best Animated Feature was Guillermo del Toro's Pinocchio, but since the provided answer was 'Flow', it is possible that 'Guillermo del Toro's Pinocchio' is not the correct answer and 'Flow' might be a title, so the correct answer would be 'Flow' if it indeed won the award.



insertion at 0
The film that won the latest Academy Award for Best Animated Feature was Guillermo del Toro's Pinocchio, but since the provided answer was 'Flow', it is possible that 'Guillermo del Toro's Pinocchio' is not the correct answer and 'Flow' might be a title, so the correct answer would be 'Flow' if it indeed won the award.
-- Test Summary -- 
Duration: 0.1 seconds
Context: 0 tokens
Depth: 0%
Score: 46.01764678955078
Response: Flow

Writing at results/graph/llama-2-7b-80k_id_410_irrelevant_mislead

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The movie that won the latest Academy Award for Best Picture is Anora.



insertion at 0
The movie that won the latest Academy Award for Best Picture is Anora.
[['11-15'], ['6-9'], ['16-19'], ['7-4'], ['19-15'], ['11-2'], ['12-26'], ['21-30'], ['15-14'], ['17-22'], ['28-14'], ['29-19'], ['8-26'], ['14-18'], ['19-14'], ['13-11'], ['13-23'], ['14-1'], ['14-15'], ['16-30']]
-- Test Summary -- 
Duration: 0.7 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: The movie that won the latest Academy Award for Best Picture is Anora.

Writing at results/graph/llama-2-7b-80k_id_411_relevant/llama-2-7b-80k_id_411_relevant_len_0_depth_0_results.json
insertion at 0
The movie that won the latest Academy Award for Best Picture is Anora.
[['11-15'], ['6-9'], ['16-19'], ['7-4'], ['19-15'], ['11-2'], ['12-26'], ['21-30'], ['15-14'], ['

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The movie that won the latest Academy Award for Best Picture is Anora.



insertion at 0
The movie that won the latest Academy Award for Best Picture is Anora.
[['11-15'], ['6-9'], ['16-19'], ['7-4'], ['19-15'], ['11-2'], ['12-26'], ['21-30'], ['15-14'], ['17-22'], ['28-14'], ['29-19'], ['8-26'], ['14-18'], ['19-14'], ['13-11'], ['13-23'], ['14-1'], ['14-15'], ['16-30']]
-- Test Summary -- 
Duration: 0.7 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: The movie that won the latest Academy Award for Best Picture is Anora.

Writing at results/graph/llama-2-7b-80k_id_411_relevant_misleading/llama-2-7b-80k_id_411_relevant_misleading_len_0_depth_0_results.json
insertion at 0
The movie that won the latest Academy Award for Best Picture is Anora.
[['11-15'], ['6-9'], ['16-19'], ['7-4'], ['19-15'], ['11-2'], ['12-26'], ['

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The movie that won the latest Academy Award for Best Picture is Anora.



insertion at 0
The movie that won the latest Academy Award for Best Picture is Anora.
[['11-15'], ['6-9'], ['16-19'], ['7-4'], ['19-15'], ['11-2'], ['12-26'], ['21-30'], ['15-14'], ['17-22'], ['28-14'], ['29-19'], ['8-26'], ['14-18'], ['19-14'], ['13-11'], ['13-23'], ['14-1'], ['14-15'], ['16-30']]
-- Test Summary -- 
Duration: 0.7 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: The movie that won the latest Academy Award for Best Picture is Anora.

Writing at results/graph/llama-2-7b-80k_id_411_irrelevant/llama-2-7b-80k_id_411_irrelevant_len_0_depth_0_results.json
insertion at 0
The movie that won the latest Academy Award for Best Picture is Anora.
[['11-15'], ['6-9'], ['16-19'], ['7-4'], ['19-15'], ['11-2'], ['12-26'], ['21-30'], ['15-14']

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The movie that won the latest Academy Award for Best Picture is Anora.



insertion at 0
The movie that won the latest Academy Award for Best Picture is Anora.
[['11-15'], ['6-9'], ['16-19'], ['7-4'], ['19-15'], ['11-2'], ['12-26'], ['21-30'], ['15-14'], ['17-22'], ['28-14'], ['29-19'], ['8-26'], ['14-18'], ['19-14'], ['13-11'], ['13-23'], ['14-1'], ['14-15'], ['16-30']]
-- Test Summary -- 
Duration: 0.7 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: The movie that won the latest Academy Award for Best Picture is Anora.

Writing at results/graph/llama-2-7b-80k_id_411_irrelevant_misleading/llama-2-7b-80k_id_411_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
The movie that won the latest Academy Award for Best Picture is Anora.
[['11-15'], ['6-9'], ['16-19'], ['7-4'], ['19-15'], ['11-2'], ['12-26']

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Liverpool is currently at the top of the latest Premier League season.



insertion at 0
Liverpool is currently at the top of the latest Premier League season.
[['11-15'], ['6-9'], ['16-19'], ['8-26'], ['17-22'], ['21-30'], ['7-4'], ['19-15'], ['15-14'], ['17-0'], ['18-30'], ['24-29'], ['11-2'], ['12-26'], ['14-18'], ['20-27'], ['23-8'], ['31-16'], ['1-25'], ['1-26']]
-- Test Summary -- 
Duration: 0.6 seconds
Context: 0 tokens
Depth: 0%
Score: 100.00001192092896
Response: Liverpool is currently at the top of the latest Premier League season.

Writing at results/graph/llama-2-7b-80k_id_420_relevant/llama-2-7b-80k_id_420_relevant_len_0_depth_0_results.json
insertion at 0
Liverpool is currently at the top of the latest Premier League season.
[['11-15'], ['6-9'], ['16-19'], ['8-26'], ['17-22'], ['21-30'], ['7-4'], ['19-15'], ['1

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.19it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Liverpool is currently at the top of the latest Premier League season.



insertion at 0
Liverpool is currently at the top of the latest Premier League season.
[['11-15'], ['6-9'], ['16-19'], ['8-26'], ['17-22'], ['21-30'], ['7-4'], ['19-15'], ['15-14'], ['17-0'], ['18-30'], ['24-29'], ['11-2'], ['12-26'], ['14-18'], ['20-27'], ['23-8'], ['31-16'], ['1-25'], ['1-26']]
-- Test Summary -- 
Duration: 0.6 seconds
Context: 0 tokens
Depth: 0%
Score: 100.00001192092896
Response: Liverpool is currently at the top of the latest Premier League season.

Writing at results/graph/llama-2-7b-80k_id_420_relevant_misleading/llama-2-7b-80k_id_420_relevant_misleading_len_0_depth_0_results.json
insertion at 0
Liverpool is currently at the top of the latest Premier League season.
[['11-15'], ['6-9'], ['16-19'], ['8-26'], ['17-22'], ['21-30'], [

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Liverpool is currently at the top of the latest Premier League season.



insertion at 0
Liverpool is currently at the top of the latest Premier League season.
[['11-15'], ['6-9'], ['16-19'], ['8-26'], ['17-22'], ['21-30'], ['7-4'], ['19-15'], ['15-14'], ['17-0'], ['18-30'], ['24-29'], ['11-2'], ['12-26'], ['14-18'], ['20-27'], ['23-8'], ['31-16'], ['1-25'], ['1-26']]
-- Test Summary -- 
Duration: 0.6 seconds
Context: 0 tokens
Depth: 0%
Score: 100.00001192092896
Response: Liverpool is currently at the top of the latest Premier League season.

Writing at results/graph/llama-2-7b-80k_id_420_irrelevant/llama-2-7b-80k_id_420_irrelevant_len_0_depth_0_results.json
insertion at 0
Liverpool is currently at the top of the latest Premier League season.
[['11-15'], ['6-9'], ['16-19'], ['8-26'], ['17-22'], ['21-30'], ['7-4'], ['19-15'],

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.17it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Liverpool is currently at the top of the latest Premier League season.



insertion at 0
Liverpool is currently at the top of the latest Premier League season.
[['11-15'], ['6-9'], ['16-19'], ['8-26'], ['17-22'], ['21-30'], ['7-4'], ['19-15'], ['15-14'], ['17-0'], ['18-30'], ['24-29'], ['11-2'], ['12-26'], ['14-18'], ['20-27'], ['23-8'], ['31-16'], ['1-25'], ['1-26']]
-- Test Summary -- 
Duration: 0.6 seconds
Context: 0 tokens
Depth: 0%
Score: 100.00001192092896
Response: Liverpool is currently at the top of the latest Premier League season.

Writing at results/graph/llama-2-7b-80k_id_420_irrelevant_misleading/llama-2-7b-80k_id_420_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
Liverpool is currently at the top of the latest Premier League season.
[['11-15'], ['6-9'], ['16-19'], ['8-26'], ['17-22'], ['21-30'

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: No, Arsenal is not at the top of the latest Premier League standings.



insertion at 0
No, Arsenal is not at the top of the latest Premier League standings.
[['11-15'], ['16-19'], ['6-9'], ['7-4'], ['8-26'], ['17-22'], ['19-15'], ['21-30'], ['24-29'], ['12-26'], ['19-10'], ['20-28'], ['0-11'], ['1-26'], ['6-30'], ['11-2'], ['14-18'], ['15-14'], ['18-30'], ['19-12']]
-- Test Summary -- 
Duration: 0.8 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: No, Arsenal is not at the top of the latest Premier League standings.

Writing at results/graph/llama-2-7b-80k_id_421_relevant/llama-2-7b-80k_id_421_relevant_len_0_depth_0_results.json
insertion at 0
No, Arsenal is not at the top of the latest Premier League standings.
[['11-15'], ['16-19'], ['6-9'], ['7-4'], ['8-26'], ['17-22'], ['19-15'], ['21-30'], ['24-29'], ['12-26'

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.17it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: No, Arsenal is not at the top of the latest Premier League standings.



insertion at 0
No, Arsenal is not at the top of the latest Premier League standings.
[['11-15'], ['16-19'], ['6-9'], ['7-4'], ['8-26'], ['17-22'], ['19-15'], ['21-30'], ['24-29'], ['12-26'], ['19-10'], ['20-28'], ['0-11'], ['1-26'], ['6-30'], ['11-2'], ['14-18'], ['15-14'], ['18-30'], ['19-12']]
-- Test Summary -- 
Duration: 0.8 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: No, Arsenal is not at the top of the latest Premier League standings.

Writing at results/graph/llama-2-7b-80k_id_421_relevant_misleading/llama-2-7b-80k_id_421_relevant_misleading_len_0_depth_0_results.json
insertion at 0
No, Arsenal is not at the top of the latest Premier League standings.
[['11-15'], ['16-19'], ['6-9'], ['7-4'], ['8-26'], ['17-22'], ['19-15'], ['21-30'

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: No, Arsenal is not at the top of the latest Premier League standings.



insertion at 0
No, Arsenal is not at the top of the latest Premier League standings.
[['11-15'], ['16-19'], ['6-9'], ['7-4'], ['8-26'], ['17-22'], ['19-15'], ['21-30'], ['24-29'], ['12-26'], ['19-10'], ['20-28'], ['0-11'], ['1-26'], ['6-30'], ['11-2'], ['14-18'], ['15-14'], ['18-30'], ['19-12']]
-- Test Summary -- 
Duration: 0.8 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: No, Arsenal is not at the top of the latest Premier League standings.

Writing at results/graph/llama-2-7b-80k_id_421_irrelevant/llama-2-7b-80k_id_421_irrelevant_len_0_depth_0_results.json
insertion at 0
No, Arsenal is not at the top of the latest Premier League standings.
[['11-15'], ['16-19'], ['6-9'], ['7-4'], ['8-26'], ['17-22'], ['19-15'], ['21-30'], ['24-29'], ['12

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.17it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: No, Arsenal is not at the top of the latest Premier League standings.



insertion at 0
No, Arsenal is not at the top of the latest Premier League standings.
[['11-15'], ['16-19'], ['6-9'], ['7-4'], ['8-26'], ['17-22'], ['19-15'], ['21-30'], ['24-29'], ['12-26'], ['19-10'], ['20-28'], ['0-11'], ['1-26'], ['6-30'], ['11-2'], ['14-18'], ['15-14'], ['18-30'], ['19-12']]
-- Test Summary -- 
Duration: 0.8 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: No, Arsenal is not at the top of the latest Premier League standings.

Writing at results/graph/llama-2-7b-80k_id_421_irrelevant_misleading/llama-2-7b-80k_id_421_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
No, Arsenal is not at the top of the latest Premier League standings.
[['11-15'], ['16-19'], ['6-9'], ['7-4'], ['8-26'], ['17-22'], ['19-15'], ['21

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.17it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The guest of honor at the most recent state dinner hosted by the President of the United States was William Ruto.



insertion at 0
The guest of honor at the most recent state dinner hosted by the President of the United States was William Ruto.
[['16-19'], ['19-15'], ['6-9'], ['6-30'], ['8-26'], ['11-15'], ['24-3'], ['26-28'], ['7-4'], ['7-31'], ['8-18'], ['8-22'], ['8-31'], ['9-8'], ['10-11'], ['10-29'], ['11-16'], ['11-24'], ['12-5'], ['12-16']]
-- Test Summary -- 
Duration: 0.2 seconds
Context: 0 tokens
Depth: 0%
Score: 55.552440881729126
Response: William Ruto

Writing at results/graph/llama-2-7b-80k_id_422_relevant/llama-2-7b-80k_id_422_relevant_len_0_depth_0_results.json
insertion at 0
The guest of honor at the most recent state dinner hosted by the President of the United States was William Ruto.
[['16-19'], ['19-15'

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.14it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The guest of honor at the most recent state dinner hosted by the President of the United States was William Ruto.



insertion at 0
The guest of honor at the most recent state dinner hosted by the President of the United States was William Ruto.
[['16-19'], ['19-15'], ['6-9'], ['6-30'], ['8-26'], ['11-15'], ['24-3'], ['26-28'], ['7-4'], ['7-31'], ['8-18'], ['8-22'], ['8-31'], ['9-8'], ['10-11'], ['10-29'], ['11-16'], ['11-24'], ['12-5'], ['12-16']]
-- Test Summary -- 
Duration: 0.2 seconds
Context: 0 tokens
Depth: 0%
Score: 55.552440881729126
Response: William Ruto

Writing at results/graph/llama-2-7b-80k_id_422_relevant_misleading/llama-2-7b-80k_id_422_relevant_misleading_len_0_depth_0_results.json
insertion at 0
The guest of honor at the most recent state dinner hosted by the President of the United States was William Ruto

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.19it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The guest of honor at the most recent state dinner hosted by the President of the United States was William Ruto.



insertion at 0
The guest of honor at the most recent state dinner hosted by the President of the United States was William Ruto.
[['16-19'], ['19-15'], ['6-9'], ['6-30'], ['8-26'], ['11-15'], ['24-3'], ['26-28'], ['7-4'], ['7-31'], ['8-18'], ['8-22'], ['8-31'], ['9-8'], ['10-11'], ['10-29'], ['11-16'], ['11-24'], ['12-5'], ['12-16']]
-- Test Summary -- 
Duration: 0.2 seconds
Context: 0 tokens
Depth: 0%
Score: 55.552440881729126
Response: William Ruto

Writing at results/graph/llama-2-7b-80k_id_422_irrelevant/llama-2-7b-80k_id_422_irrelevant_len_0_depth_0_results.json
insertion at 0
The guest of honor at the most recent state dinner hosted by the President of the United States was William Ruto.
[['16-19'], ['19

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.17it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The guest of honor at the most recent state dinner hosted by the President of the United States was William Ruto.



insertion at 0
The guest of honor at the most recent state dinner hosted by the President of the United States was William Ruto.
[['16-19'], ['19-15'], ['6-9'], ['6-30'], ['8-26'], ['11-15'], ['24-3'], ['26-28'], ['7-4'], ['7-31'], ['8-18'], ['8-22'], ['8-31'], ['9-8'], ['10-11'], ['10-29'], ['11-16'], ['11-24'], ['12-5'], ['12-16']]
-- Test Summary -- 
Duration: 0.2 seconds
Context: 0 tokens
Depth: 0%
Score: 55.552440881729126
Response: William Ruto

Writing at results/graph/llama-2-7b-80k_id_422_irrelevant_misleading/llama-2-7b-80k_id_422_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
The guest of honor at the most recent state dinner hosted by the President of the United States was William 

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.19it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The Tortured Poets Department holds the record for the most Spotify streams reached in a single day.



insertion at 0
The Tortured Poets Department holds the record for the most Spotify streams reached in a single day.
-- Test Summary -- 
Duration: 1.2 seconds
Context: 0 tokens
Depth: 0%
Score: 45.488643646240234
Response: The Tortured Poets Department is the album by The Tortured Poets Department.

Writing at results/graph/llama-2-7b-80k_id_423_relevant/llama-2-7b-80k_id_423_relevant_len_0_depth_0_results.json
insertion at 0
The Tortured Poets Department holds the record for the most Spotify streams reached in a single day.
-- Test Summary -- 
Duration: 1.2 seconds
Context: 0 tokens
Depth: 25%
Score: 45.488643646240234
Response: The Tortured Poets Department is the album by The Tortured Poets Department.

Writing at result

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The Tortured Poets Department holds the record for the most Spotify streams reached in a single day.



insertion at 0
The Tortured Poets Department holds the record for the most Spotify streams reached in a single day.
-- Test Summary -- 
Duration: 1.2 seconds
Context: 0 tokens
Depth: 0%
Score: 45.488643646240234
Response: The Tortured Poets Department is the album by The Tortured Poets Department.

Writing at results/graph/llama-2-7b-80k_id_423_relevant_misleading/llama-2-7b-80k_id_423_relevant_misleading_len_0_depth_0_results.json
insertion at 0
The Tortured Poets Department holds the record for the most Spotify streams reached in a single day.
-- Test Summary -- 
Duration: 1.2 seconds
Context: 0 tokens
Depth: 25%
Score: 45.488643646240234
Response: The Tortured Poets Department is the album by The Tortured Poets Departme

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The Tortured Poets Department holds the record for the most Spotify streams reached in a single day.



insertion at 0
The Tortured Poets Department holds the record for the most Spotify streams reached in a single day.
-- Test Summary -- 
Duration: 1.2 seconds
Context: 0 tokens
Depth: 0%
Score: 45.488643646240234
Response: The Tortured Poets Department is the album by The Tortured Poets Department.

Writing at results/graph/llama-2-7b-80k_id_423_irrelevant/llama-2-7b-80k_id_423_irrelevant_len_0_depth_0_results.json
insertion at 0
The Tortured Poets Department holds the record for the most Spotify streams reached in a single day.
-- Test Summary -- 
Duration: 1.2 seconds
Context: 0 tokens
Depth: 25%
Score: 45.488643646240234
Response: The Tortured Poets Department is the album by The Tortured Poets Department.

Writing at re

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The Tortured Poets Department holds the record for the most Spotify streams reached in a single day.



insertion at 0
The Tortured Poets Department holds the record for the most Spotify streams reached in a single day.
-- Test Summary -- 
Duration: 1.2 seconds
Context: 0 tokens
Depth: 0%
Score: 45.488643646240234
Response: The Tortured Poets Department is the album by The Tortured Poets Department.

Writing at results/graph/llama-2-7b-80k_id_423_irrelevant_misleading/llama-2-7b-80k_id_423_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
The Tortured Poets Department holds the record for the most Spotify streams reached in a single day.
-- Test Summary -- 
Duration: 1.2 seconds
Context: 0 tokens
Depth: 25%
Score: 45.488643646240234
Response: The Tortured Poets Department is the album by The Tortured Poets Depa

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Muse Bihi Abdi was the most recent incumbent president worldwide who ran for re-election but was not reelected.



insertion at 0
Muse Bihi Abdi was the most recent incumbent president worldwide who ran for re-election but was not reelected.
[['11-15'], ['6-9'], ['16-19'], ['7-4'], ['19-15'], ['8-26'], ['21-30'], ['24-29'], ['12-26'], ['6-30'], ['31-16'], ['11-2'], ['14-18'], ['17-22'], ['22-8'], ['26-3'], ['29-19'], ['29-21'], ['12-2'], ['16-1']]
-- Test Summary -- 
Duration: 1.7 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: Muse Bihi Abdi was the most recent incumbent president worldwide who ran for re-election but was not reelected.

Writing at results/graph/llama-2-7b-80k_id_424_relevant/llama-2-7b-80k_id_424_relevant_len_0_depth_0_results.json
insertion at 0
Muse Bihi Abdi was the most recent incumbent pres

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.19it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Muse Bihi Abdi was the most recent incumbent president worldwide who ran for re-election but was not reelected.



insertion at 0
Muse Bihi Abdi was the most recent incumbent president worldwide who ran for re-election but was not reelected.
[['11-15'], ['6-9'], ['16-19'], ['7-4'], ['19-15'], ['8-26'], ['21-30'], ['24-29'], ['12-26'], ['6-30'], ['31-16'], ['11-2'], ['14-18'], ['17-22'], ['22-8'], ['26-3'], ['29-19'], ['29-21'], ['12-2'], ['16-1']]
-- Test Summary -- 
Duration: 1.7 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: Muse Bihi Abdi was the most recent incumbent president worldwide who ran for re-election but was not reelected.

Writing at results/graph/llama-2-7b-80k_id_424_relevant_misleading/llama-2-7b-80k_id_424_relevant_misleading_len_0_depth_0_results.json
insertion at 0
Muse Bihi Abdi was the most

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Muse Bihi Abdi was the most recent incumbent president worldwide who ran for re-election but was not reelected.



insertion at 0
Muse Bihi Abdi was the most recent incumbent president worldwide who ran for re-election but was not reelected.
[['11-15'], ['6-9'], ['16-19'], ['7-4'], ['19-15'], ['8-26'], ['21-30'], ['24-29'], ['12-26'], ['6-30'], ['31-16'], ['11-2'], ['14-18'], ['17-22'], ['22-8'], ['26-3'], ['29-19'], ['29-21'], ['12-2'], ['16-1']]
-- Test Summary -- 
Duration: 1.7 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: Muse Bihi Abdi was the most recent incumbent president worldwide who ran for re-election but was not reelected.

Writing at results/graph/llama-2-7b-80k_id_424_irrelevant/llama-2-7b-80k_id_424_irrelevant_len_0_depth_0_results.json
insertion at 0
Muse Bihi Abdi was the most recent incumbent 

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.17it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Muse Bihi Abdi was the most recent incumbent president worldwide who ran for re-election but was not reelected.



insertion at 0
Muse Bihi Abdi was the most recent incumbent president worldwide who ran for re-election but was not reelected.
[['11-15'], ['6-9'], ['16-19'], ['7-4'], ['19-15'], ['8-26'], ['21-30'], ['24-29'], ['12-26'], ['6-30'], ['31-16'], ['11-2'], ['14-18'], ['17-22'], ['22-8'], ['26-3'], ['29-19'], ['29-21'], ['12-2'], ['16-1']]
-- Test Summary -- 
Duration: 1.7 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: Muse Bihi Abdi was the most recent incumbent president worldwide who ran for re-election but was not reelected.

Writing at results/graph/llama-2-7b-80k_id_424_irrelevant_misleading/llama-2-7b-80k_id_424_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
Muse Bihi Abdi was the 

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The current ATP top-ranked men's singles tennis player is Jannik Sinner.



insertion at 0
The current ATP top-ranked men's singles tennis player is Jannik Sinner.
[['6-9'], ['11-15'], ['8-26'], ['7-4'], ['12-26'], ['16-19'], ['21-30'], ['24-29'], ['11-2'], ['19-15'], ['17-22'], ['6-30'], ['19-12'], ['31-16'], ['14-18'], ['24-30'], ['26-25'], ['26-28'], ['6-20'], ['13-11']]
-- Test Summary -- 
Duration: 0.9 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: The current ATP top-ranked men's singles tennis player is Jannik Sinner.

Writing at results/graph/llama-2-7b-80k_id_427_relevant/llama-2-7b-80k_id_427_relevant_len_0_depth_0_results.json
insertion at 0
The current ATP top-ranked men's singles tennis player is Jannik Sinner.
[['6-9'], ['11-15'], ['8-26'], ['7-4'], ['12-26'], ['16-19'], ['21-30'], ['24-29'], ['11-2

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The current ATP top-ranked men's singles tennis player is Jannik Sinner.



insertion at 0
The current ATP top-ranked men's singles tennis player is Jannik Sinner.
[['6-9'], ['11-15'], ['8-26'], ['7-4'], ['12-26'], ['16-19'], ['21-30'], ['24-29'], ['11-2'], ['19-15'], ['17-22'], ['6-30'], ['19-12'], ['31-16'], ['14-18'], ['24-30'], ['26-25'], ['26-28'], ['6-20'], ['13-11']]
-- Test Summary -- 
Duration: 0.9 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: The current ATP top-ranked men's singles tennis player is Jannik Sinner.

Writing at results/graph/llama-2-7b-80k_id_427_relevant_misleading/llama-2-7b-80k_id_427_relevant_misleading_len_0_depth_0_results.json
insertion at 0
The current ATP top-ranked men's singles tennis player is Jannik Sinner.
[['6-9'], ['11-15'], ['8-26'], ['7-4'], ['12-26'], ['16-19'], ['21-3

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.19it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The current ATP top-ranked men's singles tennis player is Jannik Sinner.



insertion at 0
The current ATP top-ranked men's singles tennis player is Jannik Sinner.
[['6-9'], ['11-15'], ['8-26'], ['7-4'], ['12-26'], ['16-19'], ['21-30'], ['24-29'], ['11-2'], ['19-15'], ['17-22'], ['6-30'], ['19-12'], ['31-16'], ['14-18'], ['24-30'], ['26-25'], ['26-28'], ['6-20'], ['13-11']]
-- Test Summary -- 
Duration: 0.9 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: The current ATP top-ranked men's singles tennis player is Jannik Sinner.

Writing at results/graph/llama-2-7b-80k_id_427_irrelevant/llama-2-7b-80k_id_427_irrelevant_len_0_depth_0_results.json
insertion at 0
The current ATP top-ranked men's singles tennis player is Jannik Sinner.
[['6-9'], ['11-15'], ['8-26'], ['7-4'], ['12-26'], ['16-19'], ['21-30'], ['24-29'], ['

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The current ATP top-ranked men's singles tennis player is Jannik Sinner.



insertion at 0
The current ATP top-ranked men's singles tennis player is Jannik Sinner.
[['6-9'], ['11-15'], ['8-26'], ['7-4'], ['12-26'], ['16-19'], ['21-30'], ['24-29'], ['11-2'], ['19-15'], ['17-22'], ['6-30'], ['19-12'], ['31-16'], ['14-18'], ['24-30'], ['26-25'], ['26-28'], ['6-20'], ['13-11']]
-- Test Summary -- 
Duration: 0.9 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: The current ATP top-ranked men's singles tennis player is Jannik Sinner.

Writing at results/graph/llama-2-7b-80k_id_427_irrelevant_misleading/llama-2-7b-80k_id_427_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
The current ATP top-ranked men's singles tennis player is Jannik Sinner.
[['6-9'], ['11-15'], ['8-26'], ['7-4'], ['12-26'], ['16-19'], ['

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.19it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Elon Musk is no longer X Corp.'s CEO.



insertion at 0
Elon Musk is no longer X Corp.'s CEO.
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 0%
Score: 24.194729328155518
Response: No, he is not.

Writing at results/graph/llama-2-7b-80k_id_43_relevant/llama-2-7b-80k_id_43_relevant_len_0_depth_0_results.json
insertion at 0
Elon Musk is no longer X Corp.'s CEO.
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 25%
Score: 24.194729328155518
Response: No, he is not.

Writing at results/graph/llama-2-7b-80k_id_43_relevant/llama-2-7b-80k_id_43_relevant_len_0_depth_2500_results.json
insertion at 0
Elon Musk is no longer X Corp.'s CEO.
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 50%
Score: 24.194729328155518
Response: No, he is not.

Writing at results/graph/llama-2-7b-80k_i

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Elon Musk is no longer X Corp.'s CEO.



insertion at 0
Elon Musk is no longer X Corp.'s CEO.
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 0%
Score: 24.194729328155518
Response: No, he is not.

Writing at results/graph/llama-2-7b-80k_id_43_relevant_misleading/llama-2-7b-80k_id_43_relevant_misleading_len_0_depth_0_results.json
insertion at 0
Elon Musk is no longer X Corp.'s CEO.
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 25%
Score: 24.194729328155518
Response: No, he is not.

Writing at results/graph/llama-2-7b-80k_id_43_relevant_misleading/llama-2-7b-80k_id_43_relevant_misleading_len_0_depth_2500_results.json
insertion at 0
Elon Musk is no longer X Corp.'s CEO.
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 50%
Score: 24.194729328155518
Response: No, he is not

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Elon Musk is no longer X Corp.'s CEO.



insertion at 0
Elon Musk is no longer X Corp.'s CEO.
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 0%
Score: 24.194729328155518
Response: No, he is not.

Writing at results/graph/llama-2-7b-80k_id_43_irrelevant/llama-2-7b-80k_id_43_irrelevant_len_0_depth_0_results.json
insertion at 0
Elon Musk is no longer X Corp.'s CEO.
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 25%
Score: 24.194729328155518
Response: No, he is not.

Writing at results/graph/llama-2-7b-80k_id_43_irrelevant/llama-2-7b-80k_id_43_irrelevant_len_0_depth_2500_results.json
insertion at 0
Elon Musk is no longer X Corp.'s CEO.
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 50%
Score: 24.194729328155518
Response: No, he is not.

Writing at results/graph/llama-2-

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.19it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: Elon Musk is no longer X Corp.'s CEO.



insertion at 0
Elon Musk is no longer X Corp.'s CEO.
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 0%
Score: 24.194729328155518
Response: No, he is not.

Writing at results/graph/llama-2-7b-80k_id_43_irrelevant_misleading/llama-2-7b-80k_id_43_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
Elon Musk is no longer X Corp.'s CEO.
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 25%
Score: 24.194729328155518
Response: No, he is not.

Writing at results/graph/llama-2-7b-80k_id_43_irrelevant_misleading/llama-2-7b-80k_id_43_irrelevant_misleading_len_0_depth_2500_results.json
insertion at 0
Elon Musk is no longer X Corp.'s CEO.
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 50%
Score: 24.194729328155518
Response: No, h

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.14it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The number 1 ranked female tennis player in the world is Aryna Sabalenka.



insertion at 0
The number 1 ranked female tennis player in the world is Aryna Sabalenka.
[['6-30'], ['16-19'], ['21-30'], ['6-9'], ['8-26'], ['11-15'], ['7-4'], ['19-15'], ['24-29'], ['31-16'], ['13-11'], ['18-30'], ['23-20'], ['26-28'], ['1-9'], ['7-31'], ['8-31'], ['9-8'], ['10-18'], ['14-18']]
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 0%
Score: 63.78287076950073
Response: Aryna Sabalenka

Writing at results/graph/llama-2-7b-80k_id_430_relevant/llama-2-7b-80k_id_430_relevant_len_0_depth_0_results.json
insertion at 0
The number 1 ranked female tennis player in the world is Aryna Sabalenka.
[['6-30'], ['16-19'], ['21-30'], ['6-9'], ['8-26'], ['11-15'], ['7-4'], ['19-15'], ['24-29'], ['31-16'], ['13-11'], ['18-30'], ['23-20']

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.16it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The number 1 ranked female tennis player in the world is Aryna Sabalenka.



insertion at 0
The number 1 ranked female tennis player in the world is Aryna Sabalenka.
[['6-30'], ['16-19'], ['21-30'], ['6-9'], ['8-26'], ['11-15'], ['7-4'], ['19-15'], ['24-29'], ['31-16'], ['13-11'], ['18-30'], ['23-20'], ['26-28'], ['1-9'], ['7-31'], ['8-31'], ['9-8'], ['10-18'], ['14-18']]
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 0%
Score: 63.78287076950073
Response: Aryna Sabalenka

Writing at results/graph/llama-2-7b-80k_id_430_relevant_misleading/llama-2-7b-80k_id_430_relevant_misleading_len_0_depth_0_results.json
insertion at 0
The number 1 ranked female tennis player in the world is Aryna Sabalenka.
[['6-30'], ['16-19'], ['21-30'], ['6-9'], ['8-26'], ['11-15'], ['7-4'], ['19-15'], ['24-29'], ['31-16'], ['13-11']

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The number 1 ranked female tennis player in the world is Aryna Sabalenka.



insertion at 0
The number 1 ranked female tennis player in the world is Aryna Sabalenka.
[['6-30'], ['16-19'], ['21-30'], ['6-9'], ['8-26'], ['11-15'], ['7-4'], ['19-15'], ['24-29'], ['31-16'], ['13-11'], ['18-30'], ['23-20'], ['26-28'], ['1-9'], ['7-31'], ['8-31'], ['9-8'], ['10-18'], ['14-18']]
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 0%
Score: 63.78287076950073
Response: Aryna Sabalenka

Writing at results/graph/llama-2-7b-80k_id_430_irrelevant/llama-2-7b-80k_id_430_irrelevant_len_0_depth_0_results.json
insertion at 0
The number 1 ranked female tennis player in the world is Aryna Sabalenka.
[['6-30'], ['16-19'], ['21-30'], ['6-9'], ['8-26'], ['11-15'], ['7-4'], ['19-15'], ['24-29'], ['31-16'], ['13-11'], ['18-30'], ['23-

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The number 1 ranked female tennis player in the world is Aryna Sabalenka.



insertion at 0
The number 1 ranked female tennis player in the world is Aryna Sabalenka.
[['6-30'], ['16-19'], ['21-30'], ['6-9'], ['8-26'], ['11-15'], ['7-4'], ['19-15'], ['24-29'], ['31-16'], ['13-11'], ['18-30'], ['23-20'], ['26-28'], ['1-9'], ['7-31'], ['8-31'], ['9-8'], ['10-18'], ['14-18']]
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 0%
Score: 63.78287076950073
Response: Aryna Sabalenka

Writing at results/graph/llama-2-7b-80k_id_430_irrelevant_misleading/llama-2-7b-80k_id_430_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
The number 1 ranked female tennis player in the world is Aryna Sabalenka.
[['6-30'], ['16-19'], ['21-30'], ['6-9'], ['8-26'], ['11-15'], ['7-4'], ['19-15'], ['24-29'], ['31-16'], ['13-

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.19it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The latest MotoGP World Riders' Champion is Jorge Martín.



insertion at 0
The latest MotoGP World Riders' Champion is Jorge Martín.
[['14-18'], ['16-19'], ['17-0'], ['18-30'], ['19-9'], ['19-15'], ['20-10'], ['21-30'], ['24-3'], ['24-11'], ['26-21'], ['26-28'], ['6-9'], ['6-30'], ['8-26'], ['8-31'], ['10-14'], ['11-15'], ['13-23'], ['14-7']]
-- Test Summary -- 
Duration: 0.2 seconds
Context: 0 tokens
Depth: 0%
Score: 63.59459161758423
Response: Jorge Martín.

Writing at results/graph/llama-2-7b-80k_id_432_relevant/llama-2-7b-80k_id_432_relevant_len_0_depth_0_results.json
insertion at 0
The latest MotoGP World Riders' Champion is Jorge Martín.
[['14-18'], ['16-19'], ['17-0'], ['18-30'], ['19-9'], ['19-15'], ['20-10'], ['21-30'], ['24-3'], ['24-11'], ['26-21'], ['26-28'], ['6-9'], ['6-30'], ['8-26'], ['8-31'], ['10-14'], ['1

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.19it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The latest MotoGP World Riders' Champion is Jorge Martín.



insertion at 0
The latest MotoGP World Riders' Champion is Jorge Martín.
[['14-18'], ['16-19'], ['17-0'], ['18-30'], ['19-9'], ['19-15'], ['20-10'], ['21-30'], ['24-3'], ['24-11'], ['26-21'], ['26-28'], ['6-9'], ['6-30'], ['8-26'], ['8-31'], ['10-14'], ['11-15'], ['13-23'], ['14-7']]
-- Test Summary -- 
Duration: 0.2 seconds
Context: 0 tokens
Depth: 0%
Score: 63.59459161758423
Response: Jorge Martín.

Writing at results/graph/llama-2-7b-80k_id_432_relevant_misleading/llama-2-7b-80k_id_432_relevant_misleading_len_0_depth_0_results.json
insertion at 0
The latest MotoGP World Riders' Champion is Jorge Martín.
[['14-18'], ['16-19'], ['17-0'], ['18-30'], ['19-9'], ['19-15'], ['20-10'], ['21-30'], ['24-3'], ['24-11'], ['26-21'], ['26-28'], ['6-9'], ['6-30'], ['8-26'], ['

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The latest MotoGP World Riders' Champion is Jorge Martín.



insertion at 0
The latest MotoGP World Riders' Champion is Jorge Martín.
[['14-18'], ['16-19'], ['17-0'], ['18-30'], ['19-9'], ['19-15'], ['20-10'], ['21-30'], ['24-3'], ['24-11'], ['26-21'], ['26-28'], ['6-9'], ['6-30'], ['8-26'], ['8-31'], ['10-14'], ['11-15'], ['13-23'], ['14-7']]
-- Test Summary -- 
Duration: 0.2 seconds
Context: 0 tokens
Depth: 0%
Score: 63.59459161758423
Response: Jorge Martín.

Writing at results/graph/llama-2-7b-80k_id_432_irrelevant/llama-2-7b-80k_id_432_irrelevant_len_0_depth_0_results.json
insertion at 0
The latest MotoGP World Riders' Champion is Jorge Martín.
[['14-18'], ['16-19'], ['17-0'], ['18-30'], ['19-9'], ['19-15'], ['20-10'], ['21-30'], ['24-3'], ['24-11'], ['26-21'], ['26-28'], ['6-9'], ['6-30'], ['8-26'], ['8-31'], ['10-14'],

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.19it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The latest MotoGP World Riders' Champion is Jorge Martín.



insertion at 0
The latest MotoGP World Riders' Champion is Jorge Martín.
[['14-18'], ['16-19'], ['17-0'], ['18-30'], ['19-9'], ['19-15'], ['20-10'], ['21-30'], ['24-3'], ['24-11'], ['26-21'], ['26-28'], ['6-9'], ['6-30'], ['8-26'], ['8-31'], ['10-14'], ['11-15'], ['13-23'], ['14-7']]
-- Test Summary -- 
Duration: 0.2 seconds
Context: 0 tokens
Depth: 0%
Score: 63.59459161758423
Response: Jorge Martín.

Writing at results/graph/llama-2-7b-80k_id_432_irrelevant_misleading/llama-2-7b-80k_id_432_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
The latest MotoGP World Riders' Champion is Jorge Martín.
[['14-18'], ['16-19'], ['17-0'], ['18-30'], ['19-9'], ['19-15'], ['20-10'], ['21-30'], ['24-3'], ['24-11'], ['26-21'], ['26-28'], ['6-9'], ['6-30'], ['8-26']

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.17it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The most recent winner of Time Magazine's Athlete of the Year award is Caitlin Clark.



insertion at 0
The most recent winner of Time Magazine's Athlete of the Year award is Caitlin Clark.
[['6-9'], ['6-30'], ['11-15'], ['16-19'], ['21-30'], ['7-4'], ['8-26'], ['14-18'], ['17-22'], ['18-30'], ['19-15'], ['22-22'], ['24-29'], ['26-28'], ['31-16'], ['7-31'], ['8-18'], ['8-24'], ['8-31'], ['9-17']]
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 0%
Score: 56.792473793029785
Response: Caitlin Clark.

Writing at results/graph/llama-2-7b-80k_id_433_relevant/llama-2-7b-80k_id_433_relevant_len_0_depth_0_results.json
insertion at 0
The most recent winner of Time Magazine's Athlete of the Year award is Caitlin Clark.
[['6-9'], ['6-30'], ['11-15'], ['16-19'], ['21-30'], ['7-4'], ['8-26'], ['14-18'], ['17-22'], ['18-

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.16it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The most recent winner of Time Magazine's Athlete of the Year award is Caitlin Clark.



insertion at 0
The most recent winner of Time Magazine's Athlete of the Year award is Caitlin Clark.
[['6-9'], ['6-30'], ['11-15'], ['16-19'], ['21-30'], ['7-4'], ['8-26'], ['14-18'], ['17-22'], ['18-30'], ['19-15'], ['22-22'], ['24-29'], ['26-28'], ['31-16'], ['7-31'], ['8-18'], ['8-24'], ['8-31'], ['9-17']]
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 0%
Score: 56.792473793029785
Response: Caitlin Clark.

Writing at results/graph/llama-2-7b-80k_id_433_relevant_misleading/llama-2-7b-80k_id_433_relevant_misleading_len_0_depth_0_results.json
insertion at 0
The most recent winner of Time Magazine's Athlete of the Year award is Caitlin Clark.
[['6-9'], ['6-30'], ['11-15'], ['16-19'], ['21-30'], ['7-4'], ['8-26'], ['14-

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The most recent winner of Time Magazine's Athlete of the Year award is Caitlin Clark.



insertion at 0
The most recent winner of Time Magazine's Athlete of the Year award is Caitlin Clark.
[['6-9'], ['6-30'], ['11-15'], ['16-19'], ['21-30'], ['7-4'], ['8-26'], ['14-18'], ['17-22'], ['18-30'], ['19-15'], ['22-22'], ['24-29'], ['26-28'], ['31-16'], ['7-31'], ['8-18'], ['8-24'], ['8-31'], ['9-17']]
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 0%
Score: 56.792473793029785
Response: Caitlin Clark.

Writing at results/graph/llama-2-7b-80k_id_433_irrelevant/llama-2-7b-80k_id_433_irrelevant_len_0_depth_0_results.json
insertion at 0
The most recent winner of Time Magazine's Athlete of the Year award is Caitlin Clark.
[['6-9'], ['6-30'], ['11-15'], ['16-19'], ['21-30'], ['7-4'], ['8-26'], ['14-18'], ['17-22'], [

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The most recent winner of Time Magazine's Athlete of the Year award is Caitlin Clark.



insertion at 0
The most recent winner of Time Magazine's Athlete of the Year award is Caitlin Clark.
[['6-9'], ['6-30'], ['11-15'], ['16-19'], ['21-30'], ['7-4'], ['8-26'], ['14-18'], ['17-22'], ['18-30'], ['19-15'], ['22-22'], ['24-29'], ['26-28'], ['31-16'], ['7-31'], ['8-18'], ['8-24'], ['8-31'], ['9-17']]
-- Test Summary -- 
Duration: 0.3 seconds
Context: 0 tokens
Depth: 0%
Score: 56.792473793029785
Response: Caitlin Clark.

Writing at results/graph/llama-2-7b-80k_id_433_irrelevant_misleading/llama-2-7b-80k_id_433_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
The most recent winner of Time Magazine's Athlete of the Year award is Caitlin Clark.
[['6-9'], ['6-30'], ['11-15'], ['16-19'], ['21-30'], ['7-4'], ['8-26'], [

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The book that won the latest Nebula award for Best Novel is The Saint of Bright Doors.



insertion at 0
The book that won the latest Nebula award for Best Novel is The Saint of Bright Doors.
[['16-19'], ['6-9'], ['6-30'], ['31-16'], ['7-4'], ['11-15'], ['18-30'], ['21-30'], ['24-29'], ['29-26'], ['31-24'], ['6-11'], ['8-26'], ['12-26'], ['19-15'], ['22-22'], ['26-28'], ['28-14'], ['29-19'], ['1-9']]
-- Test Summary -- 
Duration: 0.4 seconds
Context: 0 tokens
Depth: 0%
Score: 67.11687445640564
Response: The Saint of Bright Doors.

Writing at results/graph/llama-2-7b-80k_id_435_relevant/llama-2-7b-80k_id_435_relevant_len_0_depth_0_results.json
insertion at 0
The book that won the latest Nebula award for Best Novel is The Saint of Bright Doors.
[['16-19'], ['6-9'], ['6-30'], ['31-16'], ['7-4'], ['11-15'], ['18-30'], ['21-30'],

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.19it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The book that won the latest Nebula award for Best Novel is The Saint of Bright Doors.



insertion at 0
The book that won the latest Nebula award for Best Novel is The Saint of Bright Doors.
[['16-19'], ['6-9'], ['6-30'], ['31-16'], ['7-4'], ['11-15'], ['18-30'], ['21-30'], ['24-29'], ['29-26'], ['31-24'], ['6-11'], ['8-26'], ['12-26'], ['19-15'], ['22-22'], ['26-28'], ['28-14'], ['29-19'], ['1-9']]
-- Test Summary -- 
Duration: 0.4 seconds
Context: 0 tokens
Depth: 0%
Score: 67.11687445640564
Response: The Saint of Bright Doors.

Writing at results/graph/llama-2-7b-80k_id_435_relevant_misleading/llama-2-7b-80k_id_435_relevant_misleading_len_0_depth_0_results.json
insertion at 0
The book that won the latest Nebula award for Best Novel is The Saint of Bright Doors.
[['16-19'], ['6-9'], ['6-30'], ['31-16'], ['7-4'], ['11-15'],

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.19it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The book that won the latest Nebula award for Best Novel is The Saint of Bright Doors.



insertion at 0
The book that won the latest Nebula award for Best Novel is The Saint of Bright Doors.
[['16-19'], ['6-9'], ['6-30'], ['31-16'], ['7-4'], ['11-15'], ['18-30'], ['21-30'], ['24-29'], ['29-26'], ['31-24'], ['6-11'], ['8-26'], ['12-26'], ['19-15'], ['22-22'], ['26-28'], ['28-14'], ['29-19'], ['1-9']]
-- Test Summary -- 
Duration: 0.4 seconds
Context: 0 tokens
Depth: 0%
Score: 67.11687445640564
Response: The Saint of Bright Doors.

Writing at results/graph/llama-2-7b-80k_id_435_irrelevant/llama-2-7b-80k_id_435_irrelevant_len_0_depth_0_results.json
insertion at 0
The book that won the latest Nebula award for Best Novel is The Saint of Bright Doors.
[['16-19'], ['6-9'], ['6-30'], ['31-16'], ['7-4'], ['11-15'], ['18-30'], ['21-3

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The book that won the latest Nebula award for Best Novel is The Saint of Bright Doors.



insertion at 0
The book that won the latest Nebula award for Best Novel is The Saint of Bright Doors.
[['16-19'], ['6-9'], ['6-30'], ['31-16'], ['7-4'], ['11-15'], ['18-30'], ['21-30'], ['24-29'], ['29-26'], ['31-24'], ['6-11'], ['8-26'], ['12-26'], ['19-15'], ['22-22'], ['26-28'], ['28-14'], ['29-19'], ['1-9']]
-- Test Summary -- 
Duration: 0.4 seconds
Context: 0 tokens
Depth: 0%
Score: 67.11687445640564
Response: The Saint of Bright Doors.

Writing at results/graph/llama-2-7b-80k_id_435_irrelevant_misleading/llama-2-7b-80k_id_435_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
The book that won the latest Nebula award for Best Novel is The Saint of Bright Doors.
[['16-19'], ['6-9'], ['6-30'], ['31-16'], ['7-4'], ['11-1

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The game associated with Sky Team won the Spiel des Jahres award most recently.



insertion at 0
The game associated with Sky Team won the Spiel des Jahres award most recently.
[['11-15'], ['16-19'], ['17-22'], ['6-9'], ['7-4'], ['8-26'], ['11-2'], ['19-15'], ['21-30'], ['12-26'], ['15-14'], ['29-19'], ['6-30'], ['12-2'], ['18-30'], ['21-4'], ['26-26'], ['14-18'], ['19-10'], ['23-7']]
-- Test Summary -- 
Duration: 0.7 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: The game associated with Sky Team won the Spiel des Jahres award most recently.

Writing at results/graph/llama-2-7b-80k_id_436_relevant/llama-2-7b-80k_id_436_relevant_len_0_depth_0_results.json
insertion at 0
The game associated with Sky Team won the Spiel des Jahres award most recently.
[['11-15'], ['16-19'], ['17-22'], ['6-9'], ['7-4'], ['8-26'], ['

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The game associated with Sky Team won the Spiel des Jahres award most recently.



insertion at 0
The game associated with Sky Team won the Spiel des Jahres award most recently.
[['11-15'], ['16-19'], ['17-22'], ['6-9'], ['7-4'], ['8-26'], ['11-2'], ['19-15'], ['21-30'], ['12-26'], ['15-14'], ['29-19'], ['6-30'], ['12-2'], ['18-30'], ['21-4'], ['26-26'], ['14-18'], ['19-10'], ['23-7']]
-- Test Summary -- 
Duration: 0.7 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: The game associated with Sky Team won the Spiel des Jahres award most recently.

Writing at results/graph/llama-2-7b-80k_id_436_relevant_misleading/llama-2-7b-80k_id_436_relevant_misleading_len_0_depth_0_results.json
insertion at 0
The game associated with Sky Team won the Spiel des Jahres award most recently.
[['11-15'], ['16-19'], ['17-22'], ['6-9'],

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.17it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The game associated with Sky Team won the Spiel des Jahres award most recently.



insertion at 0
The game associated with Sky Team won the Spiel des Jahres award most recently.
[['11-15'], ['16-19'], ['17-22'], ['6-9'], ['7-4'], ['8-26'], ['11-2'], ['19-15'], ['21-30'], ['12-26'], ['15-14'], ['29-19'], ['6-30'], ['12-2'], ['18-30'], ['21-4'], ['26-26'], ['14-18'], ['19-10'], ['23-7']]
-- Test Summary -- 
Duration: 0.7 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: The game associated with Sky Team won the Spiel des Jahres award most recently.

Writing at results/graph/llama-2-7b-80k_id_436_irrelevant/llama-2-7b-80k_id_436_irrelevant_len_0_depth_0_results.json
insertion at 0
The game associated with Sky Team won the Spiel des Jahres award most recently.
[['11-15'], ['16-19'], ['17-22'], ['6-9'], ['7-4'], ['8-26']

Loading checkpoint shards: 100%|██████████| 3/3 [00:01<00:00,  2.18it/s]




Starting Needle In A Haystack Testing...
- Model: yaofu/llama-2-7b-80k
- Context Lengths: 5, Min: 0, Max: 5000
- Document Depths: 5, Min: 0%, Max: 100%
- Needle: The game associated with Sky Team won the Spiel des Jahres award most recently.



insertion at 0
The game associated with Sky Team won the Spiel des Jahres award most recently.
[['11-15'], ['16-19'], ['17-22'], ['6-9'], ['7-4'], ['8-26'], ['11-2'], ['19-15'], ['21-30'], ['12-26'], ['15-14'], ['29-19'], ['6-30'], ['12-2'], ['18-30'], ['21-4'], ['26-26'], ['14-18'], ['19-10'], ['23-7']]
-- Test Summary -- 
Duration: 0.7 seconds
Context: 0 tokens
Depth: 0%
Score: 100.0
Response: The game associated with Sky Team won the Spiel des Jahres award most recently.

Writing at results/graph/llama-2-7b-80k_id_436_irrelevant_misleading/llama-2-7b-80k_id_436_irrelevant_misleading_len_0_depth_0_results.json
insertion at 0
The game associated with Sky Team won the Spiel des Jahres award most recently.
[['11-15'], ['16-19'], ['17-22'], ['6-

In [9]:
# Example 1: Relevant context without misleading statements
# args = get_args(id=id_val, context_type="relevant", with_misleading=False)
# run_test(args)

In [10]:
# Example 2: Irrelevant context without misleading statements
# args = get_args(id=id_val, context_type="irrelevant", with_misleading=False)
# run_test(args)

In [11]:
# Exmample 3: Relevant context with misleading statements
# args = get_args(id=id_val, context_type="relevant", with_misleading=True)
# run_test(args)

In [12]:
# Example 4: Irrelevant context with misleading statements
# args = get_args(id=id_val, context_type="irrelevant", with_misleading=True)
# run_test(args)