# RAG Pipeline for Natural Questions Dataset (Modular)

This notebook implements a Retrieval-Augmented Generation (RAG) pipeline to answer questions from the Natural Questions (NQ) dataset. This version is modularized, with core logic in the `src` directory.

**Pipeline Components (from `src.config`):**
- **Dataset:** Natural Questions (Simplified Version)
- **Orchestration:** LlamaIndex
- **Vector Store:** Qdrant
- **Embedding Model & LLMs:** Configured via `src.config` and `src.llm_setup`

**Evaluation:**
- **Retrieval:** Recall@K
- **Generation (Reference-based):** ROUGE, BLEU
- **Generation (Reference-free & Reference-aware):** LLM-as-a-Judge

## 1. Setup and Imports

In [1]:
# %pip install -qU llama-index llama-index-vector-stores-qdrant llama-index-embeddings-huggingface llama-index-llms-openai qdrant-client sentence-transformers nltk rouge_score datasets tqdm pandas scikit-learn ipywidgets notebook

In [2]:
# Cell [2]
%load_ext autoreload
%autoreload 2

import os
# Set your LM Studio API base if it's not already an environment variable elsewhere
# Ensure this IP and port are correct for your LM Studio instance
os.environ["LM_STUDIO_API_BASE"] = "http://192.168.0.114:1234/v1"
# For OpenAI compatible endpoints like LM Studio, the API key is often not needed or can be any string
os.environ["OPENAI_API_KEY"] = "not-needed" # Or your actual key if required by a different service
os.environ["OPENAI_API_BASE"] = os.environ["LM_STUDIO_API_BASE"] # Align OpenAI base with LM Studio for LlamaIndex

print(f"Notebook Set: LM_STUDIO_API_BASE is now: {os.getenv('LM_STUDIO_API_BASE')}")
print(f"Notebook Set: OPENAI_API_KEY is now: {os.getenv('OPENAI_API_KEY')}")
print(f"Notebook Set: OPENAI_API_BASE is now: {os.getenv('OPENAI_API_BASE')}")


import llama_index.llms.openai.utils as oi_utils
# Add your custom model names and their context window sizes to LlamaIndex's registry
# This helps LlamaIndex understand the capabilities of models served via the OpenAI-compatible endpoint.
# The value should be the actual context window you've set in LM Studio (e.g., 8192).
# src.config.MODEL_CTX_WINDOW will be the source of truth for budgeting.
# These oi_utils are more for LlamaIndex's internal model DB if it uses it for OpenAI class.
oi_utils.ALL_AVAILABLE_MODELS["gemma-3-4b-it"] = 8192 # Match cfg.MODEL_CTX_WINDOW
oi_utils.ALL_AVAILABLE_MODELS["gemma-3-27b-it"] = 8192 # Match cfg.MODEL_CTX_WINDOW (or specific judge window if different)

# Mark them as chat models
oi_utils.CHAT_MODELS.update({"gemma-3-4b-it": True, "gemma-3-27b-it": True})

Notebook Set: LM_STUDIO_API_BASE is now: http://192.168.0.114:1234/v1
Notebook Set: OPENAI_API_KEY is now: not-needed
Notebook Set: OPENAI_API_BASE is now: http://192.168.0.114:1234/v1


In [3]:
import sys
import json
import logging
import random
import numpy as np
import pandas as pd
from IPython.display import display, JSON # For pretty printing JSON and DataFrames

# Add project root to sys.path to allow imports from src
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

# Import from custom modules
from src import config as cfg 
from src.data_loader import load_nq_data
from src.text_processing import extract_and_filter_text_from_simplified_doc 
from src.data_preparation import prepare_kb_and_test_set
from src.llm_setup import setup_qdrant_client, setup_embedding_model, setup_llm, configure_llama_index_settings
from src.rag_pipeline_logic import index_knowledge_base, generate_no_rag_answers, generate_rag_answers
from src.evaluation_utils import evaluate_recall_at_k, calculate_rouge_l_and_bleu, evaluate_with_llm_judge

from llama_index.core.prompts import PromptTemplate
from llama_index.core import Settings # For accessing global settings

## 2. Configure Logging and Random Seeds

In [4]:
logging.basicConfig(stream=sys.stdout, 
                    level=getattr(logging, cfg.LOG_LEVEL.upper(), logging.INFO),
                    format=cfg.LOG_FORMAT)
logger = logging.getLogger(__name__) # For notebook specific logs

logging.getLogger("httpx").setLevel(logging.WARNING)

random.seed(cfg.RANDOM_SEED)
np.random.seed(cfg.RANDOM_SEED)

logger.info(f"Notebook execution started. Random seed: {cfg.RANDOM_SEED}")
logger.info(f"Data will be loaded from: {cfg.NQ_TRAIN_PATH}")

2025-06-01 00:19:53,296 - __main__ - INFO - Notebook execution started. Random seed: 42
2025-06-01 00:19:53,297 - __main__ - INFO - Data will be loaded from: /home/denis/kpi/iasa_nlp_labs/nq_rag/data/v1.0-simplified-nq-train.jsonl.gz


## 3. Exploratory Data Analysis (EDA)
Performing a brief EDA on a sample of the data.

In [5]:
logger.info(f"Starting EDA: Loading up to {cfg.EDA_SAMPLE_SIZE} examples for inspection...")
eda_data = load_nq_data(cfg.NQ_TRAIN_PATH, max_examples=cfg.EDA_SAMPLE_SIZE)

if not eda_data:
    logger.error("EDA: No data loaded. Cannot perform EDA.")
else:
    logger.info(f"EDA: Loaded {len(eda_data)} examples.")
    stats = {
        "total_examples_inspected": len(eda_data),
        "with_document_text_field": 0,
        "with_long_answer_candidates_list": 0,
        "with_non_empty_long_answer_candidates": 0,
        "examples_with_at_least_one_valid_candidate_text": 0,
        "total_candidates_inspected": 0,
        "valid_candidate_tokens_spans": 0,
        "candidates_with_non_empty_extracted_text": 0,
        "with_example_id": 0,
        "with_question_text": 0
    }

    from tqdm.notebook import tqdm 
    for i, example in enumerate(tqdm(eda_data, desc="Performing EDA on NQ data")):
        if example.get('example_id'): stats["with_example_id"] += 1
        if example.get('question_text'): stats["with_question_text"] += 1
        
        has_doc_text = False
        doc_text_content = example.get('document_text')
        if doc_text_content and isinstance(doc_text_content, str):
            stats["with_document_text_field"] += 1
            has_doc_text = True
            doc_tokens_list = doc_text_content.split(' ') 

        if 'long_answer_candidates' in example: stats["with_long_answer_candidates_list"] += 1
        candidates = example.get('long_answer_candidates', [])
        if candidates: stats["with_non_empty_long_answer_candidates"] += 1
        
        example_produced_kb_doc = False
        if has_doc_text and candidates:
            for cand in candidates:
                stats["total_candidates_inspected"] += 1
                if not isinstance(cand, dict): continue
                start_token, end_token = cand.get('start_token', -1), cand.get('end_token', -1)
                if start_token != -1 and end_token != -1 and start_token < end_token:
                    stats["valid_candidate_tokens_spans"] += 1
                    # Pass the pre-split doc_tokens_list for EDA consistency
                    text_content = extract_and_filter_text_from_simplified_doc(doc_tokens_list, start_token, end_token)
                    if text_content:
                        stats["candidates_with_non_empty_extracted_text"] += 1
                        example_produced_kb_doc = True
        if example_produced_kb_doc: stats["examples_with_at_least_one_valid_candidate_text"] += 1

    print("\n--- NQ Dataset EDA Results (Sample) ---")
    for key, value in stats.items(): print(f"{key}: {value}")
    if stats["total_examples_inspected"] > 0:
        print(f"\n% with 'document_text': { (stats['with_document_text_field'] / stats['total_examples_inspected']) * 100 :.2f}%")
        print(f"% with non-empty 'long_answer_candidates': { (stats['with_non_empty_long_answer_candidates'] / stats['total_examples_inspected']) * 100 :.2f}%")
        print(f"% producing at least one KB doc: { (stats['examples_with_at_least_one_valid_candidate_text'] / stats['total_examples_inspected']) * 100 :.2f}%")

2025-06-01 00:19:53,331 - __main__ - INFO - Starting EDA: Loading up to 1000 examples for inspection...
2025-06-01 00:19:53,332 - src.data_loader - INFO - Attempting to load NQ data from: /home/denis/kpi/iasa_nlp_labs/nq_rag/data/v1.0-simplified-nq-train.jsonl.gz


Loading NQ data: 0 lines [00:00, ? lines/s]

2025-06-01 00:19:53,613 - src.data_loader - INFO - Reached max_examples limit of 1000. Loaded 1000 examples.
2025-06-01 00:19:53,614 - src.data_loader - INFO - Successfully loaded 1000 examples from /home/denis/kpi/iasa_nlp_labs/nq_rag/data/v1.0-simplified-nq-train.jsonl.gz
2025-06-01 00:19:53,615 - __main__ - INFO - EDA: Loaded 1000 examples.


Performing EDA on NQ data:   0%|          | 0/1000 [00:00<?, ?it/s]


--- NQ Dataset EDA Results (Sample) ---
total_examples_inspected: 1000
with_document_text_field: 1000
with_long_answer_candidates_list: 1000
with_non_empty_long_answer_candidates: 1000
examples_with_at_least_one_valid_candidate_text: 1000
total_candidates_inspected: 139479
valid_candidate_tokens_spans: 139479
candidates_with_non_empty_extracted_text: 138640
with_example_id: 1000
with_question_text: 1000

% with 'document_text': 100.00%
% with non-empty 'long_answer_candidates': 100.00%
% producing at least one KB doc: 100.00%


## 4. Data Loading and Preparation
Load the full dataset (or `KNOWLEDGE_BASE_SIZE` limit) and prepare the knowledge base documents and test set.

In [6]:
logger.info(f"Loading NQ data for KB and Test Set (up to {cfg.KNOWLEDGE_BASE_SIZE} examples)...")
all_nq_data = load_nq_data(cfg.NQ_TRAIN_PATH, max_examples=cfg.KNOWLEDGE_BASE_SIZE)

test_set = []
kb_docs_llama = []
gold_map_for_recall = {}

if not all_nq_data:
    logger.error("No data loaded. Cannot proceed with KB and Test Set preparation.")
else:
    logger.info(f"Preparing Knowledge Base and Test Set from {len(all_nq_data)} loaded examples...")
    test_set, kb_docs_llama, gold_map_for_recall = prepare_kb_and_test_set(
        all_nq_data, 
        cfg.NUM_TEST_EXAMPLES,
        random_seed=cfg.RANDOM_SEED
    )

print("\n--- Data Preparation Summary ---")
print(f"Knowledge Base documents created: {len(kb_docs_llama)}")
print(f"Test examples prepared: {len(test_set)}")
print(f"Gold map for recall entries: {len(gold_map_for_recall)}")

if test_set:
    print("\nSample test set item:")
    display(JSON(test_set[0]))
if kb_docs_llama:
    print("\nSample KB Llama Document (first 200 chars):")
    # Ensure get_content is called correctly; metadata_mode="all" might not be standard for Document
    try:
        print(kb_docs_llama[0].get_content()[:200] + "...") 
    except AttributeError:
        print(kb_docs_llama[0].text[:200] + "...") # Fallback to .text attribute
    print(f"Metadata: {kb_docs_llama[0].metadata}")
if gold_map_for_recall and test_set:
    try:
        sample_q_text = test_set[0]['question_text']
        if sample_q_text in gold_map_for_recall:
            print("\nSample gold_map_for_recall entry:")
            print(f"Q: {sample_q_text} -> Gold Doc ID: {gold_map_for_recall[sample_q_text]}")
    except IndexError:
        logger.warning("Could not display sample gold_map_for_recall.")

2025-06-01 00:19:54,728 - __main__ - INFO - Loading NQ data for KB and Test Set (up to 20000 examples)...
2025-06-01 00:19:54,729 - src.data_loader - INFO - Attempting to load NQ data from: /home/denis/kpi/iasa_nlp_labs/nq_rag/data/v1.0-simplified-nq-train.jsonl.gz


Loading NQ data: 0 lines [00:00, ? lines/s]

2025-06-01 00:20:00,189 - src.data_loader - INFO - Reached max_examples limit of 20000. Loaded 20000 examples.
2025-06-01 00:20:00,190 - src.data_loader - INFO - Successfully loaded 20000 examples from /home/denis/kpi/iasa_nlp_labs/nq_rag/data/v1.0-simplified-nq-train.jsonl.gz
2025-06-01 00:20:00,190 - __main__ - INFO - Preparing Knowledge Base and Test Set from 20000 loaded examples...
2025-06-01 00:20:00,191 - src.data_preparation - INFO - Building Knowledge Base with optimized candidate selection...


Building KB:   0%|          | 0/20000 [00:00<?, ? examples/s]

2025-06-01 00:20:25,834 - src.data_preparation - INFO - Total candidates processed: 2621615, Skipped: 1832032
2025-06-01 00:20:25,834 - src.data_preparation - INFO - Built KB with 789583 LlamaIndex documents (after optimized selection).
2025-06-01 00:20:25,835 - src.data_preparation - INFO - Screening examples for test set...


Screening for test set:   0%|          | 0/20000 [00:00<?, ? examples/s]

2025-06-01 00:20:29,290 - src.data_preparation - INFO - Selected 300 examples for the test set.


Processing test set:   0%|          | 0/300 [00:00<?, ? examples/s]

2025-06-01 00:20:29,357 - src.data_preparation - INFO - Prepared 300 test examples.
2025-06-01 00:20:29,357 - src.data_preparation - INFO - Prepared gold map for recall for 300 questions.

--- Data Preparation Summary ---
Knowledge Base documents created: 789583
Test examples prepared: 300
Gold map for recall entries: 300

Sample test set item:


<IPython.core.display.JSON object>


Sample KB Llama Document (first 200 chars):
( hide ) This article has multiple issues . Please help improve it or discuss these issues on the talk page . ( Learn how and when to remove these template messages ) This article needs additional cit...
Metadata: {'example_id': '5655493461695504401', 'candidate_index': 0, 'is_gold': False, 'is_top_level': True, 'original_question': 'which is the most common use of opt-in e-mail marketing', 'document_url': 'https://en.wikipedia.org//w/index.php?title=Email_marketing&amp;oldid=814071202'}

Sample gold_map_for_recall entry:
Q: when does the movie shot caller come out in theaters -> Gold Doc ID: 2765840165344359847_20


## 5. Setup Vector Store, Embedding Model, and LLMs

In [7]:
logger.info("Setting up Qdrant client, embedding model, and LLMs...")

qdrant_client_instance = setup_qdrant_client(cfg.QDRANT_URL)
embed_model = setup_embedding_model(
    cfg.EMBED_MODEL_NAME,
    batch_size=cfg.EMBED_BATCH_SIZE
    # device_preference="cuda" is default in setup_embedding_model if available
)

# Setup LLMs with temperature from config or new default (0.0)
# The setup_llm function now defaults to temperature=0.0
generator_llm = setup_llm(
    cfg.LLM_API_BASE,
    cfg.LLM_API_KEY,
    cfg.GENERATOR_LLM_MODEL_NAME,
    "Generator"
    # temperature=0.0 # Explicitly set if you don't want to rely on the new default in setup_llm
)
judge_llm = setup_llm(
    cfg.LLM_API_BASE,
    cfg.LLM_API_KEY,
    cfg.JUDGE_LLM_MODEL_NAME,
    "Judge"
    # temperature=0.0 # Explicitly set for the judge as well
)

if generator_llm and embed_model:
    # Pass the generator_llm which now has the tokenizer attribute set up by LlamaOpenAI
    configure_llama_index_settings(generator_llm, embed_model, cfg.CHUNK_SIZE, cfg.CHUNK_OVERLAP)
    logger.info("LlamaIndex global settings have been configured.")
    if Settings.tokenizer:
        logger.info(f"LlamaIndex global tokenizer configured: {Settings.tokenizer}")
    else:
        logger.warning("LlamaIndex global tokenizer could not be configured from LLM.")
else:
    logger.error("Critical components (Generator LLM or Embedding Model) failed to initialize. LlamaIndex settings not fully configured.")

2025-06-01 00:20:29,392 - __main__ - INFO - Setting up Qdrant client, embedding model, and LLMs...
2025-06-01 00:20:29,466 - src.llm_setup - INFO - Successfully connected to Qdrant at http://localhost:6333 and listed collections.
2025-06-01 00:20:29,621 - src.llm_setup - INFO - Attempting to set up embedding model 'intfloat/e5-small-v2' on device: 'cuda' with embed_batch_size: 512
2025-06-01 00:20:29,628 - sentence_transformers.SentenceTransformer - INFO - Load pretrained SentenceTransformer: intfloat/e5-small-v2
2025-06-01 00:20:32,672 - sentence_transformers.SentenceTransformer - INFO - 2 prompts are loaded, with the keys: ['query', 'text']
2025-06-01 00:20:32,899 - src.llm_setup - INFO - Successfully loaded embedding model: intfloat/e5-small-v2 on device 'cuda'
2025-06-01 00:20:32,899 - src.llm_setup - INFO - Successfully initialized Generator LLM: gemma-3-4b-it via http://192.168.0.114:1234/v1 with temperature: 0.0
2025-06-01 00:20:32,900 - src.llm_setup - INFO - Successfully initi

In [8]:
# Temporary Debug Cell
import os
print(f"DEBUG: LM_STUDIO_API_BASE from os.getenv is: {os.getenv('LM_STUDIO_API_BASE')}")
print(f"DEBUG: cfg.LLM_API_BASE is: {cfg.LLM_API_BASE}") # Assuming cfg is already imported from src.config

DEBUG: LM_STUDIO_API_BASE from os.getenv is: http://192.168.0.114:1234/v1
DEBUG: cfg.LLM_API_BASE is: http://192.168.0.114:1234/v1


In [9]:
from llama_index.llms.openai import OpenAI
import src.config as cfg # Make sure cfg is imported if not already in this cell's scope
test_llm = OpenAI(
    base_url = os.getenv("LM_STUDIO_API_BASE"), # Use env var
    api_key  = os.getenv("OPENAI_API_KEY"),    # Use env var
    model    = cfg.GENERATOR_LLM_MODEL_NAME, # Use from config
)
print(test_llm.complete("What is the capital of France?").text) # Changed "ping" to a more standard query

The capital of France is **Paris**. 

It's a global center for art, fashion, gastronomy, and culture! 😊 

Do you want to know anything more about Paris or France in general?


## 6. Indexing Knowledge Base into Qdrant

In [10]:
vector_store = None
vector_index = None

# Ensure global LlamaIndex Settings are correctly populated before calling index_knowledge_base
# This check is more robust here, before the main 'if' block for indexing.
if not Settings.embed_model or not Settings.node_parser:
    logger.error("LlamaIndex global Settings (embed_model or node_parser) not configured. Cannot proceed with indexing.")
    logger.error(f"Settings.embed_model: {Settings.embed_model}")
    logger.error(f"Settings.node_parser: {Settings.node_parser}")
elif qdrant_client_instance: # We need qdrant client. kb_docs_llama can be empty if we are loading an existing index.
    logger.info(f"Starting indexing/loading for Qdrant collection '{cfg.QDRANT_COLLECTION_NAME}'. Recreate: {cfg.RECREATE_QDRANT_COLLECTION}")
    
    vector_store, vector_index = index_knowledge_base(
        qdrant_client_instance=qdrant_client_instance,
        collection_name=cfg.QDRANT_COLLECTION_NAME,
        embedding_dim=cfg.EMBEDDING_DIM,
        kb_docs_llama=kb_docs_llama,  # Pass the loaded documents for new indexing
        # embed_model_instance and node_parser_instance are removed as they are now taken from global Settings
        recreate_collection=cfg.RECREATE_QDRANT_COLLECTION # Pass the flag from config
    )
    
    if vector_index:
        logger.info("VectorStoreIndex obtained successfully.")
    else:
        logger.error("Failed to obtain VectorStoreIndex.")
else:
    missing_components = []
    if not qdrant_client_instance: missing_components.append("Qdrant client")
    # embed_model check is implicitly covered by Settings.embed_model check above
    logger.warning(f"Skipping indexing due to missing components: {', '.join(missing_components)} or unconfigured LlamaIndex Settings.")

2025-06-01 00:20:33,589 - __main__ - INFO - Starting indexing/loading for Qdrant collection 'nq_rag_lab_collection_modular_optimized'. Recreate: True
2025-06-01 00:20:33,593 - src.rag_pipeline_logic - INFO - Qdrant collection 'nq_rag_lab_collection_modular_optimized' already exists.
2025-06-01 00:20:33,593 - src.rag_pipeline_logic - INFO - recreate_collection is True. Deleting existing Qdrant collection 'nq_rag_lab_collection_modular_optimized'.
2025-06-01 00:20:34,611 - src.rag_pipeline_logic - INFO - Creating new Qdrant collection 'nq_rag_lab_collection_modular_optimized' and indexing documents.
2025-06-01 00:20:34,735 - src.rag_pipeline_logic - INFO - Successfully created Qdrant collection: nq_rag_lab_collection_modular_optimized
2025-06-01 00:20:34,740 - src.rag_pipeline_logic - INFO - Indexing 789583 documents into Qdrant collection 'nq_rag_lab_collection_modular_optimized' using insert_batch_size=2048.


Parsing nodes:   0%|          | 0/789583 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/2048 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/580 [00:00<?, ?it/s]

2025-06-01 00:56:47,919 - src.rag_pipeline_logic - INFO - New Knowledge Base indexing complete for collection 'nq_rag_lab_collection_modular_optimized'.
2025-06-01 00:56:47,920 - __main__ - INFO - VectorStoreIndex obtained successfully.


## 7. Baseline: LLM without RAG

In [11]:
no_rag_answers = []
if generator_llm and test_set:
    logger.info("Generating baseline answers (No-RAG)...")
    no_rag_answers = generate_no_rag_answers(generator_llm, test_set)
    if no_rag_answers and len(no_rag_answers) == len(test_set) and test_set:
        print("\n--- Sample No-RAG Answer ---")
        print(f"Question: {test_set[0]['question_text']}")
        print(f"Generated (No-RAG): {no_rag_answers[0]}")
        print(f"Reference: {test_set[0]['target_answer']}")
    elif not test_set:
        logger.info("Test set is empty, no sample to show.")
    else:
        logger.warning("No-RAG answer generation might have failed or produced mismatched results.")
else:
    logger.warning("Generator LLM not ready or no test set. Skipping No-RAG generation.")

2025-06-01 00:56:48,014 - __main__ - INFO - Generating baseline answers (No-RAG)...
2025-06-01 00:56:48,015 - src.rag_pipeline_logic - INFO - Generating answers without RAG...


No-RAG Generation:   0%|          | 0/300 [00:00<?, ? questions/s]


--- Sample No-RAG Answer ---
Question: when does the movie shot caller come out in theaters
Generated (No-RAG): “Shot Caller” was released in theaters on **August 25, 2023**. 

You can find more details and showtimes here: [https://www.imdb.com/title/tt14968736/](https://www.imdb.com/title/tt14968736/)
Reference: August 18 , 2017


## 8. RAG Pipeline: Retrieval and Generation

In [12]:
rag_answers = []
retrieved_contexts_for_rag = []

if vector_index and generator_llm and test_set:
    logger.info("Generating answers with RAG...")
    rag_answers, retrieved_contexts_for_rag = generate_rag_answers(
        vector_index, 
        generator_llm, 
        test_set, 
        cfg.TOP_K_RETRIEVAL_FOR_GENERATION
    )
    if rag_answers and retrieved_contexts_for_rag and len(rag_answers) == len(test_set) and test_set:
        print("\n--- Sample RAG Answer ---")
        print(f"Question: {test_set[0]['question_text']}")
        print(f"Retrieved Context (RAG - first 300 chars): {str(retrieved_contexts_for_rag[0])[:300]}...")
        print(f"Generated (RAG): {rag_answers[0]}")
        print(f"Reference: {test_set[0]['target_answer']}")
    elif not test_set:
        logger.info("Test set is empty, no sample to show.")
    else:
        logger.warning("RAG answer generation might have failed or produced mismatched results.")
else:
    logger.warning("Vector index, Generator LLM, or test set not ready. Skipping RAG generation.")

2025-06-01 01:03:46,462 - __main__ - INFO - Generating answers with RAG...
2025-06-01 01:03:46,462 - src.rag_pipeline_logic - INFO - Setting up RAG retriever with top_k_retrieval = 5
2025-06-01 01:03:46,463 - src.rag_pipeline_logic - INFO - Generating answers with RAG...


RAG Generation:   0%|          | 0/300 [00:00<?, ? questions/s]


--- Sample RAG Answer ---
Question: when does the movie shot caller come out in theaters
Retrieved Context (RAG - first 300 chars): It premiered at the Los Angeles Film Festival on June 16 , 2017 . It was released on July 20 , 2017 , through DirecTV Cinema , and received a theatrical release on August 18 , 2017 by Saban Films .\n\nShot Caller is an American crime thriller film directed and written by Ric Roman Waugh . The film s...
Generated (RAG): August 18, 2017
Reference: August 18 , 2017


## 9. Evaluation

### 9.1. Retrieval Evaluation (Recall@K)

In [13]:
recall_at_k_scores = {}
if vector_index and test_set and gold_map_for_recall:
    logger.info(f"Evaluating retrieval recall for K values: {cfg.TOP_K_FOR_RECALL_EVALUATION}")
    recall_at_k_scores = evaluate_recall_at_k(
        test_set, 
        vector_index, 
        gold_map_for_recall, 
        cfg.TOP_K_FOR_RECALL_EVALUATION
    )
    print("\n--- Retrieval Recall@K Scores ---")
    for k_val, score in recall_at_k_scores.items():
        print(f"Recall@{k_val}: {score:.4f}")
else:
    logger.warning("Skipping Recall@K evaluation due to missing components.")

2025-06-01 01:16:15,960 - __main__ - INFO - Evaluating retrieval recall for K values: [1, 3, 5, 10]
2025-06-01 01:16:15,961 - src.evaluation_utils - INFO - Evaluating recall with retriever top_k=10


Evaluating Recall@K:   0%|          | 0/300 [00:00<?, ? questions/s]


--- Retrieval Recall@K Scores ---
Recall@1: 0.2800
Recall@3: 0.5367
Recall@5: 0.6667
Recall@10: 0.7900


### 9.2. Generation Evaluation (ROUGE-L, BLEU)

In [14]:
reference_answers = [item['target_answer'] for item in test_set] if test_set else []
generation_metrics_no_rag = {}
generation_metrics_rag = {}

if no_rag_answers and reference_answers and len(no_rag_answers) == len(reference_answers):
    logger.info("Evaluating No-RAG generation (ROUGE/BLEU)...")
    generation_metrics_no_rag = calculate_rouge_l_and_bleu(no_rag_answers, reference_answers)
    print("\n--- No-RAG Generation Metrics (ROUGE/BLEU) ---")
    for metric, score in generation_metrics_no_rag.items():
        print(f"{metric}: {score:.4f}")
else:
    logger.warning("Skipping No-RAG ROUGE/BLEU evaluation (answers or references missing/mismatched).")

if rag_answers and reference_answers and len(rag_answers) == len(reference_answers):
    logger.info("Evaluating RAG generation (ROUGE/BLEU)...")
    generation_metrics_rag = calculate_rouge_l_and_bleu(rag_answers, reference_answers)
    print("\n--- RAG Generation Metrics (ROUGE/BLEU) ---")
    for metric, score in generation_metrics_rag.items():
        print(f"{metric}: {score:.4f}")
else:
    logger.warning("Skipping RAG ROUGE/BLEU evaluation (answers or references missing/mismatched).")

2025-06-01 01:16:20,092 - __main__ - INFO - Evaluating No-RAG generation (ROUGE/BLEU)...
2025-06-01 01:16:20,093 - absl - INFO - Using default tokenizer.


Calculating ROUGE & BLEU:   0%|          | 0/300 [00:00<?, ? pairs/s]


--- No-RAG Generation Metrics (ROUGE/BLEU) ---
rougeL_fmeasure: 0.0703
bleu_4: 0.0089
2025-06-01 01:16:22,614 - __main__ - INFO - Evaluating RAG generation (ROUGE/BLEU)...
2025-06-01 01:16:22,614 - absl - INFO - Using default tokenizer.


Calculating ROUGE & BLEU:   0%|          | 0/300 [00:00<?, ? pairs/s]


--- RAG Generation Metrics (ROUGE/BLEU) ---
rougeL_fmeasure: 0.3225
bleu_4: 0.1392


### 9.3. Generation Evaluation (LLM-as-a-Judge)

In [15]:
judge_correctness_relevance_prompt_str = (
    "You are an impartial AI assistant evaluating the correctness and relevance of a Generated Answer to a Question, "
    "using a Reference Answer as a guide. Consider if the Generated Answer accurately addresses the Question and aligns "
    "with the information expected, as exemplified by the Reference Answer. Do not be overly-strict on phrasing; "
    "focus on semantic correctness and relevance.\n\n"
    "Question: {query_str}\n"
    "Reference Answer: {reference_answer_str}\n"
    "Generated Answer: {generated_answer_str}\n\n"
    "Is the Generated Answer correct and relevant for the given Question, when compared to the Reference Answer? "
    "Respond with only YES or NO."
)
judge_correctness_relevance_prompt_tmpl = PromptTemplate(judge_correctness_relevance_prompt_str)

judge_rag_faithfulness_prompt_str = (
    "You are an impartial AI assistant evaluating if a Generated Answer is faithful to the provided Retrieved Context. "
    "The answer is faithful if all claims made in the answer are clearly supported by the Retrieved Context. "
    "Do not use any external knowledge.\n\n"
    "Retrieved Context: {context_str}\n"
    "Generated Answer: {generated_answer_str}\n\n"
    "Is the Generated Answer faithful to the Retrieved Context? "
    "Respond with only YES or NO."
)
judge_rag_faithfulness_prompt_tmpl = PromptTemplate(judge_rag_faithfulness_prompt_str)

avg_correctness_relevance_no_rag = 0.0
correctness_relevance_scores_no_rag = []
if judge_llm and no_rag_answers and reference_answers and test_set and len(no_rag_answers) == len(test_set):
    logger.info("Evaluating No-RAG answers for Correctness/Relevance with LLM Judge...")
    correctness_relevance_scores_no_rag = evaluate_with_llm_judge(
        judge_llm, judge_correctness_relevance_prompt_tmpl, 
        test_set, no_rag_answers, references=reference_answers,
        progress_desc="Judge No-RAG Correct/Relevant"
    )
    if correctness_relevance_scores_no_rag:
        avg_correctness_relevance_no_rag = np.mean(correctness_relevance_scores_no_rag)
    print(f"\nNo-RAG LLM Judge Avg Correctness/Relevance: {avg_correctness_relevance_no_rag:.4f}")
else:
    logger.warning("Skipping No-RAG LLM Judge (Correctness/Relevance) due to missing components or mismatched lengths.")

avg_correctness_relevance_rag = 0.0
correctness_relevance_scores_rag = []
if judge_llm and rag_answers and reference_answers and test_set and len(rag_answers) == len(test_set):
    logger.info("Evaluating RAG answers for Correctness/Relevance with LLM Judge...")
    correctness_relevance_scores_rag = evaluate_with_llm_judge(
        judge_llm, judge_correctness_relevance_prompt_tmpl, 
        test_set, rag_answers, references=reference_answers,
        progress_desc="Judge RAG Correct/Relevant"
    )
    if correctness_relevance_scores_rag:
        avg_correctness_relevance_rag = np.mean(correctness_relevance_scores_rag)
    print(f"RAG LLM Judge Avg Correctness/Relevance: {avg_correctness_relevance_rag:.4f}")
else:
    logger.warning("Skipping RAG LLM Judge (Correctness/Relevance) due to missing components or mismatched lengths.")

avg_faithfulness_rag = 0.0
faithfulness_scores_rag = []
if judge_llm and rag_answers and retrieved_contexts_for_rag and test_set and len(rag_answers) == len(test_set):
    logger.info("Evaluating RAG answers for Faithfulness to Context with LLM Judge...")
    faithfulness_scores_rag = evaluate_with_llm_judge(
        judge_llm, judge_rag_faithfulness_prompt_tmpl, 
        test_set, rag_answers, contexts=retrieved_contexts_for_rag,
        progress_desc="Judge RAG Faithfulness"
    )
    if faithfulness_scores_rag:
        avg_faithfulness_rag = np.mean(faithfulness_scores_rag)
    print(f"RAG LLM Judge Avg Faithfulness to Context: {avg_faithfulness_rag:.4f}")
else:
    logger.warning("Skipping RAG LLM Judge (Faithfulness) due to missing components or mismatched lengths.")

2025-06-01 01:16:23,131 - __main__ - INFO - Evaluating No-RAG answers for Correctness/Relevance with LLM Judge...


Judge No-RAG Correct/Relevant:   0%|          | 0/300 [00:00<?, ? items/s]


No-RAG LLM Judge Avg Correctness/Relevance: 0.4633
2025-06-01 01:24:14,214 - __main__ - INFO - Evaluating RAG answers for Correctness/Relevance with LLM Judge...


Judge RAG Correct/Relevant:   0%|          | 0/300 [00:00<?, ? items/s]

RAG LLM Judge Avg Correctness/Relevance: 0.6833
2025-06-01 01:27:51,463 - __main__ - INFO - Evaluating RAG answers for Faithfulness to Context with LLM Judge...


Judge RAG Faithfulness:   0%|          | 0/300 [00:00<?, ? items/s]

RAG LLM Judge Avg Faithfulness to Context: 0.9167


## 10. Results Summary and Comparison

In [17]:
results_data = []
if test_set:
    for i, item in enumerate(test_set):
        res_item = {
            'Question': item['question_text'],
            'Reference Answer': item['target_answer'],
            'No-RAG Answer': no_rag_answers[i] if i < len(no_rag_answers) else 'N/A',
            'RAG Answer': rag_answers[i] if i < len(rag_answers) else 'N/A',
            'Retrieved Context (RAG)': (retrieved_contexts_for_rag[i][:500] + "..." 
                                       if i < len(retrieved_contexts_for_rag) and 
                                          isinstance(retrieved_contexts_for_rag[i], str) and 
                                          retrieved_contexts_for_rag[i] not in ["ERROR_RETRIEVING_CONTEXT", None] 
                                       else (retrieved_contexts_for_rag[i] if i < len(retrieved_contexts_for_rag) else 'N/A')),
            'Judge Correct/Relevant (No-RAG)': correctness_relevance_scores_no_rag[i] if i < len(correctness_relevance_scores_no_rag) else 'N/A',
            'Judge Correct/Relevant (RAG)': correctness_relevance_scores_rag[i] if i < len(correctness_relevance_scores_rag) else 'N/A',
            'Judge Faithful to Context (RAG)': faithfulness_scores_rag[i] if i < len(faithfulness_scores_rag) else 'N/A'
        }
        results_data.append(res_item)

results_df = pd.DataFrame(results_data)

print("\n--- Overall Evaluation Metrics Summary ---")
if recall_at_k_scores:
    for k, score in recall_at_k_scores.items():
        print(f"Retrieval Recall@{k}: {score:.4f}")
else:
    print("Retrieval Recall: Not computed or no data.")

print("\n--- No-RAG Generation Metrics ---")
if generation_metrics_no_rag:
    for metric, score in generation_metrics_no_rag.items():
        print(f"No-RAG {metric}: {score:.4f}")
else: print("No-RAG ROUGE/BLEU: Not computed or no data.")
print(f"No-RAG LLM Judge Avg Correctness/Relevance: {avg_correctness_relevance_no_rag:.4f}")

print("\n--- RAG Generation Metrics ---")
if generation_metrics_rag:
    for metric, score in generation_metrics_rag.items():
        print(f"RAG {metric}: {score:.4f}")
else: print("RAG ROUGE/BLEU: Not computed or no data.")
print(f"RAG LLM Judge Avg Correctness/Relevance: {avg_correctness_relevance_rag:.4f}")
print(f"RAG LLM Judge Avg Faithfulness to Context: {avg_faithfulness_rag:.4f}")

if not results_df.empty:
    print("\n--- Sample Results DataFrame (First 5) ---")
    display(results_df.head())
    # To save to CSV:
    results_df.to_csv(os.path.join(cfg.PROJECT_ROOT, "results", "nq_rag_results.csv"), index=False)
    logger.info(f"Results DataFrame saved to {os.path.join(cfg.PROJECT_ROOT, 'results', 'nq_rag_results.csv')}")
else:
    print("\nNo results to display in the DataFrame.")


--- Overall Evaluation Metrics Summary ---
Retrieval Recall@1: 0.2800
Retrieval Recall@3: 0.5367
Retrieval Recall@5: 0.6667
Retrieval Recall@10: 0.7900

--- No-RAG Generation Metrics ---
No-RAG rougeL_fmeasure: 0.0703
No-RAG bleu_4: 0.0089
No-RAG LLM Judge Avg Correctness/Relevance: 0.4633

--- RAG Generation Metrics ---
RAG rougeL_fmeasure: 0.3225
RAG bleu_4: 0.1392
RAG LLM Judge Avg Correctness/Relevance: 0.6833
RAG LLM Judge Avg Faithfulness to Context: 0.9167

--- Sample Results DataFrame (First 5) ---


Unnamed: 0,Question,Reference Answer,No-RAG Answer,RAG Answer,Retrieved Context (RAG),Judge Correct/Relevant (No-RAG),Judge Correct/Relevant (RAG),Judge Faithful to Context (RAG)
0,when does the movie shot caller come out in th...,"August 18 , 2017",“Shot Caller” was released in theaters on **Au...,"August 18, 2017",It premiered at the Los Angeles Film Festival ...,0.0,1.0,1
1,radius of curvature in case of plane mirror,"Mathematically , a plane mirror can be conside...","Okay, let's break down the concept of radius o...",A plane mirror can be considered the limit of ...,A plane mirror is a mirror with a flat ( plana...,1.0,1.0,1
2,how many justices currently serve on the us su...,nine,"As of today, November 2, 2023, there are **nin...",Nine,This article is part of the series on the Unit...,1.0,1.0,1
3,who is hosting the next world cup 2022,Qatar,Qatar is hosting the next World Cup in 2022.,Qatar,"On 19 March 2015 , FIFA sources confirmed that...",1.0,1.0,1
4,cohesion tension theory of water transport in ...,YES,"Okay, let's break down the Cohesion-Tension Th...",The cohesion-tension theory explains water mov...,Water is constantly lost through transpiration...,1.0,1.0,1


2025-06-01 01:37:14,369 - __main__ - INFO - Results DataFrame saved to /home/denis/kpi/iasa_nlp_labs/nq_rag/results/nq_rag_results.csv


### Further Analysis Ideas:
- Analyze cases where LLM-judge scores are low for RAG but high for No-RAG, or vice-versa.
- Correlate retrieval scores (e.g., if the gold doc was in top K) with generation quality and judge scores.
- Error analysis: identify common failure modes (e.g., irrelevant retrieval leading to poor RAG, good retrieval but poor generation, hallucination in No-RAG vs. RAG).
- Experiment with different `TOP_K_RETRIEVAL_FOR_GENERATION` values and observe impact.
- Try different prompt templates for the generator and judge LLMs.