# CRAG RAG Pipeline: Interactive Notebook Template

This notebook demonstrates how to use the modular CRAG RAG pipeline for data loading, index building, running the pipeline, and evaluation (including LLM-based evaluation).

In [1]:
# Install dependencies if needed
!pip install -r ../requirements.txt


Collecting transformers (from -r ../requirements.txt (line 1))
  Using cached transformers-4.53.3-py3-none-any.whl.metadata (40 kB)
Collecting torch (from -r ../requirements.txt (line 2))
  Downloading torch-2.7.1-cp312-cp312-manylinux_2_28_x86_64.whl.metadata (29 kB)
[31mERROR: Could not find a version that satisfies the requirement faiss-gpu (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for faiss-gpu[0m[31m
[0m

## 1. Imports and Setup

In [1]:
import os
import json
from crag_rag import (
    load_json_dataset, preprocess_crag_sample, preprocess_crag_document,
    DocumentEncoder, VectorStore, TimeAwareModule,
    NormalQueryEncoder, TemporalQueryEncoder, QueryRouter,
    LLMGenerator, SimpleRAG, TemporalRAGPipeline, Evaluator
)


  from .autonotebook import tqdm as notebook_tqdm


## 2. Load and Preprocess CRAG Data

In [3]:
# Path to your CRAG dataset (JSONL)
crag_path = '../crag_data/crag_task_1_dev_v4_release.jsonl.bz2'  # Update as needed


In [15]:
import bz2
samples = []
with bz2.open(crag_path, 'rt', encoding='utf-8') as f:
    for line in f:
        samples.append(json.loads(line))
questions = [preprocess_crag_sample(s) for s in samples]
print(f'Loaded {len(questions)} questions.')


Loaded 2706 questions.


## 3. Build or Load Document Index

In [8]:
# Build index (run once)
# from scripts.build_index import build_document_index
# build_document_index(crag_path, output_dir='index_data')

# Load index
index_dir = 'index_data'
vector_store = VectorStore(embedding_dim=384)
vector_store.load_index(os.path.join(index_dir, 'document_index.faiss'))


VectorStore initialized with embedding dimension: 384
VectorStore index loaded from index_data/document_index.faiss


## 4. Initialize Pipeline Components

In [None]:
time_aware = TimeAwareModule()
normal_enc = NormalQueryEncoder()


TimeAwareModule initialized with basic temporal keyword patterns.
NormalQueryEncoder loaded model sentence-transformers/all-MiniLM-L6-v2 on cuda
Error loading fine-tuned Contriever model: We couldn't connect to 'https://huggingface.co' to load the files, and couldn't find them in the cached files.
Check your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
Please ensure 'model_path' is correct and compatible with AutoModel.from_pretrained or provide custom loading logic.
QueryRouter initialized.
Error loading LLM model distilbert/distilgpt2: We couldn't connect to 'https://huggingface.co' to load the files, and couldn't find them in the cached files.
Check your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
Please ensure the model name is correct and you have enough resources.


NameError: name 'vector_store' is not defined

In [6]:
temporal_enc = TemporalQueryEncoder(model_path='../models/Tscontriever/pytorch_model.bin')  # Update path


Error loading fine-tuned Contriever model: We couldn't connect to 'https://huggingface.co' to load the files, and couldn't find them in the cached files.
Check your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
Please ensure 'model_path' is correct and compatible with AutoModel.from_pretrained or provide custom loading logic.


In [10]:
router = QueryRouter(time_aware, normal_enc, temporal_enc)


QueryRouter initialized.


In [12]:
llm = LLMGenerator()


Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Device set to use cuda


LLMGenerator loaded model distilbert/distilgpt2 on cuda


In [13]:
pipeline = TemporalRAGPipeline(time_aware, router, vector_store, llm)


TemporalRAGPipeline initialized with re-ranking weights: {'semantic': 0.5, 'temporal_metadata': 0.3, 'temporal_content': 0.2}


## 5. Run Pipeline on a Sample Question

In [16]:
q = questions[0]
print('Question:', q['text'])
answer = pipeline.answer_question(q['text'])
print('Generated Answer:', answer)
print('Gold Answer:', q['answer'])


Token indices sequence length is longer than the specified maximum sequence length for this model (2458 > 1024). Running this sequence through the model will result in indexing errors
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Question: how many 3-point attempts did steve nash average per game in seasons he made the 50-40-90 club?
Query 'how many 3-point attempts did steve nash average per game in seasons he made the 50-40-90 club?' identified as non-temporal. Using NormalQueryEncoder.
Generated Answer: Error during answer generation: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Gold Answer: 4 3-points attempts per game


/opt/conda/conda-bld/pytorch_1729647378361/work/aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [14,0,0], thread: [64,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1729647378361/work/aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [14,0,0], thread: [65,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1729647378361/work/aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [14,0,0], thread: [66,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1729647378361/work/aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [14,0,0], thread: [67,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1729647378361/work/aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [14,0,0], thread: [68,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/cond

## 6. Batch Evaluation (Exact Match)

In [None]:
evaluator = Evaluator()
results = evaluator.run_evaluation_suite(pipeline, questions[:10])  # Use a small batch for demo
print(results)


## 7. LLM-based Evaluation (Optional, Requires OpenAI API Key)
You can use OpenAI's GPT-3.5/4 or HuggingFace LLMs to judge the quality of generated answers.
Below is a template for OpenAI-based evaluation.

In [None]:
# !pip install openai
import openai
openai.api_key = 'sk-...'  # Set your OpenAI API key

def gpt_judge(question, gold, pred):
    prompt = f'''
    You are an expert judge.
    Question: {question}
    Gold Answer: {gold}
    Predicted Answer: {pred}
    Is the predicted answer correct, partially correct, or incorrect?
    Respond with one of: CORRECT, PARTIAL, INCORRECT, and a short justification.
    '''
    response = openai.ChatCompletion.create(
        model='gpt-3.5-turbo',
        messages=[{'role': 'user', 'content': prompt}],
        max_tokens=100
    )
    return response['choices'][0]['message']['content']

# Example usage:
# judge_result = gpt_judge(q['text'], q['answer'], answer)
# print(judge_result)
