# VerbatimRAG + Context-Enriched Integration Test

This notebook tests the full integration of ContextEnrichedProcessor with the VerbatimRAG system.
It demonstrates how hierarchical context enrichment improves retrieval accuracy and maintains
verbatim span extraction capabilities.

## Setup

Initialize the environment and load required dependencies.

In [1]:
import sys
import os
from pathlib import Path
from dotenv import load_dotenv

# Fix OpenMP conflict
os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE'

# Add project root to path
project_root = Path().absolute().parent
sys.path.append(str(project_root))

# Load environment variables from .env.local
load_dotenv(project_root / '.env.local', override=True)

# Check if an API key is loaded
if not os.environ.get('OPENAI_API_KEY'):
    print("⚠️  Warning: OPENAI_API_KEY not found. Please set it in .env.local")
else:
    print("✅ OpenAI API key loaded")

print(f"Project root: {project_root}")
print("✅ Setup complete")

✅ OpenAI API key loaded
Project root: /Users/paulschmitt/DataspellProjects/verbatim-rag
✅ Setup complete


In [2]:
from verbatim_rag.ingestion.context_enriched_processor import ContextEnrichedProcessor
from verbatim_rag.core import VerbatimRAG
from verbatim_rag.index import VerbatimIndex
from pprint import pprint

print("✅ Imports successful")

✅ Imports successful


## Test 1: Process Document with Context Enrichment

This test verifies that the ContextEnrichedProcessor can successfully process a PDF document 
and create context-enriched chunks. Each chunk will include hierarchical context information 
(section paths, titles, etc.) that will help with more accurate retrieval.

In [3]:
# Test document path
pdf_path = project_root / "data" / "acl_papers" / "Lexical_grammar_induction.pdf"

# Create context-enriched processor optimized for RAG
processor = ContextEnrichedProcessor.for_rag(
    chunk_size=384,  # Smaller chunks for better retrieval
    overlap=50
)

# Process document
print("📄 Processing document with context enrichment...")
document = processor.process_file(pdf_path, title="Lexical Grammar Induction")

print(f"✅ Document processed successfully!")
print(f"  Title: {document.title}")
print(f"  Chunks: {len(document.chunks)}")
print(f"  Content type: {document.content_type}")

# Show chunk types
enriched_chunks = [c for c in document.chunks if hasattr(c, 'section_path')]
print(f"  Context-enriched chunks: {len(enriched_chunks)}")

📄 Processing document with context enrichment...
✅ Document processed successfully!
  Title: Lexical Grammar Induction
  Chunks: 124
  Content type: DocumentType.PDF
  Context-enriched chunks: 124


## Test 2: Create VerbatimIndex with Context-Enriched Chunks

This test creates a VerbatimIndex using the context-enriched document chunks from Test 1. 
The index will store both the embeddings and the hierarchical context information, 
enabling more precise document retrieval.

In [4]:
# Create VerbatimIndex with context-enriched chunks
print("🗂️ Creating VerbatimIndex with context-enriched chunks...")

# Initialize index with OpenAI embeddings and FAISS vector store
index = VerbatimIndex(dense_model="all-MiniLM-L6-v2")

# Add the context-enriched document to the index (using add_documents method)
print("📝 Adding document to index...")
index.add_documents([document])

print(f"✅ Index created successfully!")
print(f"  Vector store type: {type(index.vector_store).__name__}")
print(f"  Embedding provider: {type(index.dense_provider).__name__}")

🗂️ Creating VerbatimIndex with context-enriched chunks...
📝 Adding document to index...
✅ Index created successfully!
  Vector store type: LocalMilvusStore
  Embedding provider: SentenceTransformersProvider


## Test 3: Initialize VerbatimRAG System

This test initializes the complete VerbatimRAG system using the context-enriched index. 
It verifies that the RAG system can properly integrate with the hierarchically structured 
document chunks.

In [5]:
# Initialize VerbatimRAG with the context-enriched index
print("🤖 Initializing VerbatimRAG system...")

rag = VerbatimRAG(
    index=index  # Pass the index as required parameter
)

print("✅ VerbatimRAG initialized successfully!")

# Test that the index is working by doing a simple search
try:
    test_results = index.search("verbatim", k=3)
    print(f"  Index working: Found {len(test_results)} results for test query")
except Exception as e:
    print(f"  Index test failed: {e}")

🤖 Initializing VerbatimRAG system...
✅ VerbatimRAG initialized successfully!
  Index working: Found 3 results for test query


## Test 4: Query with Context-Enriched Retrieval

This test performs queries against the VerbatimRAG system to verify that context-enriched 
chunks improve retrieval accuracy. We test multiple types of queries to demonstrate how 
hierarchical context helps with finding relevant information.

In [6]:
# Test queries that should benefit from hierarchical context
test_queries = [
    "What dataset was used in this study?",
    "What are the limitations of standard RAG systems?", 
    "How does the method work?",
    "What evaluation metrics were used?",
    "What are the main contributions of this work?"
]

print("🔍 Testing queries with context-enriched retrieval...")

for i, query in enumerate(test_queries, 1):
    print(f"\n--- Query {i} ---")
    print(f"Question: {query}")
    
    try:
        # Get response from VerbatimRAG
        response = rag.query(question=query)
        
        print(f"Answer: {response.answer[:200]}...")
        print(f"Source documents: {len(response.documents)} documents cited")
        
        # Show retrieved documents with their context
        print("Retrieved documents:")
        for j, doc in enumerate(response.documents[:2]):
            print(f"  {j+1}. Document: '{doc.title}'")
            if hasattr(doc, 'highlights') and doc.highlights:
                print(f"     Highlights: {len(doc.highlights)} spans")
                for k, highlight in enumerate(doc.highlights[:1]):
                    print(f"       - {highlight.text[:80]}...")
            else:
                print(f"     Content preview: {doc.content[:80] if hasattr(doc, 'content') else 'N/A'}...")
                
    except Exception as e:
        print(f"❌ Error: {e}")
    
    print("-" * 50)

🔍 Testing queries with context-enriched retrieval...

--- Query 1 ---
Question: What dataset was used in this study?
Answer: Thanks for your question! Based on the documents, here are the key points:

• Dataset
• 6 datasets
• dataset
• Russian and Spanish out-of-domain datasets derived from Wikipedia
• the two datasets
• BM...
Source documents: 5 documents cited
Retrieved documents:
  1. Document: 'Lexical Grammar Induction'
     Highlights: 1 spans
       - Dataset...
  2. Document: 'Lexical Grammar Induction'
     Highlights: 3 spans
       - Russian and Spanish out-of-domain datasets derived from Wikipedia...
--------------------------------------------------

--- Query 2 ---
Question: What are the limitations of standard RAG systems?
Answer: Thanks for your question! Based on the documents, here are the key points:

• the number of patterns matching a subgraph of N +1 nodes increases from 2 N to 3 N
• the number of IRTG rules that under o...
Source documents: 5 documents cited
Retr

## Test 5: Analyze Context Benefits

This test analyzes the search results to understand how context enrichment affects chunk 
retrieval. It examines the retrieved chunks' metadata and content to demonstrate the 
benefits of hierarchical context.

In [7]:
# Test specific query to compare context benefits
query = "What are the limitations mentioned in the paper?"

print(f"🔬 Context Analysis: '{query}'")
print("=" * 60)

# Retrieve top chunks
try:
    results = index.search(query, k=5)
    
    print(f"\n📊 Retrieved {len(results)} chunks:")
    
    for i, result in enumerate(results, 1):
        print(f"\n{i}. Score: {result.score:.3f}")
        print(f"   Content: {result.text[:120]}...")
        
        # Check metadata for context information
        if result.metadata:
            if 'title' in result.metadata:
                print(f"   Document: {result.metadata['title']}")
            if 'chunk_type' in result.metadata:
                print(f"   Chunk type: {result.metadata['chunk_type']}")
            
except Exception as e:
    print(f"❌ Search error: {e}")

🔬 Context Analysis: 'What are the limitations mentioned in the paper?'

📊 Retrieved 5 chunks:

1. Score: 0.257
   Content: Lexical Grammar Induction | Section: 3 SZTAKI Institute of Computer Science andras@kornai.com | Subsection: 3.1 Motivati...
   Document: Lexical Grammar Induction
   Chunk type: paragraph

2. Score: 0.228
   Content: Lexical Grammar Induction | Section: 4 Hierarchical surface realization | or which parsing did not finish in 60 seconds,...
   Document: Lexical Grammar Induction
   Chunk type: section

3. Score: 0.217
   Content: Lexical Grammar Induction | Section: 5 Overall architecture | periments. In case of a timeout, a new IRTG is generated k...
   Document: Lexical Grammar Induction
   Chunk type: section

4. Score: 0.209
   Content: Lexical Grammar Induction | Section: 4 Hierarchical surface realization | rguments of a predicate separated by an interv...
   Document: Lexical Grammar Induction
   Chunk type: section

5. Score: 0.205
   Content: Lexical Grammar

## Test 6: Span Extraction with Context

This test verifies that the VerbatimRAG span extraction functionality works correctly 
with context-enriched chunks. It ensures that the hierarchical context doesn't interfere 
with the verbatim span extraction process and that citations are properly generated.

In [8]:
# Test span extraction to ensure context doesn't interfere
query = "What evaluation metrics were used?"

print(f"🎯 Span Extraction Test: '{query}'")
print("=" * 50)

try:
    # Get a full response with span extraction
    response = rag.query(question=query)
    
    print(f"\n📝 Answer: {response.answer[:200]}...")
    print(f"\n📚 Citations ({len(response.structured_answer.citations)}):")
    
    for i, citation in enumerate(response.structured_answer.citations[:5]):
        print(f"\n{i+1}. Citation:")
        print(f"   Extracted span: {citation.text[:100]}...")
        print(f"   Document index: {citation.doc_index}")
        print(f"   Highlight index: {citation.highlight_index}")
            
except Exception as e:
    print(f"❌ Span extraction error: {e}")

🎯 Span Extraction Test: 'What evaluation metrics were used?'

📝 Answer: Thanks for your question! Based on the documents, here are the key points:

• using automatic evaluation metrics
• average scores (Ave)
• average standardized scores (Ave. z)
• meaning similarity
• re...

📚 Citations (12):

1. Citation:
   Extracted span: using automatic evaluation metrics...
   Document index: 0
   Highlight index: 0

2. Citation:
   Extracted span: average standardized scores (Ave. z)...
   Document index: 1
   Highlight index: 0

3. Citation:
   Extracted span: readability evaluations...
   Document index: 1
   Highlight index: 1

4. Citation:
   Extracted span: average scores (Ave)...
   Document index: 1
   Highlight index: 2

5. Citation:
   Extracted span: meaning similarity...
   Document index: 1
   Highlight index: 3
