In [1]:
# RAG Workshop - Naive RAG Challenges
# Run 'uv sync' in the project root if dependencies are missing


# RAG Workshop - Naive RAG Challenges

This notebook demonstrates the key limitations of naive RAG systems using our extended Wikipedia dataset. We'll focus on scenarios that clearly show where naive RAG fails and why advanced techniques are necessary.

## Dataset Overview:

- **61 articles** including Wikipedia + long technical blogs from Lilian Weng, arXiv papers
- **1,210 pre-chunked** pieces with 300 character chunks, 50 character overlap
- **Pre-embedded** using OpenAI text-embedding-3-small
- **Cloud-hosted** on Qdrant for reliable access
- **Includes cross-domain articles** to demonstrate naive RAG limitations

# 1. Setup Prerequisites

## üîó Complete Setup Required

Before running this notebook, you **must** complete the workshop setup process. This includes:

- Setting up your Qdrant database (Cloud or Docker)
- Configuring environment variables
- Running the data ingestion script

üìñ **Please follow the complete setup guide here: [`SETUP.md`](../SETUP.md)**

The setup process takes about 5-10 minutes and only needs to be done once for the entire workshop.

## ‚ö†Ô∏è Important Notes

- **All workshop notebooks** use the same setup process
- You can choose between **Qdrant Cloud** (recommended) or **local Docker**
- The setup guide includes comprehensive troubleshooting
- Once setup is complete, you can run any workshop notebook

**üö´ Do not proceed** with this notebook until you've completed the setup in [`SETUP.md`](../SETUP.md)!

In [2]:
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())

True

## 1.1. Connect to Your Qdrant Cloud Collection

Now that you've run the ingestion script, let's connect to your Qdrant collection and verify the data is loaded correctly.

In [3]:
import os
from openai import OpenAI
from qdrant_client import QdrantClient

# Check if required environment variables are set
qdrant_url = os.getenv("QDRANT_URL")
qdrant_api_key = os.getenv("QDRANT_API_KEY")
openai_api_key = os.getenv("OPENAI_API_KEY")

# Validate setup
if not openai_api_key:
    print("‚ùå Missing OPENAI_API_KEY environment variable")
    print("üí° Please set this in your .env file and restart the notebook")
    raise ValueError("OpenAI API key not configured")

if not qdrant_url:
    print("‚ùå Missing QDRANT_URL environment variable")
    print("üí° Please set this in your .env file and restart the notebook")
    raise ValueError("Qdrant URL not configured")

# Determine if this is a local or cloud setup
is_local_setup = "localhost" in qdrant_url.lower()

if is_local_setup:
    print("üê≥ Detected local Docker setup")
    if qdrant_api_key:
        print("‚ö†Ô∏è  Note: QDRANT_API_KEY not needed for local setup")
else:
    print("‚òÅÔ∏è  Detected Qdrant Cloud setup")
    if not qdrant_api_key:
        print("‚ùå Missing QDRANT_API_KEY for cloud setup")
        print("üí° Please set this in your .env file and restart the notebook")
        raise ValueError("Qdrant API key required for cloud setup")

# Initialize OpenAI client
openai_client = OpenAI()

# Initialize Qdrant client (works for both local and cloud)
qdrant_client = QdrantClient(
    url=qdrant_url,
    api_key=qdrant_api_key  # Will be None for local setup, which is fine
)

# Collection configuration
collection_name = "workshop_wikipedia_extended"
embedding_model = "text-embedding-3-small"

print(f"‚úÖ Connected to Qdrant {'locally' if is_local_setup else 'Cloud'}")
print(f"üìö Collection: {collection_name}")
print(f"ü§ñ Embedding model: {embedding_model}")
print(f"üåê Qdrant URL: {qdrant_url}")

‚òÅÔ∏è  Detected Qdrant Cloud setup
‚úÖ Connected to Qdrant Cloud
üìö Collection: workshop_wikipedia_extended
ü§ñ Embedding model: text-embedding-3-small
üåê Qdrant URL: https://18cd8b2c-252d-4e81-8824-dbfbb22674f6.europe-west3-0.gcp.cloud.qdrant.io:6333


## 1.2. Verify Collection and Dataset

Let's verify that your ingestion was successful and the data is properly loaded:

In [6]:
try:
    # Get collection information
    collection_info = qdrant_client.get_collection(collection_name)
    point_count = collection_info.points_count
    
    print(f"üîó Using Qdrant {'locally (Docker)' if is_local_setup else 'Cloud'}")
    
    if point_count == 0:
        print("‚ö†Ô∏è Collection exists but is empty!")
        print("üí° Please run the ingestion script: python scripts/ingest_to_qdrant_cloud.py")
    else:
        print(f"üìä Collection Statistics:")
        print(f"   Total chunks: {point_count:,}")
        print(f"   Vector dimension: {collection_info.config.params.vectors.size}")
        print(f"   Distance metric: {collection_info.config.params.vectors.distance}")
        
        if point_count == 1210:
            print("‚úÖ Expected number of chunks found! Ingestion was successful.")
        else:
            print(f"‚ö†Ô∏è Expected 1,210 chunks but found {point_count}. Ingestion may be incomplete.")

        # Sample a few points to see the data structure
        sample_points = qdrant_client.scroll(
            collection_name=collection_name,
            limit=3,
            with_payload=True,
            with_vectors=False
        )[0]

        print(f"\nüìù Sample data structure:")
        for i, point in enumerate(sample_points):
            payload = point.payload
            print(f"\nChunk {i+1}:")
            print(f"   Title: {payload.get('title', 'Unknown')}")
            print(f"   Text preview: {payload.get('text', '')[:100]}...")
            print(f"   Chunk {payload.get('chunk_index', 0)+1} of {payload.get('total_chunks', 0)}")
            
except Exception as e:
    print(f"‚ùå Error accessing collection '{collection_name}': {e}")
    print("\nüí° Troubleshooting:")
    print("1. Make sure you've run: python scripts/ingest_to_qdrant_cloud.py")
    if is_local_setup:
        print("2. For Docker setup: Check if container is running with 'docker ps'")
        print("3. Restart Qdrant if needed: docker run -d -p 6333:6333 -p 6334:6334 qdrant/qdrant:v1.13.2")
    else:
        print("2. Check your QDRANT_URL and QDRANT_API_KEY in .env file")
        print("3. Verify your Qdrant Cloud cluster is running")
    raise

üîó Using Qdrant Cloud
üìä Collection Statistics:
   Total chunks: 1,210
   Vector dimension: 1536
   Distance metric: Cosine
‚úÖ Expected number of chunks found! Ingestion was successful.

üìù Sample data structure:

Chunk 1:
   Title: BERT (language model)
   Text preview: Bidirectional encoder representations from transformers (BERT) is a language model introduced in Oct...
   Chunk 1 of 10

Chunk 2:
   Title: BERT (language model)
   Text preview: Euclidean space. Encoder: a stack of Transformer blocks with self-attention, but without causal mask...
   Chunk 2 of 10

Chunk 3:
   Title: BERT (language model)
   Text preview: consists of a sinusoidal function that takes the position in the sequence as input. Segment type: Us...
   Chunk 3 of 10


## 2. Build the Q/A Chatbot

Now we can focus on the core RAG functionality without worrying about data preparation!

![../imgs/naive-rag.png](../imgs/naive-rag.png)

### 2.1. Retrieval - Search the cloud database for relevant embeddings

In [7]:
def vector_search(query, top_k=2):
    """Search the Qdrant Cloud collection for relevant chunks."""
    # Create embedding of the query
    response = openai_client.embeddings.create(
        input=query,
        model=embedding_model
    )
    query_embeddings = response.data[0].embedding
    
    # Similarity search using the embedding
    search_result = qdrant_client.query_points(
        collection_name=collection_name,
        query=query_embeddings,
        with_payload=True,
        limit=top_k,
    ).points
    
    return [result.payload for result in search_result]

### 2.2. Generation - Use retrieved chunks to generate answers

In [8]:
import json

def model_generate(prompt, model="gpt-4o"):
    """Generate response using OpenAI's chat completion."""
    messages = [{"role": "user", "content": prompt}]
    response = openai_client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0,  # Deterministic output
    )
    return response.choices[0].message.content

def prompt_template(question, context):
    """Create a prompt template for RAG."""
    return f"""You are an AI Assistant that provides answers to questions based on the following context. 
Make sure to only use the context to answer the question. Keep the wording very close to the context.

Context:
```
{json.dumps(context)}
```

User question: {question}

Answer in markdown:"""

def generate_answer(question):
    """Complete RAG pipeline: retrieve and generate."""
    # Retrieval: search the knowledge base
    search_result = vector_search(question)
    if not search_result:
        return "No relevant information found."
        
    
    # Generation: create prompt and generate answer
    prompt = prompt_template(question, search_result)
    return model_generate(prompt)

### 2.3. Test Basic RAG Functionality

In [9]:
# Test with a clear, unambiguous question first
question = "When was BERT introduced and by which organization?"
search_result = vector_search(question, top_k=3)

print(f"üîç Question: {question}")
print(f"\nüìö Retrieved Sources:")
#print results in easy to read format
for i, result in enumerate(search_result):
    print(f"{i+1}. {result['title']}")
    print(f"   Text: {result['text'][:100]}...")
    print(f"   Chunk {result['chunk_index']+1} of {result['total_chunks']}")
    print("\n")

# Generate answer
answer = generate_answer(question)
print(f"\nü§ñ Generated Answer:")
print(answer)

üîç Question: When was BERT introduced and by which organization?

üìö Retrieved Sources:
1. BERT (language model)
   Text: decoder, BERT can't be prompted and can't generate text, while bidirectional models in general do no...
   Chunk 8 of 10


2. BERT (language model)
   Text: sentence. On October 25, 2019, Google announced that they had started applying BERT models for Engli...
   Chunk 9 of 10


3. BERT (language model)
   Text: Bidirectional encoder representations from transformers (BERT) is a language model introduced in Oct...
   Chunk 1 of 10



ü§ñ Generated Answer:
```markdown
BERT was originally published by Google researchers Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova.
```


## 4. RAG Evaluation with RAGAS

Now let's evaluate our naive RAG system using **RAGAS** to establish baseline performance metrics and quantify the confusion we've observed.

### Context-Focused Metrics:

1. **Context Precision**: How well are relevant chunks ranked at the top?
2. **Context Recall**: How much of the necessary information was retrieved?
3. **Context Relevancy**: How relevant is the retrieved context to the question?

We're using **RAGAS** because it's purpose-built for RAG evaluation and provides deep insights into context quality - the most critical component of RAG performance.

In [10]:
# Import the RAGAS evaluation utility
from rag_evaluator_v2 import evaluate_naive_rag_v2

# Run evaluation on the current RAG system using RAGAS
print("üîç Evaluating your Naive RAG system with RAGAS...")
print("This will evaluate context quality metrics on 15 questions...\n")

baseline_results = evaluate_naive_rag_v2(
    vector_search_func=vector_search,
    generate_answer_func=generate_answer
)

üîç Evaluating your Naive RAG system with RAGAS...
This will evaluate context quality metrics on 15 questions...

‚úÖ Loaded 14 questions from evaluation dataset

Evaluating 14 questions...

Question 1/14: Who introduced the ReLU (rectified linear unit) ac...
Question 2/14: What was the first working deep learning algorithm...
Question 3/14: Which CNN achieved superhuman performance in a vis...
Question 4/14: When was BERT introduced and by which organization...
Question 5/14: What are the two model sizes BERT was originally i...
Question 6/14: What percentage of tokens are randomly selected fo...
Question 7/14: Who introduced the term 'deep learning' to the mac...
Question 8/14: Which three researchers were awarded the 2018 Turi...
Question 9/14: When was the first GPT introduced and by which org...
Question 10/14: What were the three parameter sizes of the first v...
Question 11/14: What is the 'one in ten rule' in regression analys...
Question 12/14: What is the essence of overfitt

Evaluating:   0%|          | 0/14 [00:00<?, ?it/s]


RAGAS EVALUATION RESULTS

üìã INDIVIDUAL QUESTION SCORES:
------------------------------------------------------------
 1. üü¢ 1.000 - Who introduced the ReLU (rectified linear unit) activation f...
 2. üü¢ 1.000 - What was the first working deep learning algorithm and who p...
 3. üü¢ 1.000 - Which CNN achieved superhuman performance in a visual patter...
 4. üî¥ 0.000 - When was BERT introduced and by which organization?
 5. üü¢ 1.000 - What are the two model sizes BERT was originally implemented...
 6. üü¢ 1.000 - What percentage of tokens are randomly selected for the mask...
 7. üî¥ 0.000 - Who introduced the term 'deep learning' to the machine learn...
 8. üî¥ 0.000 - Which three researchers were awarded the 2018 Turing Award f...
 9. üü¢ 1.000 - When was the first GPT introduced and by which organization?
10. üü¢ 1.000 - What were the three parameter sizes of the first versions of...
11. üü¢ 1.000 - What is the 'one in ten rule' in regression analysis?
12. üü¢ 1.00

In [11]:
baseline_results

{'metrics': {'context_recall': 0.7857142857142857},
 'aggregate_scores': {'context_recall': 0.7857142857142857,
  'overall_context_score': 0.7857142857142857},
 'individual_results': [{'question': 'Who introduced the ReLU (rectified linear unit) activation function and in what year?',
   'generated_answer': '```markdown\nKunihiko Fukushima introduced the ReLU (rectified linear unit) activation function in 1969.\n```',
   'ground_truth': 'Kunihiko Fukushima in 1969.',
   'scores': {'context_recall': 1.0}},
  {'question': 'What was the first working deep learning algorithm and who published it?',
   'generated_answer': '```markdown\nThe first working deep learning algorithm was the Group method of data handling, published by Alexey Ivakhnenko and Lapa in the Soviet Union in 1965.\n```',
   'ground_truth': 'The Group method of data handling, published by Alexey Ivakhnenko and Lapa in 1965.',
   'scores': {'context_recall': 1.0}},
  {'question': 'Which CNN achieved superhuman performance i