# RAG Workshop Notebook - Naive RAG


# 0. Setup Environment



In [1]:
%pip install --upgrade pip
%pip install wikipedia mwparserfromhell beautifulsoup4 openai qdrant-client tqdm python-dotenv


Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [2]:
from dotenv import load_dotenv, find_dotenv

load_dotenv(find_dotenv())

True

## 1. Data Ingestion:
1. Fetch Wikipedia articles using the Wikipedia API.
2. Clean the text by removing wiki markup and citation numbers.
3. Chunk the text into smaller pieces to create embeddings.
4. Create embeddings using OpenAI's text-embedding-3-small model.
5. Index the embeddings using Qdrant Vector Store.

![../imgs/ingestion.png](../imgs/ingestion.png)

### 1.1. Fetch Wikipedia articles using the Wikipedia API.

In [3]:
import wikipedia
import re
from mwparserfromhell import parse
from bs4 import BeautifulSoup

ARTICLE_TITLES = [
    "Deep learning",
    "Transformer (machine learning model)",
    "Natural language processing",
    "Reinforcement learning",
    "Artificial neural network",
    "Generative pre-trained transformer",
    "BERT (language model)", "Overfitting"
]


def fetch_wikipedia_article(title):
    try:
        page = wikipedia.page(title)
        return {
            "title": title,
            "url": page.url,
            "raw_content": page.content
        }
    except wikipedia.exceptions.DisambiguationError as e:
        return fetch_wikipedia_article(e.options[0])
    except wikipedia.exceptions.PageError:
        print(f"Skipping {title}")
        return None

### 1.2. Clean the text by removing wiki markup and citation numbers.

In [4]:
def clean_text(text):
    # Remove wiki markup and citation numbers
    text = ''.join(parse(text).strip_code())
    soup = BeautifulSoup(text, 'html.parser')
    text = soup.get_text()
    return re.sub(r'\[\d+\]', '', text).strip()


articles = []
for title in ARTICLE_TITLES:
    article = fetch_wikipedia_article(title)
    if article:
        article["content"] = clean_text(article["raw_content"])
        articles.append(article)



Skipping Natural language processing
Skipping Reinforcement learning


In [5]:
articles[1]['content'][:1000]

'In deep learning, transformer is an architecture based on the multi-head attention mechanism, in which text is converted to numerical representations called tokens, and each token is converted into a vector via lookup from a word embedding table. At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism, allowing the signal for key tokens to be amplified and less important tokens to be diminished. \nTransformers have the advantage of having no recurrent units, therefore requiring less training time than earlier recurrent neural architectures (RNNs) such as long short-term memory (LSTM). Later variations have been widely adopted for training large language models (LLMs) on large (language) datasets.\n\nThe modern version of the transformer was proposed in the 2017 paper "Attention Is All You Need" by researchers at Google. Transformers were first developed as an improvement over pre

### 1.3. Chunk the text into smaller pieces to create embeddings.


In [6]:
# Chunking function
def chunk_text(text, chunk_size=300, overlap=50):
    words = text.split()
    return [' '.join(words[i:i + chunk_size])
            for i in range(0, len(words), chunk_size - overlap)]


# Prepare chunks and metadata
corpus = []
metadata = []
for article in articles:
    chunks = chunk_text(article["content"])
    corpus.extend(chunks)
    metadata.extend([{"title": article["title"], "url": article["url"]}] * len(chunks))

In [7]:
print('Total Corpus:', len(corpus))
print('Total Metadata:', len(metadata))

deep_learning_chunks = [chunk for chunk, meta in zip(corpus, metadata) if meta['title'] == 'Deep learning']

Total Corpus: 134
Total Metadata: 134


In [8]:
len(deep_learning_chunks)

34

### 1.4. Create embeddings using OpenAI's text-embedding-3-small model.

In [9]:
from openai import OpenAI
from tqdm import tqdm

openai_client = OpenAI()


# Define the embedding function using OpenAI's API (using text-embedding-ada-002)
def openai_embedding(text):
    text = text.replace("\n", " ")
    response = openai_client.embeddings.create(
        input=[text],  # Passing the text as a list
        model="text-embedding-3-small"
    )
    # Use dot notation to access the embedding from the response object
    embeddings = [data.embedding for data in response.data]
    return embeddings

In [10]:
embeddings = []
chunked_texts = []
metadata_chunks = []
test_corpus = corpus[:10]

for chunk in tqdm(test_corpus):
    embedding = openai_embedding(chunk)
    embeddings.extend(embedding)
    chunked_texts.extend([chunk] * len(embedding))




100%|██████████| 10/10 [00:06<00:00,  1.64it/s]



### 1.5. Index the embeddings using Qdrant Vector Store.


In [11]:
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

# Create an in-memory Qdrant instance
client = QdrantClient(":memory:")
collection_name = "wikipedia_articles"

# Create the collection with the specified vector configuration
client.create_collection(
    collection_name=collection_name,
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)

# Upsert points into the collection using PointStruct for each point
client.upsert(
    collection_name=collection_name,
    points=[
        PointStruct(
            id=idx,
            vector=embedding,
            payload={"text": chunked_texts[idx]}
        )
        for idx, embedding in enumerate(embeddings)
    ]
)


UpdateResult(operation_id=0, status=<UpdateStatus.COMPLETED: 'completed'>)

## 2. Build the Q/A Chatbot

![../imgs/naive-rag.png](../imgs/naive-rag.png)


### 2.1. Retrieval - Search the database for the most relevant embeddings.

In [12]:
# Function to search the database
def vector_search(query, top_k=3):
    # create embedding of the query
    response = openai_client.embeddings.create(
        input=query,
        model="text-embedding-3-small"
    )
    query_embeddings = response.data[0].embedding
    # similarity search using the embedding, give top n results which are close to the query embeddings
    search_result = client.query_points(
        collection_name=collection_name,
        query=query_embeddings,
        with_payload=True,
        limit=top_k,
    ).points
    return [result.payload for result in search_result]


search_result = vector_search("What does the word 'deep' in 'deep learning' refer")

from pprint import pprint

pprint(search_result[0])

{'text': 'In machine learning, deep learning focuses on utilizing multilayered '
         'neural networks to perform tasks such as classification, regression, '
         'and representation learning. The field takes inspiration from '
         'biological neuroscience and is centered around stacking artificial '
         'neurons into layers and "training" them to process data. The '
         'adjective "deep" refers to the use of multiple layers (ranging from '
         'three to several hundred or thousands) in the network. Methods used '
         'can be supervised, semi-supervised or unsupervised. Some common deep '
         'learning network architectures include fully connected networks, '
         'deep belief networks, recurrent neural networks, convolutional '
         'neural networks, generative adversarial networks, transformers, and '
         'neural radiance fields. These architectures have been applied to '
         'fields including computer vision, speech recognition

### 2.2. Generation - Use the retrieved embeddings to generate the answer.

In [16]:
def model_generate(prompt, model="gpt-4o"):
    messages = [{"role": "user", "content": prompt}]
    response = openai_client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0,  # this is the degree of randomness of the model's output
    )
    return response.choices[0].message.content

In [17]:
import json


def prompt_template(question, context):
    return """You are a AI Assistant that provides answer to the question at the end, over the following
  pieces of context. Make sure to only use the context to answer the question. Keep the wording very close to the context
  context:
  ```
  """ + json.dumps(context) + """
  ```
  User question: """ + question + """
  Answer in markdown:"""


In [18]:
def generate_answer(question):
    #Retrieval: search a knowledge base.
    search_result = vector_search(question)

    prompt = prompt_template(question, search_result)
    # Generation: LLMs' ability to generate the answer
    return model_generate(prompt)


question = f"What is A common evaluation set for image classification? "
answer = generate_answer(question)
print("Answer:", answer)

Answer: ```markdown
The context does not provide specific information about a common evaluation set for image classification.
```


In [19]:
question = f"Who introduced the time delay neural network (TDNN)? and when ?"
answer = generate_answer(question)
print("Answer:", answer)

Answer: ```markdown
The time delay neural network (TDNN) was introduced by Alex Waibel in 1987.
```


## 3. RAG Evaluation with RAGAS

Before proceeding with improvements, let's establish baseline scores using **RAGAS** (Retrieval Augmented Generation Assessment Suite) - a specialized framework designed specifically for evaluating RAG systems.

### Context-Focused Metrics:

1. **Context Precision**: How well are relevant chunks ranked at the top?
2. **Context Recall**: How much of the necessary information was retrieved?
3. **Context Relevancy**: How relevant is the retrieved context to the question?

We're using **RAGAS** because it's purpose-built for RAG evaluation and provides deep insights into context quality - the most critical component of RAG performance. The evaluation is simple to use - just call one function!


In [20]:
# Import the RAGAS evaluation utility
from rag_evaluator_v2 import evaluate_naive_rag_v2

# Run evaluation on the current RAG system using RAGAS
print("🔍 Evaluating your Naive RAG system with RAGAS...")
print("This will evaluate context quality metrics on 15 questions...\n")

baseline_results = evaluate_naive_rag_v2(
    vector_search_func=vector_search,
    generate_answer_func=generate_answer
)

🔍 Evaluating your Naive RAG system with RAGAS...
This will evaluate context quality metrics on 15 questions...

✅ Loaded 14 questions from evaluation dataset

Evaluating 14 questions...

Question 1/14: Who introduced the ReLU (rectified linear unit) ac...
Question 2/14: What was the first working deep learning algorithm...
Question 3/14: Which CNN achieved superhuman performance in a vis...
Question 4/14: When was BERT introduced and by which organization...
Question 5/14: What are the two model sizes BERT was originally i...
Question 6/14: What percentage of tokens are randomly selected fo...
Question 7/14: Who introduced the term 'deep learning' to the mac...
Question 8/14: Which three researchers were awarded the 2018 Turi...
Question 9/14: When was the first GPT introduced and by which org...
Question 10/14: What were the three parameter sizes of the first v...
Question 11/14: What is the 'one in ten rule' in regression analys...
Question 12/14: What is the essence of overfitting a

Evaluating:   0%|          | 0/14 [00:00<?, ?it/s]


RAGAS EVALUATION RESULTS

CONTEXT RECALL METRIC (0.0 - 1.0 scale):
  🔴 Context Recall: 0.143

🔴 POOR: Significant improvements needed in context retrieval.


### 📋 Why We Need These Baseline Scores

These **RAGAS-powered** baseline scores are crucial because:

1. **Context Quality Focus**: RAGAS specifically measures how well your retrieval system finds and ranks relevant information
2. **Purpose-Built for RAG**: Unlike general evaluation tools, RAGAS is designed specifically for RAG systems
3. **Objective Measurement**: Quantitative metrics that measure actual retrieval performance
4. **Debugging Aid**: Low context scores immediately tell you where your RAG is failing
5. **Optimization Guide**: Use these metrics to systematically improve your retrieval strategy

🔬 **What makes RAGAS special**: 
- **Context Precision** helps ensure the most relevant information appears first
- **Context Recall** ensures you're not missing important information
- **Context Relevancy** validates that retrieved chunks actually help answer the question

**Next Steps**: Now that we have our baseline context metrics, let's expand our dataset to demonstrate the limitations of naive RAG approaches!
