# Retrieval Augmented Generation (RAG) Basics

In this notebook, we will cover the basics of Retrieval Augmented Generation (RAG) model. RAG is a model that combines the best of both worlds - retrieval and generation. It uses a retriever to retrieve relevant passages from a large corpus and then uses a generator to generate the answer.

References:

https://github.com/zenml-io/zenml-projects/blob/feature/evaluation-llm-complete-guide/llm-complete-guide/most_basic_rag_pipeline.py

https://docs.zenml.io/user-guide/llmops-guide/evaluation/evaluation-in-65-loc


In [1]:
import os
import re
import string

from openai import OpenAI
from typing import List, Tuple

# Helper Functions

In [2]:
def preprocess_text(text: str):
    text = text.lower()
    text = text.translate(str.maketrans('', '', string.punctuation))
    text = re.sub(r"\s+", " ", text).strip()
    return text

def tokenize(text: str):
    return preprocess_text(text).split()

In [3]:
def retrieve_relevant_chunks(query, corpus, top_n=2):
    query_tokens = set(tokenize(query))
    similarities = []
    for chunk in corpus:
        chunk_tokens = set(tokenize(chunk))
        similarity = len(query_tokens.intersection(chunk_tokens)) / len(query_tokens.union(chunk_tokens))
        similarities.append(similarity)
    top_chunks = sorted(list(enumerate(similarities)), key=lambda x: x[1], reverse=True)[:top_n]
    return [corpus[i] for i, _ in top_chunks]


In [4]:
def modify_query(query:str, chunks: List[str]):
    context = "/n".join(chunks)
    new_query = [
            {
                "role": "system",
                "content": f"Based on the provided context, answer the following question: {query}\n\nContext:\n{context}",
                },
            {
                "role": "user",
                "content": query,
                },
        ]
    return new_query

In [5]:
def answer_question(query: str, corpus: str, top_n=2):
    relevant_chunks = retrieve_relevant_chunks(query, corpus, top_n)
    if not relevant_chunks:
        return "I'm sorry, I don't know the answer to that question."
    client = OpenAI(api_key = os.environ.get("OPENAI_API_KEY"))
    chat_completion = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=modify_query(query, chunks=relevant_chunks),
        max_tokens=100,
        temperature=0,
    )
    answer = chat_completion.choices[0].message.content.strip()
    return answer

# Example Usage

In [6]:
# Example usage
query = "What are the unique characteristics of Pokémon found in the forests of Viridian City?"
corpus = [
    "The dense forests of Viridian City are home to many Pokémon species, including Pikachu, who is known for its electric abilities and its cute appearance with yellow fur and a lightning bolt-shaped tail.",
    "Trainers often encounter Squirtle near bodies of water, as it is a Water-type Pokémon that can shoot powerful streams of water from its mouth. Its shell is a symbol of resilience and protection.",
    "In the volcanic regions of Cinnabar Island, Charmander can be found basking in the warmth. This Fire-type Pokémon has a flame burning at the tip of its tail, indicating its health and mood.",
    "Jigglypuff is a Fairy-type Pokémon known for its soothing lullabies and round, pink appearance. It often wanders through towns and cities, leaving sleepy marks on those who listen to its songs.",
    "The psychic powers of Abra make it a sought-after Pokémon for trainers seeking mental prowess. However, its tendency to teleport away when threatened can be a challenge for inexperienced trainers.",
    "Gyarados, a fearsome Water/Flying-type Pokémon, is said to arise from the rage of a Magikarp that has endured countless hardships. Its massive size and powerful attacks make it a force to be reckoned with.",
    "Bulbasaur, with a plant bulb on its back, is a Grass/Poison-type Pokémon that is often seen in lush green areas. Its symbiotic relationship with the bulb grants it access to a variety of powerful moves.",
    "Machop, a Fighting-type Pokémon, trains tirelessly to strengthen its muscles and hone its combat skills. Its impressive physical strength makes it a formidable opponent in battles.",
    "Eevee, a Normal-type Pokémon with the ability to evolve into multiple different forms, is often sought after by trainers for its adaptability and versatility in battles.",
]

relevant_chunks = retrieve_relevant_chunks(query=query, corpus=corpus, top_n=2)
query_modification = modify_query(query=query, chunks=relevant_chunks)

print(f"Relevant chunks: {relevant_chunks}")
print(f"Modified query: {query_modification}")

Relevant chunks: ['The dense forests of Viridian City are home to many Pokémon species, including Pikachu, who is known for its electric abilities and its cute appearance with yellow fur and a lightning bolt-shaped tail.', 'In the volcanic regions of Cinnabar Island, Charmander can be found basking in the warmth. This Fire-type Pokémon has a flame burning at the tip of its tail, indicating its health and mood.']
Modified query: [{'role': 'system', 'content': 'Based on the provided context, answer the following question: What are the unique characteristics of Pokémon found in the forests of Viridian City?\n\nContext:\nThe dense forests of Viridian City are home to many Pokémon species, including Pikachu, who is known for its electric abilities and its cute appearance with yellow fur and a lightning bolt-shaped tail./nIn the volcanic regions of Cinnabar Island, Charmander can be found basking in the warmth. This Fire-type Pokémon has a flame burning at the tip of its tail, indicating its

In [7]:
run_query = True
if run_query:
    answer = answer_question(query, corpus)
    print(answer)

Pokémon found in the forests of Viridian City are known for their diverse types and abilities. One notable characteristic is that they are often Grass-type or Bug-type Pokémon, reflecting the natural environment of the forest. These Pokémon may have abilities related to plants, nature, or insects, making them well-adapted to forest habitats. Additionally, they may have unique moves and evolutions that are specific to forest-dwelling species.


# Sample Evaluation

In [8]:
def evaluate_retrieval(question, expected_answer, corpus, top_n=2):
    """Check if the retrieved chunks contain any words from expected answer"""
    relevant_chunks = retrieve_relevant_chunks(question, corpus, top_n)
    score = any(
        any(word in chunk for word in tokenize(expected_answer))
        for chunk in relevant_chunks
    )
    return score

In [9]:
def evaluate_generation(question, expected_answer, generated_answer):
    """Use ChatGPT to evaluate the relevance of the generated answer to the question and expected answer. Binary."""
    client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
    chat_completion = client.chat.completions.create(
        messages=[
            {
                "role": "system",
                "content": "You are an evaluation judge. Given a question, an expected answer, and a generated answer, your task is to determine if the generated answer is relevant and accurate. Respond with 'YES' if the generated answer is satisfactory, or 'NO' if it is not.",
            },
            {
                "role": "user",
                "content": f"Question: {question}\nExpected Answer: {expected_answer}\nGenerated Answer: {generated_answer}\nIs the generated answer relevant and accurate?",
            },
        ],
        model="gpt-3.5-turbo",
    )

    judgment = chat_completion.choices[0].message.content.strip().lower()
    return judgment == "yes"

In [10]:
eval_data = [
    {
        "question": "What are the types of attacks commonly associated with Water-type Pokémon?",
        "expected_answer": "Water-type Pokémon are known for their mastery over water-based attacks, such as Water Gun, Hydro Pump, Surf, and Bubble Beam.",
    },
    {
        "question": "How does the evolutionary process of Eevee differ from other Pokémon?",
        "expected_answer": "Eevee has the unique ability to evolve into different Pokémon species depending on various factors, such as the use of evolution stones, friendship levels, time of day, and location.",
    },
    {
        "question": "What is the signature move of Pikachu, and how does it reflect its Electric-type abilities?",
        "expected_answer": "Pikachu's signature move is Thunderbolt, a powerful Electric-type attack that allows it to unleash bolts of lightning against its opponents. This move showcases Pikachu's mastery over electricity.",
    },
    {
        "question": "Describe the appearance and behavior of a Jigglypuff.",
        "expected_answer": "Jigglypuff is a Fairy-type Pokémon known for its round, pink body, large blue eyes, and tuft of fur on its forehead. It is often seen with a microphone-like marker, which it uses to draw on the faces of those who fall asleep listening to its soothing songs.",
    },
    {
        "question": "What are the different evolutionary stages of a Charmander, and how does its appearance change during evolution?",
        "expected_answer": "Charmander evolves into Charmeleon starting at level 16, and then into Charizard starting at level 36. As it evolves, Charmander's appearance changes from a small, lizard-like creature with a flame-tipped tail to a larger, more dragon-like Pokémon with wings and increased fiery capabilities.",
    },
]

In [11]:
retrieval_scores = []
generation_scores = []

for item in eval_data:
    retrieval_score = evaluate_retrieval(
        item["question"], item["expected_answer"], corpus
    )
    retrieval_scores.append(retrieval_score)

    generated_answer = answer_question(item["question"], corpus)
    generation_score = evaluate_generation(
        item["question"], item["expected_answer"], generated_answer
    )
    generation_scores.append(generation_score)

retrieval_accuracy = sum(retrieval_scores) / len(retrieval_scores)
generation_accuracy = sum(generation_scores) / len(generation_scores)

print(f"Retrieval Accuracy: {retrieval_accuracy:.2f}")
print(f"Generation Accuracy: {generation_accuracy:.2f}")

Retrieval Accuracy: 1.00
Generation Accuracy: 0.60
