# Basic RAG Implementation: 
A Beginner's Guide to Retrieval-Augmented Generation

<a href="https://github.com/adithya-s-k/AI-Engineering.academy">
<img src="https://raw.githubusercontent.com/adithya-s-k/AI-Engineering.academy/main/assets/banner.png" width="50%">
</a>

Welcome to the Basic RAG Implementation guide! This notebook is designed to introduce beginners to the concept of Retrieval-Augmented Generation (RAG) and provide a step-by-step walkthrough of implementing a basic RAG system.

## Introduction
Retrieval-Augmented Generation (RAG) is a powerful technique that combines the strengths of large language models with the ability to retrieve relevant information from a knowledge base. This approach enhances the quality and accuracy of generated responses by grounding them in specific, retrieved information.

This notebook aims to provide a clear and concise introduction to RAG, suitable for beginners who want to understand and implement this technology.


## Getting Started
To get started with this notebook, you'll need to have a basic understanding of Python and some familiarity with machine learning concepts. Don't worry if you're new to some of these ideas – we'll guide you through each step!

### Prerequisites
- Python 3.9+
- Jupyter Notebook or JupyterLab
- Basic knowledge of Python and machine learning concepts

## Notebook Contents

Our notebook is structured into the following main sections:

1. **Environment Set Up**: We'll guide you through setting up your Python environment with all the necessary libraries and dependencies.

2. **Data Ingestion (Chunking)**: Learn how to prepare and process your data for use in a RAG system, including techniques for breaking down large texts into manageable chunks.

3. **Prompting**: Understand the art of crafting effective prompts to guide the retrieval and generation process.

4. **Setting up Retriever**: We'll walk you through the process of setting up a retrieval system to find relevant information from your knowledge base.

5. **Examples with Retrievers**: Explore practical examples of using retrievers in various scenarios to enhance your understanding of RAG systems.


## Setting up the Environment

In [None]:
!pip install llama-index
!pip install llama-index-vector-stores-qdrant 
!pip install llama-index-readers-file 
!pip install llama-index-embeddings-fastembed 
!pip install llama-index-llms-openai
!pip install -U qdrant_client fastembed
!pip install python-dotenv

In [None]:
# Standard library imports
import logging
import sys
import os

# Third-party imports
from dotenv import load_dotenv
from IPython.display import Markdown, display

# Qdrant client import
import qdrant_client

# LlamaIndex core imports
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core import Settings

# LlamaIndex vector store import
from llama_index.vector_stores.qdrant import QdrantVectorStore

# Embedding model imports
from llama_index.embeddings.fastembed import FastEmbedEmbedding
from llama_index.embeddings.openai import OpenAIEmbedding

# LLM import
from llama_index.llms.openai import OpenAI

# Load environment variables
load_dotenv()

# Get OpenAI API key from environment variables
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

# Set the embedding model
# Option 1: Use FastEmbed with BAAI/bge-base-en-v1.5 model (default)
Settings.embed_model = FastEmbedEmbedding(model_name="BAAI/bge-base-en-v1.5")

# Option 2: Use OpenAI's embedding model (commented out)
# If you want to use OpenAI's embedding model, uncomment the following line:
# Settings.embed_model = OpenAIEmbedding(embed_batch_size=10, api_key=OPENAI_API_KEY)

# Qdrant configuration (commented out)
# If you're using Qdrant, uncomment and set these variables:
# QDRANT_CLOUD_ENDPOINT = os.getenv("QDRANT_CLOUD_ENDPOINT")
# QDRANT_API_KEY = os.getenv("QDRANT_API_KEY")

# Note: Remember to add QDRANT_CLOUD_ENDPOINT and QDRANT_API_KEY to your .env file if using Qdrant Hosted version

## Load the Data

In [None]:
## download sample data

In [2]:
# lets loading the documents using SimpleDirectoryReader
from llama_index.core import Document
reader = SimpleDirectoryReader("../data/" , recursive=True)
documents = reader.load_data(show_progress=True)

# combining all the documents into a single document for later chunking and splitting
documents = Document(text="\n\n".join([doc.text for doc in documents]))

Loading files: 100%|██████████| 1/1 [00:00<00:00, 40.69file/s]


## Setting up Vector Database

We will be using qDrant as the Vector database
There are 4 ways to initialize qdrant 

1. Inmemory
```python
client = qdrant_client.QdrantClient(location=":memory:")
```
2. Disk
```python
client = qdrant_client.QdrantClient(path="./data")
```
3. Self hosted or Docker
```python

client = qdrant_client.QdrantClient(
    # url="http://<host>:<port>"
    host="localhost",port=6333
)
```

4. Qdrant cloud
```python
client = qdrant_client.QdrantClient(
    url=QDRANT_CLOUD_ENDPOINT,
    api_key=QDRANT_API_KEY,
)
```

for this notebook we will be using qdrant cloud

In [3]:
# creating a qdrant client instance

client = qdrant_client.QdrantClient(
    # you can use :memory: mode for fast and light-weight experiments,
    # it does not require to have Qdrant deployed anywhere
    # but requires qdrant-client >= 1.1.1
    location=":memory:"
    # otherwise set Qdrant instance address with:
    # url=QDRANT_CLOUD_ENDPOINT,
    # otherwise set Qdrant instance with host and port:
    # host="localhost",
    # port=6333
    # set API KEY for Qdrant Cloud
    # api_key=QDRANT_API_KEY,
    # path="./db/"
)

vector_store = QdrantVectorStore(client=client, collection_name="01_Basic_RAG")

### Ingest Data into vector DB

In [4]:
## ingesting data into vector database

## lets set up an ingestion pipeline

from llama_index.core.node_parser import TokenTextSplitter
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.node_parser import MarkdownNodeParser
from llama_index.core.node_parser import SemanticSplitterNodeParser
from llama_index.core.ingestion import IngestionPipeline

pipeline = IngestionPipeline(
    transformations=[
        # MarkdownNodeParser(include_metadata=True),
        # TokenTextSplitter(chunk_size=500, chunk_overlap=20),
        SentenceSplitter(chunk_size=1024, chunk_overlap=20),
        # SemanticSplitterNodeParser(buffer_size=1, breakpoint_percentile_threshold=95 , embed_model=Settings.embed_model),
        Settings.embed_model,
    ],
    vector_store=vector_store,
)

# Ingest directly into a vector db
nodes = pipeline.run(documents=[documents] , show_progress=True)
print("Number of chunks added to vector DB :",len(nodes))

Parsing nodes:   0%|          | 0/1 [00:00<?, ?it/s]

Generating embeddings:   0%|          | 0/18 [00:00<?, ?it/s]



Number of chunks added to vector DB : 18


## Setting Up Retriever

In [5]:
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)

## Modifying Prompts

In [6]:
from llama_index.core import ChatPromptTemplate

qa_prompt_str = (
    "Context information is below.\n"
    "---------------------\n"
    "{context_str}\n"
    "---------------------\n"
    "Given the context information and not prior knowledge, "
    "answer the question: {query_str}\n"
)

refine_prompt_str = (
    "We have the opportunity to refine the original answer "
    "(only if needed) with some more context below.\n"
    "------------\n"
    "{context_msg}\n"
    "------------\n"
    "Given the new context, refine the original answer to better "
    "answer the question: {query_str}. "
    "If the context isn't useful, output the original answer again.\n"
    "Original Answer: {existing_answer}"
)

# Text QA Prompt
chat_text_qa_msgs = [
    ("system","You are a AI assistant who is well versed with answering questions from the provided context"),
    ("user", qa_prompt_str),
]
text_qa_template = ChatPromptTemplate.from_messages(chat_text_qa_msgs)

# Refine Prompt
chat_refine_msgs = [
    ("system","Always answer the question, even if the context isn't helpful.",),
    ("user", refine_prompt_str),
]
refine_template = ChatPromptTemplate.from_messages(chat_refine_msgs)

### Example of Retrivers 

- Query Engine
- Chat Engine

In [8]:
# Setting up Query Engine

from llama_index.llms.openai import OpenAI
llm = OpenAI()

rag_engine = index.as_query_engine(
        text_qa_template=text_qa_template,
        refine_template=refine_template,
        llm=llm,)


response = rag_engine.query("What is Paul Graham")
display(Markdown(str(response)))

Paul Graham is a computer scientist, entrepreneur, and venture capitalist known for co-founding the startup accelerator Y Combinator. He is also known for his work in the field of programming languages, particularly for his role in developing the programming language Lisp.

In [9]:
# Setting up Chat Engine

chat_engine = index.as_chat_engine()

response = chat_engine.chat("What is Paul Graham")
display(Markdown(str(response)))

Paul Graham is a coder.

### Simple Chat Application with RAG

In [13]:
from typing import List
from llama_index.core.base.llms.types import ChatMessage, MessageRole

class ChatEngineInterface:
    def __init__(self, index):
        self.chat_engine = index.as_chat_engine()
        self.chat_history: List[ChatMessage] = []

    def display_message(self, role: str, content: str):
        if role == "USER":
            display(Markdown(f"**Human:** {content}"))
        else:
            display(Markdown(f"**AI:** {content}"))

    def chat(self, message: str) -> str:
        # Create a ChatMessage for the user input
        user_message = ChatMessage(role=MessageRole.USER, content=message)
        self.chat_history.append(user_message)
        
        # Get response from the chat engine
        response = self.chat_engine.chat(message, chat_history=self.chat_history)
        
        # Create a ChatMessage for the AI response
        ai_message = ChatMessage(role=MessageRole.ASSISTANT, content=str(response))
        self.chat_history.append(ai_message)
        
        # Display the conversation
        self.display_message("USER", message)
        self.display_message("ASSISTANT", str(response))
        
        print("\n" + "-"*50 + "\n")  # Separator for readability

        return str(response)

    def get_chat_history(self) -> List[ChatMessage]:
        return self.chat_history

In [None]:
chat_interface = ChatEngineInterface(index)
while True:
    user_input = input("You: ").strip()
    if user_input.lower() == 'exit':
        print("Thank you for chatting! Goodbye.")
        break
    chat_interface.chat(user_input)

In [None]:
# To view chat history:
history = chat_interface.get_chat_history()
for message in history:
    print(f"{message.role}: {message.content}")