# Baseline RAG example

This is a simple example of a baseline RAG application which purpose is to answer questions about the fantasy series [Malazan Universe](https://malazan.fandom.com/wiki/Malazan_Wiki) created by Steven Erikson and Ian C. Esslemont.

First the example will show each step of a baseline RAG pipeline including **Indexing**, **Retrieval** and **Generation**. This is done in order to show the architecture without the abstraction provided by frameworks like LlamaIndex and LangChain.
Then a more "normal" example will be shown using LlamaIndex.

As a vector database, we will use [ChromaDB](https://docs.trychroma.com/), but this can easily be exchanged with other databases.

In this example, we will use the following technologies

- OpenAI API
- ChromaDB
- LlamaIndex


### Setup libraries and environment


In [1]:
%pip install chromadb llama-index-vector-stores-chroma

Note: you may need to restart the kernel to use updated packages.


In [1]:
import os

import chromadb
import chromadb.utils.embedding_functions as embedding_functions
from chromadb import Settings
from IPython.display import Markdown, display
from llama_index.core import PromptTemplate, SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter
from openai import OpenAI, AzureOpenAI

import importlib
import util

#importlib.reload(util.helpers)
from util.helpers import create_and_save_md_files, get_malazan_pages, get_office_pages, get_friends_pages, get_theoffice_pages

ModuleNotFoundError: No module named 'chromadb'

### Environment variables

For this example you need to use an OpenAI API key. Go to [your API keys](https://platform.openai.com/api-keys) in the OpenAI console to generate one.

Then add the following to a `.env` file in the root of the project.

```
OPENAI_API_KEY=<YOUR_KEY_HERE>
```


In [22]:
from dotenv import load_dotenv

load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
AZURE_OPENAI_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT")

In [23]:
#from dotenv import load_dotenv

#load_dotenv(override=True)
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

openai_client = AzureOpenAI(
    api_key=OPENAI_API_KEY,  
    api_version="2024-05-01-preview", # https://learn.microsoft.com/en-us/azure/ai-services/openai/reference?WT.mc_id=AZ-MVP-5004796
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT")
)

In [4]:
openai_ef = embedding_functions.OpenAIEmbeddingFunction(
    api_key=OPENAI_API_KEY,
    model_name="text-embedding-ada-002",
    api_type="azure",
    api_version="2024-05-01-preview"
)

chroma_client = chromadb.PersistentClient(
    path="./data/baseline-rag/chromadb", settings=Settings(allow_reset=True))

## Fetch documents and save them as markdown files

Here we fetch pages from the Fandom Malazan Wiki. These are the documents that we will use as our "knowledge base" in order to supply context to our prompts.

We also pre-process the content in order to be able to add them to our vector database.


In [5]:
pages = get_friends_pages()
pages
#create_and_save_md_files(pages)

[<FandomPage 'Ross Geller'>, <FandomPage 'Chandler Bing'>]

In [6]:
create_and_save_md_files(pages)

## Indexing

In this step, we will index the documents in our vector database. This will allow us to retrieve the most relevant documents when we ask a question.

We will use ChromaDB as our vector database and 'text-embedding-3-small' from OpenAI as our embedding model.


#### Fetch and process saved documents

First we need to fetch the documents we saved earlier.

Then we will process the documents in order to add them to our vector database.
The `SimpleDirectoryReader` fetches each section of the markdown file
Then each section is split in to smaller chunks of text and each chunk is embedded using the OpenAI API.


In [7]:
documents = SimpleDirectoryReader('./data/docs').load_data()
text_splitter = SentenceSplitter(chunk_size=512, chunk_overlap=20)

document_data = []

for document in documents:
    chunks = text_splitter.split_text(document.text)
    for idx, chunk in enumerate(chunks):
        embedding = openai_client.embeddings.create(
            input=chunk, model="text-embedding-ada-002")
        document_data.append({
            "id": f"{document.id_}-{idx}",
            "text": chunk,
            "metadata": document.metadata,
            "embedding": embedding.data[0].embedding
        })

#### Add documents to ChromaDB


In [8]:
documents = [doc["text"] for doc in document_data]
embeddings = [doc["embedding"] for doc in document_data]
metadatas = [doc["metadata"] for doc in document_data]
ids = [doc["id"] for doc in document_data]

In [9]:
chroma_client.reset()
collection = chroma_client.get_or_create_collection(
    name="friends", metadata={"hnsw:space": "cosine"}, embedding_function=openai_ef)

In [10]:
collection.add(
    embeddings=embeddings,
    documents=documents,
    metadatas=metadatas,
    ids=ids)

## Retrieval

In this step, we will retrieve the most relevant documents to a given question. We will use the vector database to retrieve the most similar documents to the question.

In order to do this we will use the `text-embedding-3-small` model (**the same model used to index the documents**) from OpenAI to embed the question and then use the vector database to retrieve the most similar documents.

We will retrieve the top 5 documents based on the _cosine similarity_ between the question and the documents. Other similarity metrics can be used as well like squared L2 or inner product.

Change `cosine` to `l2` or `ip` when creating the collection above to try these out.


In [11]:
query = "What is Chandler's job?"

In [None]:
result = collection.query(query_texts=[query], n_results=5)
context = result["documents"][0]
#display(Markdown(f"------------\n\n{"\n\n------------\n\n".join(context)}"))

formatted_text = "\n\n------------\n\n".join(context)

# Display the formatted markdown
display(Markdown(f"{formatted_text}"))


## Generation

In this step, we will generate an answer to the question using the retrieved documents as context. We will use the OpenAI API to generate the answer.


In [13]:
prompt = PromptTemplate("""You are a helpful assistant that answers questions about the friends characters using provided context. 

Question: {query}

Context: 

-----------------------------------
{context}

-----------------------------------

""")
message = prompt.format(query=query, context="\n\n".join(context))
display(Markdown(f"{message}"))

You are a helpful assistant that answers questions about the friends characters using provided context. 

Question: What is Chandler's job?

Context: 

-----------------------------------
Much to Chandler's dismay, the job is as an unpaid intern. He soon secures a full-time paying job in the business as a junior copywriter where the internship was originally just for three assistants, as the company felt that Chandler was the strongest candidate in the program but was too experienced and mature to be just an assistant.
In an alternate reality storyline during the show ("The One That Could Have Been"), Chandler has the guts to quit his job and works as a freelance writer, specializing in comics.

Chandler Bing
=============

Career


When the series begins, Chandler works as a Data Processor. This is a job which he thoroughly loathes, and tries to quit early in season 1 ("The One With The Stoned Guy"). His boss offers a promotion and more and more money to lure him back. According to the nameplate on his door (seen in "The One With The Ick Factor") he has become Processing Supervisor. Although never explicitly stated, exterior transition shots show that Chandler's office is in the Solow Building.
It remains a running joke through most of the seasons that no one quite knows what he does.
The guys win the trivia game because Rachel and Monica don't know what Chandler does for a living ("The One With The Embryos") (Although since Ross created the game, this suggests that he knows the answer)
Joey once says, " And you call yourself an accountant?" and Chandler replies, "No!"
Monica admits several times that she doesn't pay attention when he talks about his work, but finally learns what he does when he quits in season 9. She calls it, "Statistical Analysis and Data Reconfiguration." Chandler responds by looking at her and saying, "I quit, and you learn what I do?"
Chandler mentions his job title in, "The One with the Cooking Class." This job inspires his oft-referenced office slang word "WENUS" (Weekly Estimated Net Usage Systems), as well as the "ANUS" (Annual Net Usage Statistics). Because of this job Chandler appears to be the most financially well-off among the six friends for the most part of the series and is also shown to hold a position of authority in his company.
For a time, Chandler is unable to simply quit his job as it is his nature to avoid ending anything forcefully. This leads to an awkward moment where he dozes off during a crucial meeting and unintentionally agrees to run a new branch of the company in Tulsa, Oklahoma. Although he is able to work out a schedule that allows him to spend a few days each week in New York with Monica when she gets a new job at a restaurant in the city, when his new job forces him to work on Christmas Day, Chandler quits his job so he can fly home to be with Monica ("The One With Christmas In Tulsa").
Monica helps Chandler secure a job in advertising. Much to Chandler's dismay, the job is as an unpaid intern.

Personality


"Hi, I'm Chandler, I make jokes when I'm uncomfortable."
—Chandler
Though Chandler never lets up by using sarcasm as a defense, he has a tendency to come off as needy and makes bad first impressions as said by Phoebe with his constant joke-making and snarky demeanor. Despite this emotional immaturity, Chandler is the most financially secure of his friends.
Chandler is dorky, quirky and estranged from both of his parents. He suffers from commitment issues, brought on by growing up in a broken home with no idea of what a stable marriage looks like, can be neurotic and extremely defensive with humor as his shield but his sense of humor is generally unsophisticated, to the point during an interview when his boss told him he'd have "extra duties on his hand" he had to stop himself from laughing. Chandler also associates everything that links to his parents' divorce in a negative light, specifically Thanksgiving where his parents reveal their separation over turkey where his father plans to run away with the pool boy. This also associates with his mistrust of people in adult relationships. Becoming extremely paranoid when his girlfriend, Kathy (who was an actress) shared a sex scene with her co-star prompting a big fight between the two. Ironically it was Chandler's own paranoia that drove Kathy to have an affair with the same person.
His commitment issues are not only a running gag in the series but as well with his friends who use it as an excuse to mock him. In "The One With The Lesbian Wedding", when asked who of the group will get married last they all pointed at Chandler. This could either be a joke, a ribbing on his commitment-phobia or him being the least desirable of the group. However, he was the first friend to happily settle down as a married man. Even when he was a happily married man, Chandler still retained some of his paranoia when it was revealed that their surrogate mother was having twins, he instantly panicked and suggested keeping only one. However, when he discovered they were a male and a female, he was overjoyed.
Chandler is relatively comfortable with complimenting looks of the same gender like Phoebe. However, he is quite humiliated by his effeminate nature and is a profound heterosexual whereas Phoebe has been hinted to be bisexual. It's revealed Joey is who he would go out with.

Chandler Bing


“
I'm not great at the advice, can I interest you in a sarcastic comment?
”
—Chandler after being asked for advice.
Chandler Muriel Bing is one of the main characters on the popular sitcom Friends (1994–2004), portrayed by the late Matthew Perry. Chandler has a very good sense of humor, and is notoriously sarcastic, an attribute he calls a defense mechanism he developed due to his parents' divorcing when he was a child. His best friends are Joey Tribbiani and Ross Geller. Chandler has a habit of asking rhetorical questions that start with "Could it be any more...?" or something similar, such as "Could this be any more...?" or "Could she be any more...?".

-----------------------------------



In [14]:
stream = openai_client.chat.completions.create(
    messages=[{"role": "user", "content": query}],
    model="gpt4",
    stream=True)

output = ""
for chunk in stream:
    if chunk.choices:  # Check if the list is not empty
        output += chunk.choices[0].delta.content or ""
    display(Markdown(f"{output}"), clear=True)

In the television show "Friends", Chandler Bing works in statistical analysis and data reconfiguration for a large multinational corporation. However, his job is often a source of humor among his friends who don't fully understand what he does. Later in the series, Chandler changes his career to advertising.

## Normal example using LlamaIndex

In this example, we will use LlamaIndex to abstract the indexing and retrieval steps. This shows how easily the same pipeline can be implemented using LlamaIndex.


In [None]:
#%pip install llama-index-embeddings-azure-openai
#%pip install llama-index-llms-azure-openai

In [15]:
import chromadb
from chromadb import Settings
from llama_index.llms.openai import OpenAI
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.ingestion import IngestionPipeline
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.vector_stores.chroma import ChromaVectorStore

from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding

# ChromaDB Vector Store
chroma_client = chromadb.PersistentClient(
    path="./data/baseline-rag/chromadb", settings=Settings(allow_reset=True))
chroma_client.reset()
collection = chroma_client.get_or_create_collection(
    name="friends", metadata={"hnsw:space": "cosine"})
vector_store = ChromaVectorStore(chroma_collection=collection)


llm = AzureOpenAI(
    model="gpt-4",
    deployment_name="gpt4",
    api_key=os.getenv("OPENAI_API_KEY"),  
    api_version=os.getenv("OPENAI_API_VERSION"), # https://learn.microsoft.com/en-us/azure/ai-services/openai/reference?WT.mc_id=AZ-MVP-5004796
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT")
)

# You need to deploy your own embedding model as well as your own chat completion model
embedding = AzureOpenAIEmbedding(
    model="text-embedding-ada-002",
    deployment_name="text-embedding-ada-002",
    api_key=os.getenv("OPENAI_API_KEY"),  
    api_version=os.getenv("OPENAI_API_VERSION"), # https://learn.microsoft.com/en-us/azure/ai-services/openai/reference?WT.mc_id=AZ-MVP-5004796
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT")
)

# Define the ingestion pipeline to add documents to vector store
pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_size=512, chunk_overlap=20),
        embedding,
    ],
    vector_store=vector_store,
)

# Create index with the vector store and using the embedding model
index = VectorStoreIndex.from_vector_store(
    vector_store=vector_store, embed_model=embedding)

In [16]:
# Fetch documents
documents = SimpleDirectoryReader('./data/docs').load_data()

# Run pipeline
pipeline.run(documents=documents)

print("Done")

Done


#### Create base QueryEngine from LlamaIndex


In [17]:
query_engine = index.as_query_engine(llm=llm, verbose=True)query_engine = index.as_query_engine(llm=llm, verbose=True)

#### Or alternatively, create a CustomQueryEngine


In [None]:
from llama_index.core import PromptTemplate
from llama_index.core.query_engine import CustomQueryEngine
from llama_index.core.retrievers import BaseRetriever
from llama_index.core import get_response_synthesizer
from llama_index.core.response_synthesizers import BaseSynthesizer

qa_prompt = PromptTemplate(
    """You are a helpful assistant that answers questions about the Malazan Fantasy Universe using provided context.
    Context information is below.
    ---------------------
    {context_str}
    ---------------------
    Given the context information and not prior knowledge, answer the query.
    Query: {query_str}
    Answer: 
    """,
)


class RAGQueryEngine(CustomQueryEngine):
    """RAG String Query Engine."""

    retriever: BaseRetriever
    response_synthesizer: BaseSynthesizer
    llm: OpenAI
    qa_prompt: PromptTemplate

    def custom_query(self, query_str: str):
        nodes = self.retriever.retrieve(query_str)
        context_str = "\n\n".join([n.node.get_content() for n in nodes])
        print("Prompt:\n\n", qa_prompt.format(
            context_str=context_str, query_str=query_str))
        response = self.llm.complete(
            qa_prompt.format(context_str=context_str, query_str=query_str)
        )

        return str(response)


synthesizer = get_response_synthesizer(response_mode="compact")
query_engine = RAGQueryEngine(
    retriever=index.as_retriever(),
    response_synthesizer=synthesizer,
    llm=llm,
    qa_prompt=qa_prompt,
)

In [18]:
response = query_engine.query(query)
display(Markdown(f"{response}"))

Chandler's job changes throughout the series. Initially, he works as a Data Processor and later becomes a Processing Supervisor. His job involves "Statistical Analysis and Data Reconfiguration." Later on, he unintentionally agrees to run a new branch of the company in Tulsa, Oklahoma. However, he quits this job to be with Monica on Christmas Day. Eventually, Monica helps him secure a job in advertising, where he starts as an unpaid intern and later becomes a junior copywriter. In an alternate reality storyline, Chandler works as a freelance writer, specializing in comics.

## Simplest RAG implementation using LlamaIndex


In [19]:

from llama_index.core import Settings

llm = AzureOpenAI(
    model="gpt-4",
    deployment_name="gpt4",
    api_key=os.getenv("OPENAI_API_KEY"),  
    api_version=os.getenv("OPENAI_API_VERSION"), # https://learn.microsoft.com/en-us/azure/ai-services/openai/reference?WT.mc_id=AZ-MVP-5004796
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT")
)

# You need to deploy your own embedding model as well as your own chat completion model
embedding = AzureOpenAIEmbedding(
    model="text-embedding-ada-002",
    deployment_name="text-embedding-ada-002",
    api_key=os.getenv("OPENAI_API_KEY"),  
    api_version=os.getenv("OPENAI_API_VERSION"), # https://learn.microsoft.com/en-us/azure/ai-services/openai/reference?WT.mc_id=AZ-MVP-5004796
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT")
)

Settings.llm = llm
Settings.embed_model = embedding


# Fetch documents
documents = SimpleDirectoryReader('./data/docs').load_data()

# build VectorStoreIndex that takes care of chunking documents
# and encoding chunks to embeddings for future retrieval
index = VectorStoreIndex.from_documents(documents=documents)

# The QueryEngine class is equipped with the generator
# and facilitates the retrieval and generation steps
query_engine = index.as_query_engine()

# Use your Default RAG
response = query_engine.query(query)
display(Markdown(f"{response}"))

Chandler starts off as a Data Processor, a job he dislikes. He later becomes a Processing Supervisor. His work involves "Statistical Analysis and Data Reconfiguration," which is revealed when he quits his job in season 9. After quitting his job, he briefly runs a new branch of the company in Tulsa, Oklahoma, but eventually quits that too. Monica then helps him secure a job in advertising, starting as an unpaid intern and later becoming a junior copywriter. In an alternate reality storyline, Chandler quits his job and works as a freelance writer, specializing in comics.

In [20]:
query = "how many women did Ross date and what were their names?"
response = query_engine.query(query)
display(Markdown(f"{response}"))

Ross dated three women mentioned in the context: Rachel, Emily Waltham, and Elizabeth Stevens. The context also mentions that Ross slept with 14 women during the series, but their names are not provided.