# Your First RAG Application

In this notebook, we'll walk you through each of the components that are involved in a simple RAG application.

We won't be leveraging any fancy tools, just the OpenAI Python SDK, Numpy, and some classic Python.

> NOTE: This was done with Python 3.11.4.

> NOTE: There might be [compatibility issues](https://github.com/wandb/wandb/issues/7683) if you're on NVIDIA driver >552.44 As an interim solution - you can rollback your drivers to the 552.44.

## Table of Contents:

- Task 1: Imports and Utilities
- Task 2: Documents
- Task 3: Embeddings and Vectors
- Task 4: Prompts
- Task 5: Retrieval Augmented Generation
  - 🚧 Activity #1: Augment RAG

Let's look at a rather complicated looking visual representation of a basic RAG application.

<img src="https://i.imgur.com/vD8b016.png" />

## Task 1: Imports and Utility

We're just doing some imports and enabling `async` to work within the Jupyter environment here, nothing too crazy!

In [1]:
from aimakerspace.text_utils import TextFileLoader, CharacterTextSplitter
from aimakerspace.vectordatabase import VectorDatabase
import asyncio

In [2]:
import nest_asyncio
nest_asyncio.apply()

## Task 2: Documents

We'll be concerning ourselves with this part of the flow in the following section:

<img src="https://i.imgur.com/jTm9gjk.png" />

### Loading Source Documents

So, first things first, we need some documents to work with.

While we could work directly with the `.txt` files (or whatever file-types you wanted to extend this to) we can instead do some batch processing of those documents at the beginning in order to store them in a more machine compatible format.

In this case, we're going to parse our text file into a single document in memory.

Let's look at the relevant bits of the `TextFileLoader` class:

```python
def load_file(self):
        with open(self.path, "r", encoding=self.encoding) as f:
            self.documents.append(f.read())
```

We're simply loading the document using the built in `open` method, and storing that output in our `self.documents` list.

> NOTE: We're using blogs from PMarca (Marc Andreessen) as our sample data. This data is largely irrelevant as we want to focus on the mechanisms of RAG, which includes out data's shape and quality - but not specifically what the contents of the data are. 


In [3]:
text_loader = TextFileLoader("data/PMarcaBlogs.txt")
documents = text_loader.load_documents()
len(documents)

1

In [4]:
print(documents[0][:100])


The Pmarca Blog Archives
(select posts from 2007-2009)
Marc Andreessen
copyright: Andreessen Horow


### Splitting Text Into Chunks

As we can see, there is one massive document.

We'll want to chunk the document into smaller parts so it's easier to pass the most relevant snippets to the LLM.

There is no fixed way to split/chunk documents - and you'll need to rely on some intuition as well as knowing your data *very* well in order to build the most robust system.

For this toy example, we'll just split blindly on length.

>There's an opportunity to clear up some terminology here, for this course we will be stick to the following:
>
>- "source documents" : The `.txt`, `.pdf`, `.html`, ..., files that make up the files and information we start with in its raw format
>- "document(s)" : single (or more) text object(s)
>- "corpus" : the combination of all of our documents

As you can imagine (though it's not specifically true in this toy example) the idea of splitting documents is to break them into managable sized chunks that retain the most relevant local context.

In [5]:
text_splitter = CharacterTextSplitter()
split_documents = text_splitter.split_texts(documents)
len(split_documents)

373

Let's take a look at some of the documents we've managed to split.

In [6]:
split_documents[0:1]

['\ufeff\nThe Pmarca Blog Archives\n(select posts from 2007-2009)\nMarc Andreessen\ncopyright: Andreessen Horowitz\ncover design: Jessica Hagy\nproduced using: Pressbooks\nContents\nTHE PMARCA GUIDE TO STARTUPS\nPart 1: Why not to do a startup 2\nPart 2: When the VCs say "no" 10\nPart 3: "But I don\'t know any VCs!" 18\nPart 4: The only thing that matters 25\nPart 5: The Moby Dick theory of big companies 33\nPart 6: How much funding is too little? Too much? 41\nPart 7: Why a startup\'s initial business plan doesn\'t\nmatter that much\n49\nTHE PMARCA GUIDE TO HIRING\nPart 8: Hiring, managing, promoting, and Dring\nexecutives\n54\nPart 9: How to hire a professional CEO 68\nHow to hire the best people you\'ve ever worked\nwith\n69\nTHE PMARCA GUIDE TO BIG COMPANIES\nPart 1: Turnaround! 82\nPart 2: Retaining great people 86\nTHE PMARCA GUIDE TO CAREER, PRODUCTIVITY,\nAND SOME OTHER THINGS\nIntroduction 97\nPart 1: Opportunity 99\nPart 2: Skills and education 107\nPart 3: Where to go and wh

## Task 3: Embeddings and Vectors

Next, we have to convert our corpus into a "machine readable" format as we explored in the Embedding Primer notebook.

Today, we're going to talk about the actual process of creating, and then storing, these embeddings, and how we can leverage that to intelligently add context to our queries.

### OpenAI API Key

In order to access OpenAI's APIs, we'll need to provide our OpenAI API Key!

You can work through the folder "OpenAI API Key Setup" for more information on this process if you don't already have an API Key!

In [7]:
import os
import openai
from getpass import getpass

openai.api_key = getpass("OpenAI API Key: ")
os.environ["OPENAI_API_KEY"] = openai.api_key

### Vector Database

Let's set up our vector database to hold all our documents and their embeddings!

While this is all baked into 1 call - we can look at some of the code that powers this process to get a better understanding:

Let's look at our `VectorDatabase().__init__()`:

```python
def __init__(self, embedding_model: EmbeddingModel = None):
        self.vectors = defaultdict(np.array)
        self.embedding_model = embedding_model or EmbeddingModel()
```

As you can see - our vectors are merely stored as a dictionary of `np.array` objects.

Secondly, our `VectorDatabase()` has a default `EmbeddingModel()` which is a wrapper for OpenAI's `text-embedding-3-small` model.

> **Quick Info About `text-embedding-3-small`**:
> - It has a context window of **8191** tokens
> - It returns vectors with dimension **1536**

#### ❓Question #1:

The default embedding dimension of `text-embedding-3-small` is 1536, as noted above. 

1. Is there any way to modify this dimension?
2. What technique does OpenAI use to achieve this?

> NOTE: Check out this [API documentation](https://platform.openai.com/docs/api-reference/embeddings/create) for the answer to question #1, and [this documentation](https://platform.openai.com/docs/guides/embeddings/use-cases) for an answer to question #2!

🔬 ANSWER #1
1. Indeed, it is possible with the `text-embedding-3-small` model (and for its big brother `text-embedding-3-large` too) to reduce this dimension.

2. OpenAI succeeds in doing this thanks to the MRL (Matryoshka Representation Learning) technique. 
MRL allows the generation of high-dimensional embeddings containing nested representations of variable sizes that can be dynamically truncated (taking care to do L2 normalization) as needed while retaining their semantic properties. 
This innovative technique allows to be adapt the size of the embedding to the chosen vector store vector size.
This size reduction also significantly reduces the computational and storage costs of embeddings, all without any significant loss of performance. This is quite a feat !

We can call the `async_get_embeddings` method of our `EmbeddingModel()` on a list of `str` and receive a list of `float` back!

```python
async def async_get_embeddings(self, list_of_text: List[str]) -> List[List[float]]:
        return await aget_embeddings(
            list_of_text=list_of_text, engine=self.embeddings_model_name
        )
```

We cast those to `np.array` when we build our `VectorDatabase()`:

```python
async def abuild_from_list(self, list_of_text: List[str]) -> "VectorDatabase":
        embeddings = await self.embedding_model.async_get_embeddings(list_of_text)
        for text, embedding in zip(list_of_text, embeddings):
            self.insert(text, np.array(embedding))
        return self
```

And that's all we need to do!

In [8]:
vector_db = VectorDatabase()
vector_db = asyncio.run(vector_db.abuild_from_list(split_documents))

Using text-embedding-3-small with None dimensions


#### ❓Question #2:

What are the benefits of using an `async` approach to collecting our embeddings?

> NOTE: Determining the core difference between `async` and `sync` will be useful! If you get stuck - ask ChatGPT!

🔬 ANSWER #2

In synchronous programming, operations are executed one after the other. Each task must be completed before the next can begin. 

In asynchronous programming, operations can execute independently of each other without blocking the main program. The program can launch a task and continue executing while that task runs in the background. 

Benefits of the asynchronous approach to collecting our embeddings:

1. Increased efficiency: 
When collecting embeddings for many documents, multiple requests can be sent simultaneously to the OpenAI API, without waiting for each individual request to complete.

2. Reduced Wait Time: In the code, the texts are divided into batches and processes them in parallel, which allows for concurrent request processing.

3. Better Resource Management: While the API processes an embedding request, the machine's CPU is not blocked and can perform other tasks.

4. Scalability: The asynchronous method is advantageous for large and/or numerous documents to process into embeddings.

5. Optimized Rate Limit Management: The batch approach allows for better compliance with API limits.

So, to review what we've done so far in natural language:

1. We load source documents
2. We split those source documents into smaller chunks (documents)
3. We send each of those documents to the `text-embedding-3-small` OpenAI API endpoint
4. We store each of the text representations with the vector representations as keys/values in a dictionary

### Semantic Similarity

The next step is to be able to query our `VectorDatabase()` with a `str` and have it return to us vectors and text that is most relevant from our corpus.

We're going to use the following process to achieve this in our toy example:

1. We need to embed our query with the same `EmbeddingModel()` as we used to construct our `VectorDatabase()`
2. We loop through every vector in our `VectorDatabase()` and use a distance measure to compare how related they are
3. We return a list of the top `k` closest vectors, with their text representations

There's some very heavy optimization that can be done at each of these steps - but let's just focus on the basic pattern in this notebook.

> We are using [cosine similarity](https://www.engati.com/glossary/cosine-similarity) as a distance metric in this example - but there are many many distance metrics you could use - like [these](https://flavien-vidal.medium.com/similarity-distances-for-natural-language-processing-16f63cd5ba55)

> We are using a rather inefficient way of calculating relative distance between the query vector and all other vectors - there are more advanced approaches that are much more efficient, like [ANN](https://towardsdatascience.com/comprehensive-guide-to-approximate-nearest-neighbors-algorithms-8b94f057d6b6)

In [9]:
vector_db.search_by_text("What is the Michael Eisner Memorial Weak Executive Problem?", k=3)

[('ordingly.\nSeventh, when hiring the executive to run your former specialty, be\ncareful you don’t hire someone weak on purpose.\nThis sounds silly, but you wouldn’t believe how oaen it happens.\nThe CEO who used to be a product manager who has a weak\nproduct management executive. The CEO who used to be in\nsales who has a weak sales executive. The CEO who used to be\nin marketing who has a weak marketing executive.\nI call this the “Michael Eisner Memorial Weak Executive Problem” — aaer the CEO of Disney who had previously been a brilliant TV network executive. When he bought ABC at Disney, it\npromptly fell to fourth place. His response? “If I had an extra\ntwo days a week, I could turn around ABC myself.” Well, guess\nwhat, he didn’t have an extra two days a week.\nA CEO — or a startup founder — oaen has a hard time letting\ngo of the function that brought him to the party. The result: you\nhire someone weak into the executive role for that function so\nthat you can continue to b

## Task 4: Prompts

In the following section, we'll be looking at the role of prompts - and how they help us to guide our application in the right direction.

In this notebook, we're going to rely on the idea of "zero-shot in-context learning".

This is a lot of words to say: "We will ask it to perform our desired task in the prompt, and provide no examples."

### XYZRolePrompt

Before we do that, let's stop and think a bit about how OpenAI's chat models work.

We know they have roles - as is indicated in the following API [documentation](https://platform.openai.com/docs/api-reference/chat/create#chat/create-messages)

There are three roles, and they function as follows (taken directly from [OpenAI](https://platform.openai.com/docs/guides/gpt/chat-completions-api)):

- `{"role" : "system"}` : The system message helps set the behavior of the assistant. For example, you can modify the personality of the assistant or provide specific instructions about how it should behave throughout the conversation. However note that the system message is optional and the model’s behavior without a system message is likely to be similar to using a generic message such as "You are a helpful assistant."
- `{"role" : "user"}` : The user messages provide requests or comments for the assistant to respond to.
- `{"role" : "assistant"}` : Assistant messages store previous assistant responses, but can also be written by you to give examples of desired behavior.

The main idea is this:

1. You start with a system message that outlines how the LLM should respond, what kind of behaviours you can expect from it, and more
2. Then, you can provide a few examples in the form of "assistant"/"user" pairs
3. Then, you prompt the model with the true "user" message.

In this example, we'll be forgoing the 2nd step for simplicities sake.

#### Utility Functions

You'll notice that we're using some utility functions from the `aimakerspace` module - let's take a peek at these and see what they're doing!

##### XYZRolePrompt

Here we have our `system`, `user`, and `assistant` role prompts.

Let's take a peek at what they look like:

```python
class BasePrompt:
    def __init__(self, prompt):
        """
        Initializes the BasePrompt object with a prompt template.

        :param prompt: A string that can contain placeholders within curly braces
        """
        self.prompt = prompt
        self._pattern = re.compile(r"\{([^}]+)\}")

    def format_prompt(self, **kwargs):
        """
        Formats the prompt string using the keyword arguments provided.

        :param kwargs: The values to substitute into the prompt string
        :return: The formatted prompt string
        """
        matches = self._pattern.findall(self.prompt)
        return self.prompt.format(**{match: kwargs.get(match, "") for match in matches})

    def get_input_variables(self):
        """
        Gets the list of input variable names from the prompt string.

        :return: List of input variable names
        """
        return self._pattern.findall(self.prompt)
```

Then we have our `RolePrompt` which laser focuses us on the role pattern found in most API endpoints for LLMs.

```python
class RolePrompt(BasePrompt):
    def __init__(self, prompt, role: str):
        """
        Initializes the RolePrompt object with a prompt template and a role.

        :param prompt: A string that can contain placeholders within curly braces
        :param role: The role for the message ('system', 'user', or 'assistant')
        """
        super().__init__(prompt)
        self.role = role

    def create_message(self, **kwargs):
        """
        Creates a message dictionary with a role and a formatted message.

        :param kwargs: The values to substitute into the prompt string
        :return: Dictionary containing the role and the formatted message
        """
        return {"role": self.role, "content": self.format_prompt(**kwargs)}
```

We'll look at how the `SystemRolePrompt` is constructed to get a better idea of how that extension works:

```python
class SystemRolePrompt(RolePrompt):
    def __init__(self, prompt: str):
        super().__init__(prompt, "system")
```

That pattern is repeated for our `UserRolePrompt` and our `AssistantRolePrompt` as well.

##### ChatOpenAI

Next we have our model, which is converted to a format analagous to libraries like LangChain and LlamaIndex.

Let's take a peek at how that is constructed:

```python
class ChatOpenAI:
    def __init__(self, model_name: str = "gpt-4o-mini"):
        self.model_name = model_name
        self.openai_api_key = os.getenv("OPENAI_API_KEY")
        if self.openai_api_key is None:
            raise ValueError("OPENAI_API_KEY is not set")

    def run(self, messages, text_only: bool = True):
        if not isinstance(messages, list):
            raise ValueError("messages must be a list")

        openai.api_key = self.openai_api_key
        response = openai.ChatCompletion.create(
            model=self.model_name, messages=messages
        )

        if text_only:
            return response.choices[0].message.content

        return response
```

#### ❓ Question #3:

When calling the OpenAI API - are there any ways we can achieve more reproducible outputs?

> NOTE: Check out [this section](https://platform.openai.com/docs/guides/text-generation/) of the OpenAI documentation for the answer!

🔬 ANSWER #3

By definition, LLMs produce non-deterministic results because they are trained to predict the next most likely token.

Setting the temperature parameter to 0, give the promess of eliminating the variability (since temperature = 0, this means no longer using a softmax at the logit output, but directly choosing the argmax, i.e. THE most probable token)
But, as the OpenAI documentation states: "Setting the temperature to 0 will make the outputs mostly deterministic, but a small amount of variability may remain."

OpenAI has then introduced the `seed` parameter, a feature that significantly improves output reproducibility. This parameter allows you to associate a specific user input with a specific LLM response.

You will obtain reproducible output, by:
1. Setting the same `seed`
2. Keeping all other parameters (prompt, temperature, top_p, etc.) exactly the same
3. Monitoring the `system_fingerprint` field in the response
( this field indicates whether OpenAI backend configuration has changed, what happens frequently)

### Creating and Prompting OpenAI's `gpt-4o-mini`!

Let's tie all these together and use it to prompt `gpt-4o-mini`!

In [10]:
from aimakerspace.openai_utils.prompts import (
    UserRolePrompt,
    SystemRolePrompt,
    AssistantRolePrompt,
)

from aimakerspace.openai_utils.chatmodel import ChatOpenAI

chat_openai = ChatOpenAI()
user_prompt_template = "{content}"
user_role_prompt = UserRolePrompt(user_prompt_template)
system_prompt_template = (
    "You are an expert in {expertise}, you always answer in a kind way."
)
system_role_prompt = SystemRolePrompt(system_prompt_template)

messages = [
    system_role_prompt.create_message(expertise="Python"),
    user_role_prompt.create_message(
        content="What is the best way to write a loop?"
    ),
]

response = chat_openai.run(messages)

In [11]:
print(response)

The best way to write a loop in Python largely depends on the specific task you're trying to accomplish. However, I can share some general tips and common patterns that are considered best practices:

### 1. Use a `for` loop for Iteration
If you're iterating over a sequence (like a list, tuple, or string), using a `for` loop is often best practice.

```python
# Example of a for loop
fruits = ['apple', 'banana', 'cherry']
for fruit in fruits:
    print(fruit)
```

### 2. Use a `while` loop when the number of iterations is not known
A `while` loop is suitable when the condition needs to be evaluated before each iteration.

```python
# Example of a while loop
count = 0
while count < 5:
    print(count)
    count += 1
```

### 3. Use `enumerate()` for Index and Value
If you need both the index and the value in a `for` loop, use `enumerate()`.

```python
# Using enumerate
for index, fruit in enumerate(fruits):
    print(f"{index}: {fruit}")
```

### 4. List Comprehensions for Simple Loops
W

## Task 5: Retrieval Augmented Generation

Now we can create a RAG prompt - which will help our system behave in a way that makes sense!

There is much you could do here, many tweaks and improvements to be made!

In [12]:
RAG_PROMPT_TEMPLATE = """ \
Use the provided context to answer the user's query.

You may not answer the user's query unless there is specific context in the following text.

If you do not know the answer, or cannot answer, please respond with "I don't know".
"""

rag_prompt = SystemRolePrompt(RAG_PROMPT_TEMPLATE)

USER_PROMPT_TEMPLATE = """ \
Context:
{context}

User Query:
{user_query}
"""


user_prompt = UserRolePrompt(USER_PROMPT_TEMPLATE)

class RetrievalAugmentedQAPipeline:
    def __init__(self, llm: ChatOpenAI(), vector_db_retriever: VectorDatabase) -> None:
        self.llm = llm
        self.vector_db_retriever = vector_db_retriever

    def run_pipeline(self, user_query: str) -> str:
        context_list = self.vector_db_retriever.search_by_text(user_query, k=4)

        context_prompt = ""
        for context in context_list:
            context_prompt += context[0] + "\n"

        formatted_system_prompt = rag_prompt.create_message()

        formatted_user_prompt = user_prompt.create_message(user_query=user_query, context=context_prompt)

        return {"response" : self.llm.run([formatted_system_prompt, formatted_user_prompt]), "context" : context_list}

#### ❓ Question #4:

What prompting strategies could you use to make the LLM have a more thoughtful, detailed response?

What is that strategy called?

> NOTE: You can look through ["Accessing GPT-3.5-turbo Like a Developer"](https://colab.research.google.com/drive/1mOzbgf4a2SP5qQj33ZxTz2a01-5eXqk2?usp=sharing) for an answer to this question if you get stuck!

🔬 ANSWER #4
1. Chain-of-Thought Prompting : The model is explicitly asked to reason step by step before providing a final answer. This encourages it to develop more logical, nuanced, and even critical reasoning, rather than providing an immediate and superficial answer.

2. Few-Shot Prompting:  Several examples of questions and expected answers are provided directly in the prompt. These examples serve as a model to guide the LLM toward the desired style, level of detail, and response logic.

3. Self-Ask Prompting (Socratic Prompting) : The model is encouraged to ask its own sub-questions before producing an answer. This encourages more analytical reasoning, much like an expert human would when faced with a difficult task.

and the magic becomes extraordinary, when you combine all these techniques with each other! 🚀

In [13]:
retrieval_augmented_qa_pipeline = RetrievalAugmentedQAPipeline(
    vector_db_retriever=vector_db,
    llm=chat_openai
)

In [14]:
retrieval_augmented_qa_pipeline.run_pipeline("What is the 'Michael Eisner Memorial Weak Executive Problem'?")

{'response': "The 'Michael Eisner Memorial Weak Executive Problem' refers to a situation where a CEO or startup founder hires a weak executive for a function they themselves excelled in, in order to maintain control and relevance in that area. This occurs because the CEO finds it difficult to let go of the function that contributed to their success, leading them to choose someone less competent so they can continue to be the dominant figure in that domain. An example given is Michael Eisner, the former CEO of Disney, who struggled with the ABC network's performance after acquiring it, illustrating the potential pitfalls of surrounding oneself with weaker executives to maintain a sense of superiority.",
 'context': [('ordingly.\nSeventh, when hiring the executive to run your former specialty, be\ncareful you don’t hire someone weak on purpose.\nThis sounds silly, but you wouldn’t believe how oaen it happens.\nThe CEO who used to be a product manager who has a weak\nproduct management ex

### 🏗️ Activity #1:

Enhance your RAG application in some way! 

Suggestions are: 

- Allow it to work with PDF files
- Implement a new distance metric
- Add metadata support to the vector database

While these are suggestions, you should feel free to make whatever augmentations you desire! 

> NOTE: These additions might require you to work within the `aimakerspace` library - that's expected!

In [15]:
from aimakerspace.text_utils import PDFLoader, MetadataCharacterTextSplitter, VectorDatabaseWithMetadata
from aimakerspace.openai_utils.prompts import UserRolePrompt, SystemRolePrompt
from aimakerspace.openai_utils.chatmodel import ChatOpenAI
import asyncio
import os
import os
import openai
from getpass import getpass
import nest_asyncio
nest_asyncio.apply()

### Loading and chunking a pdf file

In [16]:
# Load the PDF file with metadata
pdf_loader = PDFLoader("data/Think_And_Grow_Rich.pdf")
document_chunks_with_metadata = pdf_loader.load()
print(f"Number of loaded pages: {len(document_chunks_with_metadata)}")


Number of loaded pages: 253


In [17]:
# Display the global metadata of the document
metadata_global = document_chunks_with_metadata[0]["metadata"]
print(f"\nDocument metadata:")
print(f"Title: {metadata_global.get('title', 'Not specified')}")
print(f"Author: {metadata_global.get('author', 'Not specified')}")
print(f"Total pages: {metadata_global.get('total_pages', 'Not specified')}")




Document metadata:
Title: Think and Grow Rich by Napoleon Hill 
Author: Brod
Total pages: 253


In [18]:
# Display a sample of the first page content
print(f"\nExcerpt from the first page:")
print(document_chunks_with_metadata[0])



Excerpt from the first page:
{'content': '\uf09a  THINK & GROW RICH  \uf09b \nFree Digital Download PDF eBook Edition \nRe-published by: \n\uf09a  www.think-and-grow-rich-ebook.com  \uf09b i', 'metadata': {'title': 'Think and Grow Rich by Napoleon Hill ', 'author': 'Brod', 'subject': 'Free Digital Download PDF eBook Edition', 'keywords': 'free, ebook, think and grow rich, make money online, home based business, opportunity', 'creator': 'Writer', 'producer': 'OpenOffice.org 2.3', 'creation_date': "D:20081020165029+01'00'", 'modification_date': '', 'total_pages': 253, 'filename': 'Think_And_Grow_Rich.pdf', 'page_number': 1, 'page_size': {'width': 612, 'height': 792}}}


In [19]:
# Split the content into chunks with their metadata
splitter = MetadataCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.split_documents(document_chunks_with_metadata)
print(f"\nTotal number of chunks: {len(chunks)}")



Total number of chunks: 866


In [20]:
# Display an example chunk with its metadata
print(f"\nExample of a chunk with metadata:")
print(f"Content: {chunks[113]['content'][:1000]}...")
print(f"Page: {chunks[113]['metadata']['page_number']}")
print(f"Position in page: {chunks[113]['metadata']['chunk_start_char']} to {chunks[113]['metadata']['chunk_end_char']}")
print(f"First chunk of the page: {chunks[113]['metadata']['is_first_chunk']}")


Example of a chunk with metadata:
Content:   THINK & GROW RICH   
of approval upon them as being, not only the steps essential for the accumulation  
of money, but neccessary for the attainment of any definite goal. 
The steps call for no "hard labor." They call for no sacrifice. They do not  
require one to become ridiculous, or credulous. To apply them calls for no great  
amount of education. But the successful application of these six steps does call  
for sufficient imagination to enable one to see, and to understand, that  
accumulation of money cannot be left to chance, good fortune, and luck. One  
must realize that all who have accumulated great fortunes, first did a certain  
amount of dreaming, hoping, wishing, DESIRING, and PLANNING before they 
acquired money. 
You may as well know, right here, that you can never have riches in great  
quantities, UNLESS you can work yourself into a white heat of DESIRE for  
money, and actually BELIEVE you will possess it.
You may as w

In [21]:
openai.api_key = getpass("OpenAI API Key: ")
os.environ["OPENAI_API_KEY"] = openai.api_key

### Reducing the embeddings dimension to 256

In [22]:
# Create the vector database with metadata
vector_db = VectorDatabaseWithMetadata(
    embedding_model_name="text-embedding-3-small", 
    dimensions=256
)
vector_db = asyncio.run(vector_db.abuild_from_document_chunks(chunks))

Using text-embedding-3-small with 256 dimensions


### Full RAG Pipeline with Metadata 

In [23]:
# Configure RAG for Think and Grow Rich with metadata usage
RAG_PROMPT_TEMPLATE = """ \
You are an expert in making money and becoming rich.
Use the provided context to answer the user's questions.

When you respond, mention the page numbers from which your information comes.
If the information comes from multiple pages, cite all relevant pages.

You may not answer the user's query unless there is specific context in the following text.
If you don't know the answer, please respond with "I don't know".
"""

rag_prompt = SystemRolePrompt(RAG_PROMPT_TEMPLATE)

USER_PROMPT_TEMPLATE = """ \
Context:
{context}

Context Metadata:
{metadata}

User Question:
{user_query}
"""

user_prompt = UserRolePrompt(USER_PROMPT_TEMPLATE)

class RetrievalAugmentedQAPipeline:
    def __init__(self, llm: ChatOpenAI, vector_db_retriever: VectorDatabaseWithMetadata) -> None:
        self.llm = llm
        self.vector_db_retriever = vector_db_retriever

    async def run_pipeline(self, user_query: str) -> dict:
        # Retrieve relevant chunks with their metadata
        context_list = await self.vector_db_retriever.search_by_text(user_query, k=4)

        context_prompt = ""
        metadata_prompt = ""
        
        for i, (text, score, metadata) in enumerate(context_list):
            context_prompt += f"[EXTRACT {i+1}]\n{text}\n\n"
            metadata_prompt += f"[EXTRACT {i+1}]: Page {metadata['page_number']}\n"

        formatted_system_prompt = rag_prompt.create_message()
        formatted_user_prompt = user_prompt.create_message(
            user_query=user_query,
            context=context_prompt,
            metadata=metadata_prompt
        )

        return {
            "response": self.llm.run([formatted_system_prompt, formatted_user_prompt]), 
            "context": context_list
        }
# Initialize the RAG pipeline
chat_openai = ChatOpenAI()
think_grow_rich_rag = RetrievalAugmentedQAPipeline(
    vector_db_retriever=vector_db,
    llm=chat_openai
)

questions = [
    "What are the basic principles for success according to Napoleon Hill?",
    "How to transform our desires into concrete goals?",
    "What is the importance of faith in achieving success?"
]

for question in questions:
    print(f"\n\n=== QUESTION: {question} ===")
    result = asyncio.run(think_grow_rich_rag.run_pipeline(question))
    print("\nRESPONSE:")
    print(result["response"])
    
    # Display the sources (pages) used
    print("\nSOURCES:")
    for i, (_, _, metadata) in enumerate(result["context"]):
        print(f"- Page {metadata['page_number']}")



=== QUESTION: What are the basic principles for success according to Napoleon Hill? ===

RESPONSE:
The basic principles for success according to Napoleon Hill as outlined in the provided context include:

1. **Unwavering Courage** - Based on knowledge of self and one’s occupation (Page 108).
2. **Self-Control** - The ability to control oneself, which sets an example for others (Page 108).
3. **A Keen Sense of Justice** - Fairness and justice are essential for leadership and respect (Page 108).
4. **Definiteness of Decision** - The importance of making firm decisions to lead successfully (Page 108).
5. **Definiteness of Plans** - Successful leaders must have practical, definite plans (Page 108).

Additionally, the steps to accumulate money, while not hard laborious, require strong desire and effective planning (Page 37). The cooperation from others and harmony within a "Master Mind" group are also crucial elements in achieving success (Page 105).

SOURCES:
- Page 9
- Page 108
- Page 3

### Full RAG Pipeline with Metadata and Chain-of-Though

In [24]:
# Define prompts with CoT
COT_RAG_PROMPT_TEMPLATE = """
You are an AI assistant specializing in synthesis and analysis of precise information.
Your mission is to answer questions by relying ONLY on the provided context.

For each question, you MUST:
1) Think about the exact meaning of the question and identify key information to look for
2) Meticulously analyze the excerpts from the provided context
3) Extract relevant information by citing specific sources in the context
4) Organize your thinking in a structured and logical manner
5) Provide a complete and precise answer that directly addresses the question

IMPORTANT:
- If the necessary information is not in the context, respond with "I don't have enough information in the context to answer this question."
- Never generate information that is not explicitly mentioned in the context.
- Cite sources (excerpt numbers) to support each part of your answer.

Always follow this step-by-step thinking process before providing your final answer.
"""

COT_USER_PROMPT_TEMPLATE = """
# CONTEXT
{context}

# CONTEXT METADATA
{metadata}

# QUESTION
{user_query}

# THINKING PROCESS
To answer this question, I will analyze step by step:

1) Understanding the question:
[Think here about what the question is exactly asking]

2) Analysis of available information:
[Systematically examine each excerpt from the context]

3) Extraction of relevant information:
[Identify and cite specific passages that answer the question]

4) Logical synthesis:
[Organize this information coherently]

5) Complete answer:
[Formulate a precise answer based solely on the context]
"""

# Create prompts
cot_rag_prompt = SystemRolePrompt(COT_RAG_PROMPT_TEMPLATE)
cot_user_prompt = UserRolePrompt(COT_USER_PROMPT_TEMPLATE)

# RetrievalAugmentedQAPipeline class modified to use CoT
class RetrievalAugmentedQAPipelineWithCoT:
    def __init__(self, llm: ChatOpenAI, vector_db_retriever: VectorDatabaseWithMetadata) -> None:
        self.llm = llm
        self.vector_db_retriever = vector_db_retriever

    async def run_pipeline(self, user_query: str) -> dict:
        # Retrieve relevant chunks with their metadata
        context_list = await self.vector_db_retriever.search_by_text(user_query, k=4)
        
        # Format the context and metadata
        context_prompt = ""
        metadata_prompt = ""
        
        for i, (text, score, metadata) in enumerate(context_list):
            context_prompt += f"EXCERPT #{i+1} [relevance: {score:.4f}]:\n{text}\n\n"
            
            # Format metadata (for PDFs, page numbers, etc.)
            metadata_str = ""
            for key, value in metadata.items():
                if key in ["page_number", "title", "source", "author"]:
                    metadata_str += f"{key}: {value}, "
            
            if metadata_str:
                metadata_prompt += f"EXCERPT #{i+1}: {metadata_str[:-2]}\n"
            else:
                metadata_prompt += f"EXCERPT #{i+1}: No metadata\n"
        
        # Create messages for the LLM
        formatted_system_prompt = cot_rag_prompt.create_message()
        formatted_user_prompt = cot_user_prompt.create_message(
            user_query=user_query,
            context=context_prompt,
            metadata=metadata_prompt
        )
        
        # Generate response with CoT (temperature=0 for deterministic responses)
        response = self.llm.run(
            [formatted_system_prompt, formatted_user_prompt],
            temperature=0
        )
        
        return {
            "response": response,
            "context": context_list
        }

# Initialize the RAG pipeline with CoT
chat_openai = ChatOpenAI()
think_grow_rich_rag_with_cot = RetrievalAugmentedQAPipelineWithCoT(
    vector_db_retriever=vector_db,
    llm=chat_openai
)

# Usage examples
questions = [
    "What are the fundamental principles of success according to Napoleon Hill?",
    "How to transform a desire into a concrete goal?",
    "What is the importance of faith in achieving success?"
]

for question in questions:
    print(f"\n\n=== QUESTION: {question} ===")
    result = asyncio.run(think_grow_rich_rag_with_cot.run_pipeline(question))
    print("\nRESPONSE:")
    print(result["response"])
    
    # Display the sources used
    print("\nSOURCES:")
    for i, (_, _, metadata) in enumerate(result["context"]):
        if "page_number" in metadata:
            print(f"Page {metadata['page_number']}")

            



=== QUESTION: What are the fundamental principles of success according to Napoleon Hill? ===

RESPONSE:
1) Understanding the question:
The question is asking for the fundamental principles of success as outlined by Napoleon Hill in "Think and Grow Rich." 

2) Analysis of available information:
- **Excerpt #1** discusses the importance of a positive attitude, self-belief, and the concept of "auto-suggestion" as a law that influences thoughts and actions towards success.
- **Excerpt #3** lists important factors of leadership, which are essential for success: unwavering courage, self-control, a keen sense of justice, definiteness of decision, and definiteness of plans.
- **Excerpt #4** emphasizes the necessity of desire and belief in achieving wealth, stating that success requires dreaming, planning, and a strong desire for money.

3) Extraction of relevant information:
- From **Excerpt #1**, the principles include developing a positive attitude and self-belief, and using auto-suggestio