# RAG is more than just embedding search

**Table of contents**<a id='toc0_'></a> 

- [Introduction](#toc2_)
    - [What is RAG?](#toc3_)
    - [How do RAG models work?](#toc4_)
    - [Why there is a need for them?](#toc5_)
- [The 'Dumb' RAG Model](#toc6_)
    - [What is it?](#toc7_)
    - [What are the limitations?](#toc8_)
- [Improving the RAG Model](#toc9_)
- [Practical Examples](#toc10_)
    - [Case Study: Metaphor Systems](#toc11_)
    - [Case Study: Personal Assistant](#toc12_)
- [Utilizing Frameworks and Libraries](#toc13_)
- [Conclusion](#toc14_)


## <a id='toc2_'></a> [Introduction](#toc0_)

<a id='toc3_'></a>**What is RAG?**

Retrieval Augmented Generation (RAG) models are the bridge between large language models and external knowledge databases. They fetch the relevant data for a given query. For example, if you have some documents and want to ask questions related to the content of those documents, RAG models help by retrieving data from those documents and passing it to the LLM in queries.

<a id='toc4_'></a>**How do RAG models work?**

The typical RAG process involves embedding a user query and searching a vector database to find the most relevant information to supplement the generated response. This approach is particularly effective when the database contains information closely matching the query but not more than that.

![Image](https://jxnl.github.io/instructor/blog/img/dumb_rag.png)


<a id='toc5_'></a>**Why is there a need for them?**


Pre-trained large language models do not learn over time. If you ask them a question they have not been trained on, they will often hallucinate. Therefore, we need to embed our own data to achieve a better output.

## <a id='toc6_'></a>[The 'Dumb' RAG Model](#toc0_)

<a id='toc7_'></a>**What is it?**

The <u>simplest</u> implementation of RAG embeds a user query and do a <u>straightforward</u> search in a database, like a vector store of Wikipedia articles.

this approach often falls short when dealing with complex queries and diverse data sources.

<a id='toc8_'></a>**What are the limitations?**

This Simple RAG model suffers from several limitations:
- **Query-Document Mismatch:** It assumes that the query and document embeddings will align in the vector space, which is often not the case.
    - Query: "Tell me about climate change effects on marine life."
    - Issue: The model might retrieve documents related to general climate change or marine life, missing the specific intersection of both topics.
<!-- blank -->
- **Monolithic Search Backend:** It relies on a single search method and backend, reducing flexibility and the ability to handle multiple data sources.
    - Query: "Latest research in quantum computing."
    - Issue: The model might only search in a general science database, missing out on specialized quantum computing resources.
<!-- blank -->
- **Text Search Limitations:** The model is restricted to simple text queries without the nuances of advanced search features.
    - Query: "what problems did we fix last week"
    - Issue: cannot be answered by a simple text search since documents that contain problem, last week are going to be present at every week.
<!-- blank -->
- **Limited Planning Ability:** It fails to consider additional contextual information that could refine the search results.
    - Query: "Tips for first-time Europe travelers."
    - Issue: The model might provide general travel advice, ignoring the specific context of first-time travelers or European destinations.


## <a id='toc9_'></a>[Improving the RAG model](#toc0_)

**What's the solution?**

Enhancing RAG requires a more sophisticated approach known as query understanding.

This process involves analyzing the user's query and transforming it to better match the backend's search capabilities.

By doing so, we can significantly improve both the precision and recall of the search results, providing more accurate and relevant responses.

![Image](https://jxnl.github.io/instructor/blog/img/query_understanding.png)


## <a id='toc10_'></a>[Practical Examples](#toc0_)


In the examples below, we're going to use the [`instructor`](https://github.com/jxnl/instructor) library to simplify the interaction between the programmer and language models via the function-calling API.

<small>some of these case studied below has been inspired/done in collab with a few of my clients at new.computer, Metaphor system and Naro. Check them out.</small>


### <a id='toc11_'></a>Case Study: Metaphor Systems

Metaphor Systems presents an advanced implementation of the RAG model. It employs a unique approach to transform natural language queries into their custom search-optimized queries.

If you take a look at their web UI, you'll notice an auto-prompt option. This feature uses function calls to further optimize your query using a language model. The result is a fully specified Metaphor Systems query, optimized for their custom search.

This approach not only enhances the precision of the search results but also improves the overall user experience by providing more accurate and relevant responses.

![Image](https://jxnl.github.io/instructor/blog/img/meta.png)



If we peek under the hood, we can see that the query is actually a complex object, with a date range, and a list of domains to search in.

It's actually more complex than this but this is a good start.

We can model this structured output in Pydantic using the [`instructor`](https://github.com/jxnl/instructor) library.

In [40]:
from pydantic import BaseModel
import datetime
from typing import List

class DateRange(BaseModel):
    """
    A Pydantic model that represents a date range.
    """
    start: datetime.date
    end: datetime.date

class MetaphorQuery(BaseModel):
    """
    A Pydantic model that represents a Metaphor Systems query.
    """
    rewritten_query: str
    published_daterange: DateRange
    domains_allow_list: List[str]

    async def execute():
        """
        An asynchronous method that executes the query.
        """
        # Logic to interact with Metaphor Systems' search backend
        ...

In this example, `DateRange` and `MetaphorQuery` are Pydantic models that structure the user's query with a date range and a list of domains to search within.

These models **restructure** the user's query by including a <u>rewritten query</u>, a <u>range of published dates</u>, and a <u>list of domains</u> to search in.

This approach enhances the performance of the search backend without requiring the user to understand its intricacies.


Using the new restructured query, we can apply this pattern to our function calls to obtain results that are optimized for our backend.

In [41]:
import instructor 
from openai import OpenAI
import os

# Fetch the OpenAI API key from the environment variables. If not found, replace it with 'YOUR API KEY'.
api__key = os.getenv('OPENAI_API_KEY', "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")

# Enables response_model in the openai client
client = instructor.patch(OpenAI(api_key=api__key))

query = client.chat.completions.create(
    model="gpt-4",
    response_model=MetaphorQuery,
    messages=[
        {
            "role": "system", 
            "content": "You're a query understanding system for the Metafor Systems search engine. Today is Nov10 2023. Here are some tips: ..."
        },
        {
            "role": "user", 
            "content": "What are some recent developments in AI?"
        }
    ],
)

print(query.model_dump_json(indent=4)) # Printing the Json dump of the model

{
    "rewritten_query": "recent developments in AI",
    "published_daterange": {
        "start": "2023-01-01",
        "end": "2023-11-10"
    },
    "domains_allow_list": [
        "technology",
        "artificial intelligence"
    ]
}


The power of this approach goes beyond just adding date ranges to your search.

It's about creating nuanced, tailored searches that are deeply integrated with the backend.

Metaphor Systems offers a comprehensive suite of filters and options that you can leverage to construct a powerful search query. 

they can even employ a chain of thought prompting mechanism to enhance the utilization of these advanced features, making your search more efficient and precise:

In [44]:
from pydantic import Field

class DateRange(BaseModel):
    start: datetime.date
    end: datetime.date
    chain_of_thought: str = Field( # Chain of thought prompting 
        None,
        description="Think step by step to plan what is the best time range to search in"
    )

class MetaphorQuery(BaseModel):
    rewritten_query: str
    published_daterange: DateRange
    domains_allow_list: List[str]

    async def execute():
        ...

query = client.chat.completions.create(
    model="gpt-4",
    response_model=MetaphorQuery,
    messages=[
        {
            "role": "system", 
            "content": "You're a query understanding system for the Metafor Systems search engine. Today is Nov10 2023. Here are some tips: ..."
        },
        {
            "role": "user", 
            "content": "What are some recent developments in AI?"
        }
    ],
)

print(query.model_dump_json(indent=4)) # Printing the Json dump of the model

{
    "rewritten_query": "recent developments in AI",
    "published_daterange": {
        "start": "2022-11-10",
        "end": "2023-11-10",
        "chain_of_thought": "The user is asking for recent developments. Given today's date, the relevant timeframe for recent should be 1 year."
    },
    "domains_allow_list": [
        "Artificial Intelligence"
    ]
}


### <a id='toc12_'></a>Case Study: Personal Assistant

A personal assistant application needs to interpret vague queries and fetch information from multiple backends, such as emails and calendars. By modeling the assistant's capabilities using Pydantic, we can dispatch the query to the correct backend and retrieve a unified response.

For instance, when you ask, "What's on my schedule today?", the application needs to fetch data from various sources like events, emails, and reminders. This data is stored across different backends, but the goal is to provide a consolidated summary of results.

It's important to note that the data from these sources may not be embedded in a search backend. Instead, they could be accessed through different clients like a calendar or email, spanning both personal and professional accounts.


In [26]:
import enum
from typing import List
from pydantic import BaseModel
import datetime
import asyncio

# The ClientSource Enum represents the different sources from which data can be fetched.
class ClientSource(enum.Enum):
    GMAIL = "gmail"
    CALENDAR = "calendar"
 
# a Pydantic model that represents a search client.
# It contains the query, keywords, email, source, and date range for the search.
class SearchClient(BaseModel):
    query: str
    keywords: List[str]
    email: str
    source: ClientSource
    start_date: datetime.date
    end_date: datetime.date
 
    # an asynchronous method that fetches data from the specified source.
    async def execute(self) -> str:
        if self.source == ClientSource.GMAIL:
            # Fetch data from Gmail
            ...
        elif self.source == ClientSource.CALENDAR:
            # Fetch data from Calendar
            ...
 
# a Pydantic model that represents a retrieval operation.
# It contains a list of queries to be executed.
class Retrival(BaseModel):
    queries: List[SearchClient]

    # The execute method is an asynchronous method that executes all the queries and return the results.
    async def execute(self) -> str:
        return await asyncio.gather(*[query.execute() for query in self.queries])


Now, we can utilize this with a straightforward query such as "What do I have today?".

The system will attempt to asynchronously dispatch the query to the appropriate backend.

However, it's still crucial to remember that effectively prompting the language model is still a key aspect.


In [33]:
retrival = client.chat.completions.create(
    model="gpt-4",
    response_model=Retrival,
    messages=[
        {"role": "system", "content": "You are Jason's personal assistant. Today is November 10 2023."},
        {"role": "user", "content": "What do I have today?"}
    ],
)
print(retrival.model_dump_json(indent=4))

{
    "queries": [
        {
            "query": "Jason's schedule",
            "keywords": [
                "meeting",
                "event",
                "appointment"
            ],
            "email": "jason@example.com",
            "source": "calendar",
            "start_date": "2023-11-10",
            "end_date": "2023-11-10"
        }
    ]
}


To make it more challenging, we will assign it multiple tasks, followed by a list of queries that are routed to various search backends, such as email and calendar. Not only do we dispatch to different backends, over which we have no control, but we are also likely to render them to the user in different ways.

In [34]:
retrival = client.chat.completions.create(
    model="gpt-4",
    response_model=Retrival,
    messages=[
        {"role": "system", "content": "You are Jason's personal assistant. Today is November 10 2023."},
        {"role": "user", "content": "What meetings do I have today and are there any important emails I should be aware of?"}
    ],
)
print(retrival.model_dump_json(indent=4))

{
    "queries": [
        {
            "query": "Jason's meetings",
            "keywords": [
                "meeting",
                "appointment"
            ],
            "email": "Jason's email",
            "source": "calendar",
            "start_date": "2023-11-10",
            "end_date": "2023-11-10"
        },
        {
            "query": "Jason's emails",
            "keywords": [
                "important",
                "urgent",
                "ASAP"
            ],
            "email": "Jason's email",
            "source": "gmail",
            "start_date": "2023-11-10",
            "end_date": "2023-11-10"
        }
    ]
}


## <a id='toc13_'></a>[Utilizing Frameworks and Libraries](#toc0_)

Both of these examples showcase how both search providors and consumers can use [`instructor`](https://github.com/jxnl/instructor) to model their systems. This is a powerful pattern that allows you to build a system that can be used by anyone, and can be used to build an LLM layer, from scratch, in front of any arbitrary backend.

We can extend our RAG model's functionality by integrating various frameworks and libraries. This could include user interaction prompts, external data fetches, and more, all within the dispatch process. This modularity allows developers to build custom solutions that cater to their specific needs.

## <a id='toc14_'></a>[Conclusion](#toc0_)

This tutorial has introduced you to advanced techniques to improve RAG models by incorporating query understanding and structuring queries using Pydantic and instructor. The aim is to move beyond the limitations of the 'Dumb' RAG model and toward a more intelligent system capable of handling complex queries with context-aware responses.