# **Optimizing RAGs**
1. Optimizing Indexing - 
2. Optimizing Quering - MultiQuery Retriever
3. Optimizing Post Retrieval Process - Contextual Compression

In [4]:
f = open('keys/.openai_api_key.txt')

OPENAI_API_KEY = f.read()

In [5]:
# Step 1 - Initialize an embedding_model
# We are just loading OpenAIEmbeddings
from langchain_openai import OpenAIEmbeddings

embedding_model = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)

In [6]:
# Step 2 - Initialize a ChromaDB Connection
from langchain_chroma import Chroma

# Initialize the database connection
# If database exist, it will connect with the collection_name and persist_directory
# Otherwise a new collection will be created
db = Chroma(collection_name="vector_database", 
            embedding_function=embedding_model, 
            persist_directory="./chroma_db_")

# We can check the already existing values
print(len(db.get()["ids"]))

1004


In [7]:
# Converting CHROMA db connection to Retriever Object
retriever = db.as_retriever(search_type="similarity", search_kwargs={"k": 3})

print(type(retriever))

<class 'langchain_core.vectorstores.base.VectorStoreRetriever'>


## **MultiQuery Retriever**

- Sometimes the documents in your vector store may contain phrasing that you are not aware of, due to their size. This can cause issues in trying to think of the correct query string for similarity search.
- Retrieval may produce different results with subtle changes in query wording or if the embeddings do not capture the semantics of the data well. Prompt engineering / tuning is sometimes done to manually address these problems.
- The **`MultiQueryRetriever`** automates the process of prompt tuning by using an LLM to generate multiple queries from different perspectives for a given user input query. 
- For each query, it retrieves a set of relevant documents and takes the unique union across all queries to get a larger set of potentially relevant documents. By generating multiple perspectives on the same question, the **`MultiQueryRetriever`** might be able to overcome some of the limitations of the distance-based retrieval and get a richer set of results.

**Idea**  
- We will typically ask a question/query
- A ChatModel is going to make a couple of variations of the initial question/query
- These variations are now used to retrieve the documents

In [8]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(openai_api_key=OPENAI_API_KEY, temperature=0)

In [9]:
from langchain.retrievers.multi_query import MultiQueryRetriever

mq_retriever = MultiQueryRetriever.from_llm(
    retriever = retriever, 
    llm = chat_model
)

In [10]:
# Logging: Behind the scenes
import logging


logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO) 

In [11]:
question = "What is their on Julie vs Rachels List?"

# This will not directly answer any query
unique_docs = mq_retriever.invoke(input=question)

INFO:langchain.retrievers.multi_query:Generated queries: ['1. How does Julie compare to Rachel on the list?', '2. What are the differences between Julie and Rachel on the list?', "3. Can you provide insights on Julie and Rachel's positions on the list?"]


In [12]:
len(unique_docs)

6

In [13]:
unique_docs

[Document(metadata={'source': 'data/subtitles/Friends_2x08.srt'}, page_content="149\n00:08:53,839 --> 00:08:55,067\nI mean....\n\n150\n00:08:55,274 --> 00:08:59,802\nAll right, I guess you can say\nshe's a little spoiled sometimes.\n\n151\n00:09:00,245 --> 00:09:01,940\nYou could say that.\n\n152\n00:09:03,816 --> 00:09:07,775\nI guess, sometimes\nshe's a little ditzy, you know?\n\n153\n00:09:08,153 --> 00:09:11,088\nAnd I've seen her be a little\ntoo into her looks.\n\n154\n00:09:11,757 --> 00:09:13,816\nAnd Julie and I have\na lot in common...\n\n155\n00:09:14,092 --> 00:09:16,959\n... because we're both\npaleontologists, right?\n\n156\n00:09:17,196 --> 00:09:19,027\nBut Rachel's just a waitress.\n\n157\n00:09:19,264 --> 00:09:20,663\nWaitress."),
 Document(metadata={'source': 'data/subtitles/Friends_2x08.srt'}, page_content="152\n00:09:03,816 --> 00:09:07,775\nI guess, sometimes\nshe's a little ditzy, you know?\n\n153\n00:09:08,153 --> 00:09:11,088\nAnd I've seen her be a little\nto

## **Contextual Compression**

We just saw how to leverage LLMs to expand our queries, now let's explore how to use LLMs to "compress" our outputs.

Above we returned the entirety of the vectorized document. Ideally we would pass this document as context to an LLM to get a more relevant (i.e. compressed) answer.

**Important: We are not performing compression in the traditional sense, instead we are using an LLM to grab a larger document text output and "distill" it to a smaller and more relevant output.**

In [14]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

# Create a instance of chain extractor
# This with compress the large document into a summary
chat_model = ChatOpenAI(openai_api_key=OPENAI_API_KEY)
compressor = LLMChainExtractor.from_llm(chat_model)

# Contextual Compressions
compression_retriever = ContextualCompressionRetriever(
        base_compressor=compressor, base_retriever=retriever
)

In [15]:
question = "How does Julie compare to Rachel on the list?"

# This will not directly answer any query
compressed_docs = compression_retriever.invoke(input=question)

In [16]:
print(len(compressed_docs))

2


In [17]:
compressed_docs

[Document(metadata={'source': 'data/subtitles/Friends_2x08.srt'}, page_content="But Rachel's just a waitress."),
 Document(metadata={'source': 'data/subtitles/Friends_2x08.srt'}, page_content='And Julie and I have\na lot in common')]

In [18]:
# Returning the summary of the compressed_docs

[docs.page_content for docs in compressed_docs]

["But Rachel's just a waitress.", 'And Julie and I have\na lot in common']