# **Vector Stores and Retrievers**

## **What's Covered?**
- Retrievers
- Multi-Query Retrievers
- Contextual Compressions

## **Retrievers**

There are times when we need to pass in Vector Stores as **retriever** objects, which can be easily done via a **as_retriever()** method call.

A retriever is an interface that returns documents given an unstructured query. It is more general than a vector store. A retriever does not need to be able to store documents, only to return (or retrieve) them. Vector stores can be used as the backbone of a retriever, but there are other types of retrievers as well.

Retrievers accept a string query as input and return a list of Document's as output.

We can specify search type implemented by a vector store, like similarity and MMR (i.e. Maximum marginal relevance retrieval), to query the texts in the vector store.
- **Specify Top k**
```python
retriever = db_connection.as_retriever(search_kwargs={"k": 3})
```
- **Specify Top k and Search Type**
```python
retriever = db_connection.as_retriever(search_type="similarity", search_kwargs={"k": 3})
```
- **Maximum Marginal Relevance Retrieval**
```python
retriever = db_connection.as_retriever(search_type="mmr")
```
- **Similarity Score Threshold Retrieval**  
Apply a cutoff or a threshold such that any document which is below the cutoff is not returned.
```python
retriever = db_connection.as_retriever(
    search_type="similarity_score_threshold", 
    search_kwargs={"k": 3, "score_threshold": 0.5}
)
```

In [1]:
f = open('keys/.openai_api_key.txt')

OPENAI_API_KEY = f.read()

In [2]:
# Step 1 - Initialize an embedding_model
# We are just loading OpenAIEmbeddings
from langchain_openai import OpenAIEmbeddings

embedding_model = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)



In [3]:
# Step 2 - Initialize a ChromaDB Connection
from langchain_chroma import Chroma

# Initialize the database connection
# If database exist, it will connect with the collection_name and persist_directory
# Otherwise a new collection will be created
db = Chroma(collection_name="vector_database", 
            embedding_function=embedding_model, 
            persist_directory="./chroma_db_")

In [4]:
db

<langchain_chroma.vectorstores.Chroma at 0x1143ac8b0>

In [5]:
# We can check the already existing values

print(len(db.get()["ids"]))

1004


In [6]:
# Converting CHROMA db connection to Retriever Object
retriever = db.as_retriever(search_type="similarity", search_kwargs={"k": 3})

print(type(retriever))

<class 'langchain_core.vectorstores.base.VectorStoreRetriever'>


In [7]:
query = "What is their on Julie vs Rachels List?"

results = retriever.invoke(query)

In [8]:
results

[Document(metadata={'source': 'data/subtitles/Friends_2x08.srt'}, page_content='126\n00:07:28,029 --> 00:07:29,621\nHe\'s gonna stay with Julie.\n\n127\n00:07:29,864 --> 00:07:31,957\nHe\'s gonna stay with her\nand she\'ll be:\n\n128\n00:07:32,233 --> 00:07:34,463\n"Hi, I\'m Julie. Ross picked me.\n\n129\n00:07:34,736 --> 00:07:38,797\nWe\'ll get married and have lots\nof kids and dig up stuff together!"\n\n130\n00:07:40,475 --> 00:07:43,137\nNo offense, but that\nsounds nothing like her.\n\n131\n00:07:46,080 --> 00:07:50,073\nWhat am I gonna do?\nThis is like a complete nightmare!\n\n132\n00:07:50,318 --> 00:07:54,448\nI know. This must be so hard.\n"Oh, no! Two women love me!\n\n133\n00:07:55,790 --> 00:07:59,055\nThey\'re both gorgeous,\nmy wallet\'s too small for my 50s...\n\n134\n00:07:59,260 --> 00:08:01,751\n...and my diamond shoes are too tight!"'),
 Document(metadata={'source': 'data/subtitles/Friends_2x08.srt'}, page_content='247\n00:15:09,082 --> 00:15:10,242\nNo! I\n\n248\n

## **MultiQuery Retriever**

- Sometimes the documents in your vector store may contain phrasing that you are not aware of, due to their size. This can cause issues in trying to think of the correct query string for similarity search.
- Retrieval may produce different results with subtle changes in query wording or if the embeddings do not capture the semantics of the data well. Prompt engineering / tuning is sometimes done to manually address these problems.
- The **`MultiQueryRetriever`** automates the process of prompt tuning by using an LLM to generate multiple queries from different perspectives for a given user input query. 
- For each query, it retrieves a set of relevant documents and takes the unique union across all queries to get a larger set of potentially relevant documents. By generating multiple perspectives on the same question, the **`MultiQueryRetriever`** might be able to overcome some of the limitations of the distance-based retrieval and get a richer set of results.

**Idea**  
- We will typically ask a question/query
- A ChatModel is going to make a couple of variations of the initial question/query
- These variations are now used to retrieve the documents

In [10]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(openai_api_key=OPENAI_API_KEY, temperature=0)

In [13]:
from langchain.retrievers.multi_query import MultiQueryRetriever

mq_retriever = MultiQueryRetriever.from_llm(
    retriever = retriever, 
    llm = chat_model
)

In [14]:
# Logging: Behind the scenes
import logging


logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO) 

In [17]:
question = "What is their on Julie vs Rachels List?"

# This will not directly answer any query
unique_docs = mq_retriever.invoke(input=question)

INFO:langchain.retrievers.multi_query:Generated queries: ['1. How does Julie compare to Rachel on the list?', '2. What are the differences between Julie and Rachel on the list?', "3. Can you provide insights into Julie and Rachel's positions on the list?"]


In [18]:
len(unique_docs)

6

In [20]:
print(unique_docs[0].page_content)

149
00:08:53,839 --> 00:08:55,067
I mean....

150
00:08:55,274 --> 00:08:59,802
All right, I guess you can say
she's a little spoiled sometimes.

151
00:09:00,245 --> 00:09:01,940
You could say that.

152
00:09:03,816 --> 00:09:07,775
I guess, sometimes
she's a little ditzy, you know?

153
00:09:08,153 --> 00:09:11,088
And I've seen her be a little
too into her looks.

154
00:09:11,757 --> 00:09:13,816
And Julie and I have
a lot in common...

155
00:09:14,092 --> 00:09:16,959
... because we're both
paleontologists, right?

156
00:09:17,196 --> 00:09:19,027
But Rachel's just a waitress.

157
00:09:19,264 --> 00:09:20,663
Waitress.


In [21]:
unique_docs

[Document(metadata={'source': 'data/subtitles/Friends_2x08.srt'}, page_content="149\n00:08:53,839 --> 00:08:55,067\nI mean....\n\n150\n00:08:55,274 --> 00:08:59,802\nAll right, I guess you can say\nshe's a little spoiled sometimes.\n\n151\n00:09:00,245 --> 00:09:01,940\nYou could say that.\n\n152\n00:09:03,816 --> 00:09:07,775\nI guess, sometimes\nshe's a little ditzy, you know?\n\n153\n00:09:08,153 --> 00:09:11,088\nAnd I've seen her be a little\ntoo into her looks.\n\n154\n00:09:11,757 --> 00:09:13,816\nAnd Julie and I have\na lot in common...\n\n155\n00:09:14,092 --> 00:09:16,959\n... because we're both\npaleontologists, right?\n\n156\n00:09:17,196 --> 00:09:19,027\nBut Rachel's just a waitress.\n\n157\n00:09:19,264 --> 00:09:20,663\nWaitress."),
 Document(metadata={'source': 'data/subtitles/Friends_2x08.srt'}, page_content="152\n00:09:03,816 --> 00:09:07,775\nI guess, sometimes\nshe's a little ditzy, you know?\n\n153\n00:09:08,153 --> 00:09:11,088\nAnd I've seen her be a little\nto

## **Contextual Compression**

We just saw how to leverage LLMs to expand our queries, now let's explore how to use LLMs to "compress" our outputs.

Above we returned the entirety of the vectorized document. Ideally we would pass this document as context to an LLM to get a more relevant (i.e. compressed) answer.

**Important: We are not performing compression in the traditional sense, instead we are using an LLM to grab a larger document text output and "distill" it to a smaller and more relevant output.**

In [23]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

# Create a instance of chain extractor
# This with compress the large document into a summary
chat_model = ChatOpenAI(openai_api_key=OPENAI_API_KEY)
compressor = LLMChainExtractor.from_llm(chat_model)

# Contextual Compressions
compression_retriever = ContextualCompressionRetriever(
        base_compressor=compressor, base_retriever=retriever
)

In [28]:
question = "How does Julie compare to Rachel on the list?"

# This will not directly answer any query
compressed_docs = compression_retriever.invoke(input=question)

In [29]:
print(len(compressed_docs))

2


In [30]:
compressed_docs

[Document(metadata={'source': 'data/subtitles/Friends_2x08.srt'}, page_content="But Rachel's just a waitress."),
 Document(metadata={'source': 'data/subtitles/Friends_2x08.srt'}, page_content='And Julie and I have a lot in common')]

In [31]:
# Returning the summary of the compressed_docs

[docs.page_content for docs in compressed_docs]

["But Rachel's just a waitress.", 'And Julie and I have a lot in common']