# **Vector Stores and Retrievers**

## **What's Covered?**
- Retrievers
- Multi-Query Retrievers
- Contextual Compressions
- Indexing


## **Retrievers**

There are times when we need to pass in Vector Stores as **retriever** objects, which can be easily done via a **as_retriever()** method call.

A retriever is an interface that returns documents given an unstructured query. It is more general than a vector store. A retriever does not need to be able to store documents, only to return (or retrieve) them. Vector stores can be used as the backbone of a retriever, but there are other types of retrievers as well.

Retrievers accept a string query as input and return a list of Document's as output.

We can specify search type implemented by a vector store, like similarity and MMR (i.e. Maximum marginal relevance retrieval), to query the texts in the vector store.
- **Specify Top k**
```python
retriever = db_connection.as_retriever(search_kwargs={"k": 3})
```
- **Specify Top k and Search Type**
```python
retriever = db_connection.as_retriever(search_type="similarity", search_kwargs={"k": 3})
```
- **Maximum Marginal Relevance Retrieval**
```python
retriever = db_connection.as_retriever(search_type="mmr")
```
- **Similarity Score Threshold Retrieval**
```python
retriever = db_connection.as_retriever(
    search_type="similarity_score_threshold", 
    search_kwargs={"score_threshold": 0.5}
)
```

In [1]:
f = open('keys/.openai_api_key.txt')

OPENAI_API_KEY = f.read()

In [2]:
# Step 1 - Initialize an embedding_model
# We are just loading OpenAIEmbeddings
from langchain_openai import OpenAIEmbeddings

embedding_model = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)



In [3]:
# Step 2 - Initialize a ChromaDB Connection
from langchain_chroma import Chroma

# Initialize the database connection
# If database exist, it will connect with the collection_name and persist_directory
# Otherwise a new collection will be created
db = Chroma(collection_name="vector_database", 
            embedding_function=embedding_model, 
            persist_directory="./chroma_db_")

In [4]:
db

<langchain_chroma.vectorstores.Chroma at 0x1165ee8b0>

In [9]:
# We can check the already existing values

print(len(db.get()["ids"]))

1004


In [10]:
# Converting CHROMA db connection to Retriever Object
retriever = db.as_retriever(search_type="similarity", search_kwargs={"k": 3})

print(type(retriever))

<class 'langchain_core.vectorstores.base.VectorStoreRetriever'>


In [11]:
query = "What is their on Julie vs Rachels List?"

results = retriever.invoke(query)

In [12]:
results

[Document(metadata={'source': 'data/subtitles/Friends_2x08.srt'}, page_content='126\n00:07:28,029 --> 00:07:29,621\nHe\'s gonna stay with Julie.\n\n127\n00:07:29,864 --> 00:07:31,957\nHe\'s gonna stay with her\nand she\'ll be:\n\n128\n00:07:32,233 --> 00:07:34,463\n"Hi, I\'m Julie. Ross picked me.\n\n129\n00:07:34,736 --> 00:07:38,797\nWe\'ll get married and have lots\nof kids and dig up stuff together!"\n\n130\n00:07:40,475 --> 00:07:43,137\nNo offense, but that\nsounds nothing like her.\n\n131\n00:07:46,080 --> 00:07:50,073\nWhat am I gonna do?\nThis is like a complete nightmare!\n\n132\n00:07:50,318 --> 00:07:54,448\nI know. This must be so hard.\n"Oh, no! Two women love me!\n\n133\n00:07:55,790 --> 00:07:59,055\nThey\'re both gorgeous,\nmy wallet\'s too small for my 50s...\n\n134\n00:07:59,260 --> 00:08:01,751\n...and my diamond shoes are too tight!"'),
 Document(metadata={'source': 'data/subtitles/Friends_2x08.srt'}, page_content='247\n00:15:09,082 --> 00:15:10,242\nNo! I\n\n248\n

## **MultiQuery Retriever**

- Sometimes the documents in your vector store may contain phrasing that you are not aware of, due to their size. This can cause issues in trying to think of the correct query string for similarity search.
- Retrieval may produce different results with subtle changes in query wording or if the embeddings do not capture the semantics of the data well. Prompt engineering / tuning is sometimes done to manually address these problems.
- The **`MultiQueryRetriever`** automates the process of prompt tuning by using an LLM to generate multiple queries from different perspectives for a given user input query. 
- For each query, it retrieves a set of relevant documents and takes the unique union across all queries to get a larger set of potentially relevant documents. By generating multiple perspectives on the same question, the **`MultiQueryRetriever`** might be able to overcome some of the limitations of the distance-based retrieval and get a richer set of results.

**Idea**  
- We will typically ask a question/query
- A ChatModel is going to make a couple of variations of the initial question/query
- These variations are now used to retrieve the documents

In [19]:
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain_openai import ChatOpenAI

embedding_model = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY) 

db_connection = Chroma(
    persist_directory="./chroma_db_", 
    embedding_function=embedding_model
)

chat_model = ChatOpenAI(openai_api_key=OPENAI_API_KEY, temperature=0)

In [20]:
from langchain.retrievers.multi_query import MultiQueryRetriever

retriever_from_llm = MultiQueryRetriever.from_llm(
    retriever = db_connection.as_retriever(), 
    llm = chat_model
)

In [21]:
# Logging: Behind the scenes
import logging


logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO) 

In [22]:
question = "Who is Rachel?"

# This will not directly answer any query
unique_docs = retriever_from_llm.get_relevant_documents(query=question)

INFO:langchain.retrievers.multi_query:Generated queries: ['1. What is known about Rachel?', '2. Can you provide information on Rachel?', '3. Who is the person named Rachel?']


In [23]:
len(unique_docs)

7

In [24]:
print(unique_docs[0].page_content)

49
00:03:37,400 --> 00:03:40,808
Everybody, this is Rachel,
another Lincoln High survivor.

50
00:03:41,703 --> 00:03:44,089
This is everybody.
Chandler and Phoebe..

51
00:03:44,841 --> 00:03:46,509
...and Joey. And remember
my brother, Ross?

52
00:03:47,211 --> 00:03:47,574
Sure!

53
00:03:56,832 --> 00:03:59,845
You want to tell us now, or are we
waiting for four wet bridesmaids?

54
00:04:02,277 --> 00:04:05,575
Oh, God! Well, it started about
a half-hour before the wedding.

55
00:04:07,196 --> 00:04:09,015
I was in this room with
all the presents...

56
00:04:10,046 --> 00:04:14,346
This really gorgeous
Limoges gravy boat.

57
00:04:14,976 --> 00:04:15,572
When all of a sudden I realize...


In [25]:
unique_docs

[Document(page_content='49\n00:03:37,400 --> 00:03:40,808\nEverybody, this is Rachel,\nanother Lincoln High survivor.\n\n50\n00:03:41,703 --> 00:03:44,089\nThis is everybody.\nChandler and Phoebe..\n\n51\n00:03:44,841 --> 00:03:46,509\n...and Joey. And remember\nmy brother, Ross?\n\n52\n00:03:47,211 --> 00:03:47,574\nSure!\n\n53\n00:03:56,832 --> 00:03:59,845\nYou want to tell us now, or are we\nwaiting for four wet bridesmaids?\n\n54\n00:04:02,277 --> 00:04:05,575\nOh, God! Well, it started about\na half-hour before the wedding.\n\n55\n00:04:07,196 --> 00:04:09,015\nI was in this room with\nall the presents...\n\n56\n00:04:10,046 --> 00:04:14,346\nThis really gorgeous\nLimoges gravy boat.\n\n57\n00:04:14,976 --> 00:04:15,572\nWhen all of a sudden I realize...', metadata={'source': 'data/subtitles_data/Friends - 1x01 - The One Where Monica Gets A Roommate.720p HDTV.TvR.en.srt'}),
 Document(page_content="121\n00:07:59,047 --> 00:08:02,091\n. . .everybody.\nEverybody, this is Paul.\n\n12

## **Contextual Compression**

We just saw how to leverage LLMs to expand our queries, now let's explore how to use LLMs to "compress" our outputs.

Above we returned the entirety of the vectorized document. Ideally we would pass this document as context to an LLM to get a more relevant (i.e. compressed) answer.

**Important: We are not performing compression in the traditional sense, instead we are using an LLM to grab a larger document text output and "distill" it to a smaller and more relevant output.**

In [31]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

# Create a instance of chain extractor
# This with compress the large document into a summary
chat_model = ChatOpenAI(openai_api_key=OPENAI_API_KEY)
compressor = LLMChainExtractor.from_llm(chat_model)

# Contextual Compressions
compression_retriever = ContextualCompressionRetriever(
        base_compressor=compressor, base_retriever=db_connection.as_retriever()
)

In [32]:
question = "Who is Rachel?"

# This will not directly answer any query
compressed_docs = compression_retriever.get_relevant_documents(query=question)



In [33]:
print(len(compressed_docs))

3


In [34]:
compressed_docs

[Document(page_content='Rachel, another Lincoln High survivor.', metadata={'source': 'data/subtitles_data/Friends - 1x01 - The One Where Monica Gets A Roommate.720p HDTV.TvR.en.srt'}),
 Document(page_content='Rachel', metadata={'source': 'data/subtitles_data/Friends - 1x01 - The One Where Monica Gets A Roommate.720p HDTV.TvR.en.srt'}),
 Document(page_content='Rachel, what are you\nup to tonight?', metadata={'source': 'data/subtitles_data/Friends - 1x01 - The One Where Monica Gets A Roommate.720p HDTV.TvR.en.srt'})]

In [35]:
# Returning the summary of the compressed_docs

[docs.page_content for docs in compressed_docs]

['Rachel, another Lincoln High survivor.',
 'Rachel',
 'Rachel, what are you\nup to tonight?']

## **Indexing (coming soon...)**

The indexing API lets you load and keep in sync documents from any source into a vector store. Specifically, it helps:

- Avoid writing duplicated content into the vector store
- Avoid re-writing unchanged content
- Avoid re-computing embeddings over unchanged content

All of which should save you time and money, as well as improve your vector search results.

### **How it works?**
LangChain indexing makes use of a record manager (`RecordManager`) that keeps track of document writes into the vector store.
When indexing content, hashes are computed for each document, and the following information is stored in the record manager:
- the document hash (hash of both page content and metadata)
- write time
- the source id – each document should include information in its metadata to allow us to determine the ultimate source of this document



In [None]:
# from langchain.indexes import SQLRecordManager, index
# from langchain_openai import OpenAIEmbeddings
# from langchain_community.vectorstores import Chroma

# embedding_model = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY) 

# db_connection = Chroma(
#     persist_directory="./chroma_db_", 
#     embedding_function=embedding_model
#     index_name="test_index"
# )