# Explore Various Retrieval Strategies with Retrievers in LangChain

In [1]:
!pip install langchain==0.2.0
!pip install langchain-openai==0.1.7
!pip install langchain-community==0.2.0
!pip install langchain-huggingface==0.0.1

!pip install openai==1.55.3 httpx==0.27.2 --force-reinstall --quiet

Collecting langchain==0.2.0
  Downloading langchain-0.2.0-py3-none-any.whl.metadata (13 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain==0.2.0)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting langchain-core<0.3.0,>=0.2.0 (from langchain==0.2.0)
  Downloading langchain_core-0.2.43-py3-none-any.whl.metadata (6.2 kB)
Collecting langchain-text-splitters<0.3.0,>=0.2.0 (from langchain==0.2.0)
  Downloading langchain_text_splitters-0.2.4-py3-none-any.whl.metadata (2.3 kB)
Collecting langsmith<0.2.0,>=0.1.17 (from langchain==0.2.0)
  Downloading langsmith-0.1.147-py3-none-any.whl.metadata (14 kB)
Collecting tenacity<9.0.0,>=8.1.0 (from langchain==0.2.0)
  Downloading tenacity-8.5.0-py3-none-any.whl.metadata (1.2 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain==0.2.0)
  Downloading marshmallow-3.23.1-py3-none-any.whl.metadata (7.5 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->lang

In [2]:
!pip install langchain-chroma

Collecting langchain-chroma
  Downloading langchain_chroma-0.1.4-py3-none-any.whl.metadata (1.6 kB)
Collecting chromadb!=0.5.4,!=0.5.5,<0.6.0,>=0.4.0 (from langchain-chroma)
  Downloading chromadb-0.5.23-py3-none-any.whl.metadata (6.8 kB)
Collecting fastapi<1,>=0.95.2 (from langchain-chroma)
  Downloading fastapi-0.115.6-py3-none-any.whl.metadata (27 kB)
Collecting build>=1.0.3 (from chromadb!=0.5.4,!=0.5.5,<0.6.0,>=0.4.0->langchain-chroma)
  Downloading build-1.2.2.post1-py3-none-any.whl.metadata (6.5 kB)
Collecting chroma-hnswlib==0.7.6 (from chromadb!=0.5.4,!=0.5.5,<0.6.0,>=0.4.0->langchain-chroma)
  Downloading chroma_hnswlib-0.7.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (252 bytes)
Collecting uvicorn>=0.18.3 (from uvicorn[standard]>=0.18.3->chromadb!=0.5.4,!=0.5.5,<0.6.0,>=0.4.0->langchain-chroma)
  Downloading uvicorn-0.34.0-py3-none-any.whl.metadata (6.5 kB)
Collecting posthog>=2.4.0 (from chromadb!=0.5.4,!=0.5.5,<0.6.0,>=0.4.0->langchain-chroma)
  Do

In [4]:
import os
from google.colab import userdata



os.environ['OPENAI_API_KEY'] = userdata.get('OPEN_API_KEY')
os.environ['HUGGINGFACEHUB_API_TOKEN'] = userdata.get('HF_TOKEN')

In [5]:
from langchain_openai import OpenAIEmbeddings

# details here: https://openai.com/blog/new-embedding-models-and-api-updates
openai_embed_model = OpenAIEmbeddings(model='text-embedding-3-small')

In [6]:
docs = [
 'Quantum mechanics describes the behavior of very small particles.',
 'Photosynthesis is the process by which green plants make food using sunlight.',
 'Artificial Intelligence aims to create machines that can think and learn.',
 'The pyramids of Egypt are historical monuments that have stood for thousands of years.',
 'New Delhi is the capital of India and the seat of all three branches of the Government of India.',
 'Biology is the study of living organisms and their interactions with the environment.',
 'Music therapy can aid in the mental well-being of individuals.',
 'Mumbai is the financial capital and the most populous city of India. It is the financial, commercial, and entertainment capital of South Asia.',
 'The Milky Way is just one of billions of galaxies in the universe.',
 'Economic theories help understand the distribution of resources in society.',
 'Kolkata is the de facto cultural capital of India and a historically and culturally significant city. Calcutta served as the de facto capital of India until 1911.',
 'Yoga is an ancient practice that involves physical postures and meditation.'
]

In [7]:
from langchain_chroma import Chroma

# create vector DB of docs and embeddings - takes 1 min on Colab
chroma_db = Chroma.from_texts(texts=docs, collection_name='db_docs',
                              # need to set the distance function to cosine else it uses euclidean by default
                              # check https://docs.trychroma.com/guides#changing-the-distance-function
                              collection_metadata={"hnsw:space": "cosine"},
                              embedding=openai_embed_model)

## Vector Database Retrievers

Here we will explore the following retrieval strategies on our Vector Database:

- Similarity or Ranking based Retrieval
- Similarity with Threshold Retrieval
- Custom Retriever with Similarity Scores + Thresholding
- Multi Query Retrieval
- Contextual Compression Retrieval
- Ensemble Retrieval

### Similarity or Ranking based Retrieval

We use cosine similarity here and retrieve the top 3 similar documents based on the user input query

In [8]:
similarity_retriever = chroma_db.as_retriever(search_type="similarity",
                                              search_kwargs={"k": 3})

In [9]:
query = "what is the capital of India?"
top3_docs = similarity_retriever.invoke(query)
top3_docs

[Document(page_content='New Delhi is the capital of India and the seat of all three branches of the Government of India.'),
 Document(page_content='Mumbai is the financial capital and the most populous city of India. It is the financial, commercial, and entertainment capital of South Asia.'),
 Document(page_content='Kolkata is the de facto cultural capital of India and a historically and culturally significant city. Calcutta served as the de facto capital of India until 1911.')]

In [10]:
query = "what is the old capital of India?"
top3_docs = similarity_retriever.invoke(query)
top3_docs

[Document(page_content='Kolkata is the de facto cultural capital of India and a historically and culturally significant city. Calcutta served as the de facto capital of India until 1911.'),
 Document(page_content='New Delhi is the capital of India and the seat of all three branches of the Government of India.'),
 Document(page_content='Mumbai is the financial capital and the most populous city of India. It is the financial, commercial, and entertainment capital of South Asia.')]

In [11]:
query = "how do plants make food?"
top3_docs = similarity_retriever.invoke(query)
top3_docs

[Document(page_content='Photosynthesis is the process by which green plants make food using sunlight.'),
 Document(page_content='Biology is the study of living organisms and their interactions with the environment.'),
 Document(page_content='Artificial Intelligence aims to create machines that can think and learn.')]

### maximum marginal relevance ranking

In [12]:
mmr_retriever = chroma_db.as_retriever(search_type="mmr",
                                       search_kwargs={"k": 3,
                                                      'fetch_k': 10})

In [13]:
query = "what is the capital of India?"
top3_docs = mmr_retriever.invoke(query)
top3_docs

[Document(page_content='New Delhi is the capital of India and the seat of all three branches of the Government of India.'),
 Document(page_content='Mumbai is the financial capital and the most populous city of India. It is the financial, commercial, and entertainment capital of South Asia.'),
 Document(page_content='Kolkata is the de facto cultural capital of India and a historically and culturally significant city. Calcutta served as the de facto capital of India until 1911.')]

In [14]:
query = "what is the old capital of India?"
top3_docs = mmr_retriever.invoke(query)
top3_docs

[Document(page_content='Kolkata is the de facto cultural capital of India and a historically and culturally significant city. Calcutta served as the de facto capital of India until 1911.'),
 Document(page_content='New Delhi is the capital of India and the seat of all three branches of the Government of India.'),
 Document(page_content='Yoga is an ancient practice that involves physical postures and meditation.')]

In [15]:
query = "how do plants make food?"
top3_docs = mmr_retriever.invoke(query)
top3_docs

[Document(page_content='Photosynthesis is the process by which green plants make food using sunlight.'),
 Document(page_content='Artificial Intelligence aims to create machines that can think and learn.'),
 Document(page_content='Economic theories help understand the distribution of resources in society.')]

### Similarity with Threshold Retrieval

We use cosine similarity here and retrieve the top 3 similar documents based on the user input query and also introduce a cutoff to not return any documents which are below a certain similarity threshold

In [16]:
similarity_threshold_retriever = chroma_db.as_retriever(search_type="similarity_score_threshold",
                                                        search_kwargs={"k": 3,
                                                                       "score_threshold": 0.3})

In [17]:
query = "what is the capital of India?"
top3_docs = similarity_threshold_retriever.invoke(query)
top3_docs

[Document(page_content='New Delhi is the capital of India and the seat of all three branches of the Government of India.'),
 Document(page_content='Mumbai is the financial capital and the most populous city of India. It is the financial, commercial, and entertainment capital of South Asia.'),
 Document(page_content='Kolkata is the de facto cultural capital of India and a historically and culturally significant city. Calcutta served as the de facto capital of India until 1911.')]

In [18]:
query = "what is the old capital of India?"
top3_docs = similarity_threshold_retriever.invoke(query)
top3_docs

[Document(page_content='Kolkata is the de facto cultural capital of India and a historically and culturally significant city. Calcutta served as the de facto capital of India until 1911.'),
 Document(page_content='New Delhi is the capital of India and the seat of all three branches of the Government of India.'),
 Document(page_content='Mumbai is the financial capital and the most populous city of India. It is the financial, commercial, and entertainment capital of South Asia.')]

In [19]:
query = "how do plants make food?"
top3_docs = similarity_threshold_retriever.invoke(query)
top3_docs

[Document(page_content='Photosynthesis is the process by which green plants make food using sunlight.')]

### Custom Retriever with Similarity Scores + Thresholding

Here we will create a custom retriever which will:

- Retrieve documents with cosine distance
- Convert to similarity score and apply thresholding
- Return topk documents above a similarity threshold

In [20]:
query = 'how do plants make food?'
chroma_db.similarity_search_with_score(query, k=3)

[(Document(page_content='Photosynthesis is the process by which green plants make food using sunlight.'),
  0.3538433909416199),
 (Document(page_content='Biology is the study of living organisms and their interactions with the environment.'),
  0.8317490816116333),
 (Document(page_content='Artificial Intelligence aims to create machines that can think and learn.'),
  0.8765066266059875)]

In [21]:
chroma_db._select_relevance_score_fn()

In [22]:
# converts cosine distance to similarity; cosine_similarity = 1 - cosine_distance
cosine_sim = chroma_db._select_relevance_score_fn()
cosine_sim(0.35375)

0.64625

In [23]:
from typing import List
from langchain_core.documents import Document
from langchain_core.runnables import chain

@chain
def custom_retriever(query: str, topk=3, threshold_score=0.3) -> List[Document]:
    # get similarity conversion function (converts cosine distance to similarity)
    cosine_sim = chroma_db._select_relevance_score_fn()
    # get topk documents with lowest cosine distance
    docs, scores = zip(*chroma_db.similarity_search_with_score(query, k=topk))
    final_docs = []
    for doc, score in zip(docs, scores):
        # convert cosine distance to similarity
        score = cosine_sim(score)
        doc.metadata["score"] = round(score, 3)
        # check if score is above threshold
        if score > threshold_score:
            final_docs.append(doc)

    return final_docs

In [24]:
query = "what is the financial capital of India?"
top3_docs = custom_retriever.invoke(query, topk=3, threshold_score=0.51)
top3_docs

[Document(metadata={'score': 0.69}, page_content='Mumbai is the financial capital and the most populous city of India. It is the financial, commercial, and entertainment capital of South Asia.'),
 Document(metadata={'score': 0.54}, page_content='New Delhi is the capital of India and the seat of all three branches of the Government of India.')]

In [25]:
query = 'how do plants make food?'
top3_docs = custom_retriever.invoke(query, topk=3, threshold_score=0.5)
top3_docs

[Document(metadata={'score': 0.646}, page_content='Photosynthesis is the process by which green plants make food using sunlight.')]

### Multi Query Retrieval

Retrieval may produce different results with subtle changes in query wording, or if the embeddings do not capture the semantics of the data well. Prompt engineering / tuning is sometimes done to manually address these problems, but can be tedious.

The [`MultiQueryRetriever`](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.multi_query.MultiQueryRetriever.html) automates the process of prompt tuning by using an LLM to generate multiple queries from different perspectives for a given user input query. For each query, it retrieves a set of relevant documents and takes the unique union across all queries to get a larger set of potentially relevant documents.

In [26]:
from langchain_openai import ChatOpenAI

chatgpt = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

In [27]:
from langchain.retrievers.multi_query import MultiQueryRetriever
# Set logging for the queries
import logging

similarity_retriever = chroma_db.as_retriever(search_type="similarity",
                                              search_kwargs={"k":3})
mq_retriever = MultiQueryRetriever.from_llm(
    retriever=similarity_retriever,
    llm=chatgpt,
)

logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

In [28]:
query = "financial capital of India?"
docs = mq_retriever.invoke(query)
docs

INFO:langchain.retrievers.multi_query:Generated queries: ['What is the economic capital of India?', 'What city serves as the financial hub of India?', 'Which Indian city is known for its financial capital?']


[Document(page_content='Mumbai is the financial capital and the most populous city of India. It is the financial, commercial, and entertainment capital of South Asia.'),
 Document(page_content='Kolkata is the de facto cultural capital of India and a historically and culturally significant city. Calcutta served as the de facto capital of India until 1911.'),
 Document(page_content='New Delhi is the capital of India and the seat of all three branches of the Government of India.')]

In [29]:
query = "old capital of India?"
docs = mq_retriever.invoke(query)
docs

INFO:langchain.retrievers.multi_query:Generated queries: ['What was the former capital of India?', 'Which city used to be the capital of India in the past?', 'What was the historical capital of India?']


[Document(page_content='Kolkata is the de facto cultural capital of India and a historically and culturally significant city. Calcutta served as the de facto capital of India until 1911.'),
 Document(page_content='New Delhi is the capital of India and the seat of all three branches of the Government of India.'),
 Document(page_content='Mumbai is the financial capital and the most populous city of India. It is the financial, commercial, and entertainment capital of South Asia.')]

### Contextual Compression Retrieval

The information most relevant to a query may be buried in a document with a lot of irrelevant text. Passing that full document through your application can lead to more expensive LLM calls and poorer responses.

Contextual compression is meant to fix this. The idea is simple: instead of immediately returning retrieved documents as-is, you can compress them using the context of the given query, so that only the relevant information is returned.

This compression can happen in the form of:

- Remove parts of the content of retrieved documents which are not relevant to the query. This is done by extracting only relevant parts of the document to the given query

- Filter out documents which are not relevant to the given query but do not remove content from the document

Here we wrap our base cosine distance retriever with a `ContextualCompressionRetriever`. Then we'll add an `LLMChainExtractor`, which will iterate over the initially returned documents and extract from each only the content that is relevant to the query.

In [31]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

# Simple cosien distance based retriever
similarity_retriever = chroma_db.as_retriever(search_type = "similarity",
                                              search_kwargs= {"k":3})

# Extracts from each document only the content that is relevangt to the query
compressor = LLMChainExtractor.from_llm(chatgpt)

# retrieves the document similar to query and then applies compressor
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=similarity_retriever
)

In [32]:
query = "what is the financial capital of India?"
docs = compression_retriever.invoke(query)
docs

[Document(page_content='Mumbai is the financial capital of India.'),
 Document(page_content='New Delhi is the capital of India')]

In [33]:
query = "what is the old capital of India?"
docs = compression_retriever.invoke(query)
docs

[Document(page_content='Calcutta served as the de facto capital of India until 1911.')]

The `LLMChainFilter` is slightly simpler but more robust compressor that uses an LLM chain to decide which of the initially retrieved documents to filter out and which ones to return, without manipulating the document contents.

In [34]:
from langchain.retrievers.document_compressors import LLMChainFilter

#  decides which of the initially retrieved documents to filter out and which ones to return
_filter = LLMChainFilter.from_llm(llm=chatgpt)

# retrieves the documents similar to query and then applies the filter
compression_retriever = ContextualCompressionRetriever(
    base_compressor=_filter, base_retriever=similarity_retriever
)

In [35]:
query = "what is the financial capital of India?"
docs = compression_retriever.invoke(query)
docs

[Document(page_content='Mumbai is the financial capital and the most populous city of India. It is the financial, commercial, and entertainment capital of South Asia.')]

In [36]:
query = "what is the old capital of India?"
docs = compression_retriever.invoke(query)
docs

[Document(page_content='Kolkata is the de facto cultural capital of India and a historically and culturally significant city. Calcutta served as the de facto capital of India until 1911.')]

In [37]:
query = "how do plants live?"
docs = compression_retriever.invoke(query)
docs

[Document(page_content='Photosynthesis is the process by which green plants make food using sunlight.'),
 Document(page_content='Biology is the study of living organisms and their interactions with the environment.')]

### Ensemble Retrieval

The `EnsembleRetriever` takes a list of retrievers as input and ensemble the results of each of their retrievals and rerank the results based on the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

In [38]:
from langchain.retrievers import EnsembleRetriever

# simple cosine distance based retriever
similarity_retriever = chroma_db.as_retriever(search_type="similarity",
                                              search_kwargs={"k": 3})

# retrieves the documents similar to query and then applies the filter
compression_retriever = ContextualCompressionRetriever(
    base_compressor=_filter, base_retriever=similarity_retriever
)


ensemble_retriever = EnsembleRetriever(
    retrievers=[compression_retriever, similarity_retriever],
    weights=[0.7, 0.3]
)

In [39]:
query = "what is the financial capital of India?"
docs = ensemble_retriever.invoke(query)
docs

[Document(page_content='Mumbai is the financial capital and the most populous city of India. It is the financial, commercial, and entertainment capital of South Asia.'),
 Document(page_content='New Delhi is the capital of India and the seat of all three branches of the Government of India.'),
 Document(page_content='Kolkata is the de facto cultural capital of India and a historically and culturally significant city. Calcutta served as the de facto capital of India until 1911.')]

In [40]:
query = "how do plants live?"
docs = ensemble_retriever.invoke(query)
docs

[Document(page_content='Photosynthesis is the process by which green plants make food using sunlight.'),
 Document(page_content='Biology is the study of living organisms and their interactions with the environment.'),
 Document(page_content='Music therapy can aid in the mental well-being of individuals.')]