# Explore Various Retrieval Strategies with Retrievers in LangChain

## Install OpenAI, and LangChain dependencies

In [1]:
!pip install langchain==0.3.11
!pip install langchain-openai==0.2.12
!pip install langchain-community==0.3.11

Collecting langchain==0.3.11
  Downloading langchain-0.3.11-py3-none-any.whl.metadata (7.1 kB)
Collecting langsmith<0.3,>=0.1.17 (from langchain==0.3.11)
  Downloading langsmith-0.2.11-py3-none-any.whl.metadata (14 kB)
Collecting numpy<2,>=1.22.4 (from langchain==0.3.11)
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.0/61.0 kB[0m [31m523.2 kB/s[0m eta [36m0:00:00[0m
Downloading langchain-0.3.11-py3-none-any.whl (1.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m12.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading langsmith-0.2.11-py3-none-any.whl (326 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m326.9/326.9 kB[0m [31m22.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

## Install Chroma Vector DB and LangChain wrapper

In [2]:
!pip install langchain-chroma

Collecting langchain-chroma
  Downloading langchain_chroma-0.2.4-py3-none-any.whl.metadata (1.1 kB)
Collecting chromadb>=1.0.9 (from langchain-chroma)
  Downloading chromadb-1.0.11-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.9 kB)
Collecting fastapi==0.115.9 (from chromadb>=1.0.9->langchain-chroma)
  Downloading fastapi-0.115.9-py3-none-any.whl.metadata (27 kB)
Collecting uvicorn>=0.18.3 (from uvicorn[standard]>=0.18.3->chromadb>=1.0.9->langchain-chroma)
  Downloading uvicorn-0.34.2-py3-none-any.whl.metadata (6.5 kB)
Collecting posthog>=2.4.0 (from chromadb>=1.0.9->langchain-chroma)
  Downloading posthog-4.2.0-py2.py3-none-any.whl.metadata (3.0 kB)
Collecting onnxruntime>=1.14.1 (from chromadb>=1.0.9->langchain-chroma)
  Downloading onnxruntime-1.22.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.5 kB)
Collecting opentelemetry-api>=1.2.0 (from chromadb>=1.0.9->langchain-chroma)
  Downloading opentelemetry_api-1.33.1-py3-none-any.whl.me

## Setup Environment Variables

In [1]:
from google.colab import userdata
import os

os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')

### Open AI Embedding Models

LangChain enables us to access Open AI embedding models which include the newest models: a smaller and highly efficient `text-embedding-3-small` model, and a larger and more powerful `text-embedding-3-large` model.

In [2]:
from langchain_openai import OpenAIEmbeddings

# details here: https://openai.com/blog/new-embedding-models-and-api-updates
openai_embed_model = OpenAIEmbeddings(model='text-embedding-3-small')

## Vector Databases

One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding vectors, and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query. A vector database takes care of storing embedded data and performing vector search for you.

### Chroma Vector DB

[Chroma](https://docs.trychroma.com/getting-started) is a AI-native open-source vector database focused on developer productivity and happiness. Chroma is licensed under Apache 2.0.

### Create a Vector DB and persist on disk

Here we initialize a connection to a Chroma vector DB client, and also we want to save to disk, so we simply initialize the Chroma client and pass the directory where we want the data to be saved to.

In [3]:
docs = [
 'Quantum mechanics describes the behavior of very small particles.',
 'Photosynthesis is the process by which green plants make food using sunlight.',
 'Artificial Intelligence aims to create machines that can think and learn.',
 'The pyramids of Egypt are historical monuments that have stood for thousands of years.',
 'New Delhi is the capital of India and the seat of all three branches of the Government of India.',
 'Biology is the study of living organisms and their interactions with the environment.',
 'Music therapy can aid in the mental well-being of individuals.',
 'Mumbai is the financial capital and the most populous city of India. It is the financial, commercial, and entertainment capital of South Asia.',
 'The Milky Way is just one of billions of galaxies in the universe.',
 'Economic theories help understand the distribution of resources in society.',
 'Kolkata is the de facto cultural capital of India and a historically and culturally significant city. Calcutta served as the de facto capital of India until 1911.',
 'Yoga is an ancient practice that involves physical postures and meditation.'
]

In [4]:
from langchain_chroma import Chroma

# create vector DB of docs and embeddings - takes 1 min on Colab
chroma_db = Chroma.from_texts(texts=docs, collection_name='db_docs',
                              # need to set the distance function to cosine else it uses euclidean by default
                              # check https://docs.trychroma.com/guides#changing-the-distance-function
                              collection_metadata={"hnsw:space": "cosine"},
                              embedding=openai_embed_model)

## Vector Database Retrievers

Here we will explore the following retrieval strategies on our Vector Database:

- Similarity or Ranking based Retrieval
- Similarity with Threshold Retrieval
- Custom Retriever with Similarity Scores + Thresholding
- Multi Query Retrieval
- Contextual Compression Retrieval
- Ensemble Retrieval

### Similarity or Ranking based Retrieval

We use cosine similarity here and retrieve the top 3 similar documents based on the user input query

In [5]:
similarity_retriever = chroma_db.as_retriever(search_type="similarity",
                                              search_kwargs={"k": 3})

In [6]:
query = "what is the capital of India?"
top3_docs = similarity_retriever.invoke(query)
top3_docs

[Document(id='8a1468e5-600f-407c-93c2-51b5a69b4297', metadata={}, page_content='New Delhi is the capital of India and the seat of all three branches of the Government of India.'),
 Document(id='5eed60cd-6fb7-4dc1-bc38-434569b4f30e', metadata={}, page_content='Mumbai is the financial capital and the most populous city of India. It is the financial, commercial, and entertainment capital of South Asia.'),
 Document(id='d65532ee-523e-481d-ba73-6b260523d52a', metadata={}, page_content='Kolkata is the de facto cultural capital of India and a historically and culturally significant city. Calcutta served as the de facto capital of India until 1911.')]

In [7]:
query = "what is the old capital of India?"
top3_docs = similarity_retriever.invoke(query)
top3_docs

[Document(id='d65532ee-523e-481d-ba73-6b260523d52a', metadata={}, page_content='Kolkata is the de facto cultural capital of India and a historically and culturally significant city. Calcutta served as the de facto capital of India until 1911.'),
 Document(id='8a1468e5-600f-407c-93c2-51b5a69b4297', metadata={}, page_content='New Delhi is the capital of India and the seat of all three branches of the Government of India.'),
 Document(id='5eed60cd-6fb7-4dc1-bc38-434569b4f30e', metadata={}, page_content='Mumbai is the financial capital and the most populous city of India. It is the financial, commercial, and entertainment capital of South Asia.')]

We use maximum marginal relevance ranking here and retrieve the top 3 similar documents based on the user input query

In [8]:
mmr_retriever = chroma_db.as_retriever(search_type="mmr",
                                       search_kwargs={"k": 3,
                                                      'fetch_k': 10})

In [9]:
query = "what is the capital of India?"
top3_docs = mmr_retriever.invoke(query)
top3_docs

[Document(id='8a1468e5-600f-407c-93c2-51b5a69b4297', metadata={}, page_content='New Delhi is the capital of India and the seat of all three branches of the Government of India.'),
 Document(id='5eed60cd-6fb7-4dc1-bc38-434569b4f30e', metadata={}, page_content='Mumbai is the financial capital and the most populous city of India. It is the financial, commercial, and entertainment capital of South Asia.'),
 Document(id='d65532ee-523e-481d-ba73-6b260523d52a', metadata={}, page_content='Kolkata is the de facto cultural capital of India and a historically and culturally significant city. Calcutta served as the de facto capital of India until 1911.')]

In [10]:
query = "what is the old capital of India?"
top3_docs = mmr_retriever.invoke(query)
top3_docs

[Document(id='d65532ee-523e-481d-ba73-6b260523d52a', metadata={}, page_content='Kolkata is the de facto cultural capital of India and a historically and culturally significant city. Calcutta served as the de facto capital of India until 1911.'),
 Document(id='8a1468e5-600f-407c-93c2-51b5a69b4297', metadata={}, page_content='New Delhi is the capital of India and the seat of all three branches of the Government of India.'),
 Document(id='052f8757-67c7-45ca-b00e-fce1fe080f48', metadata={}, page_content='Yoga is an ancient practice that involves physical postures and meditation.')]

### Similarity with Threshold Retrieval

We use cosine similarity here and retrieve the top 3 similar documents based on the user input query and also introduce a cutoff to not return any documents which are below a certain similarity threshold

In [11]:
similarity_threshold_retriever = chroma_db.as_retriever(search_type="similarity_score_threshold",
                                                        search_kwargs={"k": 3,
                                                                       "score_threshold": 0.3})

In [12]:
query = "what is the capital of India?"
top3_docs = similarity_threshold_retriever.invoke(query)
top3_docs

[Document(id='8a1468e5-600f-407c-93c2-51b5a69b4297', metadata={}, page_content='New Delhi is the capital of India and the seat of all three branches of the Government of India.'),
 Document(id='5eed60cd-6fb7-4dc1-bc38-434569b4f30e', metadata={}, page_content='Mumbai is the financial capital and the most populous city of India. It is the financial, commercial, and entertainment capital of South Asia.'),
 Document(id='d65532ee-523e-481d-ba73-6b260523d52a', metadata={}, page_content='Kolkata is the de facto cultural capital of India and a historically and culturally significant city. Calcutta served as the de facto capital of India until 1911.')]

In [13]:
query = "what is the old capital of India?"
top3_docs = similarity_threshold_retriever.invoke(query)
top3_docs

[Document(id='d65532ee-523e-481d-ba73-6b260523d52a', metadata={}, page_content='Kolkata is the de facto cultural capital of India and a historically and culturally significant city. Calcutta served as the de facto capital of India until 1911.'),
 Document(id='8a1468e5-600f-407c-93c2-51b5a69b4297', metadata={}, page_content='New Delhi is the capital of India and the seat of all three branches of the Government of India.'),
 Document(id='5eed60cd-6fb7-4dc1-bc38-434569b4f30e', metadata={}, page_content='Mumbai is the financial capital and the most populous city of India. It is the financial, commercial, and entertainment capital of South Asia.')]

### Custom Retriever with Similarity Scores + Thresholding

Here we will create a custom retriever which will:

- Retrieve documents with cosine distance
- Convert to similarity score and apply thresholding
- Return topk documents above a similarity threshold

In [14]:
query = 'how do plants make food?'
chroma_db.similarity_search_with_score(query, k=3)

[(Document(id='f03d4131-d3fc-4271-8635-e454246bb006', metadata={}, page_content='Photosynthesis is the process by which green plants make food using sunlight.'),
  0.35381996631622314),
 (Document(id='160785e5-d0b6-4865-9c1c-cd2df3891071', metadata={}, page_content='Biology is the study of living organisms and their interactions with the environment.'),
  0.8317575454711914),
 (Document(id='2d3b2d00-1069-4df0-bdc8-43200f5c607f', metadata={}, page_content='Artificial Intelligence aims to create machines that can think and learn.'),
  0.8765017986297607)]

In [15]:
chroma_db._select_relevance_score_fn()

In [16]:
# converts cosine distance to similarity; cosine_similarity = 1 - cosine_distance
cosine_sim = chroma_db._select_relevance_score_fn()
cosine_sim(0.35375)

0.64625

In [17]:
from typing import List
from langchain_core.documents import Document
from langchain_core.runnables import chain

@chain
def custom_retriever(query: str, topk=3, threshold_score=0.3) -> List[Document]:
    # get similarity conversion function (converts cosine distance to similarity)
    cosine_sim = chroma_db._select_relevance_score_fn()
    # get topk documents with lowest cosine distance
    docs, scores = zip(*chroma_db.similarity_search_with_score(query, k=topk))
    final_docs = []
    for doc, score in zip(docs, scores):
        # convert cosine distance to similarity
        score = cosine_sim(score)
        doc.metadata["score"] = round(score, 3)
        # check if score is above threshold
        if score > threshold_score:
            final_docs.append(doc)

    return final_docs

In [18]:
query = "what is the financial capital of India?"
top3_docs = custom_retriever.invoke(query, topk=3, threshold_score=0.51)
top3_docs

[Document(id='5eed60cd-6fb7-4dc1-bc38-434569b4f30e', metadata={'score': 0.69}, page_content='Mumbai is the financial capital and the most populous city of India. It is the financial, commercial, and entertainment capital of South Asia.'),
 Document(id='8a1468e5-600f-407c-93c2-51b5a69b4297', metadata={'score': 0.54}, page_content='New Delhi is the capital of India and the seat of all three branches of the Government of India.')]

In [19]:
query = 'how do plants make food?'
top3_docs = custom_retriever.invoke(query, topk=3, threshold_score=0.5)
top3_docs

[Document(id='f03d4131-d3fc-4271-8635-e454246bb006', metadata={'score': 0.646}, page_content='Photosynthesis is the process by which green plants make food using sunlight.')]

### Multi Query Retrieval

Retrieval may produce different results with subtle changes in query wording, or if the embeddings do not capture the semantics of the data well. Prompt engineering / tuning is sometimes done to manually address these problems, but can be tedious.

The [`MultiQueryRetriever`](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.multi_query.MultiQueryRetriever.html) automates the process of prompt tuning by using an LLM to generate multiple queries from different perspectives for a given user input query. For each query, it retrieves a set of relevant documents and takes the unique union across all queries to get a larger set of potentially relevant documents.

In [20]:
from langchain_openai import ChatOpenAI

chatgpt = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)

In [21]:
from langchain.retrievers.multi_query import MultiQueryRetriever
# Set logging for the queries
import logging

similarity_retriever = chroma_db.as_retriever(search_type="similarity",
                                              search_kwargs={"k": 3})

mq_retriever = MultiQueryRetriever.from_llm(
    retriever=similarity_retriever, llm=chatgpt
)

logging.basicConfig()
# so we can see what queries are generated by the LLM
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

In [22]:
query = "financial capital of India?"
docs = mq_retriever.invoke(query)
docs

INFO:langchain.retrievers.multi_query:Generated queries: ['What is the capital city of India in terms of its financial significance?  ', 'Which city in India is considered the financial capital?  ', 'Can you tell me about the financial hub of India?']


[Document(id='5eed60cd-6fb7-4dc1-bc38-434569b4f30e', metadata={}, page_content='Mumbai is the financial capital and the most populous city of India. It is the financial, commercial, and entertainment capital of South Asia.'),
 Document(id='8a1468e5-600f-407c-93c2-51b5a69b4297', metadata={}, page_content='New Delhi is the capital of India and the seat of all three branches of the Government of India.'),
 Document(id='d65532ee-523e-481d-ba73-6b260523d52a', metadata={}, page_content='Kolkata is the de facto cultural capital of India and a historically and culturally significant city. Calcutta served as the de facto capital of India until 1911.')]

In [23]:
query = "old capital of India?"
docs = mq_retriever.invoke(query)
docs

INFO:langchain.retrievers.multi_query:Generated queries: ['What was the former capital of India?  ', "Which city served as India's capital before the current one?  ", 'Can you tell me the historical capital of India?']


[Document(id='d65532ee-523e-481d-ba73-6b260523d52a', metadata={}, page_content='Kolkata is the de facto cultural capital of India and a historically and culturally significant city. Calcutta served as the de facto capital of India until 1911.'),
 Document(id='8a1468e5-600f-407c-93c2-51b5a69b4297', metadata={}, page_content='New Delhi is the capital of India and the seat of all three branches of the Government of India.'),
 Document(id='5eed60cd-6fb7-4dc1-bc38-434569b4f30e', metadata={}, page_content='Mumbai is the financial capital and the most populous city of India. It is the financial, commercial, and entertainment capital of South Asia.')]

### Contextual Compression Retrieval

The information most relevant to a query may be buried in a document with a lot of irrelevant text. Passing that full document through your application can lead to more expensive LLM calls and poorer responses.

Contextual compression is meant to fix this. The idea is simple: instead of immediately returning retrieved documents as-is, you can compress them using the context of the given query, so that only the relevant information is returned.

This compression can happen in the form of:

- Remove parts of the content of retrieved documents which are not relevant to the query. This is done by extracting only relevant parts of the document to the given query

- Filter out documents which are not relevant to the given query but do not remove content from the document

Here we wrap our base cosine distance retriever with a `ContextualCompressionRetriever`. Then we'll add an `LLMChainExtractor`, which will iterate over the initially returned documents and extract from each only the content that is relevant to the query.

In [24]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

# simple cosine distance based retriever
similarity_retriever = chroma_db.as_retriever(search_type="similarity",
                                              search_kwargs={"k": 3})

# extracts from each document only the content that is relevant to the query
compressor = LLMChainExtractor.from_llm(llm=chatgpt)

# retrieves the documents similar to query and then applies the compressor
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=similarity_retriever
)

In [25]:
query = "what is the financial capital of India?"
docs = compression_retriever.invoke(query)
docs

[Document(metadata={}, page_content='Mumbai is the financial capital and the most populous city of India.')]

In [26]:
query = "what is the old capital of India?"
docs = compression_retriever.invoke(query)
docs

[Document(metadata={}, page_content='Calcutta served as the de facto capital of India until 1911.')]

The `LLMChainFilter` is slightly simpler but more robust compressor that uses an LLM chain to decide which of the initially retrieved documents to filter out and which ones to return, without manipulating the document contents.

In [27]:
from langchain.retrievers.document_compressors import LLMChainFilter

#  decides which of the initially retrieved documents to filter out and which ones to return
_filter = LLMChainFilter.from_llm(llm=chatgpt)

# retrieves the documents similar to query and then applies the filter
compression_retriever = ContextualCompressionRetriever(
    base_compressor=_filter, base_retriever=similarity_retriever
)

In [28]:
query = "what is the financial capital of India?"
docs = compression_retriever.invoke(query)
docs

[Document(id='5eed60cd-6fb7-4dc1-bc38-434569b4f30e', metadata={}, page_content='Mumbai is the financial capital and the most populous city of India. It is the financial, commercial, and entertainment capital of South Asia.')]

In [29]:
query = "what is the old capital of India?"
docs = compression_retriever.invoke(query)
docs

[Document(id='d65532ee-523e-481d-ba73-6b260523d52a', metadata={}, page_content='Kolkata is the de facto cultural capital of India and a historically and culturally significant city. Calcutta served as the de facto capital of India until 1911.')]

### Ensemble Retrieval

The `EnsembleRetriever` takes a list of retrievers as input and ensemble the results of each of their retrievals and rerank the results based on the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

In [30]:
from langchain.retrievers import EnsembleRetriever

# simple cosine distance based retriever
similarity_retriever = chroma_db.as_retriever(search_type="similarity",
                                              search_kwargs={"k": 3})

# retrieves the documents similar to query and then applies the filter
compression_retriever = ContextualCompressionRetriever(
    base_compressor=_filter, base_retriever=similarity_retriever
)


ensemble_retriever = EnsembleRetriever(
    retrievers=[compression_retriever, similarity_retriever],
    weights=[0.7, 0.3]
)

In [31]:
query = "what is the financial capital of India?"
docs = ensemble_retriever.invoke(query)
docs

[Document(id='5eed60cd-6fb7-4dc1-bc38-434569b4f30e', metadata={}, page_content='Mumbai is the financial capital and the most populous city of India. It is the financial, commercial, and entertainment capital of South Asia.'),
 Document(id='8a1468e5-600f-407c-93c2-51b5a69b4297', metadata={}, page_content='New Delhi is the capital of India and the seat of all three branches of the Government of India.'),
 Document(id='d65532ee-523e-481d-ba73-6b260523d52a', metadata={}, page_content='Kolkata is the de facto cultural capital of India and a historically and culturally significant city. Calcutta served as the de facto capital of India until 1911.')]

In [32]:
query = "how do plants live?"
docs = ensemble_retriever.invoke(query)
docs

[Document(id='f03d4131-d3fc-4271-8635-e454246bb006', metadata={}, page_content='Photosynthesis is the process by which green plants make food using sunlight.'),
 Document(id='160785e5-d0b6-4865-9c1c-cd2df3891071', metadata={}, page_content='Biology is the study of living organisms and their interactions with the environment.'),
 Document(id='4deb93cd-537d-4c1c-a06f-37e98e5f135a', metadata={}, page_content='Music therapy can aid in the mental well-being of individuals.')]

Other Retrieval Methods available in LangChain:

- Self Query Retrieval
- Hybrid Search Retrieval
- Parent Document Retrieval
- Reranker Retrieval

and more...