# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

> You do not need to run the following cells if you are running this notebook locally.

In [1]:
!pip install -qU langchain langchain-openai langchain-cohere rank_bm25

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/55.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m55.3/55.3 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/42.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.2/42.2 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m253.9/253.9 kB[0m [31m12.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m62.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m59.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/3.3 MB[0m [31m87.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m

We're also going to be leveraging [Qdrant's](https://qdrant.tech/documentation/frameworks/langchain/) (pronounced "Quadrant") VectorDB in "memory" mode (so we can leverage it locally in our colab environment).

In [2]:
!pip install -qU qdrant-client

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/306.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m306.6/306.6 kB[0m [31m9.8 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/2.5 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m2.5/2.5 MB[0m [31m124.6 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m64.3 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/319.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m319.7/319.7 kB[0m [31m22.4 MB/s[0m eta [36m0:00:00[0m
[?25h

We'll also provide our OpenAI key, as well as our Cohere API key.

In [3]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

Enter your OpenAI API Key:··········


In [4]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

Cohere API Key:··········


In [5]:
os.environ["RAGAS_APP_TOKEN"] = getpass.getpass("Please enter your Ragas API key!")

Please enter your Ragas API key!··········


In [6]:
import os
from langsmith import Client
from langchain.callbacks.tracers import LangChainTracer
from langchain.callbacks.manager import CallbackManager

# Set up LangSmith environment
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("Enter your LangChain API Key:")
os.environ["LANGCHAIN_PROJECT"] = "retriever_comparison"

# Create callback manager
langsmith_client = Client()
tracer = LangChainTracer(project_name="retriever_comparison")
callback_manager = CallbackManager([tracer])

Enter your LangChain API Key:··········


## Task 2: Data Collection and Preparation

We'll be using some reviews from the 4 movies in the John Wick franchise today to explore the different retrieval strategies.

These were obtained from IMDB, and are available in the [AIM Data Repository](https://github.com/AI-Maker-Space/DataRepository).

### Data Collection

We can simply `wget` these from GitHub.

You could use any review data you wanted in this step - just be careful to make sure your metadata is aligned with your choice.

In [7]:
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw1.csv -O john_wick_1.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw2.csv -O john_wick_2.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw3.csv -O john_wick_3.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw4.csv -O john_wick_4.csv

--2025-03-04 13:51:14--  https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw1.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.110.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19628 (19K) [text/plain]
Saving to: ‘john_wick_1.csv’


2025-03-04 13:51:14 (21.8 MB/s) - ‘john_wick_1.csv’ saved [19628/19628]

--2025-03-04 13:51:14--  https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw2.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14747 (14K) [text/plain]
Saving to: ‘john_wick_2.csv’


2025-03-04 13:51:15 (21.3 MB/s) - ‘john_wick_2.csv’

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

- Self-Query: Wants as much metadata as we can provide
- Time-weighted: Wants temporal data

> NOTE: While we're creating a temporal relationship based on when these movies came out for illustrative purposes, it needs to be clear that the "time-weighting" in the Time-weighted Retriever is based on when the document was *accessed* last - not when it was created.

In [8]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

documents = []

for i in range(1, 5):
  loader = CSVLoader(
      file_path=f"john_wick_{i}.csv",
      metadata_columns=["Review_Date", "Review_Title", "Review_Url", "Author", "Rating"]
  )

  movie_docs = loader.load()
  for doc in movie_docs:

    # Add the "Movie Title" (John Wick 1, 2, ...)
    doc.metadata["Movie_Title"] = f"John Wick {i}"

    # convert "Rating" to an `int`, if no rating is provided - assume 0 rating
    doc.metadata["Rating"] = int(doc.metadata["Rating"]) if doc.metadata["Rating"] else 0

    # newer movies have a more recent "last_accessed_at"
    doc.metadata["last_accessed_at"] = datetime.now() - timedelta(days=4-i)

  documents.extend(movie_docs)

Let's look at an example document to see if everything worked as expected!

In [9]:
documents[0]

Document(metadata={'source': 'john_wick_1.csv', 'row': 0, 'Review_Date': '6 May 2015', 'Review_Title': ' Kinetic, concise, and stylish; John Wick kicks ass.\n', 'Review_Url': '/review/rw3233896/?ref_=tt_urv', 'Author': 'lnvicta', 'Rating': 8, 'Movie_Title': 'John Wick 1', 'last_accessed_at': datetime.datetime(2025, 3, 1, 13, 51, 15, 794882)}, page_content=": 0\nReview: The best way I can describe John Wick is to picture Taken but instead of Liam Neeson it's Keanu Reeves and instead of his daughter it's his dog. That's essentially the plot of the movie. John Wick (Reeves) is out to seek revenge on the people who took something he loved from him. It's a beautifully simple premise for an action movie - when action movies get convoluted, they get bad i.e. A Good Day to Die Hard. John Wick gives the viewers what they want: Awesome action, stylish stunts, kinetic chaos, and a relatable hero to tie it all together. John Wick succeeds in its simplicity.")

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "JohnWick".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [10]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    documents,
    embeddings,
    location=":memory:",
    collection_name="JohnWick"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [11]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [12]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-3.5-turbo` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [13]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI()

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [14]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [15]:
naive_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Yes, people generally liked John Wick based on the positive reviews and high ratings given by various critics and viewers.'

In [16]:
naive_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

"Yes, there is a review with a rating of 10. Here is the URL to that review:\n\nReview Title: A Masterpiece & Brilliant Sequel\nURL: '/review/rw4854296/?ref_=tt_urv'"

In [17]:
naive_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'John Wick is a former hitman who comes out of retirement to seek vengeance against gangsters who killed his dog and took everything from him. This leads to a violent and action-packed journey where he faces off against various adversaries in his quest for retribution.'

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [18]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(documents)

We'll construct the same chain - only changing the retriever.

In [19]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [20]:
bm25_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"It seems that opinions on the John Wick movies are varied. Some people really enjoyed the first movie, praising its action sequences and unique world-building. However, others found the fourth installment to be lacking in substance and weaker compared to the previous films. So, it's safe to say that not everyone generally liked John Wick movies."

In [21]:
bm25_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

"I don't know."

In [22]:
bm25_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In John Wick, the main character John Wick, played by Keanu Reeves, goes on a vengeful killing spree after his dog, given to him by his deceased wife, is killed by some Russian gangsters. It is known for its intense and beautifully choreographed action sequences.'

It's not clear that this is better or worse - but the `I don't know` isn't great!

## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [23]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-english-v3.0")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [24]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [25]:
contextual_compression_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Yes, people generally liked John Wick.'

In [26]:
contextual_compression_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

"Yes, there is a review with a rating of 10. Here is the URL to that review:\n- Review URL: '/review/rw4854296/?ref_=tt_urv'"

In [27]:
contextual_compression_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In John Wick, after resolving his issues with the Russian mafia, John Wick is visited by mobster Santino D'Antonio, who asks him to kill his sister Gianna D'Antonio in Rome. When John completes the assignment, Santino puts a seven-million dollar contract on him, leading to professional killers coming after him. Wick promises to kill Santino, who is not protected by his marker anymore."

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [28]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [29]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [30]:
multi_query_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"Yes, people generally liked John Wick based on the reviews provided. The movie was praised for its slickness, brilliant action sequences, Keanu Reeves' performance, and overall entertainment value. Reviewers highlighted the fast-paced action, cool dialogue, intense violence, and impressive choreography, making it a highly recommended film for action buffs and general movie lovers."

In [31]:
multi_query_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

"I don't know if any reviews have a rating of 10."

In [32]:
multi_query_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In the John Wick movies, Keanu Reeves' character, John Wick, is a retired assassin who sets out on a path of vengeance and retribution after facing personal losses. In the first movie, he seeks revenge after gangsters kill his dog and steal his car. The sequel, John Wick 2, follows John as he is pulled back into the world of assassins to fulfill an old debt and faces off against numerous killers. The movies are known for their stylish action sequences, intricate choreography, and impressive fight scenes."

## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [33]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = documents
child_splitter = RecursiveCharacterTextSplitter(chunk_size=200)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [34]:
client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = Qdrant(
    collection_name="full_documents", embeddings=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

  parent_document_vectorstore = Qdrant(


Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [35]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [36]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [37]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [38]:
parent_document_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"People's opinions on John Wick seem to vary. Some really enjoy the movie and think it's the best in the series, while others find it horrible and criticize several aspects of it. So, it's safe to say that not everyone generally liked John Wick."

In [39]:
parent_document_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10 for the movie "John Wick 3". The URL to that review is: /review/rw4854296/?ref_=tt_urv.'

In [40]:
parent_document_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In John Wick, Keanu Reeves plays the character of John Wick, a retired assassin who comes out of retirement when someone kills his dog. In the second movie, John Wick's car is stolen, leading to a lot of carnage. He is then called upon to pay off an old debt by helping Ian McShane take over the Assassin's Guild, which takes him to various locations where he has to kill a large number of assassins."

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [41]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [42]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [43]:
ensemble_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Yes, people generally liked John Wick based on the reviews provided.'

In [44]:
ensemble_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10. Here is the URL to that review:\n- [A Masterpiece & Brilliant Sequel](/review/rw4854296/?ref_=tt_urv)'

In [45]:
ensemble_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In John Wick, an ex-hitman comes out of retirement to seek vengeance on the gangsters who killed his dog and took everything from him. The story revolves around his quest for revenge and the violent actions he takes to achieve it.'

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

> NOTE: You do not need to run this cell if you're running this locally

In [46]:
!pip install -qU langchain_experimental

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/209.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m209.2/209.2 kB[0m [31m6.8 MB/s[0m eta [36m0:00:00[0m
[?25h

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [47]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [48]:
semantic_documents = semantic_chunker.split_documents(documents)

Let's create a new vector store.

In [49]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="JohnWickSemantic"
)

We'll use naive retrieval for this example.

In [50]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [51]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [52]:
semantic_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'People generally liked John Wick based on the reviews provided.'

In [53]:
semantic_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10 for the movie "John Wick 3." Here is the URL to that review: \'/review/rw4854296/?ref_=tt_urv\'.'

In [54]:
semantic_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In John Wick, the main character is seeking revenge on the people who took something he loved from him - his dog. This sets off a chain of events where John Wick is dragged into a series of chaotic and violent encounters with those responsible for his loss.'

# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.

**Loading all necessary packages**

In [55]:
!pip install -qU ragas==0.2.10

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/175.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m175.7/175.7 kB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/45.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/71.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m71.1/71.1 kB[0m [31m6.1 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/485.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m485.4/485.4 kB[0m [31m25.9 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [56]:
!pip install -qU langchain-community==0.3.14 langchain-openai==0.2.14 unstructured==0.16.12 langgraph==0.2.61 langchain-qdrant==0.2.0

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/981.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m972.8/981.5 kB[0m [31m32.3 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m981.5/981.5 kB[0m [31m22.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m75.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.9/50.9 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m82.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m137.2/137.2 kB[0m [31m12.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.4/45.4 kB[0m [31m4.3 MB/s[0m e

**Generating Synthetic Test Data**

In [57]:
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o-mini"))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

**Instantiate our Knowledge Graph.**

In [58]:
from ragas.testset import TestsetGenerator

generator = TestsetGenerator(llm=generator_llm, embedding_model=generator_embeddings)
dataset = generator.generate_with_langchain_docs(documents, testset_size=10)

Applying SummaryExtractor:   0%|          | 0/44 [00:00<?, ?it/s]

Applying CustomNodeFilter:   0%|          | 0/100 [00:00<?, ?it/s]



Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:   0%|          | 0/244 [00:00<?, ?it/s]

Applying OverlapScoreBuilder:   0%|          | 0/1 [00:00<?, ?it/s]

Generating personas:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Scenarios:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Samples:   0%|          | 0/10 [00:00<?, ?it/s]

In [59]:
dataset.to_pandas()

Unnamed: 0,user_input,reference_contexts,reference,synthesizer_name
0,Liam Neeson in John Wick?,[: 0\nReview: The best way I can describe John...,"Liam Neeson is not in John Wick; instead, Kean...",single_hop_specifc_query_synthesizer
1,What is the general public's opinion on John W...,[: 2\nReview: With the fourth installment scor...,The general public seems to love the John Wick...,single_hop_specifc_query_synthesizer
2,Who is Chad Stahleski?,[: 3\nReview: John wick has a very simple reve...,"Chad Stahelski is the director of John Wick, k...",single_hop_specifc_query_synthesizer
3,What role do Russian mobsters play in John Wic...,[: 4\nReview: Though he no longer has a taste ...,"In John Wick's story, Russian mobsters are the...",single_hop_specifc_query_synthesizer
4,Who is the Bogeyman in the context of John Wick?,[: 5\nReview: Ultra-violent first entry with l...,"In the context of John Wick, the Bogeyman refe...",single_hop_specifc_query_synthesizer
5,What consequences does John face in 'John Wick...,[<1-hop>\n\n: 24\nReview: John Wick: Chapter 3...,"In 'John Wick: Chapter 3 - Parabellum', John f...",multi_hop_specific_query_synthesizer
6,How does the evolution of John Wick's characte...,[<1-hop>\n\n: 0\nReview: It is 5 years since t...,"In 'Parabellum', John Wick's character evolves...",multi_hop_specific_query_synthesizer
7,Why was John Wick 3 a disappointment compared ...,"[<1-hop>\n\n: 16\nReview: Ok, so I got back fr...","John Wick 3 was a disappointment because, desp...",multi_hop_specific_query_synthesizer
8,What are the contrasting reviews of John Wick:...,[<1-hop>\n\n: 13\nReview: Following on from tw...,The reviews of John Wick: Chapter 3 - Parabell...,multi_hop_specific_query_synthesizer
9,In what ways does John Wick Chapter 2 maintain...,[<1-hop>\n\n: 23\nReview: I love me a bit of t...,John Wick Chapter 2 maintains its status as pu...,multi_hop_specific_query_synthesizer


In [60]:
def chain_evaluation(dataset,chain, tag):
    from ragas import evaluate
    from ragas.llms import LangchainLLMWrapper
    from ragas import EvaluationDataset

    from ragas.metrics import LLMContextRecall, Faithfulness, FactualCorrectness, ResponseRelevancy, ContextEntityRecall, NoiseSensitivity
    from ragas import evaluate, RunConfig

    for test_row in dataset:
        response = chain.invoke({"question" : test_row.eval_sample.user_input}, {"tags": [tag]})
        test_row.eval_sample.response = response["response"].content
        test_row.eval_sample.retrieved_contexts = [context.page_content for context in response["context"]]


    evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o-mini"))

    evaluation_dataset = EvaluationDataset.from_pandas(dataset.to_pandas())

    custom_run_config = RunConfig(timeout=360)

    result = evaluate(
        dataset= evaluation_dataset,
        metrics=[LLMContextRecall(), Faithfulness(), FactualCorrectness(), ResponseRelevancy(), ContextEntityRecall(), NoiseSensitivity()],
        llm= evaluator_llm,
        run_config=custom_run_config
    )
    return result

In [61]:
naive_retriever = chain_evaluation(dataset,naive_retrieval_chain,'naive_retriever')
naive_retriever

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

{'context_recall': 0.8250, 'faithfulness': 0.6524, 'factual_correctness': 0.3570, 'answer_relevancy': 0.6787, 'context_entity_recall': 0.4279, 'noise_sensitivity_relevant': 0.2433}

In [62]:
bm25_result = chain_evaluation(dataset,bm25_retrieval_chain,'bm25_retriever')
bm25_result

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

{'context_recall': 0.7583, 'faithfulness': 0.5681, 'factual_correctness': 0.3950, 'answer_relevancy': 0.5546, 'context_entity_recall': 0.4448, 'noise_sensitivity_relevant': 0.2167}

In [63]:
contextual_compression_result = chain_evaluation(dataset,contextual_compression_retrieval_chain,'contextual_compression_retriever')
contextual_compression_result

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

{'context_recall': 0.7000, 'faithfulness': 0.7162, 'factual_correctness': 0.4200, 'answer_relevancy': 0.6747, 'context_entity_recall': 0.4636, 'noise_sensitivity_relevant': 0.1485}

In [64]:
multi_query_result = chain_evaluation(dataset,multi_query_retrieval_chain, 'multi_query_retriever')
multi_query_result

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

ERROR:ragas.executor:Exception raised in Job[53]: TimeoutError()


{'context_recall': 0.8583, 'faithfulness': 0.7439, 'factual_correctness': 0.4190, 'answer_relevancy': 0.5787, 'context_entity_recall': 0.4562, 'noise_sensitivity_relevant': 0.3075}

In [65]:
parent_document_result = chain_evaluation(dataset,parent_document_retrieval_chain,'parent_document_retriever')
parent_document_result

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

{'context_recall': 0.7667, 'faithfulness': 0.7917, 'factual_correctness': 0.5120, 'answer_relevancy': 0.7667, 'context_entity_recall': 0.4398, 'noise_sensitivity_relevant': 0.3348}

In [66]:
ensemble_result = chain_evaluation(dataset,ensemble_retrieval_chain,'ensemble_retriever')
ensemble_result

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

ERROR:ragas.executor:Exception raised in Job[50]: AttributeError('StringIO' object has no attribute 'decomposed_claims')
ERROR:ragas.executor:Exception raised in Job[11]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[41]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[53]: TimeoutError()
ERROR:ragas.executor:Exception raised in Job[59]: TimeoutError()


{'context_recall': 0.8500, 'faithfulness': 0.9590, 'factual_correctness': 0.4922, 'answer_relevancy': 0.6794, 'context_entity_recall': 0.4358, 'noise_sensitivity_relevant': 0.3567}

In [67]:
semantic_result = chain_evaluation(dataset,semantic_retrieval_chain,'semantic_retriever')
semantic_result

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

{'context_recall': 0.8667, 'faithfulness': 0.7714, 'factual_correctness': 0.4060, 'answer_relevancy': 0.8666, 'context_entity_recall': 0.4336, 'noise_sensitivity_relevant': 0.3316}

In [68]:
from langsmith import Client

# Initialize the client
client = Client()

# Function to get runs by tag with corrected filter syntax
def get_runs_by_tag(tag, project_name="retriever_comparison", limit=10):
    try:
        # List runs with the specified tag using the correct filter syntax
        runs = client.list_runs(
            project_name=project_name,
            filter=f'has(tags, "{tag}")'
        )
        return list(runs)  # Convert iterator to list
    except Exception as e:
        print(f"Error retrieving runs: {str(e)}")
        return []

# Get runs for each retriever type
retriever_types = ["naive_retriever", "bm25_retriever", "contextual_compression_retriever",
                  "multi_query_retriever", "parent_document_retriever",
                  "ensemble_retriever", "semantic_retriever"]

# Create a dictionary to store results
results = {}

for tag in retriever_types:
    runs = get_runs_by_tag(tag)

    if runs:

        # Calculate average latency and total cost
        total_latency = 0
        total_cost = 0
        count = 0

        for run in runs:
            # Only compute latency if both timestamps are available
            if run.start_time is not None and run.end_time is not None:
                latency = (run.end_time - run.start_time).total_seconds()
                total_latency += latency
                count += 1
            else:
                print(f"Skipping run {run.id} due to missing time info.")

            # Extract cost if available
            if hasattr(run, 'total_cost') and run.total_cost is not None:
                total_cost += float(run.total_cost)


        # Calculate averages
        avg_latency = total_latency / count if count > 0 else 0
        avg_cost = total_cost / count if count > 0 else 0

        # Store results
        results[tag] = {
            'avg_latency': avg_latency,
            'total_cost': total_cost,
            'avg_cost': avg_cost,
            'run_count': count
        }
        print("Metrics for ", tag)
        print(f"Average Latency: {avg_latency:.2f} seconds")
        print(f"Total Cost: ${total_cost:.6f}")
        print(f"Average Cost per Run: ${avg_cost:.6f}")
        print("-" * 85)
    else:
        print(f"No runs found with tag '{tag}'")

# Print summary table
print("\n--- RETRIEVER COMPARISON SUMMARY ---")
print(f"{'Retriever Type':<30} {'Avg Latency (s)':<15} {'Avg Cost ($)':<15} {'Total Cost ($)':<15} {'Run Count':<10}")
print("-" * 85)
for tag, data in results.items():
    print(f"{tag:<30} {data['avg_latency']:<15.2f} {data['avg_cost']:<15.6f} {data['total_cost']:<15.6f} {data['run_count']:<10}")

Metrics for  naive_retriever
Average Latency: 0.51 seconds
Total Cost: $0.294702
Average Cost per Run: $0.000526
-------------------------------------------------------------------------------------
Metrics for  bm25_retriever
Average Latency: 0.30 seconds
Total Cost: $0.116416
Average Cost per Run: $0.000208
-------------------------------------------------------------------------------------
Metrics for  contextual_compression_retriever
Average Latency: 0.53 seconds
Total Cost: $0.105104
Average Cost per Run: $0.000175
-------------------------------------------------------------------------------------
Metrics for  multi_query_retriever
Average Latency: 0.87 seconds
Total Cost: $0.410182
Average Cost per Run: $0.000488
-------------------------------------------------------------------------------------
Metrics for  parent_document_retriever
Average Latency: 0.45 seconds
Total Cost: $0.071600
Average Cost per Run: $0.000128
-----------------------------------------------------------


# FINAL ANALYSIS:


**1. naive_retriever**

**Performance:**
Good context recall (0.8250) and decent answer relevancy (0.6787).
However, factual correctness is on the lower side (0.3570).

**Latency & Cost:**
Moderate latency (0.51 s) and cost per run (≈$0.000526) – overall cost is moderate.

**Overall:**
A baseline method with moderate performance; improvements in factual correctness might be desired.


**2. bm25_retriever**

**Performance:**
Slightly lower context recall (0.7583) but improved factual correctness (0.3950) compared to naive.
Answer relevancy is somewhat lower (0.5546).

**Latency & Cost:**
Fastest response (0.30 s) and very low cost per run (≈$0.000208), resulting in the lowest total cost.

**Overall:**
Offers an excellent tradeoff when speed and cost are top priorities, although some performance aspects (like answer relevancy) might be compromised.

**3. contextual_compression_retriever**

**Performance:**

1. Lower context recall (0.7000) but high faithfulness (0.7162) and slightly better factual correctness (0.4200).

2. Maintains good answer relevancy (0.6747) and has the lowest noise sensitivity (0.1485), which can be an advantage.

**Latency & Cost:**

Moderate latency (0.53 s) with the lowest average cost per run (≈$0.000175) and total cost.

**Overall:**

Very cost effective with robust performance in faithfulness and factual correctness, though it sacrifices some recall.


**4. multi_query_retriever**

**Performance:**

1. Highest context recall (0.8583) and strong faithfulness (0.7439), though answer relevancy is lower (0.5787).

2. Factual correctness (0.4190) is comparable to others in the mid-range.

**Latency & Cost:**

Highest latency (0.87 s) and higher cost per run (≈$0.000488) leading to a high total cost.

**Overall:**

Excels in retrieving a wide context but at the expense of speed and increased cost, which might affect scalability.

**5. parent_document_retriever**

**Performance:**

Stands out in factual correctness (0.5120) and answer relevancy (0.7667) with strong faithfulness (0.7917).

Context recall is moderate (0.7667), while noise sensitivity is relatively higher (0.3348).

**Latency & Cost:**

1. Low latency (0.45 s) and the lowest average cost per run (≈$0.000128), resulting in the lowest total cost overall.

**Overall:**
Combines excellent quality metrics with fast responses and minimal cost—making it a very attractive option if your application values high-quality, efficient retrieval.


**6. ensemble_retriever**

**Performance:**

Exceptional faithfulness (0.9590) and strong context recall (0.8500), with respectable factual correctness (0.4922).
Answer relevancy is moderate (0.6794) and noise sensitivity is higher (0.3567).

**Latency & Cost:**

Highest latency (1.05 s) and moderate cost per run (≈$0.000433) with a total cost that is on the higher side.

**Overall:**

Prioritizes quality (especially faithfulness) over speed and cost. Suitable when correctness and reliability are critical, and higher latency is acceptable.

**7. semantic_retriever**

**Performance:**

1. Excellent context recall (0.8667) and outstanding answer relevancy (0.8666) – arguably the best in this dimension.
2. Faithfulness (0.7714) and factual correctness (0.4060) are competitive, though not the very highest.

**Latency & Cost:**

Very fast latency (0.44 s) and moderate cost per run (≈$0.000442), with a relatively low total cost.

**Overall:**
Balances strong performance (particularly in answer relevancy) with speed and reasonable cost, making it a compelling choice for many applications.

**Summary of Tradeoffs**

**Cost:**

1. Parent Document and Contextual Compression retrievers shine in cost effectiveness.

2. Moderate Cost: BM25 and Semantic retrievers offer very low to moderate cost.

3. Higher Cost: Multi-Query and Ensemble retrievers tend to be more expensive per run.


**Latency:**

**1. Fastest:**

1. BM25 and Semantic retrievers offer the quickest responses (around 0.30–0.44 s).

2. Moderate: Naive, Contextual Compression, and Parent Document are around 0.45–0.53 s.

**2. Slowest:**

Multi-Query and Ensemble have higher latencies (0.87–1.05 s).


**Quality dertermined by the metrics:**

**1. High Recall & Faithfulness:** Multi-Query and Ensemble excel in context recall and faithfulness, with Ensemble particularly boosting faithfulness.

**2. High Answer Relevancy:** Semantic retriever stands out in delivering highly relevant answers.

**Balanced Quality:** Parent Document retriever provides the best mix of factual correctness, relevancy, and faithfulness while remaining cost-effective and fast.

**Conclusion:**

**For Applications Prioritizing Speed and Low Cost:**

BM25 and Semantic retrievers are strong candidates.

**For Maximum Quality (especially Faithfulness):**

Ensemble retriever offers superior quality at the expense of latency and cost.

**For Overall Balanced Performance:**

Parent Document retriever provides excellent quality, speed, and minimal cost, making it the most well-rounded option among the ones compared.