# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

> You do not need to run the following cells if you are running this notebook locally. 

In [None]:
#!pip install -qU langchain langchain-openai langchain-cohere rank_bm25

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/49.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.6/49.6 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/44.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.0/44.0 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/233.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m233.1/233.1 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m378.1/378.1 kB[0m [31m15.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m37.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m 

We're also going to be leveraging [Qdrant's](https://qdrant.tech/documentation/frameworks/langchain/) (pronounced "Quadrant") VectorDB in "memory" mode (so we can leverage it locally in our colab environment).

In [None]:
#!pip install -qU qdrant-client

We'll also provide our OpenAI key, as well as our Cohere API key.

In [10]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

In [11]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

## Task 2: Data Collection and Preparation

We'll be using some reviews from the 4 movies in the John Wick franchise today to explore the different retrieval strategies.

These were obtained from IMDB, and are available in the [AIM Data Repository](https://github.com/AI-Maker-Space/DataRepository).

### Data Collection

We can simply `wget` these from GitHub.

You could use any review data you wanted in this step - just be careful to make sure your metadata is aligned with your choice.

In [12]:
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw1.csv -O john_wick_1.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw2.csv -O john_wick_2.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw3.csv -O john_wick_3.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw4.csv -O john_wick_4.csv

--2025-03-03 19:30:42--  https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw1.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.111.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19628 (19K) [text/plain]
Saving to: ‘john_wick_1.csv’


2025-03-03 19:30:42 (5.99 MB/s) - ‘john_wick_1.csv’ saved [19628/19628]

--2025-03-03 19:30:42--  https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw2.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.109.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14747 (14K) [text/plain]
Saving to: ‘john_wick_2.csv’


2025-03-03 19:30:42 (5.21 MB/s) - ‘john_wick_2.csv’

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

- Self-Query: Wants as much metadata as we can provide
- Time-weighted: Wants temporal data

> NOTE: While we're creating a temporal relationship based on when these movies came out for illustrative purposes, it needs to be clear that the "time-weighting" in the Time-weighted Retriever is based on when the document was *accessed* last - not when it was created.

In [13]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

documents = []

for i in range(1, 5):
  loader = CSVLoader(
      file_path=f"john_wick_{i}.csv",
      metadata_columns=["Review_Date", "Review_Title", "Review_Url", "Author", "Rating"]
  )

  movie_docs = loader.load()
  for doc in movie_docs:

    # Add the "Movie Title" (John Wick 1, 2, ...)
    doc.metadata["Movie_Title"] = f"John Wick {i}"

    # convert "Rating" to an `int`, if no rating is provided - assume 0 rating
    doc.metadata["Rating"] = int(doc.metadata["Rating"]) if doc.metadata["Rating"] else 0

    # newer movies have a more recent "last_accessed_at"
    doc.metadata["last_accessed_at"] = datetime.now() - timedelta(days=4-i)

  documents.extend(movie_docs)

Let's look at an example document to see if everything worked as expected!

In [14]:
documents[0]

Document(metadata={'source': 'john_wick_1.csv', 'row': 0, 'Review_Date': '6 May 2015', 'Review_Title': ' Kinetic, concise, and stylish; John Wick kicks ass.\n', 'Review_Url': '/review/rw3233896/?ref_=tt_urv', 'Author': 'lnvicta', 'Rating': 8, 'Movie_Title': 'John Wick 1', 'last_accessed_at': datetime.datetime(2025, 2, 28, 19, 30, 43, 498329)}, page_content=": 0\nReview: The best way I can describe John Wick is to picture Taken but instead of Liam Neeson it's Keanu Reeves and instead of his daughter it's his dog. That's essentially the plot of the movie. John Wick (Reeves) is out to seek revenge on the people who took something he loved from him. It's a beautifully simple premise for an action movie - when action movies get convoluted, they get bad i.e. A Good Day to Die Hard. John Wick gives the viewers what they want: Awesome action, stylish stunts, kinetic chaos, and a relatable hero to tie it all together. John Wick succeeds in its simplicity.")

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "JohnWick".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [15]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    documents,
    embeddings,
    location=":memory:",
    collection_name="JohnWick"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [16]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [17]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-3.5-turbo` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [18]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI()

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [19]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [20]:
naive_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Yes, people generally liked the John Wick movies based on the positive reviews provided.'

In [21]:
naive_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

"Yes, there is a review with a rating of 10. Here is the URL to that review: '/review/rw4854296/?ref_=tt_urv'"

In [22]:
naive_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In the first John Wick movie, an ex-hitman comes out of retirement to seek vengeance against the gangsters who killed his dog and took everything from him. The story revolves around John Wick facing off against Russian mobsters who underestimate his lethal capacity, leading to intense action sequences and fights as he seeks retribution.'

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [23]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(documents)

We'll construct the same chain - only changing the retriever.

In [24]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [25]:
bm25_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"People's opinions on John Wick seem to vary. Some individuals really enjoy the movie, praising its action sequences, style, and simplicity. Others, however, find it lacking in substance, character development, and plot. Overall, the reviews are mixed, so it's difficult to definitively say whether people generally liked John Wick or not."

In [26]:
bm25_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

"I don't know."

In [27]:
bm25_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In John Wick, the action is beautifully choreographed and the setup is surprisingly emotional for an action flick. Keanu Reeves stars in the movie. If you love action or even just like it, you will have a great time watching this film.'

It's not clear that this is better or worse - but the `I don't know` isn't great!

## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [28]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-english-v3.0")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [29]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [30]:
contextual_compression_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"Based on the reviews provided, it seems that people generally liked John Wick. The movie was praised for its slickness, brilliant action sequences, and Keanu Reeves' performance as the titular character. Reviewers mentioned that it was a fun, violent, and entertaining film that kept viewers engaged throughout. Overall, it received positive feedback from those who watched it."

In [31]:
contextual_compression_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

"Yes, there is a review with a rating of 10. Here is the URL to that review: '/review/rw4854296/?ref_=tt_urv'"

In [32]:
contextual_compression_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In John Wick 2, after resolving his issues with the Russian mafia, John Wick is forced back into action by mobster Santino D'Antonio to honor a marker. When John completes the task, Santino puts a contract on him leading to a chase with assassins. Wick promises to kill Santino in retaliation."

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [33]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [34]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [35]:
multi_query_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"Based on the reviews provided, it seems that the majority of people enjoyed the John Wick movies, particularly the first one. The reviews praise the action sequences, Keanu Reeves' performance, and the overall entertainment value of the films. Therefore, it can be inferred that people generally liked John Wick."

In [36]:
multi_query_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

"Yes, there is a review with a rating of 10. Here is the URL to that review: '/review/rw4854296/?ref_=tt_urv'."

In [37]:
multi_query_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In John Wick, the main character, played by Keanu Reeves, is a retired assassin who comes out of retirement when someone kills his dog and steals his car. This sets off a series of events where he goes on a killing spree, embarking on a mission of vengeance and ultimately getting involved in the criminal underworld. He faces off against numerous adversaries and is tasked with killing specific targets to settle old debts and take over criminal organizations.'

## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [38]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = documents
child_splitter = RecursiveCharacterTextSplitter(chunk_size=200)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [39]:
client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = Qdrant(
    collection_name="full_documents", embeddings=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

  parent_document_vectorstore = Qdrant(


Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [40]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [41]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [42]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [43]:
parent_document_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"People's opinions on John Wick seem to be divided. Some reviewers like the series and find it consistent and well-received, while others criticize it for its plot and action sequences. Overall, it seems that there are mixed feelings about John Wick."

In [44]:
parent_document_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10 for the movie "John Wick 3." Here is the URL to that review: \n/review/rw4854296/?ref_=tt_urv'

In [45]:
parent_document_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In John Wick, Keanu Reeves plays the character of John Wick, a retired assassin who comes out of retirement after his dog is killed and his car is stolen, leading to a lot of carnage. Wick is then called on to pay off an old debt by helping Ian McShane take over the Assassin's Guild. Wick travels to Italy, Canada, and Manhattan, killing numerous assassins along the way."

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [46]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [47]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [48]:
ensemble_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the reviews provided, it seems that people generally liked John Wick. The action sequences, character development, and overall entertainment value received positive feedback from reviewers, indicating that the movie was well-received by a number of individuals.'

In [49]:
ensemble_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

"Yes, one review has a rating of 10. Here is the URL to that review: 'https://www.imdb.com/review/rw4854296/?ref_=tt_urv'"

In [50]:
ensemble_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In John Wick, an ex-hitman comes out of retirement to seek revenge on the gangsters who killed his dog and took everything from him. The story is filled with violence, action-packed shootouts, and breathtaking fights as he navigates through a world of killers and bounty-hunting assassins.'

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

> NOTE: You do not need to run this cell if you're running this locally

In [51]:
#!pip install -qU langchain_experimental

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [52]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [53]:
semantic_documents = semantic_chunker.split_documents(documents)

Let's create a new vector store.

In [54]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="JohnWickSemantic"
)

We'll use naive retrieval for this example.

In [55]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [56]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [57]:
semantic_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"Based on the reviews provided, it seems like most people enjoyed the John Wick movies. The majority of reviews are positive, praising the action sequences, the lead actor's performance, and the overall entertainment value of the films. Therefore, it can be inferred that people generally liked John Wick."

In [58]:
semantic_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10 for the movie "John Wick 3". Here is the URL to that review: \'/review/rw4854296/?ref_=tt_urv\'.'

In [59]:
semantic_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In "John Wick", the main character, played by Keanu Reeves, seeks revenge on the people who took something he loved from him. Specifically, his dog is killed, leading him to go on a path of vengeance against the individuals involved.'

# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.

In [60]:
### YOUR CODE HERE
##assignment 07 and 09  review

In [61]:
os.environ["RAGAS_APP_TOKEN"] = getpass.getpass("Please enter your Ragas API key!")

In [62]:

from uuid import uuid4

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("Enter your LangChain API Key:")
os.environ["LANGCHAIN_PROJECT"] = f"John Wick project - {uuid4().hex[0:8]}"

In [63]:
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from ragas.testset import TestsetGenerator

generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o"))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

In [64]:
generator = TestsetGenerator(llm=generator_llm, embedding_model=generator_embeddings)
dataset = generator.generate_with_langchain_docs(documents, testset_size=10)

Applying SummaryExtractor:   0%|          | 0/44 [00:00<?, ?it/s]

Applying CustomNodeFilter:   0%|          | 0/100 [00:00<?, ?it/s]

Node 0113f67a-bbfa-4b7e-9f6f-5ae3c2cbd5b1 does not have a summary. Skipping filtering.
Node 9e254ed3-2529-4311-902e-036e663dc2f7 does not have a summary. Skipping filtering.
Node 085a51b7-894b-403f-ad99-1103590b5644 does not have a summary. Skipping filtering.
Node 37876029-d4ab-4e16-9b19-bef77d8fcc99 does not have a summary. Skipping filtering.
Node 08131808-e56c-4aab-8eeb-938086c95cc8 does not have a summary. Skipping filtering.
Node e8a641f1-a26a-4664-bcfb-695d17cdab2d does not have a summary. Skipping filtering.
Node 46442a5a-5da2-436e-a992-0dc617b2ed40 does not have a summary. Skipping filtering.
Node 0ae51282-0fc2-4813-b497-980d1ee991ba does not have a summary. Skipping filtering.
Node 4c7e9dbc-90c5-4a8a-9d30-7e86529a6917 does not have a summary. Skipping filtering.
Node 1928fd15-1842-4e67-bb13-bb69be9e922e does not have a summary. Skipping filtering.
Node 18c02067-82d2-4195-8547-8f83f1e238d2 does not have a summary. Skipping filtering.
Node d39d9dde-d65b-40fb-a347-4b4953dd202b d

Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:   0%|          | 0/244 [00:00<?, ?it/s]

Applying OverlapScoreBuilder:   0%|          | 0/1 [00:00<?, ?it/s]

Generating personas:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Scenarios:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Samples:   0%|          | 0/10 [00:00<?, ?it/s]

In [65]:
dataset.to_pandas()

Unnamed: 0,user_input,reference_contexts,reference,synthesizer_name
0,What Keanu Reeves do in John Wick?,[: 0\nReview: The best way I can describe John...,"In John Wick, Keanu Reeves plays the role of J...",single_hop_specifc_query_synthesizer
1,Why everyone like John Wick movies so much?,[: 2\nReview: With the fourth installment scor...,The fourth installment of John Wick is scoring...,single_hop_specifc_query_synthesizer
2,Why John Wick movie so special even if story s...,[: 3\nReview: John wick has a very simple reve...,"John Wick is special because, despite its simp...",single_hop_specifc_query_synthesizer
3,Howw does Reeves perform in the moviee John Wick?,[: 4\nReview: Though he no longer has a taste ...,"Savvy, indestructible Reeves looks right at ho...",single_hop_specifc_query_synthesizer
4,What motivates John Wick to come out of retire...,[: 5\nReview: Ultra-violent first entry with l...,"In the original John Wick (2014), the ex-hit-m...",single_hop_specifc_query_synthesizer
5,What John Wick: Chapter 3 - Parabellum about a...,[<1-hop>\n\n: 24\nReview: John Wick: Chapter 3...,John Wick: Chapter 3 - Parabellum is about the...,multi_hop_specific_query_synthesizer
6,How does the cinematic execution of JOHN WICK ...,"[<1-hop>\n\n: 16\nReview: Ok, so I got back fr...","JOHN WICK is praised for its relentless, pulse...",multi_hop_specific_query_synthesizer
7,What are the key elements that make Keanu Reev...,[<1-hop>\n\n: 10\nReview: Wow what a great sur...,Keanu Reeves' portrayal of John Wick stands ou...,multi_hop_specific_query_synthesizer
8,Why Parabellum not exciting even with John Wic...,[<1-hop>\n\n: 2\nReview: The first three John ...,"Parabellum is not exciting because, despite Jo...",multi_hop_specific_query_synthesizer
9,How has the John Wick franchise set a new stan...,[<1-hop>\n\n: 18\nReview: Ever since the origi...,"The John Wick franchise, particularly since th...",multi_hop_specific_query_synthesizer


In [69]:
from ragas.llms import LangchainLLMWrapper

evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o"))

In [105]:
#Setup langsmith dataset

from langsmith import Client

langsmith_client = Client()
naive_dataset_name = "John Wick dataset"

langsmith_dataset = langsmith_client.create_dataset(
    dataset_name=naive_dataset_name,
    description="John Wick dataset"
)

In [106]:
#load data into langsmith dataset

for data_row in dataset.to_pandas().iterrows():
  langsmith_client.create_example(
      inputs={
          "question": data_row[1]["user_input"]
      },
      outputs={
          "answer": data_row[1]["reference"]
      },
      metadata={
          "context": data_row[1]["reference_contexts"]
      },
      dataset_id=langsmith_dataset.id
  )

In [None]:
#Setup qa evaluator

from langsmith.evaluation import LangChainStringEvaluator, evaluate
from langchain.chat_models import ChatOpenAI

def prepare_data(run, example):
    return {
        "prediction": run.outputs.get("result", ""),
        "reference": example.outputs.get("answer", ""),
        "input": example.inputs.get("query", ""),
    }

qa_evaluator = LangChainStringEvaluator("qa", config={"llm": evaluator_llm}, prepare_data=prepare_data)


In [107]:
#evaluate naive retrieval chain in langsmith
evaluate(
    naive_retrieval_chain.invoke,
    data=langsmith_dataset,
    evaluators=[qa_evaluator],
    metadata={"revision_id": "naive_retrieval_chain"},
    experiment_prefix="naive_retrieval_chain_init"  
)

View the evaluation results for experiment: 'naive_retrieval_chain_init-02fcab48' at:
https://smith.langchain.com/o/239bc38e-a472-4022-97f0-84cd498894d1/datasets/d994ba33-4c35-4de2-9861-3e23dee794dc/compare?selectedSessions=8a3ccf8a-a125-46a9-b80b-0939dcd6be66




0it [00:00, ?it/s]

Unnamed: 0,inputs.question,outputs.response,outputs.context,error,reference.answer,feedback.correctness,execution_time,example_id,id
0,How has the John Wick franchise set a new stan...,content='The John Wick franchise has set a new...,[page_content=': 11\nReview: JOHN WICK is a ra...,,"The John Wick franchise, particularly since th...",0.0,4.295606,b30d8102-1243-4c37-a164-dba5925d89af,14847194-abfe-4f62-be17-b87316ec0f05
1,Why Parabellum not exciting even with John Wic...,"content='In the movie ""John Wick 3: Parabellum...",[page_content=': 6\nReview: Assassin John Wick...,,"Parabellum is not exciting because, despite Jo...",0.0,1.358216,a159c9f7-759e-47c4-8abf-58ef1f4668b4,bc914af1-9b3d-4bed-8494-84930c740f86
2,What are the key elements that make Keanu Reev...,"content=""Keanu Reeves' portrayal of John Wick ...","[page_content=': 9\nReview: At first glance, J...",,Keanu Reeves' portrayal of John Wick stands ou...,1.0,1.704178,645522d1-3671-4f9c-9062-c02314f3114b,64e08567-8b26-4bd7-886a-1a6581d33975
3,How does the cinematic execution of JOHN WICK ...,"content=""I'm sorry, but I don't have the speci...",[page_content=': 0\nReview: It is 5 years sinc...,,"JOHN WICK is praised for its relentless, pulse...",0.0,1.415762,6157bd4a-fc86-4b44-b8d1-103187a02d50,0067ab3e-0e6f-4f5a-853a-c9b8a24b7855
4,What John Wick: Chapter 3 - Parabellum about a...,"content=""I don't know."" additional_kwargs={'re...",[page_content=': 0\nReview: It is 5 years sinc...,,John Wick: Chapter 3 - Parabellum is about the...,1.0,0.682761,c98f1e6f-dee8-4775-a602-6ec7631bf964,822c5e29-d2a4-4340-8581-6204b454c37d
5,What motivates John Wick to come out of retire...,content='John Wick comes out of retirement in ...,[page_content=': 5\nReview: Ultra-violent firs...,,"In the original John Wick (2014), the ex-hit-m...",0.0,2.351836,e74e6cbd-431f-4e3f-97e0-b8a3d05249b8,ae53b4bf-7cad-4243-afa1-993360e5a9f3
6,Howw does Reeves perform in the moviee John Wick?,content='Reeves performs exceptionally well in...,"[page_content=': 9\nReview: At first glance, J...",,"Savvy, indestructible Reeves looks right at ho...",0.0,1.592109,ccd67b28-5964-44ae-aa03-3645db7a25d3,9be43d30-f071-41e9-97d7-11206657fc59
7,Why John Wick movie so special even if story s...,"content=""The John Wick movie is special becaus...",[page_content=': 3\nReview: John wick has a ve...,,"John Wick is special because, despite its simp...",0.0,0.830751,9cb4cf94-ec69-461e-b6b2-fe6718a870a4,445cee33-5798-474e-8e1b-6a60996f155f
8,Why everyone like John Wick movies so much?,content='The John Wick movies are loved by man...,[page_content=': 20\nReview: John Wick is some...,,The fourth installment of John Wick is scoring...,0.0,1.752694,e4fcad63-a7c8-40be-b603-1c2b2c21759a,07b8a907-02ed-4ef0-a579-3c7ba4c763e8
9,What Keanu Reeves do in John Wick?,"content=""In John Wick, Keanu Reeves plays the ...",[page_content=': 19\nReview: If you've seen th...,,"In John Wick, Keanu Reeves plays the role of J...",,1.19157,51bee111-6296-4de0-b02b-29222e27be93,44a9ff8b-1743-48be-acdc-f8a07f69bd5e


In [108]:

#evaluate bm25 retrieval chain in langsmith
evaluate(
    bm25_retrieval_chain.invoke,
    data=langsmith_dataset,
    evaluators=[qa_evaluator],
    metadata={"revision_id": "bm25_retrieval_chain"},
    experiment_prefix="bm25_retrieval_chain_init"  
)

View the evaluation results for experiment: 'bm25_retrieval_chain_init-3132c0f7' at:
https://smith.langchain.com/o/239bc38e-a472-4022-97f0-84cd498894d1/datasets/d994ba33-4c35-4de2-9861-3e23dee794dc/compare?selectedSessions=3de60d79-e1fa-4962-a200-1e4c08f15303




0it [00:00, ?it/s]

Unnamed: 0,inputs.question,outputs.response,outputs.context,error,reference.answer,feedback.correctness,execution_time,example_id,id
0,How has the John Wick franchise set a new stan...,content='The John Wick franchise has set a new...,[page_content=': 18\nReview: Ever since the or...,,"The John Wick franchise, particularly since th...",0.0,1.125163,b30d8102-1243-4c37-a164-dba5925d89af,2acaa36d-869f-4529-8b99-7cfb02def543
1,Why Parabellum not exciting even with John Wic...,content='The reviewer mentioned that the lack ...,[page_content=': 6\nReview: Assassin John Wick...,,"Parabellum is not exciting because, despite Jo...",0.0,0.666182,a159c9f7-759e-47c4-8abf-58ef1f4668b4,528bd3d1-c7fa-4b43-94ec-ff19e52a3b85
2,What are the key elements that make Keanu Reev...,"content=""Keanu Reeves' portrayal of John Wick ...",[page_content=': 7\nReview: John Wick (2014) i...,,Keanu Reeves' portrayal of John Wick stands ou...,1.0,1.585393,645522d1-3671-4f9c-9062-c02314f3114b,415b3895-25d8-438c-8206-eb4e6c7ecc7f
3,How does the cinematic execution of JOHN WICK ...,"content=""I don't know the specific details of ...","[page_content=': 16\nReview: Ok, so I got back...",,"JOHN WICK is praised for its relentless, pulse...",0.0,0.732518,6157bd4a-fc86-4b44-b8d1-103187a02d50,d8f1660d-3382-443d-b380-037b410b5388
4,What John Wick: Chapter 3 - Parabellum about a...,"content=""John Wick: Chapter 3 - Parabellum is ...",[page_content=': 24\nReview: John Wick: Chapte...,,John Wick: Chapter 3 - Parabellum is about the...,1.0,0.91165,c98f1e6f-dee8-4775-a602-6ec7631bf964,d0690c2c-7bd0-409f-9f2e-0225e8b38034
5,What motivates John Wick to come out of retire...,"content=""John Wick comes out of retirement in ...",[page_content=': 5\nReview: Ultra-violent firs...,,"In the original John Wick (2014), the ex-hit-m...",1.0,0.552351,e74e6cbd-431f-4e3f-97e0-b8a3d05249b8,bf60a7fe-4788-4e4d-96e0-70884f6f43ef
6,Howw does Reeves perform in the moviee John Wick?,content='Reeves performs exceptionally well in...,[page_content=': 19\nReview: John Wick: Chapte...,,"Savvy, indestructible Reeves looks right at ho...",0.0,0.53298,ccd67b28-5964-44ae-aa03-3645db7a25d3,b5e9a282-6c83-4bb6-a6ff-7d8c33f1a7c8
7,Why John Wick movie so special even if story s...,content='John Wick movies are special because ...,[page_content=': 22\nReview: Lets contemplate ...,,"John Wick is special because, despite its simp...",0.0,0.7109,9cb4cf94-ec69-461e-b6b2-fe6718a870a4,04b7487c-1f32-42e7-b61d-9f72bed0119b
8,Why everyone like John Wick movies so much?,content='People like John Wick movies so much ...,[page_content=': 16\nReview: John Wick 3 is wi...,,The fourth installment of John Wick is scoring...,0.0,0.836647,e4fcad63-a7c8-40be-b603-1c2b2c21759a,964e3172-90ed-4c5c-a97e-4419dc7dc24b
9,What Keanu Reeves do in John Wick?,"content='In John Wick 4, Keanu Reeves engages ...",[page_content=': 19\nReview: John Wick: Chapte...,,"In John Wick, Keanu Reeves plays the role of J...",,0.653222,51bee111-6296-4de0-b02b-29222e27be93,7a9ade98-e1d7-4d37-825f-d39b3670fc41


In [109]:
#evaluate contextual compression retrieval chain in langsmith
evaluate(
    contextual_compression_retrieval_chain.invoke,
    data=langsmith_dataset,
    evaluators=[qa_evaluator],
    metadata={"revision_id": "contextual_compression_retrieval_chain"},
    experiment_prefix="contextual_compression_retrieval_chain_init"  
)

View the evaluation results for experiment: 'contextual_compression_retrieval_chain_init-2e810c70' at:
https://smith.langchain.com/o/239bc38e-a472-4022-97f0-84cd498894d1/datasets/d994ba33-4c35-4de2-9861-3e23dee794dc/compare?selectedSessions=b6b82734-24f6-4f4c-8e51-30345138934b




0it [00:00, ?it/s]

Unnamed: 0,inputs.question,outputs.response,outputs.context,error,reference.answer,feedback.correctness,execution_time,example_id,id
0,How has the John Wick franchise set a new stan...,"content=""The John Wick franchise has set a new...",[page_content=': 18\nReview: Ever since the or...,,"The John Wick franchise, particularly since th...",0.0,2.397407,b30d8102-1243-4c37-a164-dba5925d89af,94a12881-2e16-461a-99f7-b55963a15104
1,Why Parabellum not exciting even with John Wic...,content='The reason why some viewers may find ...,[page_content=': 6\nReview: Assassin John Wick...,,"Parabellum is not exciting because, despite Jo...",0.0,1.220085,a159c9f7-759e-47c4-8abf-58ef1f4668b4,101ecf3d-6cf0-47e5-8da7-37468d22cbea
2,What are the key elements that make Keanu Reev...,"content=""Keanu Reeves' portrayal of John Wick ...","[page_content=': 9\nReview: At first glance, J...",,Keanu Reeves' portrayal of John Wick stands ou...,1.0,1.829808,645522d1-3671-4f9c-9062-c02314f3114b,37d7d6cf-63dd-4b3e-a875-ff4a92ac3573
3,How does the cinematic execution of JOHN WICK ...,"content=""I don't know the specific details of ...",[page_content=': 14\nReview: By now you know w...,,"JOHN WICK is praised for its relentless, pulse...",0.0,2.220997,6157bd4a-fc86-4b44-b8d1-103187a02d50,9c562551-b15b-474b-a5e3-cc0d99101d2e
4,What John Wick: Chapter 3 - Parabellum about a...,"content=""John Wick: Chapter 3 - Parabellum is ...",[page_content=': 24\nReview: John Wick: Chapte...,,John Wick: Chapter 3 - Parabellum is about the...,0.0,2.541477,c98f1e6f-dee8-4775-a602-6ec7631bf964,030f4df5-2217-44c2-8360-ae38ea92d7be
5,What motivates John Wick to come out of retire...,content='John Wick comes out of retirement in ...,[page_content=': 19\nReview: If you've seen th...,,"In the original John Wick (2014), the ex-hit-m...",1.0,1.141197,e74e6cbd-431f-4e3f-97e0-b8a3d05249b8,93100a5d-64f2-48aa-a0aa-9f6bc3e200bc
6,Howw does Reeves perform in the moviee John Wick?,content='Keanu Reeves performs excellently in ...,"[page_content=': 9\nReview: At first glance, J...",,"Savvy, indestructible Reeves looks right at ho...",0.0,0.918319,ccd67b28-5964-44ae-aa03-3645db7a25d3,d7a71a99-b4ee-4a66-b573-0498ae9848ca
7,Why John Wick movie so special even if story s...,content='The John Wick movie is special becaus...,[page_content=': 3\nReview: John wick has a ve...,,"John Wick is special because, despite its simp...",0.0,1.36337,9cb4cf94-ec69-461e-b6b2-fe6718a870a4,65000e9e-ecee-48ff-b8b9-b682681d642f
8,Why everyone like John Wick movies so much?,"content=""People seem to like John Wick movies ...","[page_content=': 9\nReview: At first glance, J...",,The fourth installment of John Wick is scoring...,0.0,1.586571,e4fcad63-a7c8-40be-b603-1c2b2c21759a,3e6031bd-53a8-490f-bb72-7328467be948
9,What Keanu Reeves do in John Wick?,"content='In John Wick, Keanu Reeves plays the ...",[page_content=': 19\nReview: If you've seen th...,,"In John Wick, Keanu Reeves plays the role of J...",,2.654673,51bee111-6296-4de0-b02b-29222e27be93,60aa97aa-efad-41fe-b28d-e36abb39b53c


In [110]:
#evaluate multi_query_retrieval_chain in langsmith
evaluate(
    multi_query_retrieval_chain.invoke,
    data=langsmith_dataset,
    evaluators=[qa_evaluator],
    metadata={"revision_id": "multi_query_retrieval_chain"},
    experiment_prefix="multi_query_retrieval_chain_init"  
)

View the evaluation results for experiment: 'multi_query_retrieval_chain_init-665b0d5e' at:
https://smith.langchain.com/o/239bc38e-a472-4022-97f0-84cd498894d1/datasets/d994ba33-4c35-4de2-9861-3e23dee794dc/compare?selectedSessions=7128c42b-f9a4-46e5-9179-edd8dbcd31ca




0it [00:00, ?it/s]

Unnamed: 0,inputs.question,outputs.response,outputs.context,error,reference.answer,feedback.correctness,execution_time,example_id,id
0,How has the John Wick franchise set a new stan...,content='The John Wick franchise has set a new...,[page_content=': 11\nReview: JOHN WICK is a ra...,,"The John Wick franchise, particularly since th...",0.0,4.569184,b30d8102-1243-4c37-a164-dba5925d89af,a2627cda-c4ba-47fb-8848-9b6fc7d86916
1,Why Parabellum not exciting even with John Wic...,"content=""The reviews suggest that John Wick 3:...",[page_content=': 6\nReview: Assassin John Wick...,,"Parabellum is not exciting because, despite Jo...",0.0,2.567012,a159c9f7-759e-47c4-8abf-58ef1f4668b4,f7673ee0-0e20-4603-b1be-a8793ba9e14a
2,What are the key elements that make Keanu Reev...,"content=""The key elements that make Keanu Reev...",[page_content=': 14\nReview: Keanu Reeve is Jo...,,Keanu Reeves' portrayal of John Wick stands ou...,0.0,2.953259,645522d1-3671-4f9c-9062-c02314f3114b,20630e89-2e45-4f0a-8cec-74c3930452ed
3,How does the cinematic execution of JOHN WICK ...,"content=""I'm sorry, but I don't have the speci...",[page_content=': 0\nReview: It is 5 years sinc...,,"JOHN WICK is praised for its relentless, pulse...",0.0,2.956057,6157bd4a-fc86-4b44-b8d1-103187a02d50,89417856-0fa8-4960-b925-b5780598c455
4,What John Wick: Chapter 3 - Parabellum about a...,"content=""John Wick: Chapter 3 - Parabellum is ...",[page_content=': 0\nReview: It is 5 years sinc...,,John Wick: Chapter 3 - Parabellum is about the...,1.0,5.83004,c98f1e6f-dee8-4775-a602-6ec7631bf964,043b5b9d-3a92-42f6-89ed-a67953e91093
5,What motivates John Wick to come out of retire...,content='John Wick is motivated to come out of...,[page_content=': 18\nReview: When the story be...,,"In the original John Wick (2014), the ex-hit-m...",1.0,2.852388,e74e6cbd-431f-4e3f-97e0-b8a3d05249b8,ed9bb110-fa1e-4226-b701-7210b3263dab
6,Howw does Reeves perform in the moviee John Wick?,content='Keanu Reeves performs exceptionally w...,[page_content=': 14\nReview: Keanu Reeve is Jo...,,"Savvy, indestructible Reeves looks right at ho...",0.0,2.579411,ccd67b28-5964-44ae-aa03-3645db7a25d3,569ca9a4-bd79-4387-96bf-b5d2e99081f6
7,Why John Wick movie so special even if story s...,"content=""[Answer]: The John Wick movie is spec...",[page_content=': 3\nReview: John wick has a ve...,,"John Wick is special because, despite its simp...",0.0,2.630238,9cb4cf94-ec69-461e-b6b2-fe6718a870a4,482ace56-ffc3-42a1-ae0c-89b996adc573
8,Why everyone like John Wick movies so much?,content='People seem to like John Wick movies ...,"[page_content=': 9\nReview: At first glance, J...",,The fourth installment of John Wick is scoring...,0.0,3.184125,e4fcad63-a7c8-40be-b603-1c2b2c21759a,05544b0a-ebfd-49b1-b970-759c87366dd0
9,What Keanu Reeves do in John Wick?,"content=""In John Wick, Keanu Reeves plays the ...",[page_content=': 19\nReview: If you've seen th...,,"In John Wick, Keanu Reeves plays the role of J...",,2.483979,51bee111-6296-4de0-b02b-29222e27be93,e17e149c-25b7-4639-8eda-4be244189f93


In [111]:

evaluate(
    parent_document_retrieval_chain.invoke,
    data=langsmith_dataset,
    evaluators=[qa_evaluator],
    metadata={"revision_id": "parent_document_retrieval_chain"},
    experiment_prefix="parent_document_retrieval_chain_init"  
)

View the evaluation results for experiment: 'parent_document_retrieval_chain_init-51f07832' at:
https://smith.langchain.com/o/239bc38e-a472-4022-97f0-84cd498894d1/datasets/d994ba33-4c35-4de2-9861-3e23dee794dc/compare?selectedSessions=b87c79fc-86e6-4569-b6b4-d190c27504db




0it [00:00, ?it/s]

Unnamed: 0,inputs.question,outputs.response,outputs.context,error,reference.answer,feedback.correctness,execution_time,example_id,id
0,How has the John Wick franchise set a new stan...,"content=""The John Wick franchise has set a new...",[page_content=': 18\nReview: Ever since the or...,,"The John Wick franchise, particularly since th...",0.0,2.212943,b30d8102-1243-4c37-a164-dba5925d89af,08ff241f-1b66-4c7f-b07c-eda2ad2db162
1,Why Parabellum not exciting even with John Wic...,"content='It seems like the reviewer found ""Joh...","[page_content=': 11\nReview: The overrated ""Jo...",,"Parabellum is not exciting because, despite Jo...",0.0,1.253405,a159c9f7-759e-47c4-8abf-58ef1f4668b4,39375b34-539e-4148-8cb8-33118547b6db
2,What are the key elements that make Keanu Reev...,"content=""Keanu Reeves' portrayal of John Wick ...",[page_content=': 14\nReview: Keanu Reeve is Jo...,,Keanu Reeves' portrayal of John Wick stands ou...,0.0,1.131403,645522d1-3671-4f9c-9062-c02314f3114b,f43d319d-6170-4072-96ea-d78af8c70840
3,How does the cinematic execution of JOHN WICK ...,"content=""I don't have the specific details to ...","[page_content=': 16\nReview: Ok, so I got back...",,"JOHN WICK is praised for its relentless, pulse...",0.0,0.949937,6157bd4a-fc86-4b44-b8d1-103187a02d50,c57cd95d-09ad-4c5f-bc9d-f5219090f6cd
4,What John Wick: Chapter 3 - Parabellum about a...,content='John Wick: Chapter 3 - Parabellum is ...,"[page_content=': 11\nReview: The overrated ""Jo...",,John Wick: Chapter 3 - Parabellum is about the...,1.0,1.538504,c98f1e6f-dee8-4775-a602-6ec7631bf964,0e73731f-5ff5-409b-a145-3ce0cba22260
5,What motivates John Wick to come out of retire...,content='John Wick comes out of retirement in ...,[page_content=': 19\nReview: If you've seen th...,,"In the original John Wick (2014), the ex-hit-m...",1.0,0.74118,e74e6cbd-431f-4e3f-97e0-b8a3d05249b8,c90379a6-cbff-45fd-af02-23ca58960e41
6,Howw does Reeves perform in the moviee John Wick?,content='Reeves performs exceptionally well in...,"[page_content=': 9\nReview: At first glance, J...",,"Savvy, indestructible Reeves looks right at ho...",0.0,1.41077,ccd67b28-5964-44ae-aa03-3645db7a25d3,68541051-58a4-43f4-bf36-0556859fad1d
7,Why John Wick movie so special even if story s...,content='John Wick movies are special because ...,[page_content=': 3\nReview: John wick has a ve...,,"John Wick is special because, despite its simp...",0.0,1.98755,9cb4cf94-ec69-461e-b6b2-fe6718a870a4,2585e0f1-e707-4ef1-bf93-b5dc0ec8b1ca
8,Why everyone like John Wick movies so much?,content='People like John Wick movies so much ...,[page_content=': 8\nReview: It's hard to find ...,,The fourth installment of John Wick is scoring...,0.0,0.954298,e4fcad63-a7c8-40be-b603-1c2b2c21759a,45f4c08b-fcd6-4839-893b-bf1d06e30ab1
9,What Keanu Reeves do in John Wick?,"content=""In John Wick, Keanu Reeves plays the ...",[page_content=': 19\nReview: If you've seen th...,,"In John Wick, Keanu Reeves plays the role of J...",,1.297598,51bee111-6296-4de0-b02b-29222e27be93,ef87d55e-7573-4fea-9ce2-e4a2d658648b


In [112]:
evaluate(
    ensemble_retrieval_chain.invoke,
    data=langsmith_dataset,
    evaluators=[qa_evaluator],
    metadata={"revision_id": "ensemble_retrieval_chain"},
    experiment_prefix="ensemble_retrieval_chain_init"  
)

View the evaluation results for experiment: 'ensemble_retrieval_chain_init-4269ae48' at:
https://smith.langchain.com/o/239bc38e-a472-4022-97f0-84cd498894d1/datasets/d994ba33-4c35-4de2-9861-3e23dee794dc/compare?selectedSessions=a2225560-5d24-46e6-a361-e8b55c23bf2f




0it [00:00, ?it/s]

Unnamed: 0,inputs.question,outputs.response,outputs.context,error,reference.answer,feedback.correctness,execution_time,example_id,id
0,How has the John Wick franchise set a new stan...,"content=""The John Wick franchise has set a new...",[page_content=': 18\nReview: Ever since the or...,,"The John Wick franchise, particularly since th...",0.0,6.404524,b30d8102-1243-4c37-a164-dba5925d89af,7e54fa10-7f68-4083-ba59-d3a768d9bcd2
1,Why Parabellum not exciting even with John Wic...,"content=""John Wick 3 may not be as exciting fo...",[page_content=': 6\nReview: Assassin John Wick...,,"Parabellum is not exciting because, despite Jo...",0.0,3.289473,a159c9f7-759e-47c4-8abf-58ef1f4668b4,680d6b48-7f1f-45e3-aa3b-7c3e2fbcea24
2,What are the key elements that make Keanu Reev...,"content=""Keanu Reeves' portrayal of John Wick ...","[page_content=': 9\nReview: At first glance, J...",,Keanu Reeves' portrayal of John Wick stands ou...,0.0,3.87964,645522d1-3671-4f9c-9062-c02314f3114b,de8d98d8-8f3f-4761-97b9-0c2f2f0ae8e5
3,How does the cinematic execution of JOHN WICK ...,"content=""I'm unable to provide a direct compar...",[page_content=': 14\nReview: By now you know w...,,"JOHN WICK is praised for its relentless, pulse...",0.0,3.193208,6157bd4a-fc86-4b44-b8d1-103187a02d50,d94f90ab-3429-438f-80ff-ceba4be591de
4,What John Wick: Chapter 3 - Parabellum about a...,"content=""John Wick: Chapter 3 - Parabellum is ...",[page_content=': 24\nReview: John Wick: Chapte...,,John Wick: Chapter 3 - Parabellum is about the...,1.0,3.522756,c98f1e6f-dee8-4775-a602-6ec7631bf964,7d0abe00-2593-455b-83b9-a106c7757eda
5,What motivates John Wick to come out of retire...,content='John Wick is motivated to come out of...,[page_content=': 19\nReview: If you've seen th...,,"In the original John Wick (2014), the ex-hit-m...",0.0,4.454549,e74e6cbd-431f-4e3f-97e0-b8a3d05249b8,e1852b8f-ed5d-41b2-b23c-e823a4b74e4d
6,Howw does Reeves perform in the moviee John Wick?,content='Reeves performs excellently in the mo...,"[page_content=': 9\nReview: At first glance, J...",,"Savvy, indestructible Reeves looks right at ho...",0.0,4.901107,ccd67b28-5964-44ae-aa03-3645db7a25d3,17ffd179-ee08-4c95-ae8f-b51cabd6f42d
7,Why John Wick movie so special even if story s...,content='The John Wick movie is special becaus...,[page_content=': 3\nReview: John wick has a ve...,,"John Wick is special because, despite its simp...",0.0,6.543692,9cb4cf94-ec69-461e-b6b2-fe6718a870a4,49bc2e3b-3166-4ae0-accf-6bfe2cb74cb5
8,Why everyone like John Wick movies so much?,"content=""The John Wick movies are loved by man...","[page_content=': 9\nReview: At first glance, J...",,The fourth installment of John Wick is scoring...,0.0,3.747879,e4fcad63-a7c8-40be-b603-1c2b2c21759a,7358244f-4f59-43e1-ad0a-c9c1b0a032ed
9,What Keanu Reeves do in John Wick?,"content=""In the movie John Wick, Keanu Reeves ...",[page_content=': 19\nReview: If you've seen th...,,"In John Wick, Keanu Reeves plays the role of J...",,2.885132,51bee111-6296-4de0-b02b-29222e27be93,1fe7e365-9553-4e9c-b39a-ac6dc8bf601c


In [113]:
### Ragas evaluations
from ragas.metrics import LLMContextRecall, Faithfulness, FactualCorrectness, ResponseRelevancy, ContextEntityRecall, NoiseSensitivity
from ragas import evaluate, RunConfig

custom_run_config = RunConfig(timeout=360)

In [120]:
#prep dataset for ragas with naive retrieval chain

for test_row in dataset:
  response = naive_retrieval_chain.invoke({"question" : test_row.eval_sample.user_input})
  test_row.eval_sample.response = str(response["response"].content)
  test_row.eval_sample.retrieved_contexts = [context.page_content for context in response["context"]]

from ragas import EvaluationDataset

evaluation_dataset = EvaluationDataset.from_pandas(dataset.to_pandas())

In [121]:
result = evaluate(
    dataset=evaluation_dataset,
    metrics=[LLMContextRecall(), Faithfulness(), FactualCorrectness(), ResponseRelevancy(), ContextEntityRecall(), NoiseSensitivity()],
    llm=evaluator_llm,
    run_config=custom_run_config
)
result

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

Exception raised in Job[13]: RateLimitError(Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4 in organization org-TU5fm55zJrncrgPcg3lg23B6 on tokens per min (TPM): Limit 40000, Used 38544, Requested 3736. Please try again in 3.42s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}})
Exception raised in Job[24]: RateLimitError(Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4 in organization org-TU5fm55zJrncrgPcg3lg23B6 on tokens per min (TPM): Limit 40000, Used 38876, Requested 3437. Please try again in 3.469s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}})
Exception raised in Job[28]: RateLimitError(Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4 in organization org-TU5fm55zJrncrgPcg3lg23B6 on tokens per min (TPM): Limit 40000, Used 39573, Requested 30

{'context_recall': 0.7500, 'faithfulness': 0.9750, 'factual_correctness': 0.3970, 'answer_relevancy': 0.7644, 'context_entity_recall': 0.5107, 'noise_sensitivity_relevant': 0.3000}

In [118]:
#prep dataset for ragas with bm25 retrieval chain

for test_row in dataset:
  response = bm25_retrieval_chain.invoke({"question" : test_row.eval_sample.user_input})
  test_row.eval_sample.response = str(response["response"].content)
  test_row.eval_sample.retrieved_contexts = [context.page_content for context in response["context"]]

from ragas import EvaluationDataset

evaluation_dataset = EvaluationDataset.from_pandas(dataset.to_pandas())

In [119]:
result = evaluate(
    dataset=evaluation_dataset,
    metrics=[LLMContextRecall(), Faithfulness(), FactualCorrectness(), ResponseRelevancy(), ContextEntityRecall(), NoiseSensitivity()],
    llm=evaluator_llm,
    run_config=custom_run_config
)
result

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

Exception raised in Job[11]: RateLimitError(Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4 in organization org-TU5fm55zJrncrgPcg3lg23B6 on tokens per min (TPM): Limit 10000, Used 9930, Requested 1112. Please try again in 6.252s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}})
Exception raised in Job[4]: RateLimitError(Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4 in organization org-TU5fm55zJrncrgPcg3lg23B6 on tokens per min (TPM): Limit 10000, Used 9680, Requested 1268. Please try again in 5.688s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}})
Exception raised in Job[13]: RateLimitError(Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4 in organization org-TU5fm55zJrncrgPcg3lg23B6 on tokens per min (TPM): Limit 40000, Used 38465, Requested 1624

{'context_recall': 0.5000, 'faithfulness': 0.6701, 'factual_correctness': 0.3790, 'answer_relevancy': 0.8555, 'context_entity_recall': 0.3512, 'noise_sensitivity_relevant': 0.1032}

In [129]:
#prep dataset for ragas with contextual compression retrieval chain

for test_row in dataset:
  response = contextual_compression_retrieval_chain.invoke({"question" : test_row.eval_sample.user_input})
  test_row.eval_sample.response = str(response["response"].content)
  test_row.eval_sample.retrieved_contexts = [context.page_content for context in response["context"]]

from ragas import EvaluationDataset

evaluation_dataset = EvaluationDataset.from_pandas(dataset.to_pandas())

In [130]:
result = evaluate(
    dataset=evaluation_dataset,
    metrics=[LLMContextRecall(), Faithfulness(), FactualCorrectness(), ResponseRelevancy(), ContextEntityRecall(), NoiseSensitivity()],
    llm=evaluator_llm,
    run_config=custom_run_config
)
result

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

Exception raised in Job[36]: RateLimitError(Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4 in organization org-TU5fm55zJrncrgPcg3lg23B6 on tokens per min (TPM): Limit 40000, Used 39281, Requested 2120. Please try again in 2.101s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}})
Exception raised in Job[11]: RateLimitError(Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4 in organization org-TU5fm55zJrncrgPcg3lg23B6 on tokens per min (TPM): Limit 40000, Used 38449, Requested 1557. Please try again in 9ms. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}})
Exception raised in Job[30]: OutputParserException(Failed to parse StringIO from completion {"classifications": [{"statement": "John Wick: Chapter 3 - Parabellum is about the consequences of John's actions from the previous

{'context_recall': 0.5833, 'faithfulness': 0.7095, 'factual_correctness': 0.4090, 'answer_relevancy': 0.8544, 'context_entity_recall': 0.3665, 'noise_sensitivity_relevant': 0.4388}

In [131]:
#prep dataset for ragas with multi_query retrieval chain

for test_row in dataset:
  response = multi_query_retrieval_chain.invoke({"question" : test_row.eval_sample.user_input})
  test_row.eval_sample.response = str(response["response"].content)
  test_row.eval_sample.retrieved_contexts = [context.page_content for context in response["context"]]

from ragas import EvaluationDataset

evaluation_dataset = EvaluationDataset.from_pandas(dataset.to_pandas())

In [132]:
result = evaluate(
    dataset=evaluation_dataset,
    metrics=[LLMContextRecall(), Faithfulness(), FactualCorrectness(), ResponseRelevancy(), ContextEntityRecall(), NoiseSensitivity()],
    llm=evaluator_llm,
    run_config=custom_run_config
)
result

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

Exception raised in Job[24]: RateLimitError(Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4 in organization org-TU5fm55zJrncrgPcg3lg23B6 on tokens per min (TPM): Limit 40000, Used 39329, Requested 3239. Please try again in 3.851s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}})
Exception raised in Job[30]: RateLimitError(Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4 in organization org-TU5fm55zJrncrgPcg3lg23B6 on tokens per min (TPM): Limit 40000, Used 39890, Requested 3640. Please try again in 5.295s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}})
Exception raised in Job[25]: RateLimitError(Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4 in organization org-TU5fm55zJrncrgPcg3lg23B6 on tokens per min (TPM): Limit 40000, Used 39046, Requested 3

{'context_recall': 0.8750, 'faithfulness': 0.5676, 'factual_correctness': 0.4256, 'answer_relevancy': 0.8565, 'context_entity_recall': 0.4556, 'noise_sensitivity_relevant': 0.0000}

In [140]:
#prep dataset for ragas with parent document retrieval chain

for test_row in dataset:
  response = parent_document_retrieval_chain.invoke({"question" : test_row.eval_sample.user_input})
  test_row.eval_sample.response = str(response["response"].content)
  test_row.eval_sample.retrieved_contexts = [context.page_content for context in response["context"]]

evaluation_dataset = EvaluationDataset.from_pandas(dataset.to_pandas())

In [141]:
result = evaluate(
    dataset=evaluation_dataset,
    metrics=[LLMContextRecall(), Faithfulness(), FactualCorrectness(), ResponseRelevancy(), ContextEntityRecall(), NoiseSensitivity()],
    llm=evaluator_llm,
    run_config=custom_run_config
)
result

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

{'context_recall': 0.4267, 'faithfulness': 0.6800, 'factual_correctness': 0.3970, 'answer_relevancy': 0.8521, 'context_entity_recall': 0.3399, 'noise_sensitivity_relevant': 0.1967}

In [142]:
#prep dataset for ragas with ensemble retrieval chain

for test_row in dataset:
  response = ensemble_retrieval_chain.invoke({"question" : test_row.eval_sample.user_input})
  test_row.eval_sample.response = str(response["response"].content)
  test_row.eval_sample.retrieved_contexts = [context.page_content for context in response["context"]]

evaluation_dataset = EvaluationDataset.from_pandas(dataset.to_pandas())

In [144]:
result = evaluate(
    dataset=evaluation_dataset,
    metrics=[LLMContextRecall(), Faithfulness(), FactualCorrectness(), ResponseRelevancy(), ContextEntityRecall(), NoiseSensitivity()],
    llm=evaluator_llm,
    run_config=custom_run_config
)
result

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

Exception raised in Job[49]: RateLimitError(Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4 in organization org-TU5fm55zJrncrgPcg3lg23B6 on tokens per min (TPM): Limit 80000, Used 79140, Requested 4114. Please try again in 2.44s. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}})
Exception raised in Job[6]: BadRequestError(Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens. However, your messages resulted in 9893 tokens. Please reduce the length of the messages.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}})
Exception raised in Job[30]: OutputParserException(Failed to parse StringIO from completion {"classifications": [{"statement": "John Wick: Chapter 3 - Parabellum is about the consequences of John's actions from the previous film, sending him on a bigger odyssey of violence while exploring t

{'context_recall': 0.9375, 'faithfulness': 0.8999, 'factual_correctness': 0.3460, 'answer_relevancy': 0.8592, 'context_entity_recall': 0.3983, 'noise_sensitivity_relevant': 0.8333}

Results

naive_retrieval_chain  
    {'context_recall': 0.7500, 'faithfulness': 0.9750, 'factual_correctness': 0.3970, 'answer_relevancy': 0.7644, 'context_entity_recall': 0.5107, 'noise_sensitivity_relevant': 0.3000}  
p50 latency 1.50 seconds  
Cost $0.002  

bm25_retrieval_chain  
    {'context_recall': 0.5000, 'faithfulness': 0.6701, 'factual_correctness': 0.3790, 'answer_relevancy': 0.8555, 'context_entity_recall': 0.3512, 'noise_sensitivity_relevant': 0.1032}  
p50 latency 0.72 seconds  
Cost $0.001  

contextual_compression_retrieval_chain  
    {'context_recall': 0.5833, 'faithfulness': 0.7095, 'factual_correctness': 0.4090, 'answer_relevancy': 0.8544, 'context_entity_recall': 0.3665, 'noise_sensitivity_relevant': 0.4388}  
p50 latency 1.71s seconds  
Cost $0.001  


multi_query_retrieval_chain  
    {'context_recall': 0.8750, 'faithfulness': 0.5676, 'factual_correctness': 0.4256, 'answer_relevancy': 0.8565, 'context_entity_recall': 0.4556, 'noise_sensitivity_relevant': 0.0000}  
p50 latency 2.90 seconds  
Cost $0.003  

parent_document_retrieval_chain  
    {'context_recall': 0.4267, 'faithfulness': 0.6800, 'factual_correctness': 0.3970, 'answer_relevancy': 0.8521, 'context_entity_recall': 0.3399, 'noise_sensitivity_relevant': 0.1967}  
p50 latency 1.27 seconds  
Cost $0.003  

ensemble_retrieval_chain  
    {'context_recall': 0.9375, 'faithfulness': 0.8999, 'factual_correctness': 0.3460, 'answer_relevancy': 0.8592, 'context_entity_recall': 0.3983, 'noise_sensitivity_relevant': 0.8333}  
p50 latency 3.81 seconds  
Cost $0.003  




### Comparison of Retriever Methods

| Method                             | Context Recall | Faithfulness | Factual Correctness | Answer Relevancy | Context Entity Recall | Noise Sensitivity | Latency (p50) | Cost  |
|------------------------------------|---------------|-------------|--------------------|-----------------|--------------------|-----------------|------------|------|
| **Naive Retrieval Chain**         | 0.7500        | 0.9750      | 0.3970            | 0.7644          | 0.5107             | 0.3000          | 1.50s      | $0.002 |
| **BM25 Retrieval Chain**          | 0.5000        | 0.6701      | 0.3790            | 0.8555          | 0.3512             | 0.1032          | 0.72s      | $0.001 |
| **Contextual Compression Chain**  | 0.5833        | 0.7095      | 0.4090            | 0.8544          | 0.3665             | 0.4388          | 1.71s      | $0.001 |
| **Multi-Query Retrieval Chain**   | 0.8750        | 0.5676      | 0.4256            | 0.8565          | 0.4556             | 0.0000          | 2.90s      | $0.003 |
| **Parent-Document Retrieval Chain** | 0.4267      | 0.6800      | 0.3970            | 0.8521          | 0.3399             | 0.1967          | 1.27s      | $0.003 |
| **Ensemble Retrieval Chain**      | 0.9375        | 0.8999      | 0.3460            | 0.8592          | 0.3983             | 0.8333          | 3.81s      | $0.003 |

### Best Retriever for This Dataset

From the results, **Ensemble Retrieval Chain** has the highest **context recall (0.9375)** and **faithfulness (0.8999)**, making it the most reliable for retrieving relevant content. However, it has a **high noise sensitivity (0.8333)**, the **highest latency (3.81s)**, and a cost of **$0.003** per query, which is on the higher end.

If cost and latency are primary concerns, **BM25 Retrieval Chain** provides a **low-cost ($0.001), low-latency (0.72s) alternative** with reasonable answer relevancy (0.8555). However, its **context recall is the lowest (0.5000)**, meaning it may miss key information.

For a balance of performance, cost, and latency, **Naive Retrieval Chain** seems like a strong contender. It has **high faithfulness (0.9750), good recall (0.7500), and moderate latency (1.50s) at a cost of $0.002**. It performs better than BM25 in recall and faithfulness while keeping latency and cost manageable.

### Conclusion

- **If accuracy is the most important factor**: **Ensemble Retrieval Chain** is the best choice.
- **If speed and cost efficiency are priorities**: **BM25 Retrieval Chain** is preferable.
- **If a balance of performance, cost, and latency is needed**: **Naive Retrieval Chain** offers a good trade-off.
