# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

> You do not need to run the following cells if you are running this notebook locally. 

In [1]:
!pip install -qU langchain langchain-openai langchain-cohere rank_bm25


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


We're also going to be leveraging [Qdrant's](https://qdrant.tech/documentation/frameworks/langchain/) (pronounced "Quadrant") VectorDB in "memory" mode (so we can leverage it locally in our colab environment).

In [2]:
!pip install -qU qdrant-client


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


We'll also provide our OpenAI key, as well as our Cohere API key.

In [3]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

In [4]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

## Task 2: Data Collection and Preparation

We'll be using some reviews from the 4 movies in the John Wick franchise today to explore the different retrieval strategies.

These were obtained from IMDB, and are available in the [AIM Data Repository](https://github.com/AI-Maker-Space/DataRepository).

### Data Collection

We can simply `wget` these from GitHub.

You could use any review data you wanted in this step - just be careful to make sure your metadata is aligned with your choice.

In [5]:
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw1.csv -O john_wick_1.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw2.csv -O john_wick_2.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw3.csv -O john_wick_3.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw4.csv -O john_wick_4.csv

--2025-02-28 11:40:29--  https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw1.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8001::154, 2606:50c0:8002::154, 2606:50c0:8003::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8001::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19628 (19K) [text/plain]
Saving to: ‘john_wick_1.csv’


2025-02-28 11:40:29 (18.6 MB/s) - ‘john_wick_1.csv’ saved [19628/19628]

--2025-02-28 11:40:29--  https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw2.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8001::154, 2606:50c0:8002::154, 2606:50c0:8003::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8001::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14747 (14K) [text/plain]
Saving to: ‘john_wick_2.csv’


2025-02-28 11:40:29

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

- Self-Query: Wants as much metadata as we can provide
- Time-weighted: Wants temporal data

> NOTE: While we're creating a temporal relationship based on when these movies came out for illustrative purposes, it needs to be clear that the "time-weighting" in the Time-weighted Retriever is based on when the document was *accessed* last - not when it was created.

In [6]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

documents = []

for i in range(1, 5):
  loader = CSVLoader(
      file_path=f"john_wick_{i}.csv",
      metadata_columns=["Review_Date", "Review_Title", "Review_Url", "Author", "Rating"]
  )

  movie_docs = loader.load()
  for doc in movie_docs:

    # Add the "Movie Title" (John Wick 1, 2, ...)
    doc.metadata["Movie_Title"] = f"John Wick {i}"

    # convert "Rating" to an `int`, if no rating is provided - assume 0 rating
    doc.metadata["Rating"] = int(doc.metadata["Rating"]) if doc.metadata["Rating"] else 0

    # newer movies have a more recent "last_accessed_at"
    doc.metadata["last_accessed_at"] = datetime.now() - timedelta(days=4-i)

  documents.extend(movie_docs)

Let's look at an example document to see if everything worked as expected!

In [7]:
documents[0]

Document(metadata={'source': 'john_wick_1.csv', 'row': 0, 'Review_Date': '6 May 2015', 'Review_Title': ' Kinetic, concise, and stylish; John Wick kicks ass.\n', 'Review_Url': '/review/rw3233896/?ref_=tt_urv', 'Author': 'lnvicta', 'Rating': 8, 'Movie_Title': 'John Wick 1', 'last_accessed_at': datetime.datetime(2025, 2, 25, 11, 40, 31, 292319)}, page_content=": 0\nReview: The best way I can describe John Wick is to picture Taken but instead of Liam Neeson it's Keanu Reeves and instead of his daughter it's his dog. That's essentially the plot of the movie. John Wick (Reeves) is out to seek revenge on the people who took something he loved from him. It's a beautifully simple premise for an action movie - when action movies get convoluted, they get bad i.e. A Good Day to Die Hard. John Wick gives the viewers what they want: Awesome action, stylish stunts, kinetic chaos, and a relatable hero to tie it all together. John Wick succeeds in its simplicity.")

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "JohnWick".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [8]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    documents,
    embeddings,
    location=":memory:",
    collection_name="JohnWick"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [9]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [10]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-3.5-turbo` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [11]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI()

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [12]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [13]:
naive_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"Based on the reviews provided, it seems that people generally liked John Wick. The reviews praised the action sequences, Keanu Reeves' performance, the style of the movie, and the overall experience. So, yes, people generally liked John Wick."

In [14]:
naive_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10. Here is the URL to that review:\n- \'/review/rw4854296/?ref_=tt_urv\' ("A Masterpiece & Brilliant Sequel")'

In [15]:
naive_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In "John Wick," the protagonist, an ex-hit-man, comes out of retirement to seek vengeance against the gangsters who killed his dog and took everything from him, leading to plenty of violent action, shootouts, and breathtaking fights. His pursuit of revenge sets off a chain of events that brings him into conflict with various criminal elements, ultimately leading to a series of intense and brutal confrontations.'

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [16]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(documents)

We'll construct the same chain - only changing the retriever.

In [17]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [18]:
bm25_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'It seems like opinions on John Wick vary. Some people really enjoyed the action and style of the movie, while others found it dull and lacking substance. Overall, it seems that the reception of John Wick is mixed.'

In [19]:
bm25_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'There are no reviews with a rating of 10 for the movie "John Wick 4."'

In [20]:
bm25_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In John Wick, the main character, John Wick, is a retired hitman seeking vengeance for the killing of his dog, which was a last gift from his deceased wife. This sets off a series of intense action sequences as Wick takes on numerous assassins who come after him. The movie is known for its well-choreographed action scenes and compelling storyline.'

It's not clear that this is better or worse - but the `I don't know` isn't great!

## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [21]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-english-v3.0")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [22]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [23]:
contextual_compression_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Yes, people generally liked John Wick.'

In [24]:
contextual_compression_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10 for the movie "John Wick 3". Here is the URL to that review: \'/review/rw4854296/?ref_=tt_urv\'.'

In [25]:
contextual_compression_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In John Wick 2, after resolving his issues with the Russian mafia, John Wick is forced back into action when mobster Santino D'Antonio asks him to kill his sister in Rome. When Wick accomplishes the task, Santino puts a seven-million dollar contract on him, leading to professional killers coming after him. Wick promises to kill Santino in retaliation."

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [26]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [27]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [28]:
multi_query_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Yes, people generally liked John Wick based on the reviews provided in the context.'

In [29]:
multi_query_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is one review with a rating of 10 for the movie "John Wick 3". Here is the URL to that review: \'/review/rw4854296/?ref_=tt_urv\''

In [30]:
multi_query_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In John Wick, Keanu Reeves plays the character of John Wick, a retired assassin who comes out of retirement to seek revenge when his dog is killed and his car is stolen. He goes on a rampage, killing many individuals while being hunted by assassins. The story involves various locations such as Italy, Canada, and Manhattan, with lots of action and carnage.'

## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [31]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = documents
child_splitter = RecursiveCharacterTextSplitter(chunk_size=200)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [32]:
client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = Qdrant(
    collection_name="full_documents", embeddings=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

  parent_document_vectorstore = Qdrant(


Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [33]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [34]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [35]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [36]:
parent_document_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"People's opinions about John Wick are divided. Some love the series and appreciate the action and excitement it offers, while others find it nonsensical and boring. It seems that there is a range of views on whether people generally like John Wick."

In [37]:
parent_document_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10. The review of "John Wick 3" by user \'ymyuseda\' has a rating of 10. The URL to that review is \'/review/rw4854296/?ref_=tt_urv\'.'

In [38]:
parent_document_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In John Wick, the main character, played by Keanu Reeves, is a retired assassin who comes out of retirement when someone kills his dog. In the sequel, John Wick 2, his car is stolen which leads to a lot of carnage. John is then called on to pay off an old debt by helping to take over the Assassin's Guild by traveling to Italy, Canada, and Manhattan and killing many assassins along the way."

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [39]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [40]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [41]:
ensemble_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"Based on the reviews provided, it seems that people generally liked John Wick. The movie received positive reviews praising the action sequences, Keanu Reeves' performance, and the overall entertainment factor of the film."

In [42]:
ensemble_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is one review with a rating of 10. The URL to that review is /review/rw4854296/?ref_=tt_urv.'

In [43]:
ensemble_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In John Wick, an ex-hitman comes out of retirement to seek vengeance on the gangsters who killed his dog and took everything from him. With a price on his head, John Wick unleashes a maelstrom of destruction against those who try to stop him, making him a target of hitmen and bounty hunters. The story is filled with violent action, shootouts, and gripping fights.'

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

> NOTE: You do not need to run this cell if you're running this locally

In [44]:
!pip install -qU langchain_experimental


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [45]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [46]:
semantic_documents = semantic_chunker.split_documents(documents)

Let's create a new vector store.

In [47]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="JohnWickSemantic"
)

We'll use naive retrieval for this example.

In [48]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [49]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [50]:
semantic_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'People generally liked John Wick based on the reviews provided.'

In [51]:
semantic_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

"Yes, there is a review with a rating of 10. Here is the URL to that review: '/review/rw4854296/?ref_=tt_urv'"

In [52]:
semantic_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In "John Wick," the main character seeks revenge on the people who killed his dog and took everything from him. He is an ex-hitman who comes out of retirement to track down the gangsters responsible.'

# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.

In [53]:
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o"))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

In [54]:
from ragas.testset import TestsetGenerator

generator = TestsetGenerator(llm=generator_llm, embedding_model=generator_embeddings)
dataset = generator.generate_with_langchain_docs(documents, testset_size=10)

Applying SummaryExtractor:   0%|          | 0/44 [00:00<?, ?it/s]

Applying CustomNodeFilter:   0%|          | 0/100 [00:00<?, ?it/s]

Node 6a0b4d98-8ab6-4e19-b4de-8507c7bedc75 does not have a summary. Skipping filtering.
Node 9e5f4eb2-b439-4b7f-b495-6adf5a368906 does not have a summary. Skipping filtering.
Node 9571afff-9e2f-4be2-bb55-2e9b03c1dfa7 does not have a summary. Skipping filtering.
Node 11542b14-0b87-437c-8707-c0b2929a87c9 does not have a summary. Skipping filtering.
Node 8e5296a6-9eb9-4a13-830b-72e38836de4a does not have a summary. Skipping filtering.
Node aecbb8ca-12e8-4789-9c81-82c4a85dd6c7 does not have a summary. Skipping filtering.
Node d6a24950-e017-42b4-b0a7-9b2b2838221f does not have a summary. Skipping filtering.
Node 3a0c534a-2ae9-406d-8ce6-8ed6f12733a7 does not have a summary. Skipping filtering.
Node 474a6780-c1f3-4864-96ec-a2b38985f663 does not have a summary. Skipping filtering.
Node e485b07c-d045-41ac-8808-bb22139dcb8a does not have a summary. Skipping filtering.
Node 8e6b3216-1ad0-4750-b9cc-fc991ed309b1 does not have a summary. Skipping filtering.
Node a488805e-78fd-4dfa-81a2-3334a502c655 d

Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:   0%|          | 0/244 [00:00<?, ?it/s]

Applying OverlapScoreBuilder:   0%|          | 0/1 [00:00<?, ?it/s]

Generating personas:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Scenarios:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Samples:   0%|          | 0/10 [00:00<?, ?it/s]

In [55]:
import time
from ragas import evaluate, RunConfig
from ragas.llms import LangchainLLMWrapper
from ragas.metrics import LLMContextRecall, Faithfulness, FactualCorrectness, ResponseRelevancy, ContextEntityRecall, NoiseSensitivity
from ragas import EvaluationDataset

from ragas import EvaluationDataset

def get_variable_name(var, scope):
    for name, value in scope.items():
        if value is var:
            return name
    return None

def evaluate_one_chain(dataset, the_chain):
    start_time = time.perf_counter()

    for test_row in dataset:
        response = the_chain.invoke({"question" : test_row.eval_sample.user_input})
        print("response", response)
        print("response[response]", response["response"].content)
        test_row.eval_sample.response = response["response"].content
        test_row.eval_sample.retrieved_contexts = [context.page_content for context in response["context"]]

    evaluation_dataset = EvaluationDataset.from_pandas(dataset.to_pandas())
    evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o"))
    custom_run_config = RunConfig(timeout=360)

    result = evaluate(
        dataset=evaluation_dataset,
        metrics=[LLMContextRecall(), Faithfulness(), FactualCorrectness(), ResponseRelevancy(), ContextEntityRecall(), NoiseSensitivity()],
        llm=evaluator_llm,
        run_config=custom_run_config
    )
    end_time = time.perf_counter()
    print(get_variable_name(the_chain, globals()), end_time - start_time)
    return result


In [56]:
result_multi_query_retrieval = evaluate_one_chain(dataset, multi_query_retrieval_chain)

response {'response': AIMessage(content="I don't know.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 6, 'prompt_tokens': 4657, 'total_tokens': 4663, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-2e9b53fa-0aeb-46c8-b28a-064e7e8b8141-0', usage_metadata={'input_tokens': 4657, 'output_tokens': 6, 'total_tokens': 4663, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}), 'context': [Document(metadata={'source': 'john_wick_2.csv', 'row': 19, 'Review_Date': '29 November 2020', 'Review_Title': ' John Wick Kills A Lot Of People\n', 'Review_Url': '/review/rw6320115/?ref_=tt_urv', 'Author': 'boblipton', 'Rating': 5, 'Movi

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

Exception raised in Job[2]: TypeError(ufunc 'invert' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'')
Exception raised in Job[29]: TimeoutError()
Exception raised in Job[35]: TimeoutError()
Exception raised in Job[47]: TimeoutError()
Exception raised in Job[53]: TimeoutError()
Exception raised in Job[59]: TimeoutError()


multi_query_retrieval_chain 455.59587641700637


In [57]:
result_naive_retrieval = evaluate_one_chain(dataset, naive_retrieval_chain)

response {'response': AIMessage(content='Liam Neeson is not a character in the context of John Wick.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 16, 'prompt_tokens': 3807, 'total_tokens': 3823, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-46296589-f1c7-4e90-87fb-60c5f3e4ca81-0', usage_metadata={'input_tokens': 3807, 'output_tokens': 16, 'total_tokens': 3823, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}), 'context': [Document(metadata={'source': 'john_wick_1.csv', 'row': 0, 'Review_Date': '6 May 2015', 'Review_Title': ' Kinetic, concise, and stylish; John Wick kicks ass.\n', 'Review_Url': '/review/rw323

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

Exception raised in Job[17]: TimeoutError()
Exception raised in Job[35]: TimeoutError()
Exception raised in Job[47]: TimeoutError()


naive_retrieval_chain 414.61089033301687


In [58]:
result_bm25_retrieval = evaluate_one_chain(dataset, bm25_retrieval_chain)

response {'response': AIMessage(content='Liam Neeson is an actor known for his role in the movie Taken. In the context of John Wick, he is mentioned as an actor who could have played a similar role to Keanu Reeves in the film.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 46, 'prompt_tokens': 1210, 'total_tokens': 1256, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-e8911f3c-57fe-4e77-81f5-38358b414538-0', usage_metadata={'input_tokens': 1210, 'output_tokens': 46, 'total_tokens': 1256, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}), 'context': [Document(metadata={'source': 'john_wick_4.csv', 'row': 11, 'Rev

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

bm25_retrieval_chain 262.4028498750122


In [59]:
result_ecompression_retrieval = evaluate_one_chain(dataset, contextual_compression_retrieval_chain)

response {'response': AIMessage(content='Liem Neeson is not a character in John Wick; Liam Neeson is mentioned in the context comparing him to Keanu Reeves in terms of the type of role they play in their respective movies.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 42, 'prompt_tokens': 931, 'total_tokens': 973, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-cbe39dc7-9036-417b-b181-b793ba9c038c-0', usage_metadata={'input_tokens': 931, 'output_tokens': 42, 'total_tokens': 973, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}), 'context': [Document(metadata={'source': 'john_wick_1.csv', 'row': 0, 'Review_Date'

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

contextual_compression_retrieval_chain 195.4534698750067


In [60]:
result_document_retriever = evaluate_one_chain(dataset, parent_document_retrieval_chain)

response {'response': AIMessage(content="I'm sorry, but I don't know who Liem Neeson is in the context of John Wick.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 23, 'prompt_tokens': 330, 'total_tokens': 353, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-f3047296-29a6-4f97-935f-6a1f54fbf148-0', usage_metadata={'input_tokens': 330, 'output_tokens': 23, 'total_tokens': 353, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}), 'context': [Document(metadata={'source': 'john_wick_1.csv', 'row': 0, 'Review_Date': '6 May 2015', 'Review_Title': ' Kinetic, concise, and stylish; John Wick kicks ass.\n', 'Review_Url': '/

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

parent_document_retrieval_chain 172.69446754199453


In [61]:
result_ensemble_retrieval = evaluate_one_chain(dataset, ensemble_retrieval_chain)

response {'response': AIMessage(content="Liem Neeson is a prominent actor known for films like Taken. In the context of John Wick, Liam Neeson is referenced as the actor who plays the lead character in the film Taken, draw a comparison to Keanu Reeves' character in John Wick.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 54, 'prompt_tokens': 5867, 'total_tokens': 5921, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-bf9be13b-6610-4705-98bf-064d7740b1a4-0', usage_metadata={'input_tokens': 5867, 'output_tokens': 54, 'total_tokens': 5921, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}), 'context': [Document(meta

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

Exception raised in Job[11]: TimeoutError()
Exception raised in Job[17]: TimeoutError()
Exception raised in Job[29]: TimeoutError()
Exception raised in Job[35]: TimeoutError()
Exception raised in Job[47]: TimeoutError()
Exception raised in Job[53]: TimeoutError()
Exception raised in Job[59]: TimeoutError()


ensemble_retrieval_chain 477.89819716700003


In [62]:
result_semantic_retrieval = evaluate_one_chain(dataset, semantic_retrieval_chain)

response {'response': AIMessage(content="I don't know.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 6, 'prompt_tokens': 2720, 'total_tokens': 2726, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-58467943-e082-4c9b-adbe-79aa3eaceae3-0', usage_metadata={'input_tokens': 2720, 'output_tokens': 6, 'total_tokens': 2726, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}), 'context': [Document(metadata={'source': 'john_wick_1.csv', 'row': 20, 'Review_Date': '22 October 2014', 'Review_Title': ' Smoothest action film to come around in a long time\n', 'Review_Url': '/review/rw3109271/?ref_=tt_urv', 'Author': 'IceSkateUpH

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

Exception raised in Job[2]: TypeError(ufunc 'invert' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'')
Exception raised in Job[29]: TimeoutError()
Exception raised in Job[47]: TimeoutError()


semantic_retrieval_chain 436.39838075000443


In [64]:
print("result_multi_query_retrieval", result_multi_query_retrieval)

print("result_naive_retrieval", result_naive_retrieval)

print("result_bm25_retrieval", result_bm25_retrieval)

print("result_ecompression_retrieval", result_ecompression_retrieval)

print("result_document_retriever", result_document_retriever)

print("result_ensemble_retrieval", result_ensemble_retrieval)

print("result_semantic_retrieval", result_semantic_retrieval)

result_multi_query_retrieval {'context_recall': 0.8250, 'faithfulness': 0.8538, 'factual_correctness': 0.4900, 'answer_relevancy': 0.8724, 'context_entity_recall': 0.5650, 'noise_sensitivity_relevant': 0.4556}
result_naive_retrieval {'context_recall': 0.7750, 'faithfulness': 0.8476, 'factual_correctness': 0.4760, 'answer_relevancy': 0.9593, 'context_entity_recall': 0.5708, 'noise_sensitivity_relevant': 0.2485}
result_bm25_retrieval {'context_recall': 0.5500, 'faithfulness': 0.6695, 'factual_correctness': 0.3800, 'answer_relevancy': 0.9673, 'context_entity_recall': 0.4792, 'noise_sensitivity_relevant': 0.2273}
result_ecompression_retrieval {'context_recall': 0.7500, 'faithfulness': 0.7084, 'factual_correctness': 0.5380, 'answer_relevancy': 0.9725, 'context_entity_recall': 0.6183, 'noise_sensitivity_relevant': 0.2026}
result_document_retriever {'context_recall': 0.5500, 'faithfulness': 0.6988, 'factual_correctness': 0.3950, 'answer_relevancy': 0.8702, 'context_entity_recall': 0.5225, 'no


## 1. Golden Dataset Creation

To compare different retrieval methods apples-to-apples, we generated a synthetic “golden dataset.” This dataset:
- Contains questions that vary in complexity, including entities and relationships relevant to our domain.  
- Includes “noise” questions or partial queries to test retrieval robustness.
- Is large enough to highlight latency differences.

> **How we generated it:**  
> We used a Synthetic Data Generation pipeline (e.g., Ragas) to produce pairs of queries and ground-truth passages. This ensures we can reliably measure *context recall*, *factual correctness*, and other metrics without relying purely on manual labeling.

---

## 2. Retriever-Specific RAGAS Metrics

Below are the metrics we collected for each retriever. All values are drawn from runs on the synthetic dataset. (Note: **Lower** `noise_sensitivity_relevant` typically indicates *greater* robustness to noisy queries.)

| Retriever                  | Context Recall | Faithfulness | Factual Correctness | Answer Relevancy | Context Entity Recall | Noise Sensitivity Relevant |
|----------------------------|---------------:|-------------:|---------------------:|------------------:|-----------------------:|---------------------------:|
| **Multi-query**           | 0.8250         | 0.8538       | 0.4900              | 0.8724           | 0.5650                | 0.4556                    |
| **Naive**                 | 0.7750         | 0.8476       | 0.4760              | 0.9593           | 0.5708                | 0.2485                    |
| **BM25**                  | 0.5500         | 0.6695       | 0.3800              | 0.9673           | 0.4792                | 0.2273                    |
| **Embedding Compression** | 0.7500         | 0.7084       | **0.5380**          | **0.9725**       | **0.6183**            | 0.2026                    |
| **Document Retriever**    | 0.5500         | 0.6988       | 0.3950              | 0.8702           | 0.5225                | **0.1767**                |
| **Ensemble**              | **0.8500**     | 0.8191       | 0.5020              | 0.9680           | 0.5983                | 0.4048                    |
| **Semantic**              | 0.7750         | 0.8360       | 0.4767              | 0.8739           | 0.5696                | 0.4119                    |

> **Note on “Semantic Chunking”**  
> You could apply semantic chunking to any of these retrieval methods. While we did *not* treat it as a separate “retriever,” some teams might test each approach in a “chunked vs. unchunked” setup to see how splitting documents into meaningful chunks affects these metrics.

---

## 3. Summary of Findings

1. **Context Recall:** 
   - The *Ensemble Retriever* provides the highest coverage (0.85).  
   - *Multi-query* is a close second (0.825).  
   - Lexical-based (BM25, Document Retriever) lag behind in coverage (0.55), indicating they sometimes miss relevant passages.

2. **Faithfulness & Factual Correctness:**  
   - *Multi-query* has the strongest faithfulness (0.8538), with *Naive* close behind (0.8476).  
   - However, *Embedding Compression* stands out for factual correctness (0.5380), the highest in the table, suggesting it pulls in more *accurate* information for final answers.

3. **Answer Relevancy & Context Entity Recall:**  
   - *Embedding Compression* also leads on answer relevancy (0.9725) and context entity recall (0.6183). This indicates it surfaces passages that are both on-topic and rich in the correct entities.  
   - BM25 can rank high for “relevancy” on typical keyword-based queries (0.9673), but it lags in other metrics like context entity recall and factual correctness.

4. **Noise Sensitivity:**  
   - Lower is better here. *Document Retriever* is the most robust (0.1767), meaning it is least likely to degrade when queries contain “noise” (misspellings, extraneous text).  
   - More advanced methods like *Ensemble*, *Multi-query*, and *Semantic* can get tripped up by noisy or partial inputs, judging by higher noise-sensitivity scores.

### Cost and Latency Considerations

- **Lexical approaches (BM25, Document Retriever)** 
  - **Pros:** Fast, low computational cost, easy to maintain.
  - **Cons:** Lower coverage and factual correctness, especially in complex or ambiguous queries.
- **Vector-based approaches (Semantic, Embedding Compression)**
  - **Pros:** Generally higher semantic coverage, better factual correctness, strong answer relevancy.
  - **Cons:** Higher up-front embedding cost, can be slower if index is large (depends on the vector database or nearest-neighbor strategy).
- **Multi-query and Ensemble** 
  - **Pros:** Potentially the best coverage (Ensemble) or highest faithfulness (Multi-query) by combining multiple signals.
  - **Cons:** Usually the most expensive, as you either run multiple retrieval steps or orchestrate multiple models/indices.

### Which Retriever is “Best” for This Data?

Given our synthetic dataset—which emphasizes both correctness and coverage—**Embedding Compression Retrieval** emerges as a strong choice. It strikes a balance with:

- **High factual correctness** (top score of 0.5380)  
- **High answer relevancy** (top score of 0.9725)  
- **Strong entity recall** (top score of 0.6183)

However, if *coverage* (context recall) is a priority—even at higher cost—**Ensemble Retrieval** might outperform others (0.85). Meanwhile, if you need to minimize infrastructure cost/latency, a simpler approach like **BM25** or **Naive** could suffice, though with reduced accuracy on more complex queries.

In short, your choice depends on the **trade-off** between:
- Needing highly *correct, relevant passages* (Embedding Compression or Ensemble),
- Handling *noisy queries* reliably (Document Retriever),
- Minimizing *computational cost* (BM25 / Naive).

---

> **Next Steps:**  
> - Tune each method to see if hyperparameter changes (e.g., chunk size, top-k results) improve specific metrics.  
> - Investigate a two-stage retrieval (e.g., BM25 → Vector-based re-ranker) for a middle ground.  
> - Perform a deeper cost/latency analysis by benchmarking each approach at scale.