# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

> You do not need to run the following cells if you are running this notebook locally. 

In [1]:
import pandas as pd
print("✅ pandas version:", pd.__version__)

import ragas
from ragas.testset     import TestsetGenerator
from ragas.llms        import LangchainLLMWrapper
from ragas.embeddings  import LangchainEmbeddingsWrapper
from langchain_openai  import ChatOpenAI, OpenAIEmbeddings


print("✅ ragas version:", ragas.__version__)


✅ pandas version: 2.2.3
✅ ragas version: 0.2.15


In [2]:
!pip install -qU langchain langchain-openai langchain-cohere rank_bm25 pandas ragas langsmith rapidfuzz qdrant-client 



[0m

We're also going to be leveraging [Qdrant's](https://qdrant.tech/documentation/frameworks/langchain/) (pronounced "Quadrant") VectorDB in "memory" mode (so we can leverage it locally in our colab environment).

In [3]:
#!pip install -qU qdrant-client

We'll also provide our OpenAI key, as well as our Cohere API key.

In [4]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

In [5]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

## Task 2: Data Collection and Preparation

We'll be using some reviews from the 4 movies in the John Wick franchise today to explore the different retrieval strategies.

These were obtained from IMDB, and are available in the [AIM Data Repository](https://github.com/AI-Maker-Space/DataRepository).

### Data Collection

We can simply `wget` these from GitHub.

You could use any review data you wanted in this step - just be careful to make sure your metadata is aligned with your choice.

In [6]:
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw1.csv -O john_wick_1.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw2.csv -O john_wick_2.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw3.csv -O john_wick_3.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw4.csv -O john_wick_4.csv

--2025-05-18 16:32:48--  https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw1.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19628 (19K) [text/plain]
Saving to: ‘john_wick_1.csv’


2025-05-18 16:32:48 (37.5 MB/s) - ‘john_wick_1.csv’ saved [19628/19628]

--2025-05-18 16:32:49--  https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw2.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14747 (14K) [text/plain]
Saving to: ‘john_wick_2.csv’


2025-05-18 16:32:49 (14.8 MB/s) - ‘john_wick_2.csv’

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

- Self-Query: Wants as much metadata as we can provide
- Time-weighted: Wants temporal data

> NOTE: While we're creating a temporal relationship based on when these movies came out for illustrative purposes, it needs to be clear that the "time-weighting" in the Time-weighted Retriever is based on when the document was *accessed* last - not when it was created.

In [7]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

documents = []

for i in range(1, 5):
  loader = CSVLoader(
      file_path=f"john_wick_{i}.csv",
      metadata_columns=["Review_Date", "Review_Title", "Review_Url", "Author", "Rating"]
  )

  movie_docs = loader.load()
  for doc in movie_docs:

    # Add the "Movie Title" (John Wick 1, 2, ...)
    doc.metadata["Movie_Title"] = f"John Wick {i}"

    # convert "Rating" to an `int`, if no rating is provided - assume 0 rating
    doc.metadata["Rating"] = int(doc.metadata["Rating"]) if doc.metadata["Rating"] else 0

    # newer movies have a more recent "last_accessed_at"
    doc.metadata["last_accessed_at"] = datetime.now() - timedelta(days=4-i)

  documents.extend(movie_docs)

Let's look at an example document to see if everything worked as expected!

In [8]:
documents[0]

import pickle
with open("documents.pkl", "wb") as f:
    pickle.dump(documents, f)

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "JohnWick".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [9]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    documents,
    embeddings,
    location=":memory:",
    collection_name="JohnWick"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [10]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [11]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [12]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-mini")


### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [13]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [14]:
naive_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"Yes, people generally liked the first John Wick movie. The reviews highlight Keanu Reeves' cool and confident performance, the stylish and well-choreographed action sequences, and the film's brisk pace and entertaining plot. Many reviewers praised it as one of the best action movies in recent years, recommending it especially to action fans. Although there are some differing opinions, with a few reviewers finding it generic or giving moderate ratings, the overall sentiment is positive and the film is considered a standout in the genre."

In [15]:
naive_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10. Here is the URL to that review:\n\n- /review/rw4854296/?ref_=tt_urv\n\nThis review is titled "A Masterpiece & Brilliant Sequel" for John Wick 3.'

In [16]:
naive_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In the movie *John Wick* (2014), John Wick, played by Keanu Reeves, is a retired hitman who is grieving the recent death of his wife. His wife had left him a puppy to help him cope with her death. However, his quiet life is shattered when a group of gangsters led by a young Russian-American thug break into his home, kill his dog, and steal his cherished car. This brutal act pulls John out of retirement and sets him on a violent path of revenge against the criminals responsible. \n\nJohn Wick unleashes a relentless and highly skilled assault on the gangsters, demonstrating his lethal assassin skills. The thugs\' actions awaken the legend of John Wick — a feared and highly skilled hitman known as "The Boogeyman." As a result, John finds himself targeted by many killers and bounty hunters who want to take him down, leading to intense and expertly choreographed action sequences filled with shootouts and hand-to-hand combat.\n\nThe film is known for its stylish, kinetic action, suspense, a

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [17]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(documents)

We'll construct the same chain - only changing the retriever.

In [18]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [19]:
bm25_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Overall, opinions about the John Wick films appear mixed based on the provided reviews:\n\n- The first John Wick movie was generally very well-liked. Reviews describe it as "something special," with smooth, stylish action sequences and an engaging world, receiving high ratings like 8 and 10 out of 10. It is praised as a must-see for action fans.\n\n- The fourth film, John Wick: Chapter 4, received a more critical review, calling it the weakest in the series with little plot and mostly gunfights, although it still received a rating of 4 out of 10 in the given example.\n\n- The third film, John Wick 3, received very negative feedback from the one provided review, describing it as mindless, overly violent, and plotless, with a rating of 1 out of 10.\n\nIn summary, people generally liked the original John Wick film quite a lot, but later installments have received more mixed to negative reactions from some reviewers.'

In [20]:
bm25_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'No, none of the reviews provided have a rating of 10. Therefore, there are no URLs to reviews with a rating of 10.'

In [21]:
bm25_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'The movie *John Wick* (the first film) centers around an emotionally driven action story featuring Keanu Reeves as John Wick. It is about a retired hitman who embarks on a relentless and beautifully choreographed quest for vengeance after tragic events disrupt his peaceful life. The film is praised for its intense hand-to-hand combat, gunfights, and surprisingly emotional depth for an action movie. Overall, it is highly regarded as an exciting and well-executed action thriller that keeps viewers engaged throughout.'

It's not clear that this is better or worse - but the `I don't know` isn't great!

## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [22]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-english-v3.0")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [23]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [24]:
contextual_compression_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"Yes, people generally liked the first John Wick film. Reviews describe it as the best action film of the year and one of the best in the past decade, praising Keanu Reeves' performance, the slick and intense action sequences, and the unique criminal underworld setting. It is highly recommended for action fans and considered fun, stylish, and well-executed. However, the third film in the series received a more mixed review, suggesting the magic may have diminished over time."

In [25]:
contextual_compression_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10. Here is the URL to that review:\n\n- /review/rw4854296/?ref_=tt_urv'

In [26]:
contextual_compression_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In the first John Wick movie, John Wick, a retired hitman, seeks revenge after a group of gangsters steal his car and kill his beloved dog, which was a final gift from his deceased wife. This brutal act drags him out of retirement and into a violent rampage against those responsible, showcasing intense action, fights, and shootouts.\n\nIn John Wick 2, after resolving issues with the Russian mafia, John returns home but is soon visited by the mobster Santino D'Antonio, who forces him to honor a blood oath (marker) by helping him. John initially refuses, leading Santino to blow up his house. John is compelled to kill Santino's sister, Gianna D'Antonio, in Rome so that Santino can take a seat on the High Table of the criminal organizations. After completing the task, Santino betrays John by putting a $7 million contract on his head, forcing John to fight against numerous professional killers while vowing to kill Santino, who is no longer protected by his marker.\n\nOverall, John Wick is 

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [27]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=bm25_retriever, llm=chat_model
)

In [28]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [29]:
multi_query_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"Yes, people generally like John Wick. The franchise has been praised for its expertly choreographed and well-executed action sequences, with many reviewers highlighting Keanu Reeves' performance and the consistent quality of the films. For example, John Wick 4 received high ratings and was described as possibly one of the best action movies ever, and the earlier films were also well-received and considered standards for action movies in Hollywood. However, there are some mixed opinions, especially about John Wick 4, where a few reviews mention a lack of storyline or criticize it for being over the top. But overall, the general reception is positive."

In [30]:
multi_query_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10. Here is the URL to that review:\n\n/review/rw4854296/?ref_=tt_urv'

In [31]:
multi_query_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'The movie "John Wick" centers around a former hitman named John Wick, played by Keanu Reeves, who seeks revenge after a brutal home invasion in which thugs kill his beloved dog—one of the last gifts from his deceased wife. This event pulls him back into a violent criminal underworld. The film is known for its beautifully choreographed action sequences, combining hand-to-hand combat and gunfights with a stylish presentation. It features intense and brutal fights, and John Wick is portrayed as a ruthless character navigating a world filled with equally dangerous individuals. The story is relatively simple, focusing on revenge and the consequences of violence, but it is executed with great intensity and emotional weight.'

## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [32]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter

from qdrant_client import QdrantClient, models

parent_docs = documents
child_splitter = RecursiveCharacterTextSplitter(chunk_size=200)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [33]:
client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = Qdrant(
    collection_name="full_documents", embeddings=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

  parent_document_vectorstore = Qdrant(


Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [34]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [35]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [36]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [37]:
parent_document_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the provided context, people generally liked the John Wick series. One review praised the first John Wick movie for its well-choreographed action and emotional setup, recommending it highly. Another review expressed that the series has been remarkably consistent and well received, even stating that "John Wick: Chapter 4" might be the best the series has to offer. However, there are also some negative opinions, such as one reviewer finding "John Wick 4" horrible and criticizing its plot and fight scenes.\n\nIn summary, while there are mixed opinions, the overall reception of the John Wick series appears to be generally positive.'

In [38]:
parent_document_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is at least one review with a rating of 10. \n\nHere is the URL to that review:  \n/review/rw4854296/?ref_=tt_urv\n\nThis review is for "John Wick 3" and the rating given is 10.'

In [39]:
parent_document_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In the first John Wick movie, John Wick is a retired assassin who re-enters the dangerous world after gangsters kill his dog, a final gift from his deceased wife, and steal his car. Seeking revenge, he unleashes a violent and relentless vendetta against those responsible, facing numerous hitmen who come after him because of the high bounty on his head.\n\nThe story centers on John's deep thirst for retribution and showcases intense action, including shootouts and fights, as he battles his enemies.\n\nIn summary, John Wick is about a legendary ex-hitman who is pulled back into a deadly and action-packed fight for vengeance after losing his dog and car to criminals."

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [40]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [41]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [42]:
ensemble_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Yes, people generally liked the original John Wick film. The reviews for John Wick 1 are overwhelmingly positive, praising its stylish and well-choreographed action sequences, Keanu Reeves\' performance, and the unique criminal underworld it presents. Reviewers describe it as a fun, violent, and kinetic action movie with a straightforward but effective plot centered on revenge. For example, one reviewer gave it a perfect 10 rating calling it a "must see for action fans" and highlighting its smooth action and cool noir world. Another gave it a 9, calling it "the coolest action film you\'ll see all year" and praising its brisk pacing and intense action scenes. Even generally enthusiastic reviews emphasize its successful simplicity and the emotionally engaging setup involving John Wick\'s dog.\n\nWhile there are some more neutral opinions—such as one reviewer rating it 6 out of 10 who felt it was a "generic action thriller"—the overall tone from multiple highly rated reviews shows that J

In [43]:
ensemble_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is at least one review with a rating of 10. Here is the URL to that review:\n\n- Review Title: "A Masterpiece & Brilliant Sequel"  \n  Rating: 10  \n  URL: /review/rw4854296/?ref_=tt_urv\n\nIf you need more information or additional reviews, please let me know!'

In [44]:
ensemble_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In the original "John Wick" movie (2014), the story centers around John Wick, a retired hitman who is mourning the death of his wife. After her death, John tries to rebuild his life. However, his peaceful life is shattered when a group of gangsters, led by a young Russian-American punk, break into his house, beat him, kill his beloved dog (a final gift from his late wife), and steal his classic car. This attack triggers John to come out of retirement and seek revenge.\n\nJohn Wick unleashes a relentless and violently brutal vendetta against the Russian mobsters responsible, showcasing his lethal skills and deadly efficiency. The gangsters soon realize they have awakened a legendary assassin known as "The Boogeyman," and a large bounty is placed on John\'s head, causing hitmen from all over to pursue him.\n\nThroughout the film, John fights through numerous assassins and criminals in highly choreographed action sequences, blending gunfights, martial arts, and suspenseful shootouts. The

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

> NOTE: You do not need to run this cell if you're running this locally

In [45]:
!pip install -qU langchain_experimental

[0m

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [46]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [47]:
semantic_documents = semantic_chunker.split_documents(documents)

Let's create a new vector store.

In [48]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="JohnWickSemantic"
)

We'll use naive retrieval for this example.

In [49]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [50]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [51]:
semantic_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the reviews provided, people generally liked John Wick, especially the first film. Reviews highlight it as a special and stylish action movie with smooth, well-choreographed action sequences, an interesting world, and a compelling performance by Keanu Reeves. It is described as slick, violent fun, highly entertaining, and refreshing in the action genre. The franchise overall has remained consistent and well received, with strong ratings for multiple installments.\n\nThere are some mixed opinions on later films, such as John Wick 3, where one reviewer felt "the magic is gone" and gave it a lower rating, but others still praised the action and intensity.\n\nIn summary, John Wick is generally liked, particularly for its action and style, and the series is regarded as maintaining high quality across its sequels.'

In [52]:
semantic_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10. The URL to that review is:\n\n/review/rw4854296/?ref_=tt_urv'

In [53]:
semantic_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In "John Wick," Keanu Reeves plays John Wick, a retired hitman who is forced back into a violent world to seek revenge. The story begins when a group of gangsters, led by the son of a Russian mob boss whom John used to work for, break into his house, beat him up, kill his puppy (a final gift from his deceased wife), and steal his beloved car. This act of cruelty unknowingly awakens John Wick’s lethal skills and legacy. Driven by grief and rage, John embarks on a relentless and ultra-violent vendetta against the gangsters. Throughout the film, he faces numerous assassins sent after him by the Russian mob boss who wants to protect his son. The movie is known for its stylish, well-choreographed action sequences, kinetic chaos, and a simple but emotionally compelling revenge-driven plot.'

# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.

In [54]:
# ╔══════════════════════════════════════════════════════════════════════╗
# ║  QUICK-FIX BOOTSTRAP  ▸ installs deps, resurrects `dataset` & docs   ║
# ╚══════════════════════════════════════════════════════════════════════╝
# 1️⃣  Install any missing wheels *inside* the running kernel
%pip install --quiet --upgrade rapidfuzz ragas langsmith pandas

# 2️⃣  Re-import everything we need
import os, pickle, json, sys, pathlib
import pandas as pd
from langsmith import Client                         # pip name: langsmith :contentReference[oaicite:0]{index=0}
from ragas.testset import TestsetGenerator           # pip name: ragas :contentReference[oaicite:1]{index=1}
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.docstore.document import Document

# 3️⃣  Restore previously-saved objects if they exist ────────────────────
if pathlib.Path("dataset.pkl").exists():
    with open("dataset.pkl", "rb") as f:
        dataset = pickle.load(f)
    print(f"✅  Reloaded test-set from disk with {len(dataset)} Q-A pairs")
else:
    # 3a.  Re-hydrate the source *documents* quickly (only light I/O)
    if pathlib.Path("documents.pkl").exists():
        with open("documents.pkl", "rb") as f:
            documents = pickle.load(f)
        print(f"✅  Reloaded {len(documents)} documents from disk")
    else:
        # Fallback – load the four John-Wick CSVs (much faster than rerunning full pipeline)
        docs = []
        for csv in ("john_wick_1.csv","john_wick_2.csv","john_wick_3.csv","john_wick_4.csv"):
            df = pd.read_csv(csv)
            docs.extend([Document(page_content=row["content"]) for _, row in df.iterrows()])
        documents = docs
        with open("documents.pkl", "wb") as f:
            pickle.dump(documents, f)
        print(f"✅  Parsed & cached {len(documents)} docs from CSVs")

    # 3b.  QUICK synthetic test-set (cheap: only 10 Q-A pairs)
    llm        = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1-nano"))
    embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())
    generator  = TestsetGenerator(llm=llm, embedding_model=embeddings)
    dataset    = generator.generate_with_langchain_docs(documents, testset_size=10)
    with open("dataset.pkl", "wb") as f:
        pickle.dump(dataset, f)
    print(f"✅  Generated and cached new test-set with {len(dataset)} Q-A pairs")

# 4️⃣  Make *dataset* visible to later cells
print("🏁  Environment ready – go ahead and run the Activity-2 evaluation cell.")


/Users/aqsa/AI Projects/AIE6 Class Projects/AIE6/13_Advanced_Retrieval/.venv/bin/python: No module named pip
Note: you may need to restart the kernel to use updated packages.
✅  Reloaded test-set from disk with 12 Q-A pairs
🏁  Environment ready – go ahead and run the Activity-2 evaluation cell.


In [55]:
# ── LangSmith token prompt (works on any version of the SDK) ───────────
import os, getpass
from langsmith import Client      # `Client` exists in every release

def ensure_langsmith_token():
    if os.getenv("LANGSMITH_API_KEY") or os.getenv("LANGCHAIN_API_KEY"):
        print("🔑  Token already present in env – good to go!")
        return

    token = getpass.getpass("Paste your LangSmith API key: ").strip()
    if not token:
        raise RuntimeError("No token entered – aborting evaluation.")

    # Set *both* names so whichever the SDK expects will resolve
    os.environ["LANGSMITH_API_KEY"] = token
    os.environ["LANGCHAIN_API_KEY"] = token

    # quick sanity-check
    try:
        Client().list_datasets(limit=1)      # cheap GET /datasets
        print("✅  Token works – ready to create datasets & runs")
    except Exception as e:
        raise RuntimeError(f"Token validation failed: {e}")

ensure_langsmith_token()


✅  Token works – ready to create datasets & runs


In [61]:
# -------------------- ACTIVITY #2 · Simplified Evaluation --------------------
#ACTIVITY #2 : LangSmith Evaluation  -----------------------
# Prereqs (make sure you ran all cells above):
#   • OPENAI_API_KEY  – already set earlier
#   • LANGCHAIN_API_KEY and LANGCHAIN_ENDPOINT (or LANGSMITH_API_KEY) – for LangSmith
#   • variable `dataset`  -> a ragas.Testset object created in Activity #1
#   • retrieval chains: naive_retrieval_chain, bm25_retrieval_chain, 
#                       contextual_compression_retrieval_chain, multi_query_retrieval_chain,
#                       parent_document_retrieval_chain, ensemble_retrieval_chain,
#                       semantic_retrieval_chain##
import os, time, uuid
from datetime import datetime, timezone
import pandas as pd, numpy as np

from langsmith       import Client
from langchain.smith import run_on_dataset, RunEvalConfig
from langchain_openai import ChatOpenAI

# 1. Build a unique dataset name and upload your examples  ---------------
client       = Client()  # uses LANGSMITH_API_KEY
ts           = datetime.now(timezone.utc).strftime("%Y%m%dT%H%M%SZ")
DATASET_NAME = f"jw_synth_qa_{ts}_{uuid.uuid4().hex[:5]}"

# Normalize the ragas test-set into 'question' and 'answer'
try:
    df = dataset.to_pandas()
except AttributeError:
    df = pd.DataFrame(dataset)

RENAMES = {
    "query":"question","user_input":"question","inputs":"question",
    "ground_truth":"answer","reference":"answer","outputs":"answer"
}
df = df.rename(columns={old:new for old,new in RENAMES.items() if old in df})

# Explode dict-columns if necessary
if "question" not in df and "inputs" in df:
    df["question"] = df["inputs"].apply(lambda d: d.get("query") or d.get("user_input"))
if "answer"   not in df and "outputs" in df:
    df["answer"]   = df["outputs"].apply(lambda d: d.get("answer") or d.get("reference"))

if not {"question","answer"} <= set(df):
    raise ValueError(f"Missing columns: {set(['question','answer'])-set(df)}")

# Create & populate the golden dataset
golden = client.create_dataset(DATASET_NAME, description="Synthetic QA from RAGAS")
for _, row in df.iterrows():
    client.create_example(
        inputs     = {"question": row["question"]},
        outputs    = {"answer":   row["answer"]},
        dataset_id = golden.id,
    )
print(f"✅ Created dataset {DATASET_NAME} with {golden.example_count} examples")

# 2. Prepare evaluators & LLM ---------------------------------------------
eval_llm = ChatOpenAI(model="gpt-4.1-nano", temperature=0)

eval_cfg = RunEvalConfig(
    eval_llm=eval_llm,
    evaluators=[
        RunEvalConfig.QA(),    # built-in correctness
        RunEvalConfig.LabeledCriteria({
            "rotten_tomatoes": (
                "Rate from 1–5 how satisfyingly this answer would "
                "land on a Rotten-Tomatoes–style review for helpfulness."
            )
        }),
    ],
)

# 3. Launch one run per retriever under its own unique project -------------
chains = {
    "naive"                 : naive_retrieval_chain,
    "bm25"                  : bm25_retrieval_chain,
    "contextual_compression": contextual_compression_retrieval_chain,
    "multi_query"           : multi_query_retrieval_chain,
    "parent_document"       : parent_document_retrieval_chain,
    "ensemble"              : ensemble_retrieval_chain,
    "semantic_chunking"     : semantic_retrieval_chain,
}

run_handles = {}
project_map = {}

for name, chain in chains.items():
    proj = f"eval_{name}_{ts}_{uuid.uuid4().hex[:5]}"
    project_map[name] = proj
    print(f"🚀 Launching {name} under project '{proj}' …")
    run_handles[name] = run_on_dataset(
        client               = client,
        dataset_name         = DATASET_NAME,
        llm_or_chain_factory = lambda c=chain: c,
        evaluation           = eval_cfg,
        project_name         = proj,
        tags                 = ["activity-2", name],
    )

print("🎉  All jobs submitted; check the LangSmith UI for each project.")

# 4. Optional: Summary of latency & cost ----------------------------------
# ── 4b. Latency · Tokens · Cost summary using the exact project_map ──────
PRICE_PER_1K = 0.03  # USD per 1K tokens
time.sleep(5)  # give runs a moment to register

def stats_for_project(proj_name: str):
    try:
        runs = list(client.list_runs(
            project_name=proj_name,
            execution_order=1,
            filter_state="completed"
        ))
    except Exception:
        return {"count": 0, "P50 s": None, "P95 s": None, "tokens": 0, "cost$": 0}
    # Compute latencies from the run objects
    latencies = []
    for r in runs:
        if r.start_time and r.end_time:
            # r.start_time and r.end_time are datetimes
            delta = r.end_time - r.start_time
            latencies.append(delta.total_seconds())
    lat = np.array(latencies, float)

    # Tokens is still r.total_tokens
    tok = np.array([r.total_tokens or 0 for r in runs], int)
    return {
        "count"   : len(runs),
        "P50 s"   : np.percentile(lat, 50).round(3) if len(runs) else None,
        "P95 s"   : np.percentile(lat, 95).round(3) if len(runs) else None,
        "tokens"  : tok.sum(),
        "cost$"   : (tok.sum()/1000 * PRICE_PER_1K).round(4),
    }

# Build summary by iterating over the exact project_map
summary = pd.DataFrame({
    name: stats_for_project(proj_name)
    for name, proj_name in project_map.items()
}).T

print("\n=== Latency · Tokens · Cost Summary ===")
display(summary)


✅ Created dataset jw_synth_qa_20250518T235914Z_bfb89 with 0 examples
🚀 Launching naive under project 'eval_naive_20250518T235914Z_73fdd' …


  exec(code_obj, self.user_global_ns, self.user_ns)


View the evaluation results for project 'eval_naive_20250518T235914Z_73fdd' at:
https://smith.langchain.com/o/4e223e9d-b789-4c00-8d16-32ad70974f10/datasets/793567b0-434d-48a2-8bf8-4e90d87b34b8/compare?selectedSessions=8e4b9196-b276-4fe4-bc7e-1e390699edb8

View all tests for Dataset jw_synth_qa_20250518T235914Z_bfb89 at:
https://smith.langchain.com/o/4e223e9d-b789-4c00-8d16-32ad70974f10/datasets/793567b0-434d-48a2-8bf8-4e90d87b34b8
[------------------------------------------------->] 12/12🚀 Launching bm25 under project 'eval_bm25_20250518T235914Z_159f8' …


  exec(code_obj, self.user_global_ns, self.user_ns)


View the evaluation results for project 'eval_bm25_20250518T235914Z_159f8' at:
https://smith.langchain.com/o/4e223e9d-b789-4c00-8d16-32ad70974f10/datasets/793567b0-434d-48a2-8bf8-4e90d87b34b8/compare?selectedSessions=6385b4e3-1d86-4234-931a-95ad5f345c1a

View all tests for Dataset jw_synth_qa_20250518T235914Z_bfb89 at:
https://smith.langchain.com/o/4e223e9d-b789-4c00-8d16-32ad70974f10/datasets/793567b0-434d-48a2-8bf8-4e90d87b34b8
[------------------------------------------------->] 12/12🚀 Launching contextual_compression under project 'eval_contextual_compression_20250518T235914Z_194e7' …


  exec(code_obj, self.user_global_ns, self.user_ns)


View the evaluation results for project 'eval_contextual_compression_20250518T235914Z_194e7' at:
https://smith.langchain.com/o/4e223e9d-b789-4c00-8d16-32ad70974f10/datasets/793567b0-434d-48a2-8bf8-4e90d87b34b8/compare?selectedSessions=fe40d825-6777-4e4a-a093-ee8e5f47d533

View all tests for Dataset jw_synth_qa_20250518T235914Z_bfb89 at:
https://smith.langchain.com/o/4e223e9d-b789-4c00-8d16-32ad70974f10/datasets/793567b0-434d-48a2-8bf8-4e90d87b34b8
[------------------------------------------------->] 12/12🚀 Launching multi_query under project 'eval_multi_query_20250518T235914Z_2df2b' …


  exec(code_obj, self.user_global_ns, self.user_ns)


View the evaluation results for project 'eval_multi_query_20250518T235914Z_2df2b' at:
https://smith.langchain.com/o/4e223e9d-b789-4c00-8d16-32ad70974f10/datasets/793567b0-434d-48a2-8bf8-4e90d87b34b8/compare?selectedSessions=18ef6eeb-9b9a-4b15-9650-ac2b8a1151d5

View all tests for Dataset jw_synth_qa_20250518T235914Z_bfb89 at:
https://smith.langchain.com/o/4e223e9d-b789-4c00-8d16-32ad70974f10/datasets/793567b0-434d-48a2-8bf8-4e90d87b34b8
[------------------------------------------------->] 12/12🚀 Launching parent_document under project 'eval_parent_document_20250518T235914Z_ff2e0' …


  exec(code_obj, self.user_global_ns, self.user_ns)


View the evaluation results for project 'eval_parent_document_20250518T235914Z_ff2e0' at:
https://smith.langchain.com/o/4e223e9d-b789-4c00-8d16-32ad70974f10/datasets/793567b0-434d-48a2-8bf8-4e90d87b34b8/compare?selectedSessions=f6db63cf-7b51-4f5d-9b2f-fd446a0dbdf3

View all tests for Dataset jw_synth_qa_20250518T235914Z_bfb89 at:
https://smith.langchain.com/o/4e223e9d-b789-4c00-8d16-32ad70974f10/datasets/793567b0-434d-48a2-8bf8-4e90d87b34b8
[------------------------------------------------->] 12/12🚀 Launching ensemble under project 'eval_ensemble_20250518T235914Z_504d2' …


  exec(code_obj, self.user_global_ns, self.user_ns)


View the evaluation results for project 'eval_ensemble_20250518T235914Z_504d2' at:
https://smith.langchain.com/o/4e223e9d-b789-4c00-8d16-32ad70974f10/datasets/793567b0-434d-48a2-8bf8-4e90d87b34b8/compare?selectedSessions=38ab2278-d46c-4754-8219-40fabf18905d

View all tests for Dataset jw_synth_qa_20250518T235914Z_bfb89 at:
https://smith.langchain.com/o/4e223e9d-b789-4c00-8d16-32ad70974f10/datasets/793567b0-434d-48a2-8bf8-4e90d87b34b8
[------------------------------------------------->] 12/12🚀 Launching semantic_chunking under project 'eval_semantic_chunking_20250518T235914Z_529c0' …


  exec(code_obj, self.user_global_ns, self.user_ns)


View the evaluation results for project 'eval_semantic_chunking_20250518T235914Z_529c0' at:
https://smith.langchain.com/o/4e223e9d-b789-4c00-8d16-32ad70974f10/datasets/793567b0-434d-48a2-8bf8-4e90d87b34b8/compare?selectedSessions=fadd788d-f9fe-40f5-9314-16fc7ef6ebbb

View all tests for Dataset jw_synth_qa_20250518T235914Z_bfb89 at:
https://smith.langchain.com/o/4e223e9d-b789-4c00-8d16-32ad70974f10/datasets/793567b0-434d-48a2-8bf8-4e90d87b34b8
[------------------------------------------------->] 12/12🎉  All jobs submitted; check the LangSmith UI for each project.

=== Latency · Tokens · Cost Summary ===


Unnamed: 0,count,P50 s,P95 s,tokens,cost$
naive,12.0,4.81,9.486,46839.0,1.4052
bm25,12.0,4.124,5.58,19040.0,0.5712
contextual_compression,12.0,4.752,9.34,17397.0,0.5219
multi_query,12.0,7.45,10.2,39076.0,1.1723
parent_document,12.0,4.173,7.301,13348.0,0.4004
ensemble,12.0,9.85,12.303,74557.0,2.2367
semantic_chunking,12.0,5.044,10.127,37835.0,1.135


Summary Paragraph:
1) The Ensemble modeling was most expensive, while Parent_document is cheapest. This is proportional to the amount of tokens utilized.
2) In terms of latency, P95 is best for BM25 while worst for Ensemble; while P50 also follows the same patterns. This suggest BM25 can be fastest, while keeping costs reasonable

Worth noticing, Ensemble is most expensive and has highest latency

In terms of performance, our metrics for Correctness and Rotten Tomatoes can be seen in LangSmith UI