# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

> You do not need to run the following cells if you are running this notebook locally. 

In [None]:
#!pip install -qU langchain langchain-openai langchain-cohere rank_bm25

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/49.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.6/49.6 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/44.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.0/44.0 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/233.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m233.1/233.1 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m378.1/378.1 kB[0m [31m15.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m37.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m 

We're also going to be leveraging [Qdrant's](https://qdrant.tech/documentation/frameworks/langchain/) (pronounced "Quadrant") VectorDB in "memory" mode (so we can leverage it locally in our colab environment).

In [None]:
#!pip install -qU qdrant-client

We'll also provide our OpenAI key, as well as our Cohere API key.

In [10]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

In [4]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

## Task 2: Data Collection and Preparation

We'll be using some reviews from the 4 movies in the John Wick franchise today to explore the different retrieval strategies.

These were obtained from IMDB, and are available in the [AIM Data Repository](https://github.com/AI-Maker-Space/DataRepository).

### Data Collection

We can simply `wget` these from GitHub.

You could use any review data you wanted in this step - just be careful to make sure your metadata is aligned with your choice.

In [5]:
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw1.csv -O john_wick_1.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw2.csv -O john_wick_2.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw3.csv -O john_wick_3.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw4.csv -O john_wick_4.csv

--2025-05-21 16:21:58--  https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw1.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8002::154, 2606:50c0:8000::154, 2606:50c0:8001::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8002::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19628 (19K) [text/plain]
Saving to: ‘john_wick_1.csv’


2025-05-21 16:21:58 (12.8 MB/s) - ‘john_wick_1.csv’ saved [19628/19628]

--2025-05-21 16:21:58--  https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw2.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 2606:50c0:8002::154, 2606:50c0:8000::154, 2606:50c0:8001::154, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|2606:50c0:8002::154|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14747 (14K) [text/plain]
Saving to: ‘john_wick_2.csv’


2025-05-21 16:21:59

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

- Self-Query: Wants as much metadata as we can provide
- Time-weighted: Wants temporal data

> NOTE: While we're creating a temporal relationship based on when these movies came out for illustrative purposes, it needs to be clear that the "time-weighting" in the Time-weighted Retriever is based on when the document was *accessed* last - not when it was created.

In [6]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

documents = []

for i in range(1, 5):
  loader = CSVLoader(
      file_path=f"john_wick_{i}.csv",
      metadata_columns=["Review_Date", "Review_Title", "Review_Url", "Author", "Rating"]
  )

  movie_docs = loader.load()
  for doc in movie_docs:

    # Add the "Movie Title" (John Wick 1, 2, ...)
    doc.metadata["Movie_Title"] = f"John Wick {i}"

    # convert "Rating" to an `int`, if no rating is provided - assume 0 rating
    doc.metadata["Rating"] = int(doc.metadata["Rating"]) if doc.metadata["Rating"] else 0

    # newer movies have a more recent "last_accessed_at"
    doc.metadata["last_accessed_at"] = datetime.now() - timedelta(days=4-i)

  documents.extend(movie_docs)

Let's look at an example document to see if everything worked as expected!

In [7]:
documents[0]

Document(metadata={'source': 'john_wick_1.csv', 'row': 0, 'Review_Date': '6 May 2015', 'Review_Title': ' Kinetic, concise, and stylish; John Wick kicks ass.\n', 'Review_Url': '/review/rw3233896/?ref_=tt_urv', 'Author': 'lnvicta', 'Rating': 8, 'Movie_Title': 'John Wick 1', 'last_accessed_at': datetime.datetime(2025, 5, 18, 16, 22, 20, 423106)}, page_content=": 0\nReview: The best way I can describe John Wick is to picture Taken but instead of Liam Neeson it's Keanu Reeves and instead of his daughter it's his dog. That's essentially the plot of the movie. John Wick (Reeves) is out to seek revenge on the people who took something he loved from him. It's a beautifully simple premise for an action movie - when action movies get convoluted, they get bad i.e. A Good Day to Die Hard. John Wick gives the viewers what they want: Awesome action, stylish stunts, kinetic chaos, and a relatable hero to tie it all together. John Wick succeeds in its simplicity.")

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "JohnWick".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [11]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    documents,
    embeddings,
    location=":memory:",
    collection_name="JohnWick"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [12]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [14]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [15]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-nano")


### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [16]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [17]:
naive_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"Based on the reviews provided, people generally liked John Wick. Many reviews are quite positive, highlighting its stylish action sequences, engaging storytelling, and Keanu Reeves' performance. Several reviews give high ratings, such as 9 or 10 out of 10, and describe the film as fun, exciting, and a standout in the action genre. However, there are a few mixed or less enthusiastic opinions, with some ratings around the middle or slightly below average, but overall, the general consensus appears to be favorable."

In [18]:
naive_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there are reviews with a rating of 10. The URLs for those reviews are:\n\n1. /review/rw4854296/?ref_=tt_urv\n2. /review/rw8944843/?ref_=tt_urv'

In [19]:
naive_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In the John Wick film series, the story centers around John Wick, a retired hitman who is drawn back into the violent underworld of assassins after the death of his wife and the theft of his car and the killing of his dog—his last connection to his wife. The first movie depicts his relentless quest for vengeance against gangsters who wronged him, unleashing a series of spectacular and brutal action sequences. As the series progresses, the story expands to explore the complex rules of the assassin world, the consequences of Wick's past actions, and his battles against various criminal factions, all depicted through stylish and meticulously choreographed combat scenes."

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [20]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(documents)

We'll construct the same chain - only changing the retriever.

In [21]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [22]:
bm25_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the reviews provided, opinions about people generally liking John Wick are mixed. Some reviews highly praise the first film for its stylish action and engaging world-building, indicating that many fans enjoyed it. However, later reviews, such as those for the third and fourth installments, are quite negative, criticizing the movies for being shallow, overly violent, and lacking plot. \n\nOverall, it seems that while many people liked John Wick, especially the first movie, there are also significant opinions that do not favor the franchise, especially as the series progresses. Therefore, people’s overall liking of John Wick varies, and it cannot be said that they generally liked it across the board.'

In [23]:
bm25_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Based on the provided reviews, none of the reviews have a rating of 10. The highest ratings mentioned are 9.'

In [24]:
bm25_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In the John Wick film series, the story revolves around John Wick, a skilled assassin who seeks vengeance after a personal loss. The movies depict his action-packed journey through a dangerous underworld filled with assassins, gangsters, and criminal organizations. Each installment features intense, choreographed action sequences, a complex network of assassin codes and politics, and explores Wick's struggle to escape his violent past. The series is known for its stylish combat scenes, world-building, and Keanu Reeves' portrayal of the titular character."

It's not clear that this is better or worse - but the `I don't know` isn't great!

## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [25]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-english-v3.0")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [26]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [27]:
contextual_compression_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the reviews provided, people generally liked John Wick. The film received high ratings and positive comments, praising its style, action sequences, and Keanu Reeves\' performance. For example, one reviewer gave it a 9 out of 10 and called it "the coolest action film you\'ll see all year," while another gave it a perfect 10 and described it as "smoothest action film to come around in a long time." Overall, the reception appears to be very favorable.'

In [28]:
contextual_compression_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there are reviews with a rating of 10. Here are the URLs to those reviews:\n\n1. Review Title: "A Masterpiece & Brilliant Sequel"  \n   URL: /review/rw4854296/?ref_=tt_urv\n\n2. Review Title: "Most American action flicks released these days have poor screenplays and overuse computer-generated imagery."  \n   URL: /review/rw4860412/?ref_=tt_urv'

In [29]:
contextual_compression_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In the John Wick films, John Wick is a retired hitman who is drawn back into a violent underworld after a series of personal tragedies and conflicts. In the first film, he seeks revenge after gangsters steal his car and kill his dog, which was a gift from his deceased wife. As he pursues vengeance, Wick unleashes a relentless wave of destruction against those who have wronged him, becoming the target of numerous professional killers. \n\nIn the second film, after addressing issues with the Russian mafia, Wick is approached by Santino D'Antonio, who demands his help with a task involving a secret marker. Wick refuses, leading to his house being blown up. Subsequently, Wick is forced to carry out an assignment in Rome to help Santino, which results in a contract being put on him. The plot thickens as Wick fights to survive against many enemies, leading him to vow to kill Santino in revenge.\n\nOverall, the series portrays John Wick as a highly skilled, relentless assassin caught in a vi

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [30]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [31]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [32]:
multi_query_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the reviews, people generally liked John Wick. The series has received many positive ratings and praise for its action sequences, style, and entertainment value. The first film is often highlighted as a standout, and the franchise as a whole is considered a successful and well-received action series by many viewers.'

In [33]:
multi_query_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there are reviews with a rating of 10. Here are the URLs to those reviews:\n- /review/rw4854296/?ref_=tt_urv\n- /review/rw4855764/?ref_=tt_urv'

In [34]:
multi_query_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In the John Wick series, the main character, Keanu Reeves, plays John Wick, a retired assassin who comes out of retirement to seek revenge after personal tragedies. The first film shows how Wick, after losing his wife and having his dog killed by thugs who steal his car, is pushed back into the violent world of assassins. Over subsequent films, Wick is drawn into complex criminal underworld conflicts, taking on multiple enemies, honoring and breaking secret codes, and performing highly choreographed action sequences. The series is known for its stylish violence, world-building, and Keanu Reeves' portrayal of a stoic and lethal hitman."

## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [35]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = documents
child_splitter = RecursiveCharacterTextSplitter(chunk_size=200)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [36]:
client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = Qdrant(
    collection_name="full_documents", embeddings=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

  parent_document_vectorstore = Qdrant(


Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [37]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [38]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [39]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [40]:
parent_document_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the provided reviews, opinions about John Wick are mixed. Some reviewers, like MrHeraclius, highly recommend it and praise its action and emotional depth. Conversely, a review from solidabs gives a very negative opinion of John Wick 4, describing it as "horrible" and criticizing its plot and fight scenes. Overall, it seems that people’s opinions vary, with some liking the series and others not enjoying specific installments.'

In [41]:
parent_document_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10. The URL to that review is: /review/rw4854296/?ref_=tt_urv'

In [42]:
parent_document_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In John Wick, a retired assassin named John Wick, played by Keanu Reeves, comes out of retirement following personal tragedy. The first film begins with the death of his wife and the theft of his car, which prompts him to seek vengeance, ultimately unleashing a violent and relentless series of actions against those who have wronged him. Throughout the series, Wick is depicted as a highly skilled hitman who is drawn back into the assassin world to settle various conflicts, involving many action-packed sequences, shootouts, and fights. The movies depict his journey through a dangerous underworld filled with complex characters and high-stakes confrontations.'

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [43]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [44]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [45]:
ensemble_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"Based on the reviews provided, people generally liked John Wick. Many reviews are highly positive, praising its stylish action, choreography, and Keanu Reeves' performance. Some reviews give high ratings like 8 or 9 out of 10, and numerous comments describe the film as fun, exciting, and a standout in the action genre. However, there are some negative opinions as well, especially regarding later sequels, with ratings as low as 1 or 4 out of 10, but overall, the initial reception appears to be quite favorable."

In [46]:
ensemble_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there are reviews with a rating of 10. The URLs to those reviews are:\n\n1. [Review for John Wick 3](https://example.com/review/rw4854296/?ref_=tt_urv)\n2. [Review for John Wick 4](https://example.com/review/rw8944843/?ref_=tt_urv)'

In [47]:
ensemble_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In the John Wick film series, the story centers around John Wick, a retired hitman who comes out of retirement to seek vengeance for the murder of his dog and the theft of his car, which are personal and significant losses. The series explores his journey through a secretive world of assassins, criminal underworlds, and strict rules. As he seeks justice, he faces numerous enemies, bounty hunters, and complex criminal organizations, leading to intense action, elaborate fight sequences, and a story of revenge and consequences. Over the series, his actions have far-reaching effects, and he becomes a legendary figure feared by many in the criminal world.'

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

> NOTE: You do not need to run this cell if you're running this locally

In [None]:
#!pip install -qU langchain_experimental

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.4/50.4 kB[0m [31m1.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m208.1/208.1 kB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m40.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m399.9/399.9 kB[0m [31m17.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m32.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m292.1/292.1 kB[0m [31m15.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [48]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [49]:
semantic_documents = semantic_chunker.split_documents(documents)

Let's create a new vector store.

In [50]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="JohnWickSemantic"
)

We'll use naive retrieval for this example.

In [51]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [52]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [53]:
semantic_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"Based on the reviews provided, people generally liked John Wick. Many reviews are highly positive, praising the action, style, and overall entertainment value of the films. For example, some reviewers gave ratings of 9 or 10 out of 10, and descriptions highlight the films' coolness, intense action sequences, and stylish direction. Although there are a few less favorable reviews, the overall impression is that John Wick is widely appreciated and well-liked by audiences."

In [54]:
semantic_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10. The URL to that review is: /review/rw4854296/?ref_=tt_urv.'

In [55]:
semantic_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In the John Wick movies, the character John Wick, played by Keanu Reeves, is a retired assassin who is drawn back into a violent world of crime and revenge. The first film's plot begins when some thugs break into John's home, beat him up, kill his dog, and steal his car—without knowing his identity. This act of brutality provokes John Wick to seek revenge against those who wronged him. As he reenters his deadly profession, he faces a relentless pursuit by bounty-hunting killers, Russian gangsters, and other enemies, all while trying to find peace after losing his wife and beloved dog. The series is known for its stylish, intense action sequences and a simple premise of revenge that drives the story."

# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.

In [57]:
import os
import getpass
import copy
import pandas as pd
import numpy as np
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')



[nltk_data] Downloading package punkt to /Users/aneeta/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/aneeta/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


True

In [58]:
# === ENVIRONMENT SETUP ===
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("🔐 Enter your LangChain API key: ")
os.environ["LANGCHAIN_PROJECT"] = "retriever-comparison-john-wick"

In [60]:
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1-nano"))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

In [63]:
from ragas.testset.synthesizers import default_query_distribution, SingleHopSpecificQuerySynthesizer, MultiHopAbstractQuerySynthesizer, MultiHopSpecificQuerySynthesizer

query_distribution = [
        (SingleHopSpecificQuerySynthesizer(llm=generator_llm), 0.5),
        (MultiHopAbstractQuerySynthesizer(llm=generator_llm), 0.3),
        (MultiHopSpecificQuerySynthesizer(llm=generator_llm), 0.2),
]

from ragas.testset import TestsetGenerator


generator = TestsetGenerator(llm=generator_llm, embedding_model=generator_embeddings)
dataset = generator.generate_with_langchain_docs(documents, testset_size=10)

Applying SummaryExtractor:   0%|          | 0/44 [00:00<?, ?it/s]

Applying CustomNodeFilter:   0%|          | 0/100 [00:00<?, ?it/s]

Node eb4a6c45-9b3e-408c-a926-3735762999d2 does not have a summary. Skipping filtering.
Node 3b091d74-76d4-4746-b473-02068f71e20b does not have a summary. Skipping filtering.
Node 42ee3418-3604-45b7-aec6-527044b2a9f0 does not have a summary. Skipping filtering.
Node 5e908c40-d391-4b44-8470-8fe13a0c0aae does not have a summary. Skipping filtering.
Node 107083e3-cd14-41c6-b97c-ee2009146678 does not have a summary. Skipping filtering.
Node c17f415d-0755-405c-9664-02fb7f2d7c55 does not have a summary. Skipping filtering.
Node 932c8e17-bda0-440d-a727-94a56a7f9113 does not have a summary. Skipping filtering.
Node 4b141b7c-f6f4-4a62-a5c2-4a6776372047 does not have a summary. Skipping filtering.
Node 040159c5-3669-4128-85ef-15ba6b8f19f0 does not have a summary. Skipping filtering.
Node 9ac32dd1-e50b-4d4f-9e0f-f6a97a149d2c does not have a summary. Skipping filtering.
Node 84dac775-3171-4c33-8f64-41840e2eba0a does not have a summary. Skipping filtering.
Node 0a166fab-8352-4a6a-b9d6-06bc17b5ad61 d

Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:   0%|          | 0/217 [00:00<?, ?it/s]

Applying [CosineSimilarityBuilder, OverlapScoreBuilder]:   0%|          | 0/2 [00:00<?, ?it/s]

Generating personas:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Scenarios:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Samples:   0%|          | 0/12 [00:00<?, ?it/s]

In [64]:
dataset.to_pandas()

Unnamed: 0,user_input,reference_contexts,reference,synthesizer_name
0,What is the main plot of John Wick according t...,[: 0\nReview: The best way I can describe John...,The main plot of John Wick is about a man seek...,single_hop_specifc_query_synthesizer
1,So like why is John Wick so popular and what m...,[: 2\nReview: With the fourth installment scor...,The context states that the fourth installment...,single_hop_specifc_query_synthesizer
2,As an action film critic analyzing the movie J...,[: 3\nReview: John wick has a very simple reve...,The review highlights that John Wick has a sim...,single_hop_specifc_query_synthesizer
3,Is Reeves in the movie?,[: 4\nReview: Though he no longer has a taste ...,"Yes, the review mentions that Reeves looks rig...",single_hop_specifc_query_synthesizer
4,How does John Wick's quest for revenge reflect...,[<1-hop>\n\n: 10\nReview: Wow what a great sur...,John Wick's story exemplifies the hero's journ...,multi_hop_abstract_query_synthesizer
5,How does the John Wick film series exemplify t...,[<1-hop>\n\n: 22\nReview: All the below are no...,"The John Wick film series, as described in the...",multi_hop_abstract_query_synthesizer
6,Wha is the probblem with lack of inovation in ...,[<1-hop>\n\n: 23\nReview: I love me a bit of t...,The context indicates that the John Wick serie...,multi_hop_abstract_query_synthesizer
7,How does the ineptitude of antagonists and cha...,[<1-hop>\n\n: 22\nReview: All the below are no...,The film is a mess because the antagonists are...,multi_hop_abstract_query_synthesizer
8,How does the portrayal of Russian gangsters in...,[<1-hop>\n\n: 10\nReview: Wow what a great sur...,The context highlights that in the John Wick s...,multi_hop_specific_query_synthesizer
9,How do the stylistic elements of John Wick: Ch...,[<1-hop>\n\n: 12\nReview: If there's an equiva...,John Wick: Chapter 2 is described as an artist...,multi_hop_specific_query_synthesizer


In [65]:
res=naive_retrieval_chain.invoke({"question" : "What happened to the puppy in John Wick?"})#["response"].content
res

{'response': AIMessage(content="In John Wick, the puppy that John received after his wife's death is killed by some thugs led by the son of a Russian gangster. This tragic event motivates John Wick to seek revenge.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 38, 'prompt_tokens': 3228, 'total_tokens': 3266, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-nano-2025-04-14', 'system_fingerprint': 'fp_8fd43718b3', 'id': 'chatcmpl-BZlNRw6i7nJqQrteEyLQ5WYLyv7R6', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='run--23a7d9d1-0000-4bb3-889f-35f69cf7a5ad-0', usage_metadata={'input_tokens': 3228, 'output_tokens': 38, 'total_tokens': 3266, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0

In [66]:
'\n'.join([doc.page_content for doc in res['context']])

': 10\nReview: Wow what a great surprise this was. I was told by a friend this was good but it\'s been awhile since I liked a Keanu movie so I was hesitant to try it. Retired hit-man John Wick (Keanu Reeves) loses his wife to cancer. After her funeral he receives a puppy she left him. A few days later some thugs, led by the son of a Russian gangster John used to work for, break into John\'s house. They beat him up, take the keys to his beloved car, and kill the puppy. They did this not knowing who he was; they just wanted the car. Now John Wick is out for revenge and the Russian gangster is trying to save his son\'s life by sending killers after John.\n: 19\nReview: If you\'ve seen the first John Wick movie, you know that Keanu Reeves is John Wick, a retired assassin who comes out of retirement when someone kills his dog. In this one, which begins a week later, matters are still reverberating, and some one has stolen his car, which calls for a lot of carnage. That settled, John is call

In [136]:
import os
os.environ["COHERE_API_KEY"] = "UBfTGhv9xUD08EBcK6r4ps0drrcnU8n4K2ozMsCo"


In [67]:
import time
import pandas as pd

# Define retrievers and their names together
retrievers = {
    "naive": naive_retrieval_chain,
    "bm25": bm25_retrieval_chain,
    "compression": contextual_compression_retrieval_chain,
    "multi_query": multi_query_retrieval_chain,
    "parent_doc": parent_document_retrieval_chain,
    "ensemble": ensemble_retrieval_chain,
}

# Store output datasets by name
retriever_outputs = {}




In [68]:
# Run each retriever over the dataset
for name, retriever in retrievers.items():
    for row in dataset:
        question = row.eval_sample.user_input
        result = retriever.invoke({"question": question})

        row.eval_sample.response = result["response"].content
        row.eval_sample.retrieved_contexts = [doc.page_content for doc in result["context"]]

    # Save the modified dataset for this retriever
    retriever_outputs[name] = copy.deepcopy(dataset)
    time.sleep(6)  # Pause to avoid rate limits

# Example: View results for one retriever
retriever_outputs["bm25"].to_pandas().head()

Unnamed: 0,user_input,retrieved_contexts,reference_contexts,response,reference,synthesizer_name
0,What is the main plot of John Wick according t...,[: 4\nReview: Hollow plot interspersed with th...,[: 0\nReview: The best way I can describe John...,"According to the review, the main plot of John...",The main plot of John Wick is about a man seek...,single_hop_specifc_query_synthesizer
1,So like why is John Wick so popular and what m...,[: 6\nReview: John Wick is one of those few mo...,[: 2\nReview: With the fourth installment scor...,John Wick is popular because it offers a uniqu...,The context states that the fourth installment...,single_hop_specifc_query_synthesizer
2,As an action film critic analyzing the movie J...,"[: 9\nReview: At first glance, John Wick sound...",[: 3\nReview: John wick has a very simple reve...,"As an action film critic analyzing ""John Wick,...",The review highlights that John Wick has a sim...,single_hop_specifc_query_synthesizer
3,Is Reeves in the movie?,[: 11\nReview: Who needs a 2hr and 40 min acti...,[: 4\nReview: Though he no longer has a taste ...,"Based on the provided context, there is no inf...","Yes, the review mentions that Reeves looks rig...",single_hop_specifc_query_synthesizer
4,How does John Wick's quest for revenge reflect...,[: 0\nReview: The best way I can describe John...,[<1-hop>\n\n: 10\nReview: Wow what a great sur...,John Wick's quest for revenge reflects the cla...,John Wick's story exemplifies the hero's journ...,multi_hop_abstract_query_synthesizer


In [69]:
import cohere
co = cohere.Client("UBfTGhv9xUD08EBcK6r4ps0drrcnU8n4K2ozMsCo")
response = co.generate(prompt="Hello world", model="command", max_tokens=20)
print(response.generations[0].text)


 Hi there! How are you doing? 
If you have any questions, let me know and


In [70]:
from ragas import EvaluationDataset, evaluate, RunConfig
from ragas.llms import LangchainLLMWrapper
from ragas.metrics import (
    LLMContextRecall,
    Faithfulness,
    FactualCorrectness,
    ResponseRelevancy
)
from langchain_openai import ChatOpenAI

# Setup evaluator model
evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1-mini"))

# Evaluation config
run_config = RunConfig(timeout=360)

# Define your retriever outputs dict (replace with your actual one)
# retriever_outputs = {"bm25": ..., "naive": ..., ...}

# Initialize result storage
ragas_results = {}

# Evaluate each retriever output
for name, dataset_obj in retriever_outputs.items():
    eval_dataset = EvaluationDataset.from_pandas(dataset_obj.to_pandas())
    result = evaluate(
        dataset=eval_dataset,
        metrics=[
            LLMContextRecall(),
            Faithfulness(),
            FactualCorrectness(),
            ResponseRelevancy()
        ],
        llm=evaluator_llm,
        run_config=run_config
    )
    ragas_results[name] = result

# Optional: print or inspect results
from pprint import pprint
pprint(ragas_results)


Evaluating:   0%|          | 0/48 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/48 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/48 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/48 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/48 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/48 [00:00<?, ?it/s]

{'bm25': {'context_recall': 0.5278, 'faithfulness': 0.8104, 'factual_correctness(mode=f1)': 0.4742, 'answer_relevancy': 0.7897},
 'compression': {'context_recall': 0.6111, 'faithfulness': 0.7829, 'factual_correctness(mode=f1)': 0.5700, 'answer_relevancy': 0.7934},
 'ensemble': {'context_recall': 0.9722, 'faithfulness': 0.8699, 'factual_correctness(mode=f1)': 0.4700, 'answer_relevancy': 0.9479},
 'multi_query': {'context_recall': 0.9028, 'faithfulness': 0.8319, 'factual_correctness(mode=f1)': 0.5317, 'answer_relevancy': 0.9524},
 'naive': {'context_recall': 0.8194, 'faithfulness': 0.8244, 'factual_correctness(mode=f1)': 0.4867, 'answer_relevancy': 0.9520},
 'parent_doc': {'context_recall': 0.4306, 'faithfulness': 0.6454, 'factual_correctness(mode=f1)': 0.5208, 'answer_relevancy': 0.9535}}


In [73]:
from langsmith import Client
from langchain_openai import ChatOpenAI
from langsmith.evaluation import LangChainStringEvaluator

# === Create Dataset ===
client = Client()

dataset_name = "John Wick Movie Reviews"
dataset_description = "Q&A pairs and review context for the John Wick franchise."

dataset = client.create_dataset(
    dataset_name=dataset_name,
    description=dataset_description
)

# === Upload Examples from RAGAS dataset ===
df = dataset_ragas.to_pandas()  # Replace `dataset_ragas` with your actual Testset object

for _, row in df.iterrows():
    client.create_example(
        inputs={"question": row["user_input"]},
        outputs={"answer": row["reference"]},
        metadata={"context": row["reference_contexts"]},
        dataset_id=dataset.id
    )

# === Define LLM for evaluation ===
eval_llm = ChatOpenAI(model="gpt-4.1")

# === Define Evaluators ===

# 1. Basic QA evaluator
qa_eval = LangChainStringEvaluator(
    name="qa",
    config={"llm": eval_llm},
    prepare_data=lambda run, example: {
        "prediction": run.outputs["response"],
        "reference": example.outputs["answer"],
        "input": example.inputs["question"]
    }
)

# 2. Helpfulness evaluator
helpfulness_eval = LangChainStringEvaluator(
    name="labeled_criteria",
    config={
        "criteria": {
            "helpfulness": "Is the response helpful given the correct reference answer?"
        },
        "llm": eval_llm
    },
    prepare_data=lambda run, example: {
        "prediction": run.outputs["response"],
        "reference": example.outputs["answer"],
        "input": example.inputs["question"]
    }
)

# 3. Dopeness evaluator (fun custom one)
dopeness_eval = LangChainStringEvaluator(
    name="criteria",
    config={
        "criteria": {
            "dopeness": "Is this response dope, lit, or cool?"
        },
        "llm": eval_llm
    },
    prepare_data=lambda run, example: {
        "prediction": run.outputs["response"],
        "reference": example.outputs["answer"],
        "input": example.inputs["question"]
    }
)


NameError: name 'dataset_ragas' is not defined

In [None]:
import time
import pandas as pd
from langsmith.evaluation import evaluate

# === Define all retriever chains and results storage ===
retriever_chains = {
    "naive": naive_retrieval_chain,
    "bm25": bm25_retrieval_chain,
    "compression": contextual_compression_retrieval_chain,
    "multi_query": multi_query_retrieval_chain,
    "parent_doc": parent_document_retrieval_chain,
    "ensemble": ensemble_retrieval_chain,
}

# === Initialize dictionary to hold LangSmith results ===
langsmith_results = {}

# === Run LangSmith evaluation per retriever ===
for name, retriever in retriever_chains.items():
    print(f"🔍 Evaluating {name}...")
    result = evaluate(
        retriever.invoke,
        data=dataset_name,  # dataset name you uploaded earlier
        evaluators=[
            qa_eval,
            helpfulness_eval,
            dopeness_eval
        ],
        metadata={"revision_id": "default_chain_init"},
    )
    langsmith_results[name] = result
    time.sleep(60)  # throttle to avoid rate limits

# === Optional: Confirm session links manually ===
print("✅ Evaluation complete. Visit https://smith.langchain.com to view your runs.")

# === Display stored performance images ===
from IPython.display import display, Image

display(Image("langsmith_performance_1.png"))
display(Image("langsmith_performance_4.png"))
