# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

> You do not need to run the following cells if you are running this notebook locally. 

In [1]:
#!pip install -qU langchain langchain-openai langchain-cohere rank_bm25

We're also going to be leveraging [Qdrant's](https://qdrant.tech/documentation/frameworks/langchain/) (pronounced "Quadrant") VectorDB in "memory" mode (so we can leverage it locally in our colab environment).

In [2]:
#!pip install -qU qdrant-client

We'll also provide our OpenAI key, as well as our Cohere API key.

In [None]:
# !pip install python-dotenv -q

from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()

# access your API keys without having to enter them manually
openai_api_key = os.getenv("OPENAI_API_KEY")
cohere_api_key = os.getenv("COHERE_API_KEY")
langsmith_api_key = os.getenv("LANGSMITH_API_KEY")

# Set them for the libraries to use
os.environ["OPENAI_API_KEY"] = openai_api_key
os.environ["COHERE_API_KEY"] = cohere_api_key
os.environ["LANGSMITH_API_KEY"] = langsmith_api_key

In [4]:
import os
import getpass

# os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

In [5]:
# os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

## Task 2: Data Collection and Preparation

We'll be using some reviews from the 4 movies in the John Wick franchise today to explore the different retrieval strategies.

These were obtained from IMDB, and are available in the [AIM Data Repository](https://github.com/AI-Maker-Space/DataRepository).

### Data Collection

We can simply `wget` these from GitHub.

You could use any review data you wanted in this step - just be careful to make sure your metadata is aligned with your choice.

In [6]:
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw1.csv -O john_wick_1.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw2.csv -O john_wick_2.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw3.csv -O john_wick_3.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw4.csv -O john_wick_4.csv

--2025-05-14 20:46:27--  https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw1.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19628 (19K) [text/plain]
Saving to: ‘john_wick_1.csv’


2025-05-14 20:46:28 (68.6 MB/s) - ‘john_wick_1.csv’ saved [19628/19628]

--2025-05-14 20:46:28--  https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw2.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14747 (14K) [text/plain]
Saving to: ‘john_wick_2.csv’


2025-05-14 20:46:28 (25.8 MB/s) - ‘john_wick_2.csv’

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

- Self-Query: Wants as much metadata as we can provide
- Time-weighted: Wants temporal data

> NOTE: While we're creating a temporal relationship based on when these movies came out for illustrative purposes, it needs to be clear that the "time-weighting" in the Time-weighted Retriever is based on when the document was *accessed* last - not when it was created.

In [7]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

documents = []

for i in range(1, 5):
  loader = CSVLoader(
      file_path=f"john_wick_{i}.csv",
      metadata_columns=["Review_Date", "Review_Title", "Review_Url", "Author", "Rating"]
  )

  movie_docs = loader.load()
  for doc in movie_docs:

    # Add the "Movie Title" (John Wick 1, 2, ...)
    doc.metadata["Movie_Title"] = f"John Wick {i}"

    # convert "Rating" to an `int`, if no rating is provided - assume 0 rating
    doc.metadata["Rating"] = int(doc.metadata["Rating"]) if doc.metadata["Rating"] else 0

    # newer movies have a more recent "last_accessed_at"
    doc.metadata["last_accessed_at"] = datetime.now() - timedelta(days=4-i)

  documents.extend(movie_docs)

Let's look at an example document to see if everything worked as expected!

In [8]:
documents[0]

Document(metadata={'source': 'john_wick_1.csv', 'row': 0, 'Review_Date': '6 May 2015', 'Review_Title': ' Kinetic, concise, and stylish; John Wick kicks ass.\n', 'Review_Url': '/review/rw3233896/?ref_=tt_urv', 'Author': 'lnvicta', 'Rating': 8, 'Movie_Title': 'John Wick 1', 'last_accessed_at': datetime.datetime(2025, 5, 11, 20, 46, 30, 154962)}, page_content=": 0\nReview: The best way I can describe John Wick is to picture Taken but instead of Liam Neeson it's Keanu Reeves and instead of his daughter it's his dog. That's essentially the plot of the movie. John Wick (Reeves) is out to seek revenge on the people who took something he loved from him. It's a beautifully simple premise for an action movie - when action movies get convoluted, they get bad i.e. A Good Day to Die Hard. John Wick gives the viewers what they want: Awesome action, stylish stunts, kinetic chaos, and a relatable hero to tie it all together. John Wick succeeds in its simplicity.")

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "JohnWick".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [9]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    documents,
    embeddings,
    location=":memory:",
    collection_name="JohnWick"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [10]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [11]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [12]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-nano")


### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [13]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [14]:
naive_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the reviews provided, people generally liked John Wick. The film received high ratings from several reviewers, with many praising its action sequences, style, and entertainment value. For example, some reviews rated it 9 or 10 out of 10, highlighting its slickness, brutality, and coolness. Even those who gave lower ratings, such as 5 or 6, still acknowledged its interesting action and choreography. Overall, the subjective opinions suggest that viewers generally appreciated and enjoyed the film.'

In [15]:
naive_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10. The URL to that review is: /review/rw4854296/?ref_=tt_urv'

In [16]:
naive_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In the John Wick film series, the story follows John Wick, a retired hitman who is drawn back into a world of violence and revenge after a series of personal attacks. In the first film, John comes out of retirement after a gangsters kill his dog and steal his car, seeking vengeance against those responsible. Throughout the series, he faces various enemies, including mobsters and professional killers, and becomes entangled in a complex underworld governed by rules and honor. The films are characterized by intense action sequences, stylish choreography, and a portrayal of a dark, criminal underworld.'

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [17]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(documents)

We'll construct the same chain - only changing the retriever.

In [18]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [19]:
bm25_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"Based on the reviews provided, people's opinions on John Wick vary. Some reviews are highly positive, praising the movies for their exciting action, stylishness, and world-building, with ratings of 8 and 10 out of 10. However, there are also negative reviews, such as one rating of 1 out of 10, criticizing the film for being boring, overly violent, and lacking plot. \n\nOverall, it seems that while many people enjoyed the John Wick movies, some did not, and opinions are mixed. Therefore, people did not generally have a uniformly positive or negative view of John Wick."

In [20]:
bm25_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'There are no reviews with a rating of 10.'

In [21]:
bm25_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In the John Wick film series, the main character, John Wick, is an ex-hitman who becomes involved in a series of violent and highly choreographed action scenes. The original movie, "John Wick," introduces him as a man mourning the loss of his wife, who leaves him a dog as a final gift. When the dog is murdered by thugs, Wick is drawn back into the criminal underworld, seeking revenge. Throughout the series, he faces numerous assassins and enemies, showcasing intense combat and stylish choreography. The series is known for its brutal action scenes, world-building involving a secret society of assassins, and Keanu Reeves\' portrayal of a relentless and skilled killer.'

It's not clear that this is better or worse - but the `I don't know` isn't great!

## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [22]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-english-v3.0")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [23]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [24]:
contextual_compression_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the reviews provided, people generally liked John Wick. The majority of reviews are highly positive, praising its action sequences, style, and Keanu Reeves\' performance. For example, one review rated it 9 out of 10 and called it "the coolest action film you\'ll see all year," while another gave it a perfect score of 10 and described it as "something special." However, there is at least one less favorable review for the third film in the series, giving it a 5 out of 10 and stating "The magic is gone." Overall, the sentiment from the majority of reviews suggests that people generally enjoyed John Wick.'

In [25]:
contextual_compression_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there are reviews with a rating of 10. Here are the URLs to those reviews:\n\n1. Review titled "A Masterpiece & Brilliant Sequel" – URL: /review/rw4854296/?ref_=tt_urv\n2. Review titled "Most American action flicks released these days have poor screenplays and overuse computer-generated imagery. The John Wick franchise is one of the few exceptions, along with Mission Impossible." – URL: /review/rw8944843/?ref_=tt_urv\n3. Review titled "It\'s got its own action style!" – URL: /review/rw4860412/?ref_=tt_urv\n\nLet me know if you need further details!'

In [26]:
contextual_compression_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In the John Wick series, John Wick, played by Keanu Reeves, is a retired hitman who seeks revenge after a series of tragic events. In the first film, his beloved dog is killed and his car is stolen by criminals, which motivates him to return to his violent past to track down those responsible. The storyline involves him confronting gangsters, facing a bounty on his head, and ultimately seeking justice for his loss. In the second film, after resolving issues with the Russian mafia, a new conflict arises when a mobster, Santino D'Antonio, asks Wick to carry out an assassination in Rome to help Santino climb the criminal hierarchy. Wick completes the task, but Santino then puts a contract on Wick's life, leading to a series of violent confrontations. Throughout the series, John Wick is portrayed as a lethal and efficient assassin who operates within a strict criminal code while seeking vengeance and survival."

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [27]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    #retriever=naive_retriever, llm=chat_model
    retriever=bm25_retriever, llm=chat_model
)

In [28]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [29]:
multi_query_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"Based on the reviews provided, people's opinions on John Wick are quite mixed. Some reviewers highly praised the movies, describing them as exciting, well-choreographed, and highly entertaining, with ratings of 8, 9, or 10 out of 10. Others expressed strong criticism, describing the films as boring, repetitive, mindless action, and with poor plots, with ratings as low as 1 or 2 out of 10. \n\nOverall, it seems that some people really liked John Wick, especially fans of action and well-choreographed fight scenes, while others did not enjoy the films and found them dull or superficial. Therefore, people’s opinions are divided, and it was not universally liked."

In [30]:
multi_query_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there are reviews with a rating of 10. One such review can be found at the URL: /review/rw3109271/?ref_=tt_urv.'

In [31]:
multi_query_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In the John Wick films, the main storyline centers on John Wick, a retired hitman seeking revenge after a brutal home invasion leaves his beloved dog murdered by thugs from his past. The films depict his relentless and violent quest for vengeance, showcasing intense action sequences, sophisticated choreography, and a dark, gritty world filled with assassins, crime, and morality questions. Throughout the series, John Wick confronts numerous enemies, demonstrating exceptional combat skills and a ruthless determination to protect what he loves.'

## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [32]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = documents
child_splitter = RecursiveCharacterTextSplitter(chunk_size=200)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [33]:
client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = Qdrant(
    collection_name="full_documents", embeddings=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

  parent_document_vectorstore = Qdrant(


Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [34]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [35]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [36]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [37]:
parent_document_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the provided reviews, people generally liked the John Wick series. The majority of the reviews are positive, praising the action, choreography, and entertainment value. However, there is at least one negative review criticizing John Wick 4 specifically, calling it "horrible" and criticizing its plot and fight scenes. Overall, most feedback appears favorable, indicating that people tend to like the John Wick series, although opinions on the latest installment vary.'

In [38]:
parent_document_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10. The URL to that review is: /review/rw4854296/?ref_=tt_urv'

In [39]:
parent_document_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'Based on the information provided, in the John Wick movies, John Wick is a retired assassin who is drawn back into violence and chaos after certain events. In the first film, he comes out of retirement after his dog is killed and his car is stolen, seeking revenge against those who wronged him. This leads to him unleashing a deadly and relentless vendetta, involving many scenes of violence and action as he fights to recover his dignity and settle old scores.\n\nIn the second movie, John Wick continues his violent adventures. The story begins with him retrieving a car from a chop shop, but he is soon forced back into the criminal underworld when calls in a favor from an Italian criminal, which leads him on missions across various locations, including Italy, Canada, and Manhattan, involving killing numerous assassins to help a new crime boss take over the Assassin’s Guild.\n\nOverall, John Wick is depicted as a legendary, highly skilled hitman who displays intense violence and a relentl

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [40]:
from langchain.retrievers import EnsembleRetriever

# retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [41]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [42]:
ensemble_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"Based on the reviews provided, people generally liked John Wick. Several reviews praise the film's action sequences, style, and Keanu Reeves' performance, with ratings frequently in the 8 to 10 range. Critics highlight its fun, slick, and exciting nature, especially for action fans. However, there are some negative opinions and lower ratings (such as 1 to 5), often criticizing the over-the-top action or lack of a substantial plot. Overall, the majority of reviews reflect a positive reception."

In [43]:
ensemble_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10. The URL to that review is /review/rw3109271/?ref_=tt_urv.'

In [44]:
ensemble_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In the John Wick series, the story revolves around John Wick, a retired hitman who is forced back into the violent underworld he once left behind. The original film depicts how Wick’s life is upended when a gangster’s son kills his dog, a final gift from his late wife, which prompts Wick to seek lethal revenge against those who wronged him. Throughout the series, Wick faces various enemies, including mobsters, assassins, and criminal organizations, as he seeks justice and deals with the consequences of his past actions. The franchise features intense action, stylized fight sequences, and explores a fictional world where assassins operate under strict rules and codes of conduct.'

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

> NOTE: You do not need to run this cell if you're running this locally

In [45]:
#!pip install -qU langchain_experimental

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [46]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [47]:
semantic_documents = semantic_chunker.split_documents(documents)

Let's create a new vector store.

In [48]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="JohnWickSemantic"
)

We'll use naive retrieval for this example.

In [49]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [50]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [51]:
semantic_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the reviews provided, people generally liked John Wick. Multiple reviews praise its action sequences, style, and entertainment value, often giving high ratings like 8, 9, or 10 out of 10. Some reviews mention that the series is well received and maintains a consistent level of quality. However, there are also some lower ratings and critical comments, especially for later installments, indicating that not everyone was pleased. Overall, the majority of reviews suggest that people generally liked John Wick.'

In [52]:
semantic_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10. The URL for that review is: /review/rw4854296/?ref_=tt_urv'

In [53]:
semantic_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In the John Wick movies, the story centers around John Wick, a retired assassin who is drawn back into a world of violence and revenge. The first film's plot begins when a group of thugs, led by the son of a Russian mobster he used to work for, break into his house, beat him up, kill his dog, and steal his car. The killing of his beloved dog, which was a gift from his deceased wife, prompts John Wick to seek vengeance. As he pursues justice, he becomes entangled in a dangerous underworld of crime, confronts various enemies, and unleashes a relentless and highly stylized wave of violence against those who wronged him. The movies explore themes of revenge, consequence, and the formidable skills of a legendary assassin."

# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset" (<span style="color:green">Done)
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics (<span style="color:green">Done)
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why. (<span style="color:green">Table at the end of the RAGAS evaluation code)
<span style="color:green"> For perfromance, I used three RAGAS metrics. For cost and latency, I used a Langmsith evaluation of all retrievers.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

## <span style="color:green"> RAGAS results 
<span style="color:green">(Retriever perfromance metrics: faithfulness, context recall and context precision) I did not run all the ragas metrics, I only did the retriever specific ones. To save evaluation costs and to keep the output easy to understand and analyze.

| Retriever                        | Faithfulness | Context Recall | Context Precision |
|----------------------------------|--------------|----------------|-------------------|
| Naive Retriever                  | 0.9647       | 1.0000         | 0.8856            |
| BM25 Retriever                   | 0.9599       | 0.9889         | 0.7824            |
| Multi-Query Retriever            | 0.9948       | 1.0000         | 0.7148            |
| Parent Document Retriever        | 0.8673       | 0.9792         | 0.9838            |
| Semantic Retriever               | 0.9924       | 1.0000         | 0.8508            |
| Contextual Compression Retriever | 0.9799       | 1.0000         | 0.9861            |

## <span style="color:green"> RAGAS ANALYSIS

*   <span style="color:green">**Overall High Performance**: Most retrievers achieve excellent perfect context recall: so they are generally successful at retrieving all relevant information.

*   <span style="color:green">**Faithfulness**: The Multi-Query Retriever (0.9948) and Semantic Retriever (0.9924) lead in Faithfulness, suggesting their generated answers align very closely with the provided context.

*   <span style="color:green">**Precision**: The Contextual Compression Retriever (0.9861) and Parent Document Retriever (0.9838) show the highest Context Precision, meaning the context they retrieve is highly relevant to the query.

*   <span style="color:green">**Whats with Multi-query and parent document?**: Multi query, while top in faithfulness, it has the lowest context precision (0.7148), suggesting it might retrieve more irrelevant documents alongside the relevant ones. Parent Document has slightly lower faithfulness (0.8673) but very high precision.


## <span style="color:green"> LANGSMITH Results (Cost, Latency, Correctness)
See https://smith.langchain.com/o/a335557d-c0ed-4838-b6da-f98a8da2090c/datasets/bfc91aaa-6b4f-48cf-8145-9d13b8f858c3 for more details.


![Naive Retriever Summary from LangSmith](LS_results.png)

## <span style="color:green"> LANGSMITH Analysis

- <span style="color:green"> All of them did fairly well with correctness. Cohere has slightly lower correctness, but it did run into a few timeout errors on some runs, so that might be it.
- <span style="color:green"> LATENCY: Ensemble did worse. Understandable. Since it tries everything. BM_25 rocked it and Cohere/MultiQuery were mid.
- <span style="color:green"> Dollar COST: Fairly low! 
- <span style="color:green"> Total tokens: Parent, BM_25 and cohere are rocking it at under 20k, Ensemble is touching 60k! and the other three are mid.

##### HINTS:

- LangSmith provides detailed information about latency and cost.

In [54]:
### YOUR CODE HERE
#!pip install -qU ragas

## Generate Golden Data Set

<span style="color:green"> Imports for RAGAS and setting the llm and embedding model

In [55]:
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
sdg_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1-nano"))
sdg_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings(model="text-embedding-3-small"))

<span style="color:green"> Generate the golden test dataset

In [None]:
from ragas.testset import TestsetGenerator
generator = TestsetGenerator(llm=sdg_llm, embedding_model=sdg_embeddings)
sdg_dataset = generator.generate_with_langchain_docs(documents, testset_size=10)

I cleared the output of this cell: too long and not useful

In [57]:
sdg_dataset.to_pandas()

Unnamed: 0,user_input,reference_contexts,reference,synthesizer_name
0,Who is Keanu Reeves in the context of the acti...,[: 0\nReview: The best way I can describe John...,Keanu Reeves is the actor who plays the charac...,single_hop_specifc_query_synthesizer
1,"Considering the popularity of the ""John Wick"" ...",[: 2\nReview: With the fourth installment scor...,"The review mentions that ""John Wick"" is a film...",single_hop_specifc_query_synthesizer
2,Who is Chad Stahelski and what is his role in ...,[: 3\nReview: John wick has a very simple reve...,"Chad Stahelski is the director of John Wick, a...",single_hop_specifc_query_synthesizer
3,How does Reeves' portrayal contribute to the o...,[: 4\nReview: Though he no longer has a taste ...,"Savvy, indestructible Reeves looks right at ho...",single_hop_specifc_query_synthesizer
4,so like in the first review it says all these ...,[<1-hop>\n\n: 13\nReview: ... slaughtering a l...,The first review describes the film as full of...,multi_hop_abstract_query_synthesizer
5,How do the elaborate action sequences and the ...,[<1-hop>\n\n: 23\nReview: Rating 10/10\nI was ...,The reviews highlight that John Wick features ...,multi_hop_abstract_query_synthesizer
6,How do the visceral and stylized action sequen...,"[<1-hop>\n\n: 9\nReview: At first glance, John...",The visceral and stylized action sequences in ...,multi_hop_abstract_query_synthesizer
7,Considering the film's lengthy runtime and its...,"[<1-hop>\n\n: 9\nReview: ""John Wick: Chapter 2...",The film 'John Wick: Chapter 2' has a runtime ...,multi_hop_abstract_query_synthesizer
8,whats up with john wick 2 and john wick chapte...,[<1-hop>\n\n: 10\nReview: The first John Wick ...,"The reviews highlight that John Wick 2, also k...",multi_hop_specific_query_synthesizer
9,Does the review of Chapter 2 relate to the lac...,[<1-hop>\n\n: 11\nReview: Don't believe the hy...,The review of Chapter 2 describes it as a gene...,multi_hop_specific_query_synthesizer


## <span style="color:green"> RAGAS evaluation of all the retrievers. First setup, then evaluate

<span style="color:green"> Ragas specific Imports, setting up a list of retrievers, a list of metrics, and setting up the llm and embeddings. This makes it easy for us to pick and choose which retrievers to evaluate with what metrics.

In [107]:
# RAGAS Setup
import pandas as pd
import time
from datasets import Dataset
from ragas.llms import LangchainLLMWrapper
from ragas import evaluate as ragas_evaluate # Alias to avoid conflict
from ragas import RunConfig

from ragas.metrics import (
    Faithfulness,
    ContextRecall,
    ContextPrecision
)

# Commented out retrievers that often give time out errors!
retriever_chains_to_evaluate = [
    ("Naive Retriever", naive_retrieval_chain),
    ("BM25 Retriever", bm25_retrieval_chain),
    #("Contextual Compression Retriever", contextual_compression_retrieval_chain),
    ("Multi-Query Retriever", multi_query_retrieval_chain),
    ("Parent Document Retriever", parent_document_retrieval_chain),
    #("Ensemble Retriever", ensemble_retrieval_chain),
    ("Semantic Retriever", semantic_retrieval_chain)
]

metrics_to_use = [
    Faithfulness(),
    ContextRecall(),
    ContextPrecision()
]
evaluation_run_config = RunConfig(
    timeout=900,  # 10 minutes per job
    max_workers=2 # Limit concurrency
)

# LLM? we defined it earlier as sdg_llm
ragas_eval_llm_for_metrics = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1-nano"))
ragas_eval_embeddings_for_metrics = sdg_embeddings

# For a timeout error (I git it with cohere a lot):
#ragas_eval_llm_for_metrics = LangchainLLMWrapper(
#    ChatOpenAI(
#        model="gpt-4.1-nano",
#        request_timeout=120  # This is the key addition
#    )
#)


## <span style="color:green"> RAGAS Evaluation: Run the evals on chosen retrievers!

In [None]:

# Save the results to a file
output_summary_filename = "ragas_Retriever_summary.txt"
with open(output_summary_filename, "w") as f:
    f.write("Ragas Summaries):\n\n")

print(f"Ragas summaries to be appended to: {output_summary_filename}\n")

# For each retriever: 
#   run its rag chain on the golden test dataset, 
#   convert the results to a pandas dataframe, and 
#   evaluate the results with RAGAS.
for retriever_name, chain_instance in retriever_chains_to_evaluate:
    print(f"--- Evaluating: {retriever_name} ---")
    for test_sample in sdg_dataset: 
        question = test_sample.eval_sample.user_input
        
        response_dict = chain_instance.invoke({"question": question})
        test_sample.eval_sample.response = response_dict["response"].content 
        test_sample.eval_sample.retrieved_contexts = [doc.page_content for doc in response_dict["context"]]
        
        time.sleep(6.1)
    
    list_of_eval_sample_dicts = [sample.eval_sample.model_dump() for sample in sdg_dataset]
    df_for_evaluation = pd.DataFrame(list_of_eval_sample_dicts)
    evaluation_hf_dataset = Dataset.from_pandas(df_for_evaluation)

  
    result = ragas_evaluate(
        dataset=evaluation_hf_dataset,
        metrics=metrics_to_use, 
        llm=ragas_eval_llm_for_metrics,
        embeddings=ragas_eval_embeddings_for_metrics,
        run_config=evaluation_run_config
    )
    summary_line = f"Retriever: {retriever_name}\nScores: {result}\n-------------\n"
    print(f"\nScores for {retriever_name}:")
    print(result) 
    with open(output_summary_filename, "a") as f:
        f.write(summary_line)

print(f"\nAll RAGAS eval summaries  appended to: {output_summary_filename}")

Ragas summaries (User's Exact Workflow - No Comments) will be appended to: ragas_Retriever_summary.txt

--- Evaluating: Ensemble Retriever ---


Evaluating:   0%|          | 0/36 [00:00<?, ?it/s]


Scores for Ensemble Retriever:
{'faithfulness': 0.9798, 'context_recall': 1.0000, 'context_precision': 0.8678}

All summaries (User's Exact Workflow - No Comments) appended to: ragas_Retriever_summary.txt


## <span style="color:green"> Langsmith evaluation for cost and latency.
<span style="color:green"> Same as RAGAS: we first do the setup and then run the evaluations.

<span style="color:green"> Langsmith setup: Imports, tracing, init the dataset

In [110]:
from langsmith import Client
from langsmith.evaluation import LangChainStringEvaluator, evaluate

LANGCHAIN_TRACING="true"
LANGCHAIN_PROJECT= " Assignment 13 Eval Retrievers-Test"

client = Client()
dataset_name = "Advanced_Retrieval_Assignment_Test"
langsmith_dataset = client.create_dataset(
    dataset_name=dataset_name,
    description="Assignment 13 eval retrievers - test!"
)

<span style="color:green"> Convert our golden data set into Langsmith format. Langsmith was having trouble with the special formatting of the contexts. So, I tried to get around it by cleaning out all the <1-hop> markers etc and just grabbing the actual review. Then it did work.

In [111]:
def clean_context(context):
    """Clean the reference context by removing special formatting."""
    if isinstance(context, str):
        try:
            if context.startswith("[: "):
                parts = context.split("Review: ", 1)
                return parts[1] if len(parts) > 1 else context
            elif context.startswith("<1-hop>"):
                parts = context.split("Review: ", 1)
                return parts[1] if len(parts) > 1 else context
        except Exception as e:
            print(f"Warning: Error cleaning context: {e}")
            return context
    return context

# Convert the dataset from Ragas format to LangSmith format.
for data_row in sdg_dataset.to_pandas().iterrows():
  client.create_example(
      inputs={
          "question": data_row[1]["user_input"]
      },
      outputs={
          "answer": data_row[1]["reference"]
      },
      metadata={
          #"context": data_row[1]["reference_contexts"]
          "context": [clean_context(ctx) for ctx in data_row[1]["reference_contexts"]]

      },
      dataset_id=langsmith_dataset.id
  )

# Quick check of the dataset format
print("=== Checking LangSmith Dataset Format ===")
example = next(client.list_examples(dataset_id=langsmith_dataset.id))
print(f"Context format in metadata: {example.metadata['context'][0]}")  

=== Checking LangSmith Dataset Format ===
Context format in metadata: This universe, created without any source material, had to develop and expand itself in every new film. The same fiction can not be made for each film. We don't know if Derek Kolstad had thought that John Wick would be this big when he first created it, but when we came to the third film it seems clear how the story deepened and became beautiful.


<span style="color:green"> Set the lanfsmith eval llm and define the data prep util that will take the rag chain output and convert it into langsmith format.

<span style="color:green"> I would have liked to just do both ragas and langmsith evals together for each retriever. I would have liked to refactor the code so it does not have to run the rag chains TWICE, once for ragas, once for langsmith. I wanted to dig deeper into the possible evaluation functions RAGAS and Langsmith provide and find ones that will take the 5-column data and run the evals. But I have a lot of work backlog, so maybe later!

In [112]:
lc_eval_llm = ChatOpenAI(model="gpt-4.1")

def prep_data(run, example):
    return {
        "prediction": run.outputs['response'],  # Map 'response' key to 'prediction'
        "reference": example.outputs['answer'], # Map 'answer' key to 'reference'
        "input": example.inputs['question'],    # Map 'question' key to 'input' (or 'query' depending on the evaluator's prompt)
    }

qa_evaluator = LangChainStringEvaluator("qa", config={"llm" : lc_eval_llm},prepare_data=prep_data)

<span style="color:green"> Run the langsmith evaluators for all our retrievers!

In [114]:
naive_ls_results = evaluate(
    naive_retrieval_chain.invoke,
    data=dataset_name,
    evaluators=[
        qa_evaluator,
    ],
    metadata={"revision_id": "Base_Embedding_Model"},
)
print("Naive Retriever Summary:")
print(naive_ls_results)

View the evaluation results for experiment: 'sparkling-eye-19' at:
https://smith.langchain.com/o/a335557d-c0ed-4838-b6da-f98a8da2090c/datasets/799322e8-ed7a-405c-b3be-045ee16a42b6/compare?selectedSessions=33a431ab-d18d-4995-830e-17a0f859c6c0




0it [00:00, ?it/s]

Naive Retriever Summary:
<ExperimentResults sparkling-eye-19>
Error: results_object does not have 'experiment_name' or 'summary_stats' attributes. Cannot process.


In [90]:
# BM25 Retriever
bm25_ls_results = evaluate(
    bm25_retrieval_chain.invoke,
    data=dataset_name,
    evaluators=[
        qa_evaluator,
    ],
    metadata={"revision_id": "BM25_Retriever"},
)
print("BM25 Retriever Summary:")
print(bm25_ls_results)

# Contextual Compression Retriever
compression_ls_results = evaluate(
    contextual_compression_retrieval_chain.invoke,
    data=dataset_name,
    evaluators=[
        qa_evaluator,
    ],
    metadata={"revision_id": "Contextual_Compression_Retriever"},
)
print("Contextual Compression Retriever Summary:")
print(compression_ls_results)

# Multi-Query Retriever
multi_query_ls_results = evaluate(
    multi_query_retrieval_chain.invoke,
    data=dataset_name,
    evaluators=[
        qa_evaluator,
    ],
    metadata={"revision_id": "Multi_Query_Retriever"},
)
print("Multi-Query Retriever Summary:") 
print(multi_query_ls_results)

# Parent Document Retriever
parent_doc_ls_results = evaluate(
    parent_document_retrieval_chain.invoke,
    data=dataset_name,
    evaluators=[
        qa_evaluator,
    ],
    metadata={"revision_id": "Parent_Document_Retriever"},
)
print("Parent Document Retriever Summary:") 
print(parent_doc_ls_results)

# Ensemble Retriever
ensemble_ls_results = evaluate(
    ensemble_retrieval_chain.invoke,
    data=dataset_name,
    evaluators=[
        qa_evaluator,
    ],
    metadata={"revision_id": "Ensemble_Retriever"},
)
print("Ensemble Retriever Summary:")
print(ensemble_ls_results)

# Semantic Retrieval Chain
semantic_ls_results = evaluate(
    semantic_retrieval_chain.invoke,
    data=dataset_name,
    evaluators=[
        qa_evaluator,
    ],
    metadata={"revision_id": "Semantic_Retrieval_Chain"},
)
print("Semantic Retrieval Chain Summary:")
print(semantic_ls_results)


View the evaluation results for experiment: 'bold-memory-92' at:
https://smith.langchain.com/o/a335557d-c0ed-4838-b6da-f98a8da2090c/datasets/bfc91aaa-6b4f-48cf-8145-9d13b8f858c3/compare?selectedSessions=e7cb5a13-7f09-44ae-9288-93c4bbc64f88




0it [00:00, ?it/s]

BM25 Retriever Summary:
<ExperimentResults bold-memory-92>
View the evaluation results for experiment: 'dependable-clove-34' at:
https://smith.langchain.com/o/a335557d-c0ed-4838-b6da-f98a8da2090c/datasets/bfc91aaa-6b4f-48cf-8145-9d13b8f858c3/compare?selectedSessions=090362fe-b209-429f-9349-4f5ebe3ac686




0it [00:00, ?it/s]

Error running target function: status_code: 429, body: {'message': "You are using a Trial key, which is limited to 10 API calls / minute. You can continue to use the Trial key for free or upgrade to a Production key with higher rate limits at 'https://dashboard.cohere.com/api-keys'. Contact us on 'https://discord.gg/XW44jPfYJu' or email us at support@cohere.com with any questions"}
Traceback (most recent call last):
  File "/Users/geetachaudhry/Projects/mavenbc/code/myAIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/langsmith/evaluation/_runner.py", line 1905, in _forward
    fn(*args, langsmith_extra=langsmith_extra)
    ~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/geetachaudhry/Projects/mavenbc/code/myAIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/langchain_core/runnables/base.py", line 3032, in invoke
    input = context.run(step.invoke, input, config, **kwargs)
  File "/Users/geetachaudhry/Projects/mavenbc/code/myAIE6/13_Advanced_Retrieval/.

Contextual Compression Retriever Summary:
<ExperimentResults dependable-clove-34>
View the evaluation results for experiment: 'yellow-cheek-42' at:
https://smith.langchain.com/o/a335557d-c0ed-4838-b6da-f98a8da2090c/datasets/bfc91aaa-6b4f-48cf-8145-9d13b8f858c3/compare?selectedSessions=735f5ddd-59c5-4d6d-81d4-bf183afdcdc2




0it [00:00, ?it/s]

Multi-Query Retriever Summary:
<ExperimentResults yellow-cheek-42>
View the evaluation results for experiment: 'advanced-mass-60' at:
https://smith.langchain.com/o/a335557d-c0ed-4838-b6da-f98a8da2090c/datasets/bfc91aaa-6b4f-48cf-8145-9d13b8f858c3/compare?selectedSessions=17e0fa96-235d-424e-8b3d-344e613441cc




0it [00:00, ?it/s]

Parent Document Retriever Summary:
<ExperimentResults advanced-mass-60>
View the evaluation results for experiment: 'earnest-country-73' at:
https://smith.langchain.com/o/a335557d-c0ed-4838-b6da-f98a8da2090c/datasets/bfc91aaa-6b4f-48cf-8145-9d13b8f858c3/compare?selectedSessions=8cf04488-cf49-48b6-aa63-85e15a99f97c




0it [00:00, ?it/s]

Ensemble Retriever Summary:
<ExperimentResults earnest-country-73>
View the evaluation results for experiment: 'crazy-head-18' at:
https://smith.langchain.com/o/a335557d-c0ed-4838-b6da-f98a8da2090c/datasets/bfc91aaa-6b4f-48cf-8145-9d13b8f858c3/compare?selectedSessions=d942b42c-b85e-46bb-9ffc-f61532bbaecd




0it [00:00, ?it/s]

Semantic Retrieval Chain Summary:
<ExperimentResults crazy-head-18>
