# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

> You do not need to run the following cells if you are running this notebook locally.

In [2]:
!pip install -qU langchain langchain-openai langchain-cohere rank_bm25

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/55.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m55.3/55.3 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.2/42.2 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m253.9/253.9 kB[0m [31m9.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m45.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m53.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/3.3 MB[0m [31m37.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.9/50.9 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[?25h

We're also going to be leveraging [Qdrant's](https://qdrant.tech/documentation/frameworks/langchain/) (pronounced "Quadrant") VectorDB in "memory" mode (so we can leverage it locally in our colab environment).

In [3]:
!pip install -qU qdrant-client

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/306.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━[0m [32m174.1/306.6 kB[0m [31m5.1 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m306.6/306.6 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m43.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m319.7/319.7 kB[0m [31m18.7 MB/s[0m eta [36m0:00:00[0m
[?25h

We'll also provide our OpenAI key, as well as our Cohere API key.

In [4]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

Enter your OpenAI API Key:··········


In [5]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

Cohere API Key:··········


## Task 2: Data Collection and Preparation

We'll be using some reviews from the 4 movies in the John Wick franchise today to explore the different retrieval strategies.

These were obtained from IMDB, and are available in the [AIM Data Repository](https://github.com/AI-Maker-Space/DataRepository).

### Data Collection

We can simply `wget` these from GitHub.

You could use any review data you wanted in this step - just be careful to make sure your metadata is aligned with your choice.

In [6]:
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw1.csv -O john_wick_1.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw2.csv -O john_wick_2.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw3.csv -O john_wick_3.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw4.csv -O john_wick_4.csv

--2025-03-02 19:04:48--  https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw1.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19628 (19K) [text/plain]
Saving to: ‘john_wick_1.csv’


2025-03-02 19:04:48 (28.3 MB/s) - ‘john_wick_1.csv’ saved [19628/19628]

--2025-03-02 19:04:49--  https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw2.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.110.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14747 (14K) [text/plain]
Saving to: ‘john_wick_2.csv’


2025-03-02 19:04:49 (19.1 MB/s) - ‘john_wick_2.csv’

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

- Self-Query: Wants as much metadata as we can provide
- Time-weighted: Wants temporal data

> NOTE: While we're creating a temporal relationship based on when these movies came out for illustrative purposes, it needs to be clear that the "time-weighting" in the Time-weighted Retriever is based on when the document was *accessed* last - not when it was created.

In [7]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

documents = []

for i in range(1, 5):
  loader = CSVLoader(
      file_path=f"john_wick_{i}.csv",
      metadata_columns=["Review_Date", "Review_Title", "Review_Url", "Author", "Rating"]
  )

  movie_docs = loader.load()
  for doc in movie_docs:

    # Add the "Movie Title" (John Wick 1, 2, ...)
    doc.metadata["Movie_Title"] = f"John Wick {i}"

    # convert "Rating" to an `int`, if no rating is provided - assume 0 rating
    doc.metadata["Rating"] = int(doc.metadata["Rating"]) if doc.metadata["Rating"] else 0

    # newer movies have a more recent "last_accessed_at"
    doc.metadata["last_accessed_at"] = datetime.now() - timedelta(days=4-i)

  documents.extend(movie_docs)

Let's look at an example document to see if everything worked as expected!

In [7]:
documents[0]

Document(metadata={'source': 'john_wick_1.csv', 'row': 0, 'Review_Date': '6 May 2015', 'Review_Title': ' Kinetic, concise, and stylish; John Wick kicks ass.\n', 'Review_Url': '/review/rw3233896/?ref_=tt_urv', 'Author': 'lnvicta', 'Rating': 8, 'Movie_Title': 'John Wick 1', 'last_accessed_at': datetime.datetime(2025, 2, 26, 11, 57, 7, 206941)}, page_content=": 0\nReview: The best way I can describe John Wick is to picture Taken but instead of Liam Neeson it's Keanu Reeves and instead of his daughter it's his dog. That's essentially the plot of the movie. John Wick (Reeves) is out to seek revenge on the people who took something he loved from him. It's a beautifully simple premise for an action movie - when action movies get convoluted, they get bad i.e. A Good Day to Die Hard. John Wick gives the viewers what they want: Awesome action, stylish stunts, kinetic chaos, and a relatable hero to tie it all together. John Wick succeeds in its simplicity.")

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "JohnWick".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [8]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    documents,
    embeddings,
    location=":memory:",
    collection_name="JohnWick"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [9]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [10]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-3.5-turbo` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [11]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI()

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [12]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [13]:
naive_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"Based on the reviews provided, it seems that the general consensus is that people liked John Wick. The film received high ratings and praise for its action sequences, Keanu Reeves' performance, and overall entertainment value."

In [14]:
naive_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10 for the movie "John Wick 3". Here is the URL to that review: \'/review/rw4854296/?ref_=tt_urv\'.'

In [15]:
naive_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In John Wick, an ex-hitman comes out of retirement to seek vengeance on the gangsters that killed his dog and took everything from him. The movie is full of action, shootouts, and breathtaking fights as John Wick unleashes a maelstrom of destruction against those who come after him. It is a story of revenge and relentless vendetta.'

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [17]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(documents)

We'll construct the same chain - only changing the retriever.

In [18]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [19]:
bm25_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"People's opinions on John Wick vary. Some viewers enjoyed the action and found it to be a smooth, stylish film, while others thought it lacked substance and depth."

In [21]:
bm25_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

"I'm sorry, but there are no reviews with a rating of 10 in the context provided."

In [22]:
bm25_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'John Wick is an action movie with beautifully choreographed fight scenes and emotional depth. It stars Keanu Reeves and has been highly recommended for those who love action movies.'

It's not clear that this is better or worse - but the `I don't know` isn't great!

## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [25]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-english-v3.0")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [26]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [28]:
contextual_compression_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Yes, people generally liked John Wick. The reviews mention that it is a remarkable and surprising film, highly recommended for action buffs and those who like a good movie.'

In [27]:
contextual_compression_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10. Here is the URL to that review:\n\'/review/rw4854296/?ref_=tt_urv\' for the review titled "A Masterpiece & Brilliant Sequel" by author \'ymyuseda\'.'

In [29]:
contextual_compression_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In John Wick, the character John Wick seeks revenge after his dog is killed and his house is blown up, leading to a series of events that involve him taking on various criminal organizations and professional killers. He ultimately faces off against Santino D'Antonio to avenge the death of his loved ones."

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [30]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [31]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [32]:
multi_query_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"Based on the reviews provided, it seems that people generally liked the John Wick movies. The reviews praise the action sequences, Keanu Reeves' performance, and the overall entertainment value of the movies. The positive reviews indicate that many viewers enjoyed the films."

In [36]:
multi_query_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'No reviews have a rating of 10 in the provided context.'

In [34]:
multi_query_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In the movie John Wick, Keanu Reeves plays the character of John Wick, a retired assassin who comes out of retirement to seek revenge after his dog is killed and his car is stolen. He embarks on a mission that involves a lot of carnage and involves traveling to Italy, Canada, and Manhattan to take down numerous assassins.'

## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [37]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = documents
child_splitter = RecursiveCharacterTextSplitter(chunk_size=200)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [38]:
client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = Qdrant(
    collection_name="full_documents", embeddings=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

  parent_document_vectorstore = Qdrant(


Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [39]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [40]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [41]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [42]:
parent_document_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the reviews provided, opinions on John Wick appear to be divided. Some people really enjoy the series and find it consistent and well-received, while others find it boring and nonsensical.'

In [43]:
parent_document_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

"Yes, there is a review with a rating of 10. Here is the URL to that review: '/review/rw4854296/?ref_=tt_urv'"

In [44]:
parent_document_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In the John Wick movies, John Wick is a retired assassin who comes out of retirement after someone kills his dog and steals his car. He then goes on a mission of revenge and faces off against many assassins along the way. In the second movie, he is forced back into the world of assassins when an Italian baddie calls in a favor. The movies are known for their intense action sequences and high body count.'

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [45]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [46]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [48]:
ensemble_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"Based on the reviews provided, it seems that people generally liked John Wick. The movie received positive feedback for its action sequences, Keanu Reeves' performance, and the overall entertainment value it provided to the viewers."

In [49]:
ensemble_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

"Yes, there is a review with a rating of 10. Here is the URL to that review:\n- '/review/rw4854296/?ref_=tt_urv'"

In [50]:
ensemble_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In "John Wick," an ex-hit-man comes out of retirement to seek revenge on the gangsters who killed his dog and took everything from him. With intense action, shootouts, and breathtaking fights, John Wick embarks on a mission of vengeance, facing off against various adversaries in a relentless quest for retribution.'

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

> NOTE: You do not need to run this cell if you're running this locally

In [51]:
!pip install -qU langchain_experimental

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/209.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━[0m [32m163.8/209.2 kB[0m [31m4.6 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m209.2/209.2 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[?25h

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [52]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [53]:
semantic_documents = semantic_chunker.split_documents(documents)

Let's create a new vector store.

In [54]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="JohnWickSemantic"
)

We'll use naive retrieval for this example.

In [55]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [56]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [57]:
semantic_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Yes, people generally liked John Wick based on the reviews provided.'

In [58]:
semantic_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

"Yes, there is at least one review with a rating of 10. Here is the URL to that review: '/review/rw4860412/?ref_=tt_urv'."

In [59]:
semantic_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In "John Wick," the main character seeks revenge on the people who took something he loved from him, which led to a series of chaotic and action-packed events.'

# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.

# Create Golden Dataset

In [8]:
import os
import getpass

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("LangChain API Key:")

LangChain API Key:··········


In [9]:
!pip install -qU ragas==0.2.10

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/175.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m174.1/175.7 kB[0m [31m6.9 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m175.7/175.7 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/45.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.5/45.5 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m71.1/71.1 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m485.4/485.4 kB[0m [31m16.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [10]:
!pip install rapidfuzz

Collecting rapidfuzz
  Downloading rapidfuzz-3.12.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Downloading rapidfuzz-3.12.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m27.3 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: rapidfuzz
Successfully installed rapidfuzz-3.12.2


In [5]:
### Load Data
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw1.csv -O john_wick_1.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw2.csv -O john_wick_2.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw3.csv -O john_wick_3.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw4.csv -O john_wick_4.csv


--2025-03-01 20:19:09--  https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw1.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19628 (19K) [text/plain]
Saving to: ‘john_wick_1.csv’


2025-03-01 20:19:10 (19.3 MB/s) - ‘john_wick_1.csv’ saved [19628/19628]

--2025-03-01 20:19:10--  https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw2.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14747 (14K) [text/plain]
Saving to: ‘john_wick_2.csv’


2025-03-01 20:19:10 (20.7 MB/s) - ‘john_wick_2.csv’

In [6]:
### Prepare Data
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

documents = []

for i in range(1, 5):
  loader = CSVLoader(
      file_path=f"john_wick_{i}.csv",
      metadata_columns=["Review_Date", "Review_Title", "Review_Url", "Author", "Rating"]
  )

  movie_docs = loader.load()
  for doc in movie_docs:

    # Add the "Movie Title" (John Wick 1, 2, ...)
    doc.metadata["Movie_Title"] = f"John Wick {i}"

    # convert "Rating" to an `int`, if no rating is provided - assume 0 rating
    doc.metadata["Rating"] = int(doc.metadata["Rating"]) if doc.metadata["Rating"] else 0

    # newer movies have a more recent "last_accessed_at"
    doc.metadata["last_accessed_at"] = datetime.now() - timedelta(days=4-i)

  documents.extend(movie_docs)

In [12]:
### Generate Dataset
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o-mini"))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

from ragas.testset import TestsetGenerator

generator = TestsetGenerator(llm=generator_llm, embedding_model=generator_embeddings)
dataset = generator.generate_with_langchain_docs(documents, testset_size=10)

Applying SummaryExtractor:   0%|          | 0/44 [00:00<?, ?it/s]

Applying CustomNodeFilter:   0%|          | 0/100 [00:00<?, ?it/s]



Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:   0%|          | 0/244 [00:00<?, ?it/s]

Applying OverlapScoreBuilder:   0%|          | 0/1 [00:00<?, ?it/s]

Generating personas:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Scenarios:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Samples:   0%|          | 0/10 [00:00<?, ?it/s]

In [14]:
dataset.to_pandas().to_csv('golden_dataset.csv')

In [19]:
import pandas as pd
# load data
dataset_df = pd.read_csv('golden_dataset.csv').drop(columns=['Unnamed: 0'])
dataset_df

Unnamed: 0,user_input,reference_contexts,reference,synthesizer_name
0,How does A Good Day to Die Hard compare to Joh...,"["": 0\nReview: The best way I can describe Joh...",A Good Day to Die Hard is described as convolu...,single_hop_specifc_query_synthesizer
1,Why is John Wick so popular and what makes it ...,[': 2\nReview: With the fourth installment sco...,John Wick is popular because it has four insta...,single_hop_specifc_query_synthesizer
2,What makes Keanu Reeves' performance in John W...,[': 3\nReview: John wick has a very simple rev...,Keanu Reeves' performance in John Wick is spec...,single_hop_specifc_query_synthesizer
3,What happen to John Wick in the movie and how ...,[': 4\nReview: Though he no longer has a taste...,"In the movie, John Wick, a retired assassin kn...",single_hop_specifc_query_synthesizer
4,What happens to the hit-man in the movie John ...,"["": 5\nReview: Ultra-violent first entry with ...","In John Wick, an ex-hit-man comes out of retir...",single_hop_specifc_query_synthesizer
5,What are the main themes and narrative consequ...,"[""<1-hop>\n\n: 24\nReview: John Wick: Chapter ...",John Wick: Chapter 3 - Parabellum explores the...,multi_hop_specific_query_synthesizer
6,How does the action and pacing in John Wick 2 ...,"[""<1-hop>\n\n: 10\nReview: The first John Wick...",John Wick 2 does not have the ability to surpr...,multi_hop_specific_query_synthesizer
7,What are the contrasting perspectives on the p...,"['<1-hop>\n\n: 11\nReview: The overrated ""John...",The reviews present contrasting perspectives o...,multi_hop_specific_query_synthesizer
8,How does The Marquis's role in John Wick relat...,"['<1-hop>\n\n: 1\nReview: The Table, the inter...",The Marquis's role in John Wick is significant...,multi_hop_specific_query_synthesizer
9,How does Chapter 4 improve upon Chapter 3 in t...,"[""<1-hop>\n\n: 19\nReview: John Wick: Chapter ...",Chapter 4 improves upon Chapter 3 by maintaini...,multi_hop_specific_query_synthesizer


# Create the dataset on Langsmith

In [16]:
from langsmith import Client

client = Client()

dataset_name = "John Wick"

langsmith_dataset = client.create_dataset(
    dataset_name=dataset_name,
    description="John Wick "
)

In [18]:
for data_row in dataset.to_pandas().iterrows():
  client.create_example(
      inputs={
          "question": data_row[1]["user_input"]
      },
      outputs={
          "answer": data_row[1]["reference"]
      },
      metadata={
          "context": data_row[1]["reference_contexts"]
      },
      dataset_id=langsmith_dataset.id
  )

# Create Result DataFrame

In [116]:
import pandas as pd
result_df = pd.DataFrame(columns=['Technique','Latency', 'Cost', 'Context Precision', 'Context Recall'])
result_df

Unnamed: 0,Technique,Latency,Cost,Context Precision,Context Recall


# Helper Function for evaluation




In [152]:
from langsmith import Client
from langchain.smith import RunEvalConfig
import pandas as pd
import json
import os
import numpy as np
from ragas import EvaluationDataset
from ragas import evaluate
from ragas.llms import LangchainLLMWrapper
from ragas.metrics import LLMContextRecall, LLMContextPrecisionWithReference
from ragas import evaluate, RunConfig

def extract_langsmith_results(experiment_run,dataset_df):
  data = []

  # Iterate through the results and extract relevant values
  for result_key in experiment_run['results']:
      output = experiment_run['results'][result_key]['output']

      data.append({
          'user_input': experiment_run['results'][result_key]['input']['question'],
          'response': output['response'].content,
          'retrieved_contexts': [context.page_content for context in output['context']],  # Store context as a list
          'latency': experiment_run['results'][result_key]['execution_time'],
          'total_tokens': output['response'].response_metadata['token_usage']['total_tokens']
      })

  langsmith_results_df = pd.DataFrame(data)
  test_dataset_technique = dataset_df.copy()
  merged_df = langsmith_results_df.merge(test_dataset_technique, on='user_input', how='inner')
  ragas_df = merged_df[['user_input', 'retrieved_contexts', 'reference_contexts', 'response', 'reference', 'synthesizer_name']]
  ragas_df['reference_contexts'] = ragas_df['reference_contexts'].apply(ast.literal_eval)



  # Convert the list of dictionaries into a DataFrame
  return langsmith_results_df, ragas_df

def run_ragas(ragas_df):
  evaluation_dataset = EvaluationDataset.from_pandas(ragas_df)

  evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o-mini"))

  custom_run_config = RunConfig(timeout=360)

  result__ragas_naive = evaluate(
      dataset=evaluation_dataset,
      metrics=[LLMContextRecall(), LLMContextPrecisionWithReference()],
      llm=evaluator_llm,
      run_config=custom_run_config
  )
  return result__ragas_naive
def update_result(technique, langsmith_result, ragas_result, result_df):

  new_row = {
            'Technique': technique ,
            'Latency': langsmith_result['latency'].mean(),
            'Cost': langsmith_result['total_tokens'].mean(),
            'Context Precision': np.mean(ragas_result['llm_context_precision_with_reference']),
            'Context Recall': np.mean(ragas_result['context_recall'])
        }

  result_df = pd.concat([result_df, pd.DataFrame([new_row])], ignore_index=True)
  return result_df







# Initialize Langsmith Client

In [153]:
# Initialize the client
client = Client()

# Define your dataset name
dataset_name = "John Wick"

# Create a run configuration
run_config = RunEvalConfig(
    evaluators=[],  # No evaluators for now, just running the chain
    custom_metrics={},  # No custom metrics for this run
)

# Setting up QDrant

In [22]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    documents,
    embeddings,
    location=":memory:",
    collection_name="JohnWick"
)

# Evaluate Naive Retrieval



### Setting Naive chain

In [23]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI()

from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

naive_retrieval_chain = (
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [24]:
# Define a function that creates your chain
def create_chain():
    return naive_retrieval_chain

session_name = "John Wick Full Dataset Run"

### Run on Langsmith

In [None]:
experiment_run = client.run_on_dataset(
        dataset_name=dataset_name,
        llm_or_chain_factory=create_chain,
        evaluation=run_config,
        project_name=session_name,
        metadata={
            "description": "Running full John Wick dataset to collect metrics, responses, and contexts"
        }
    )

### Evaluate with RAGAS

In [93]:
langsmith_results, ragas_df = extract_langsmith_results(experiment_run,dataset_df)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ragas_df['reference_contexts'] = ragas_df['reference_contexts'].apply(ast.literal_eval)


In [81]:
langsmith_results

Unnamed: 0,user_input,response,retrieved_contexts,latency,total_tokens
0,How does Chapter 4 improve upon Chapter 3 in t...,I don't know how Chapter 4 improves upon Chapt...,"[: 7\nReview: About mid-way through the film, ...",1.489089,2997
1,How does The Marquis's role in John Wick relat...,The action-packed style of The Matrix is relat...,[: 20\nReview: John Wick is something special....,2.674525,3791
2,What are the contrasting perspectives on the p...,The reviews on 'John Wick: Chapter 3 - Parabel...,[: 0\nReview: It is 5 years since the first Jo...,2.532513,3569
3,How does the action and pacing in John Wick 2 ...,I don't have specific information on how the a...,"[: 9\nReview: ""John Wick: Chapter 2"" is an Ame...",1.631563,3492
4,What are the main themes and narrative consequ...,The main themes explored in John Wick: Chapter...,[: 24\nReview: John Wick: Chapter 3 - Parabell...,2.28261,3572
5,What happens to the hit-man in the movie John ...,"In the movie John Wick, the hit-man faces many...","[: 18\nReview: When the story begins, John (Ke...",1.48414,3389
6,What happen to John Wick in the movie and how ...,"In the movie ""John Wick,"" the titular characte...",[: 0\nReview: The best way I can describe John...,5.256903,4036
7,What makes Keanu Reeves' performance in John W...,Keanu Reeves' performance in John Wick is spec...,"[: 9\nReview: At first glance, John Wick sound...",1.311281,3631
8,Why is John Wick so popular and what makes it ...,John Wick is popular because of the slickness ...,"[: 9\nReview: At first glance, John Wick sound...",2.180597,3929
9,How does A Good Day to Die Hard compare to Joh...,"In terms of narrative coherence, John Wick is ...",[: 0\nReview: The best way I can describe John...,1.883184,3863


In [94]:
ragas_df

Unnamed: 0,user_input,retrieved_contexts,reference_contexts,response,reference,synthesizer_name
0,How does Chapter 4 improve upon Chapter 3 in t...,"[: 7\nReview: About mid-way through the film, ...",[<1-hop>\n\n: 19\nReview: John Wick: Chapter 4...,I don't know how Chapter 4 improves upon Chapt...,Chapter 4 improves upon Chapter 3 by maintaini...,multi_hop_specific_query_synthesizer
1,How does The Marquis's role in John Wick relat...,[: 20\nReview: John Wick is something special....,"[<1-hop>\n\n: 1\nReview: The Table, the intern...",The action-packed style of The Matrix is relat...,The Marquis's role in John Wick is significant...,multi_hop_specific_query_synthesizer
2,What are the contrasting perspectives on the p...,[: 0\nReview: It is 5 years since the first Jo...,"[<1-hop>\n\n: 11\nReview: The overrated ""John ...",The reviews on 'John Wick: Chapter 3 - Parabel...,The reviews present contrasting perspectives o...,multi_hop_specific_query_synthesizer
3,How does the action and pacing in John Wick 2 ...,"[: 9\nReview: ""John Wick: Chapter 2"" is an Ame...",[<1-hop>\n\n: 10\nReview: The first John Wick ...,I don't have specific information on how the a...,John Wick 2 does not have the ability to surpr...,multi_hop_specific_query_synthesizer
4,What are the main themes and narrative consequ...,[: 24\nReview: John Wick: Chapter 3 - Parabell...,[<1-hop>\n\n: 24\nReview: John Wick: Chapter 3...,The main themes explored in John Wick: Chapter...,John Wick: Chapter 3 - Parabellum explores the...,multi_hop_specific_query_synthesizer
5,What happens to the hit-man in the movie John ...,"[: 18\nReview: When the story begins, John (Ke...",[: 5\nReview: Ultra-violent first entry with l...,"In the movie John Wick, the hit-man faces many...","In John Wick, an ex-hit-man comes out of retir...",single_hop_specifc_query_synthesizer
6,What happen to John Wick in the movie and how ...,[: 0\nReview: The best way I can describe John...,[: 4\nReview: Though he no longer has a taste ...,"In the movie ""John Wick,"" the titular characte...","In the movie, John Wick, a retired assassin kn...",single_hop_specifc_query_synthesizer
7,What makes Keanu Reeves' performance in John W...,"[: 9\nReview: At first glance, John Wick sound...",[: 3\nReview: John wick has a very simple reve...,Keanu Reeves' performance in John Wick is spec...,Keanu Reeves' performance in John Wick is spec...,single_hop_specifc_query_synthesizer
8,Why is John Wick so popular and what makes it ...,"[: 9\nReview: At first glance, John Wick sound...",[: 2\nReview: With the fourth installment scor...,John Wick is popular because of the slickness ...,John Wick is popular because it has four insta...,single_hop_specifc_query_synthesizer
9,How does A Good Day to Die Hard compare to Joh...,[: 0\nReview: The best way I can describe John...,[: 0\nReview: The best way I can describe John...,"In terms of narrative coherence, John Wick is ...",A Good Day to Die Hard is described as convolu...,single_hop_specifc_query_synthesizer


In [97]:
from ragas import EvaluationDataset
from ragas import evaluate
from ragas.llms import LangchainLLMWrapper
from ragas.metrics import LLMContextRecall, LLMContextPrecisionWithReference
from ragas import evaluate, RunConfig

evaluation_dataset = EvaluationDataset.from_pandas(ragas_df)

evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o-mini"))

custom_run_config = RunConfig(timeout=360)

result__ragas_naive = evaluate(
    dataset=evaluation_dataset,
    metrics=[LLMContextRecall(), LLMContextPrecisionWithReference()],
    llm=evaluator_llm,
    run_config=custom_run_config
)
result__ragas_naive

Evaluating:   0%|          | 0/20 [00:00<?, ?it/s]

{'context_recall': 0.7150, 'llm_context_precision_with_reference': 0.8300}

### Update Results DF

In [107]:
result__ragas_naive

{'context_recall': 0.7150, 'llm_context_precision_with_reference': 0.8300}

In [99]:
langsmith_results

Unnamed: 0,user_input,response,retrieved_contexts,latency,total_tokens
0,How does Chapter 4 improve upon Chapter 3 in t...,I don't know how Chapter 4 improves upon Chapt...,"[: 7\nReview: About mid-way through the film, ...",1.489089,2997
1,How does The Marquis's role in John Wick relat...,The action-packed style of The Matrix is relat...,[: 20\nReview: John Wick is something special....,2.674525,3791
2,What are the contrasting perspectives on the p...,The reviews on 'John Wick: Chapter 3 - Parabel...,[: 0\nReview: It is 5 years since the first Jo...,2.532513,3569
3,How does the action and pacing in John Wick 2 ...,I don't have specific information on how the a...,"[: 9\nReview: ""John Wick: Chapter 2"" is an Ame...",1.631563,3492
4,What are the main themes and narrative consequ...,The main themes explored in John Wick: Chapter...,[: 24\nReview: John Wick: Chapter 3 - Parabell...,2.28261,3572
5,What happens to the hit-man in the movie John ...,"In the movie John Wick, the hit-man faces many...","[: 18\nReview: When the story begins, John (Ke...",1.48414,3389
6,What happen to John Wick in the movie and how ...,"In the movie ""John Wick,"" the titular characte...",[: 0\nReview: The best way I can describe John...,5.256903,4036
7,What makes Keanu Reeves' performance in John W...,Keanu Reeves' performance in John Wick is spec...,"[: 9\nReview: At first glance, John Wick sound...",1.311281,3631
8,Why is John Wick so popular and what makes it ...,John Wick is popular because of the slickness ...,"[: 9\nReview: At first glance, John Wick sound...",2.180597,3929
9,How does A Good Day to Die Hard compare to Joh...,"In terms of narrative coherence, John Wick is ...",[: 0\nReview: The best way I can describe John...,1.883184,3863


In [118]:
import numpy as np
technique = 'Naive'

result_df = update_result(technique, langsmith_results, result__ragas_naive, result_df)
result_df


  result_df = pd.concat([result_df, pd.DataFrame([new_row])], ignore_index=True)


Unnamed: 0,Technique,Latency,Cost,Context Precision,Context Recall
0,Naive,2.272641,3626.9,0.829996,0.715


In [119]:
result_df.to_csv('result_df.csv')

# Best-Matching 25 (BM25) Retriever

### Define chain

In [120]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(documents)

bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [121]:
# Define a function that creates your chain
def create_chain():
    return bm25_retrieval_chain

session_name = "John Wick Full Dataset Run bm25"

### Run Dataset on langsmith

In [122]:
experiment_run = client.run_on_dataset(
        dataset_name=dataset_name,
        llm_or_chain_factory=create_chain,
        evaluation=run_config,
        project_name=session_name,
        metadata={
            "description": "Running full John Wick dataset to collect metrics, responses, and contexts bm25"
        }
    )

  experiment_run = client.run_on_dataset(


View the evaluation results for project 'John Wick Full Dataset Run bm25' at:
https://smith.langchain.com/o/73a1f3ba-0d4a-4422-a192-2262a2ce081d/datasets/06e69cb1-b22f-4738-ae6b-9ea8dc93a8c1/compare?selectedSessions=8354225b-f49f-4f15-a831-12f1b48db2fc

View all tests for Dataset John Wick at:
https://smith.langchain.com/o/73a1f3ba-0d4a-4422-a192-2262a2ce081d/datasets/06e69cb1-b22f-4738-ae6b-9ea8dc93a8c1
[------------------------------------------------->] 10/10

In [124]:
langsmith_results, ragas_df = extract_langsmith_results(experiment_run,dataset_df)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ragas_df['reference_contexts'] = ragas_df['reference_contexts'].apply(ast.literal_eval)


### Run RAGAS Evaluation

In [125]:
result__ragas = run_ragas(ragas_df)
result__ragas

Evaluating:   0%|          | 0/20 [00:00<?, ?it/s]

{'context_recall': 0.5900, 'llm_context_precision_with_reference': 0.7056}

### Update Results

In [133]:
technique = 'Best-Matching 25'

result_df = update_result(technique, langsmith_results, result__ragas, result_df)
result_df

Unnamed: 0,Technique,Latency,Cost,Context Precision,Context Recall
0,Naive,2.272641,3626.9,0.829996,0.715
1,Best-Matching 25,1.303881,1334.9,0.705556,0.59


In [134]:
result_df.to_csv('result_df.csv')

# Contextual Compression (Using Reranking)

### Define chain

In [135]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-english-v3.0")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [136]:
# Define a function that creates your chain
def create_chain():
    return contextual_compression_retrieval_chain

session_name = "John Wick Full Dataset Run Reranker"

### Run Dataset on langsmith

In [137]:
experiment_run = client.run_on_dataset(
        dataset_name=dataset_name,
        llm_or_chain_factory=create_chain,
        evaluation=run_config,
        project_name=session_name,
        metadata={
            "description": "Running full John Wick dataset to collect metrics, responses, and contexts reranker"
        }
    )

  experiment_run = client.run_on_dataset(


View the evaluation results for project 'John Wick Full Dataset Run Reranker' at:
https://smith.langchain.com/o/73a1f3ba-0d4a-4422-a192-2262a2ce081d/datasets/06e69cb1-b22f-4738-ae6b-9ea8dc93a8c1/compare?selectedSessions=2224eef5-75c0-427b-90e3-6bb479782220

View all tests for Dataset John Wick at:
https://smith.langchain.com/o/73a1f3ba-0d4a-4422-a192-2262a2ce081d/datasets/06e69cb1-b22f-4738-ae6b-9ea8dc93a8c1
[------------------------------------------------->] 10/10

In [138]:
langsmith_results, ragas_df = extract_langsmith_results(experiment_run,dataset_df)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ragas_df['reference_contexts'] = ragas_df['reference_contexts'].apply(ast.literal_eval)


### Run RAGAS Evaluation

In [139]:
result__ragas = run_ragas(ragas_df)
result__ragas

Evaluating:   0%|          | 0/20 [00:00<?, ?it/s]

{'context_recall': 0.6067, 'llm_context_precision_with_reference': 0.9833}

### Update Results

In [140]:
technique = 'Compression- Reranker'

result_df = update_result(technique, langsmith_results, result__ragas, result_df)
result_df

Unnamed: 0,Technique,Latency,Cost,Context Precision,Context Recall
0,Naive,2.272641,3626.9,0.829996,0.715
1,Best-Matching 25,1.303881,1334.9,0.705556,0.59
2,Compression- Reranker,1.920618,1312.2,0.983333,0.606667


In [141]:
result_df.to_csv('result_df.csv')

# Multi-Query Retriever

### Define chain

In [142]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [143]:
# Define a function that creates your chain
def create_chain():
    return multi_query_retrieval_chain

session_name = "John Wick Full Dataset Run Multi Query"

### Run Dataset on langsmith

In [144]:
experiment_run = client.run_on_dataset(
        dataset_name=dataset_name,
        llm_or_chain_factory=create_chain,
        evaluation=run_config,
        project_name=session_name,
        metadata={
            "description": "Running full John Wick dataset to collect metrics, responses, and contexts - Multi Query"
        }
    )

  experiment_run = client.run_on_dataset(


View the evaluation results for project 'John Wick Full Dataset Run Multi Query' at:
https://smith.langchain.com/o/73a1f3ba-0d4a-4422-a192-2262a2ce081d/datasets/06e69cb1-b22f-4738-ae6b-9ea8dc93a8c1/compare?selectedSessions=c22ee315-026e-43d4-b7ec-920a53757eeb

View all tests for Dataset John Wick at:
https://smith.langchain.com/o/73a1f3ba-0d4a-4422-a192-2262a2ce081d/datasets/06e69cb1-b22f-4738-ae6b-9ea8dc93a8c1
[------------------------------------------------->] 10/10

In [145]:
langsmith_results, ragas_df = extract_langsmith_results(experiment_run,dataset_df)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ragas_df['reference_contexts'] = ragas_df['reference_contexts'].apply(ast.literal_eval)


### Run RAGAS Evaluation

In [146]:
result__ragas = run_ragas(ragas_df)
result__ragas

Evaluating:   0%|          | 0/20 [00:00<?, ?it/s]

{'context_recall': 0.8083, 'llm_context_precision_with_reference': 0.7747}

### Update Results

In [147]:
technique = 'MultiQuery'

result_df = update_result(technique, langsmith_results, result__ragas, result_df)
result_df

Unnamed: 0,Technique,Latency,Cost,Context Precision,Context Recall
0,Naive,2.272641,3626.9,0.829996,0.715
1,Best-Matching 25,1.303881,1334.9,0.705556,0.59
2,Compression- Reranker,1.920618,1312.2,0.983333,0.606667
3,MultiQuery,3.381989,4678.1,0.774707,0.808333


In [148]:
result_df.to_csv('result_df.csv')

# Parent Document Retriever

### Define Chain

In [154]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = documents
child_splitter = RecursiveCharacterTextSplitter(chunk_size=200)

client_qdrant = QdrantClient(location=":memory:")

client_qdrant.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = Qdrant(
    collection_name="full_documents", embeddings=OpenAIEmbeddings(model="text-embedding-3-small"), client=client_qdrant
)

store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)
parent_document_retriever.add_documents(parent_docs, ids=None)

parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)


In [150]:
# Define a function that creates your chain
def create_chain():
    return parent_document_retrieval_chain

session_name = "John Wick Full Dataset Run Parent Retriever"

### Run Dataset on langsmith

In [155]:
experiment_run = client.run_on_dataset(
        dataset_name=dataset_name,
        llm_or_chain_factory=create_chain,
        evaluation=run_config,
        project_name=session_name,
        metadata={
            "description": "Running full John Wick dataset to collect metrics, responses, and contexts Parent Retriever"
        }
    )

  experiment_run = client.run_on_dataset(


View the evaluation results for project 'John Wick Full Dataset Run Parent Retriever' at:
https://smith.langchain.com/o/73a1f3ba-0d4a-4422-a192-2262a2ce081d/datasets/06e69cb1-b22f-4738-ae6b-9ea8dc93a8c1/compare?selectedSessions=72f773ce-1d5a-4c0f-a3ba-a798707ad8b7

View all tests for Dataset John Wick at:
https://smith.langchain.com/o/73a1f3ba-0d4a-4422-a192-2262a2ce081d/datasets/06e69cb1-b22f-4738-ae6b-9ea8dc93a8c1
[------------------------------------------------->] 10/10

In [156]:
langsmith_results, ragas_df = extract_langsmith_results(experiment_run,dataset_df)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ragas_df['reference_contexts'] = ragas_df['reference_contexts'].apply(ast.literal_eval)


### Run RAGAS Evaluation

In [157]:
result__ragas = run_ragas(ragas_df)
result__ragas

Evaluating:   0%|          | 0/20 [00:00<?, ?it/s]

{'context_recall': 0.5233, 'llm_context_precision_with_reference': 0.8000}

### Update Results

In [158]:
technique = 'Parent Retrieval'

result_df = update_result(technique, langsmith_results, result__ragas, result_df)
result_df

Unnamed: 0,Technique,Latency,Cost,Context Precision,Context Recall
0,Naive,2.272641,3626.9,0.829996,0.715
1,Best-Matching 25,1.303881,1334.9,0.705556,0.59
2,Compression- Reranker,1.920618,1312.2,0.983333,0.606667
3,MultiQuery,3.381989,4678.1,0.774707,0.808333
4,Parent Retrieval,1.695198,726.9,0.8,0.523333


In [159]:
result_df.to_csv('result_df.csv')

# Ensemble Retriever

### Define Chain

In [160]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [161]:
# Define a function that creates your chain
def create_chain():
    return ensemble_retrieval_chain

session_name = "John Wick Full Dataset Run Ensemble Retriever"

### Run Dataset on langsmith

In [162]:
experiment_run = client.run_on_dataset(
        dataset_name=dataset_name,
        llm_or_chain_factory=create_chain,
        evaluation=run_config,
        project_name=session_name,
        metadata={
            "description": "Running full John Wick dataset to collect metrics, responses, and contexts Ensemblerent Retriever"
        }
    )

  experiment_run = client.run_on_dataset(


View the evaluation results for project 'John Wick Full Dataset Run Ensemble Retriever' at:
https://smith.langchain.com/o/73a1f3ba-0d4a-4422-a192-2262a2ce081d/datasets/06e69cb1-b22f-4738-ae6b-9ea8dc93a8c1/compare?selectedSessions=dcb608bb-5326-4779-905d-b183bedc5e48

View all tests for Dataset John Wick at:
https://smith.langchain.com/o/73a1f3ba-0d4a-4422-a192-2262a2ce081d/datasets/06e69cb1-b22f-4738-ae6b-9ea8dc93a8c1
[------------------------------------------------->] 10/10

In [163]:
langsmith_results, ragas_df = extract_langsmith_results(experiment_run,dataset_df)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ragas_df['reference_contexts'] = ragas_df['reference_contexts'].apply(ast.literal_eval)


### Run RAGAS Evaluation

In [164]:
result__ragas = run_ragas(ragas_df)
result__ragas

Evaluating:   0%|          | 0/20 [00:00<?, ?it/s]

{'context_recall': 0.9000, 'llm_context_precision_with_reference': 0.7737}

### Update Results

In [165]:
technique = 'Ensemble Retrieval'

result_df = update_result(technique, langsmith_results, result__ragas, result_df)
result_df

Unnamed: 0,Technique,Latency,Cost,Context Precision,Context Recall
0,Naive,2.272641,3626.9,0.829996,0.715
1,Best-Matching 25,1.303881,1334.9,0.705556,0.59
2,Compression- Reranker,1.920618,1312.2,0.983333,0.606667
3,MultiQuery,3.381989,4678.1,0.774707,0.808333
4,Parent Retrieval,1.695198,726.9,0.8,0.523333
5,Ensemble Retrieval,3.827447,5499.1,0.773691,0.9


In [166]:
result_df.to_csv('result_df.csv')

# Conclusion

In [167]:
result_df

Unnamed: 0,Technique,Latency,Cost,Context Precision,Context Recall
0,Naive,2.272641,3626.9,0.829996,0.715
1,Best-Matching 25,1.303881,1334.9,0.705556,0.59
2,Compression- Reranker,1.920618,1312.2,0.983333,0.606667
3,MultiQuery,3.381989,4678.1,0.774707,0.808333
4,Parent Retrieval,1.695198,726.9,0.8,0.523333
5,Ensemble Retrieval,3.827447,5499.1,0.773691,0.9


Best score for recall is **Ensemble Retrever**.   
Since it's combining multiple methods and using their retrieved chunks it was best at retrieving the bigger portion of relevant chunks.  

Best score for precision is **Compresison- Reranker**.  
Since reranker's job is to compress retrieved chunks to the most relevant ones.  

Best score for latency is **Best-Matching 25**.  
Since using sparse representation it's the simplest to compute  
While biggest latency is **Ensemble retriever**.  
Since it uses all retrievers.  

Biggest cost is **Ensemble Retriever** while the least is **Parent Retriever**

# Full Code

In [None]:
#### Prepare Data
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

documents = []

for i in range(1, 5):
  loader = CSVLoader(
      file_path=f"john_wick_{i}.csv",
      metadata_columns=["Review_Date", "Review_Title", "Review_Url", "Author", "Rating"]
  )

  movie_docs = loader.load()
  for doc in movie_docs:

    # Add the "Movie Title" (John Wick 1, 2, ...)
    doc.metadata["Movie_Title"] = f"John Wick {i}"

    # convert "Rating" to an `int`, if no rating is provided - assume 0 rating
    doc.metadata["Rating"] = int(doc.metadata["Rating"]) if doc.metadata["Rating"] else 0

    # newer movies have a more recent "last_accessed_at"
    doc.metadata["last_accessed_at"] = datetime.now() - timedelta(days=4-i)

  documents.extend(movie_docs)

### Prepare vectorstore
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    documents,
    embeddings,
    location=":memory:",
    collection_name="JohnWick"
)

### Naive Retriever chain

naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI()

from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

naive_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

##### Best-Matching 25 (BM25) Retriever chain

from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(documents)

bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

bm25_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

#### Contextual Compression (Using Reranking)

from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-english-v3.0")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

contextual_compression_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

##### Multi-Query Retriever

from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

multi_query_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

##### Parent Document Retriever

from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = documents
child_splitter = RecursiveCharacterTextSplitter(chunk_size=200)

client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = Qdrant(
    collection_name="full_documents", embeddings=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

parent_document_retriever.add_documents(parent_docs, ids=None)

parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

parent_document_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

##### Ensemble Retriever

from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

ensemble_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

#### Semantic Chunking

from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

semantic_documents = semantic_chunker.split_documents(documents)

semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="JohnWickSemantic"
)

semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

semantic_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content