In [None]:
OpenAI_api_key=""

In [None]:
Cohere_API_Key = ""

# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Parent Document Retrieval
- Self-Query Retrieval
- Time-weighted Retrieval
- Contextual Compression Retrieval

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

> NOTE: We'll provide a separate notebook for the evaluation of these methods which will be available in the YouTube comments

## Step 1:

Installing required libraries!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

In [None]:
!pip install -qU langchain langchain-openai langchain-cohere

We're also going to be leveraging [Qdrant's](https://qdrant.tech/documentation/frameworks/langchain/) (pronounced "Quadrant") VectorDB in "memory" mode (so we can leverage it locally in our colab environment), and require [Lark](https://github.com/lark-parser/lark) for our Self-Query retriever.

In [None]:
!pip install -qU qdrant-client lark

We'll also provide our OpenAI key, as well as our Cohere API key.

In [None]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

Enter your OpenAI API Key:··········


In [None]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

Cohere API Key:··········


## Data Collection and Preparation

We'll be using some reviews from the 4 movies in the John Wick franchise today to explore the different retrieval strategies.

These were obtained from IMDB, and are available in the [AIM Data Repository](https://github.com/AI-Maker-Space/DataRepository).

### Data Collection

We can simply `wget` these from GitHub.

You could use any review data you wanted in this step - just be careful to make sure your metadata is aligned with your choice.

In [None]:
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw1.csv -O john_wick_1.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw2.csv -O john_wick_2.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw3.csv -O john_wick_3.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw4.csv -O john_wick_4.csv

--2024-04-15 20:56:13--  https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw1.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19628 (19K) [text/plain]
Saving to: ‘john_wick_1.csv’


2024-04-15 20:56:14 (11.8 MB/s) - ‘john_wick_1.csv’ saved [19628/19628]

--2024-04-15 20:56:14--  https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw2.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14747 (14K) [text/plain]
Saving to: ‘john_wick_2.csv’


2024-04-15 20:56:14 (19.5 MB/s) - ‘john_wick_2.csv’

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

- Self-Query: Wants as much metadata as we can provide
- Time-weighted: Wants temporal data

> NOTE: While we're creating a temporal relationship based on when these movies came out for illustrative purposes, it needs to be clear that the "time-weighting" in the Time-weighted Retriever is based on when the document was *accessed* last - not when it was created.

In [None]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

documents = []

for i in range(1, 5):
  loader = CSVLoader(
      file_path=f"john_wick_{i}.csv",
      metadata_columns=["Review_Date", "Review_Title", "Review_Url", "Author", "Rating"]
  )

  movie_docs = loader.load()
  for doc in movie_docs:

    # Add the "Movie Title" (John Wick 1, 2, ...)
    doc.metadata["Movie_Title"] = f"John Wick {i}"

    # convert "Rating" to an `int`, if no rating is provided - assume 0 rating
    doc.metadata["Rating"] = int(doc.metadata["Rating"]) if doc.metadata["Rating"] else 0

    # newer movies have a more recent "last_accessed_at"
    doc.metadata["last_accessed_at"] = datetime.now() - timedelta(days=4-i)

  documents.extend(movie_docs)

Let's look at an example document to see if everything worked as expected!

In [None]:
documents[0]

Document(page_content=": 0\nReview: The best way I can describe John Wick is to picture Taken but instead of Liam Neeson it's Keanu Reeves and instead of his daughter it's his dog. That's essentially the plot of the movie. John Wick (Reeves) is out to seek revenge on the people who took something he loved from him. It's a beautifully simple premise for an action movie - when action movies get convoluted, they get bad i.e. A Good Day to Die Hard. John Wick gives the viewers what they want: Awesome action, stylish stunts, kinetic chaos, and a relatable hero to tie it all together. John Wick succeeds in its simplicity.", metadata={'source': 'john_wick_1.csv', 'row': 0, 'Review_Date': '6 May 2015', 'Review_Title': ' Kinetic, concise, and stylish; John Wick kicks ass.\n', 'Review_Url': '/review/rw3233896/?ref_=tt_urv', 'Author': 'lnvicta', 'Rating': 8, 'Movie_Title': 'John Wick 1', 'last_accessed_at': datetime.datetime(2024, 4, 12, 20, 56, 18, 740710)})

## QDrant VectorStore

Now that we have our documents, we will create a QDrant VectorStore with the collection name "JohnWick".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.


In [None]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small",  base_url="https://drchat.xyz")

vectorstore = Qdrant.from_documents(
    documents,
    embeddings,
    location=":memory:",
    collection_name="JohnWick"
)

## Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive rertriever will simple look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [None]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [None]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-3.5-turbo` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [None]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(base_url="https://drchat.xyz")

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [None]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.


In [None]:
naive_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the provided context, it can be inferred that people generally liked the John Wick movies. The reviews mention that the films are stylish, fun, and filled with impressive action sequences. Some reviewers even consider John Wick to be one of the best action films of the past decade. However, there is one review that states a different opinion and finds the movies to be generic.'

In [None]:
naive_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10. Here is the URL to that review: /review/rw4854296/?ref_=tt_urv'

In [None]:
naive_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In John Wick, an ex-hit-man comes out of retirement to seek revenge on the gangsters who killed his dog and took everything from him. He unleashes a maelstrom of destruction against those who chase him, and he becomes the target of hit men. The movie is known for its ultra-violent and action-packed scenes.'

Overall, this is not bad! Let's see if we can make it better!

## Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [None]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = documents
child_splitter = RecursiveCharacterTextSplitter(chunk_size=200)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [None]:
client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = Qdrant(
    collection_name="full_documents", embeddings=OpenAIEmbeddings(model="text-embedding-3-small",  base_url="https://drchat.xyz"), client=client
)

Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [None]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [None]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [None]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [None]:
parent_document_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the provided context, opinions about John Wick are mixed. Some people love the series and enjoy the action and style, while others dislike it and criticize the plot and fight scenes.'

In [None]:
parent_document_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10. Here is the URL to that review: /review/rw4854296/?ref_=tt_urv.'

In [None]:
parent_document_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In the John Wick movies, John Wick, played by Keanu Reeves, is a retired assassin who is forced back into action after someone kills his dog and steals his car. In the first movie, John seeks revenge on the gangsters responsible for the death of his dog and the theft of his car. In the second movie, he is called on to pay off an old debt by helping Ian McShane's character take over the Assassin's Guild. John travels to Italy, Canada, and Manhattan, killing numerous assassins along the way."

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Self-Query

We have another problem with retrieval - what if people are asking questions based on the metadata, instead of the contained context?

That's where Self-Query shines!

It not only lets us search our vectorstore in the traditional way (semantic similarity), but we can also filter our vectorstore intelligently based on the provided metadata!

Let's start by describing the relevant metadata, and what type of data it is.


In [None]:
from langchain.chains.query_constructor.base import AttributeInfo
from langchain.retrievers.self_query.base import SelfQueryRetriever

metadata_field_info = [
    AttributeInfo(
        name="Movie_Title",
        description="The title of the movie",
        type="string",
    ),
    AttributeInfo(
        name="Review_Date",
        description="The date of the review",
        type="string",
    ),
    AttributeInfo(
        name="Review_Title",
        description="The title of the review",
        type="string",
    ),
    AttributeInfo(
        name="Review_Url",
        description="The URL of the review",
        type="string",
    ),
    AttributeInfo(
        name="Author",
        description="The author of the review",
        type="string",
    ),
    AttributeInfo(
        name="Rating",
        description="A 1 to 10 rating for the movie",
        type="integer",
    )
]

document_content_desription = "A review of the movie."

self_query_retriever = SelfQueryRetriever.from_llm(
    ChatOpenAI(temperature=0,  base_url="https://drchat.xyz"),
    vectorstore,
    document_content_desription,
    metadata_field_info,
)

Let's create our chain, and see how this does.

In [None]:
self_query_retrieval_chain = (
    {"context": itemgetter("question") | self_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [None]:
self_query_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the provided context, it can be inferred that people generally liked John Wick. The reviews mention the intense action, thrilling storyline, and the popularity of the franchise with multiple sequels.'

In [None]:
self_query_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

"Yes, there are reviews with a rating of 10. Here are the URLs to those reviews:\n\n1. Review URL: '/review/rw7582036/?ref_=tt_urv'\n2. Review URL: '/review/rw4860603/?ref_=tt_urv'\n3. Review URL: '/review/rw4854296/?ref_=tt_urv'\n4. Review URL: '/review/rw3109271/?ref_=tt_urv'"

In [None]:
self_query_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In John Wick, an ex-hitman comes out of retirement to seek vengeance against the gangsters who killed his dog and took everything from him. He is dragged into an impossible task when an arrogant Russian mob prince steals his car and kills his dog. Wick unleashes a maelstrom of destruction against those who try to chase him, becoming the target of hitmen and bounty hunters. He is forced to unearth his concealed identity and carry out a relentless vendetta. The story continues in subsequent John Wick films.'

As we can see - the question concerning specific metadata was answered much better than the previous methods!

## Time-weighted Vector Store

Next up, we have a method that leverages temporal data to more prefer the most recently accessed documents in our VectorStore.

> NOTE: In this example we've created a relationship between when the movie came out, and when the chunks were accessed, this is simply for illustrative purposes.

Once again, we'll create a new QDrant vectorstore using the in-memory client.

In [None]:
from langchain.retrievers import TimeWeightedVectorStoreRetriever

client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="time_weighted",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

time_weighted_vectorstore = Qdrant(
    collection_name="time_weighted", embeddings=OpenAIEmbeddings(model="text-embedding-3-small", base_url="https://drchat.xyz"), client=client
)

In the `TimeWeightedVectorStoreRetriever`, you'll notice that we have a parameter called `decay_rate`. This `decay_rate` is used in the following calculation to scale our cosine-similarity scores by when the document was last accessed:

$semantic\_similarity + (1.0 - decay\_rate) ^ {hours\_passed}$

As you can see, the decay_rate will be set between $(0-1)$, where $0$ will mean we're never decaying our semantic similarity scores, and $1$ will mean we're also never decaying our semantic similarity scores.

As we increase the decay rate from $0 -> 1$ we're going to see items more and more aggresively decaying as time moves along.

So a "low" decay rate (closer to 0) will lead to items being less penalized for being accessed less-recently - and a "high" decay rate (closer to 1) will lead to items being more penalized for being accessed less-recently.

In [None]:
time_weighted_retriever = TimeWeightedVectorStoreRetriever(
    vectorstore=time_weighted_vectorstore, decay_rate=0.6, k=2,
)

time_weighted_retriever.add_documents(documents)

['041a74fee7504afaa68942b7b1357a02',
 '69a3e85d18df4b6693be1f039948c1e5',
 '9af28c10fe054973a22053215407dafc',
 '8ab65d84cfb84a8e9ea0992ba0005628',
 '0d1addafed094df2b56dbe7850858373',
 '545819de98fb454c827e28e5024d8db4',
 '57d9f75b49ec4f138f52aaa2f2091090',
 '92aabf91d57241d3aded8afb34829f47',
 'd6a0f12f87d9414a93f72af7a86ef562',
 '616f116acf4f4525a320ecaebbaa63f8',
 'fe4fe4e8ca4249d8a48c4eb778604b56',
 '9ac8837494064375a91d3c7b9cca37e2',
 '65397164b03b4380943919198c4aa650',
 '07b5047516a34de1b2f88b46c991b416',
 '27c90fb58d4a46fb936265e7d945b4d8',
 '9cee5fb85c1745d087da1ab7c653353e',
 '8bc0bf8a2a3d41528551b3e442fd1102',
 '91ed9fce24904f5a8328a18e9df329bc',
 'f1f17e525039463f8c61d8a37ed1ab9b',
 'd18ba57c4c754a93b30908ffd2dbb7bc',
 '8a85fe3829b64318b13dbe81d4b10047',
 '872dcf788e12436c9db1f36ae039d0d3',
 '69209011bb044b658c521c27c54ad90e',
 'f4b0c406ff9143e5b49c15ad1dcf01e6',
 '70581600749e480b8b861a88a65ec749',
 'c8aa35823fd74276a29f18d3ea122bc1',
 'e227c0984d4743c58c80fae6fa6005dd',
 

Let's create our same RAG chain that we've been using so far!

In [None]:
time_weighted_retrieval_chain = (
    {"context": itemgetter("question") | time_weighted_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's test this iteration!

In [None]:
time_weighted_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the context provided, people generally liked John Wick 4. The reviews mentioned that the movie exceeded expectations and that it was even better than the previous films in the franchise. The set pieces and action sequences were praised, and the movie was described as a wild ride.'

In [None]:
time_weighted_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'No, there are no reviews with a rating of 10 in the provided context.'

In [None]:
time_weighted_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In John Wick 4, according to the reviews, the movie is described as a wild ride and even better than the previous movies in the franchise. It is said to have intense action scenes and set pieces, with one scene in Tokyo featuring illuminated cherry blossoms. However, there is a mention of a super illogical thing that bothered one reviewer, which was that nobody flinched at a nightclub despite a 10-minute beatdown happening. Overall, it seems like a highly entertaining and action-packed film.'

As you can see, we're heavily biasing the more "recently accessed" documents (John Wick 4, in this case).

## Contextual Compression

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [None]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank()
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [None]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [None]:
contextual_compression_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the context provided, it can be inferred that people generally liked John Wick. One reviewer mentioned that they made a big mistake by initially skipping the film and found it to be a dark, stylish, and fun action movie. Another reviewer praised the film for its elaborate action sequences and compared it favorably to other noir films. Additionally, the positive comments about the cinematography and fight choreography suggest a positive reception.'

In [None]:
contextual_compression_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10. The URL to that review is "/review/rw4854296/?ref_=tt_urv".'

In [None]:
contextual_compression_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In John Wick, the main character, played by Keanu Reeves, is a retired hitman who is forced back into action when a mobster named Santino D'Antonio asks him to fulfill a marker. Wick initially refuses, so Santino blows up his house. Wick then seeks out the owner of the Continental hotel, Winston, who advises him to honor the marker. The assignment is to kill Santino's sister, Gianna D'Antonio, in Rome. After completing the mission, Santino puts a seven-million dollar contract on Wick's head, attracting professional killers from everywhere. Wick vows to kill Santino, who is no longer protected by the marker."

Again, it's hard to qualitatively assess this solution - but it "seems" better!