# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

We'll also provide our OpenAI key, as well as our Cohere API key.

In [20]:
import os
import openai
from dotenv import load_dotenv
load_dotenv(dotenv_path="../.env")
openai.api_key = os.getenv("OPENAI_API_KEY")

In [21]:
os.environ["COHERE_API_KEY"] = os.getenv("COHERE_API_KEY")

In [22]:
!pip install langchain langchain-community langchain-openai langchain-cohere qdrant-client


Collecting langchain-cohere
  Obtaining dependency information for langchain-cohere from https://files.pythonhosted.org/packages/24/4f/be687115b5e9cd982b5454e428771819b8e3f166974be709c55327724652/langchain_cohere-0.4.5-py3-none-any.whl.metadata
  Downloading langchain_cohere-0.4.5-py3-none-any.whl.metadata (6.6 kB)
Collecting qdrant-client
  Obtaining dependency information for qdrant-client from https://files.pythonhosted.org/packages/ef/33/d8df6a2b214ffbe4138db9a1efe3248f67dc3c671f82308bea1582ecbbb7/qdrant_client-1.15.1-py3-none-any.whl.metadata
  Downloading qdrant_client-1.15.1-py3-none-any.whl.metadata (11 kB)
Collecting cohere<6.0,>=5.12.0 (from langchain-cohere)
  Obtaining dependency information for cohere<6.0,>=5.12.0 from https://files.pythonhosted.org/packages/aa/21/d0eb7c8e5b3bb748190c59819928c38cafcdf8f8aaca9d21074c64cf1cae/cohere-5.17.0-py3-none-any.whl.metadata
  Downloading cohere-5.17.0-py3-none-any.whl.metadata (3.4 kB)
Collecting types-pyyaml<7.0.0.0,>=6.0.12.2024091

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
botocore 1.27.59 requires urllib3<1.27,>=1.25.4, but you have urllib3 2.5.0 which is incompatible.
transformers 4.35.0 requires tokenizers<0.15,>=0.14, but you have tokenizers 0.21.4 which is incompatible.


## Task 2: Data Collection and Preparation

We'll be using our Loan Data once again - this time the strutured data available through the CSV!

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

In [23]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

loader = CSVLoader(
    file_path=f"./data/complaints.csv",
    metadata_columns=[
      "Date received", 
      "Product", 
      "Sub-product", 
      "Issue", 
      "Sub-issue", 
      "Consumer complaint narrative", 
      "Company public response", 
      "Company", 
      "State", 
      "ZIP code", 
      "Tags", 
      "Consumer consent provided?", 
      "Submitted via", 
      "Date sent to company", 
      "Company response to consumer", 
      "Timely response?", 
      "Consumer disputed?", 
      "Complaint ID"
    ]
)

loan_complaint_data = loader.load()

for doc in loan_complaint_data:
    doc.page_content = doc.metadata["Consumer complaint narrative"]

Let's look at an example document to see if everything worked as expected!

In [5]:
loan_complaint_data[0]

Document(metadata={'source': './data/complaints.csv', 'row': 0, 'Date received': '03/27/25', 'Product': 'Student loan', 'Sub-product': 'Federal student loan servicing', 'Issue': 'Dealing with your lender or servicer', 'Sub-issue': 'Trouble with how payments are being handled', 'Consumer complaint narrative': "The federal student loan COVID-19 forbearance program ended in XX/XX/XXXX. However, payments were not re-amortized on my federal student loans currently serviced by Nelnet until very recently. The new payment amount that is effective starting with the XX/XX/XXXX payment will nearly double my payment from {$180.00} per month to {$360.00} per month. I'm fortunate that my current financial position allows me to be able to handle the increased payment amount, but I am sure there are likely many borrowers who are not in the same position. The re-amortization should have occurred once the forbearance ended to reduce the impact to borrowers.", 'Company public response': 'None', 'Company'

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "LoanComplaints".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [6]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    loan_complaint_data,
    embeddings,
    location=":memory:",
    collection_name="LoanComplaints"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [7]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [8]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [9]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-nano")

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [10]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [11]:
naive_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans appears to be problems related to dealing with lenders or servicers, including errors in loan information, misapplied payments, incorrect account statuses, and lack of transparency. Specific recurring issues include errors in loan balances, misreporting of payment statuses, unauthorized data disclosures, and challenges in managing or adjusting repayment plans.'

In [12]:
naive_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Yes, there were complaints that did not get handled in a timely manner. Specifically, two complaints from April 2025 indicate delays:\n\n1. Complaint ID 12709087 sent to MOHELA on 03/28/25 was marked as **not handled in a timely manner**.\n2. Complaint ID 12975634 sent to Maximus Federal Services, Inc. on 04/14/25 was handled **within the required timeframe** (timely response).\n\nAdditionally, multiple complaints (e.g., Complaint ID 13091395 sent on 04/21/25) involved delays or lack of response from the companies.\n\nTherefore, at least some complaints, notably the one to MOHELA, were not handled in a timely manner.'

In [13]:
naive_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People often failed to pay back their loans due to a combination of factors such as limited or no options to manage their payments effectively, rising interest that negated their payments, lack of clear communication or transparency about their loan status, financial hardships preventing increased payments, and systemic issues with loan servicing and credit reporting. Many borrowers also felt misled about repayment terms and faced difficulties with loan transfers, unresponsive customer service, or unexpected reporting damaging their credit scores. All these issues contributed to the inability to repay their loans and, in some cases, to the accumulation of unmanageable debt.'

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [14]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(loan_complaint_data, )

We'll construct the same chain - only changing the retriever.

In [15]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [16]:
bm25_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans appears to be problems related to dealing with the lender or servicer, particularly issues such as miscommunication, incorrect information about loan balances and terms, and challenges in managing payments. Many complaints involve difficulties in obtaining accurate information, applying payments correctly, or resolving disputes about fees or loan validity.'

In [17]:
bm25_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the information provided, it appears that several complaints were responded to by the companies and marked as "Closed with explanation" and indicated as "Timely response? Yes." However, there are also multiple complaints, particularly the ones involving extensive issues such as account corrections, missed responses, and illegal collection activities, where the company\'s responses were inadequate or insufficient, and the complainants express frustration that their issues have not been properly resolved.\n\nSpecifically, one complaint details that the company failed to action corrections despite multiple requests and that the complainant experienced long wait times and unanswered calls, suggesting delays and unresolved issues. Although the company\'s response was marked as timely, the ongoing dissatisfaction indicates that some complaints did not get effectively handled in a timely manner.\n\nTherefore, the answer is: Yes, some complaints did not get handled in a satisfactory 

In [22]:
bm25_retrieval_chain.invoke({"question" : "FInd complaints about 'Bank of America"})["response"].content

"Based on the complaints provided, here are some issues raised about Bank of America and associated servicers (though most complaints referenced Nelnet, Navient, or Maximus, not Bank of America directly):\n\n1. **Complaints about Nelnet (student loan servicing):**\n   - Alleged obstruction of rights to file and pursue CFPB complaints, with claims that Nelnet manipulates the complaint process to avoid accountability.\n   - Poor communication during loan transitions, leading to confusion about loan status and reporting of delinquent payments despite accounts being in forbearance.\n   - Misreporting or inaccurate credit reporting, damaging borrower credit scores despite the loan being current or in forbearance.\n   - Data breaches and unauthorized access to sensitive federal student loan data involving federal agencies.\n\n2. **Complaints about Navient (student loan servicing):**\n   - Alleged mismanagement of loan repayment, where borrowers are unable to decrease balances despite consist

It's not clear that this is better or worse, if only we had a way to test this (SPOILERS: We do, the second half of the notebook will cover this)

<div style="background-color: #204B8E; color: white; padding: 10px; border-radius: 5px;">

#### ❓ Question #1:

Give an example query where BM25 is better than embeddings and justify your answer.

</div>

<div style="background-color: #204B8E; color: white; padding: 10px; border-radius: 5px;">
### Answer: 
Query: "Find complaints about 'Bank of America'"


- BM25 will find documents containing the exact company name
- Embeddings might return complaints about other banks or financial institutions

</div>

## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [23]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-v3.5")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [24]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [25]:
contextual_compression_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided complaints, the most common issue with loans appears to be problems related to loan servicing, including errors in loan balances, misapplied payments, wrongful denials of payment plans, and mishandling of account information. Many complaints also involve difficulties with repayment options such as forbearance or deferment, accruing interest, and lack of clear communication or accurate information from servicers.'

In [26]:
contextual_compression_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, there is no indication that any complaints went completely unhandled or unresolved in a timely manner. The complaint related to the complaint received on 04/14/25 from EdFinancial Services was responded to with a "Closed with explanation" and was marked as responded to timely. Similarly, the complaint received on 04/21/25 from Maximus Federal Services, Inc. was also responded to with a "Closed with explanation" and was marked as timely. \n\nWhile some complaints mention delays or ongoing issues, the records do not specify any complaints that were left completely unhandled or ignored beyond the expected response time.\n\nTherefore, the answer is:  \n**No, there are no complaints in the provided data that were left completely unhandled in a timely manner.**'

In [22]:
contextual_compression_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for several reasons, including:\n\n1. Lack of Awareness and Information: Borrowers were often not informed about the requirement to repay loans or how the loan process works, leading to confusion and unawareness of their repayment obligations.\n\n2. Poor Communication from Servicers: Borrowers reported that lenders or loan servicers failed to notify them when payments were due, did not provide clear information about payment plans, and did not confirm transfers or account details properly.\n\n3. Financial Hardships and Unmanageable Interest: Many borrowers found themselves unable to afford payments because of accumulating interest and the limited options provided, such as forbearance or deferment, which often resulted in interest continuing to grow and increasing overall debt.\n\n4. Economic and Employment Factors: Unexpected economic challenges, stagnant wages, and employment conditions contributed to borrowers’ inability to make payments, especi

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [27]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [28]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [29]:
multi_query_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issues with loans tend to revolve around problems with lenders or servicers, such as dealing with bad or incorrect information about the loan, mishandling of payments, problems with loan balance or interest calculations, and issues with loan transfer or unnotified changes. Many complaints also highlight difficulties in obtaining clear, accurate information, issues with repayment plans, and concerns over unethical or illegal practices by loan servicers.\n\nIn summary, the most common issue appears to be **"Dealing with your lender or servicer,"** particularly issues related to receiving incorrect or bad information, mishandling of repayments, or lack of communication.'

In [26]:
multi_query_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, it appears that several complaints were not handled in a timely manner. For example:\n\n- One complaint mentioned it had been nearly 18 months with no resolution despite requests for review and corrections.\n- Other complaints indicated delays exceeding 30 days for investigations or responses, with some noting no response at all over extended periods.\n- Multiple complaints explicitly mention that the issues remain unresolved despite waiting for months or over a year, suggesting they were not managed promptly.\n\nTherefore, yes, several complaints did not get handled in a timely manner.'

In [27]:
multi_query_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans mainly because they faced financial hardships, were misled or lacked adequate information about repayment options, and encountered practices by servicers that pressured or steered them into unfavorable arrangements. Many borrowers did not qualify for forgiveness programs or were unaware of income-based repayment options, leading to prolonged periods of unmanageable debt. Additionally, some experienced issues like interest accumulation during forbearance, lack of proper notices, incorrect reporting to credit bureaus, or coercive servicing practices such as forbearance steering and incorrect loan handling, all of which contributed to their inability to repay their loans effectively.'

<div style="background-color: #204B8E; color: white; padding: 10px; border-radius: 5px;">

#### ❓ Question #2:

Explain how generating multiple reformulations of a user query can improve recall.

</div>

<div style="background-color: #204B8E; color: white; padding: 10px; border-radius: 5px;">

### Answer:

- Instead of asking once, the system asks the same question multiple ways since people use different words for the same thing.
- More documents get found because you're searching with multiple word combinations.

</div>

## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [30]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = loan_complaint_data
child_splitter = RecursiveCharacterTextSplitter(chunk_size=750)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [31]:
from langchain_qdrant import QdrantVectorStore

client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = QdrantVectorStore(
    collection_name="full_documents", embedding=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [32]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [33]:
parent_document_retriever.add_documents(parent_docs, ids=None)

KeyboardInterrupt: 

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [None]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [None]:
parent_document_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided context, appears to be errors and misconduct related to federal student loan servicing. Specifically, problems such as incorrect information on reports, misapplied payments, wrongful denials of payment plans, discrepancies in loan balances and interest rates, and issues arising from loan transfers and agency dissolutions are frequently reported. Many complaints involve systemic breakdowns, misinformation, and unfair practices by loan servicers.'

In [None]:
parent_document_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Yes, based on the provided complaints, several did not get handled in a timely manner. Specifically, the complaints filed with MOHELA on 03/28/25 and 04/11/25 both were marked as "Timely response?": "No," indicating they were not responded to promptly. The complaint filed with Nelnet on 04/27/25, however, was responded to within the required timeframe ("Timely response?": "Yes"). \n\nTherefore, at least two complaints about unresolved issues did not receive timely responses.'

In [None]:
parent_document_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily due to issues such as being unable to make payments during their studies or after graduation, often because of financial hardship or lack of proper information. Some specific reasons highlighted include:\n\n- Lack of clear communication or notification from loan servicers about payment obligations, such as when payments were supposed to start or changes in loan ownership.\n- Financial difficulties after graduation, including unemployment or severe financial hardship, which made it difficult to keep up with payments.\n- Misrepresentations or lack of transparency about the value of their education, the long-term consequences of taking out loans, or the stability of the educational institutions they attended.\n- Administrative errors or mishandling by loan servicing agencies, leading to reported delinquencies or incorrect credit reporting.\n- In some cases, borrowers were unaware that their payments had begun or were misled about their repa

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [None]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [None]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [None]:
ensemble_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided complaints data, the most common issues with student loans appear to be related to:\n\n- Problems with loan servicing, including errors in loan balances, misapplied payments, and wrongful denials of payment plans.\n- Dealing with lenders or servicers due to miscommunication, lack of notification about account status, or inaccurate information.\n- Issues with how payments are being handled, including reversals, inability to apply payments correctly, and late reporting.\n- Incorrect or incomplete information reported to credit bureaus, impacting credit scores.\n- Challenges with understanding and managing loan terms, interest calculation, and repayment options.\n\nOverall, one of the most prevalent issues is **errors or mismanagement by loan servicers**, leading to incorrect balances, late payments reported to credit agencies, and difficulties in making or verifying payments. These servicing problems often cause financial hardship, damage to credit scores, and frus

In [39]:
ensemble_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, yes, some complaints were not handled in a timely manner. Specifically, there are at least two instances where the response was marked as "No" for being timely:\n\n1. Complaint ID: 12935889 (Page received 04/11/25, from MOHELA in CO) was marked as "Timely response? No."\n2. Complaint ID: 12744910 (Page received 03/31/25, from Maximus Federal Services, Inc. in MI) was marked as "Timely response? Yes," but the ongoing issues and delayed resolutions suggest the matter was not effectively handled promptly in reality.\n\nAdditionally, multiple complaints detail prolonged wait times, lack of responses, or failure to update the complainants within expected timeframes. Therefore, the answer is that yes, some complaints were not responded to or handled in a timely manner.'

In [40]:
ensemble_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for various reasons, including inadequate communication and notification from loan servicers, mismanagement of their loan information, lack of awareness about payment obligations, and difficulties caused by predatory practices such as forbearance steering and high interest accrual. Many borrowers were not properly informed about their repayment options, including income-driven plans or forgiveness programs, which led to unintentional delinquency or default. Additionally, some borrowers experienced issues like incorrect account information, transfer of loans without proper notification, or administrative errors that negatively impacted their credit and made repayment challenging.'

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [38]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [39]:
semantic_documents = semantic_chunker.split_documents(loan_complaint_data[:20])

Let's create a new vector store.

In [40]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="Loan_Complaint_Data_Semantic_Chunks"
)

We'll use naive retrieval for this example.

In [41]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [42]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [43]:
semantic_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issues with loans appear to involve difficulties with repayment, such as struggling to repay the loan, problems with payment plans, and confusion or inaccuracies related to loan balances, payment amounts, and loan servicing. Many complaints highlight issues like delayed or improper response from servicers, disputes over credit reporting, unauthorized or incorrect debt collection actions, and mishandling of borrower information.\n\nIn summary, the most common issues involve:\n- Trouble with repayment plans and payments\n- Errors or disputes regarding loan status and balances\n- Poor communication or lack of transparency from servicers\n- Unauthorized reporting of defaults or delinquencies\n- Data breaches or privacy violations\n\nIf you need a more specific answer, please let me know!'

In [47]:
semantic_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, yes, some complaints did not get handled in a timely manner, specifically by Nelnet. The complaint from 05/04/25 regarding the transfer of a student loan account to Nelnet mentions that despite acknowledging receipt of the complaint via Certified Mail, Nelnet never responded to the correspondence. Additionally, the complaint from 05/04/25 about bad information about the loan also indicates that Nelnet responded with "Closed with explanation," suggesting the issue was not fully resolved promptly. \n\nWhile the responses to some complaints were marked as "timely" in the data, the fact that there are complaints indicating no response or unresolved issues suggests that not all complaints were handled in a timely or satisfactory manner.'

In [48]:
semantic_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for various reasons, including issues related to miscommunication, administrative errors, or disputes over the legitimacy and status of their loans. Specific reasons highlighted in the complaints include:\n\n- Receiving incorrect or unclear information about loan status, such as being told about forbearance periods that were not documented in writing.\n- Difficulties in accessing account information and poor customer service, leading to frustration and unresolved issues.\n- Disputes over loan legitimacy or obligations, such as claims that loans are unverified or legally void due to administrative or legal issues.\n- Problems with repayment plans, including errors in payment processing or changes to payment amounts that borrowers can handle.\n- Allegations of improper reporting, breach of privacy, or illegal collection practices, which complicated the repayment process.\n- In some cases, borrowers found out their loans were in default due to admini

<div style="background-color: #204B8E; color: white; padding: 10px; border-radius: 5px;">

#### ❓ Question #3:

If sentences are short and highly repetitive (e.g., FAQs), how might semantic chunking behave, and how would you adjust the algorithm?

</div>

<div style="background-color: #204B8E; color: white; padding: 10px; border-radius: 5px;">

### Answer:
- Set higher minimum sentence count per chunk
-  Add rules to keep FAQ items together (like grouping by question-answer pairs) to maintain logical document boundaries
</div>

# 🤝 Breakout Room Part #2

<div style="background-color: #204B8E; color: white; padding: 10px; border-radius: 5px;">

### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against each other. 
You can use the loans or bills dataset.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

</div>

In [4]:
# Activity 1: Evaluating Retriever Methods

import os
import openai
from dotenv import load_dotenv
from langchain_community.document_loaders.csv_loader import CSVLoader
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.retrievers import BM25Retriever
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain.retrievers import ParentDocumentRetriever, EnsembleRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient, models
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser
from ragas import evaluate
from ragas.metrics import (
    faithfulness,
    answer_relevancy,
    context_precision,  # Changed from context_relevancy
    context_recall,
    answer_correctness,
    answer_similarity
)
from datasets import Dataset
import pandas as pd
import time
import asyncio

# Load environment variables
load_dotenv(dotenv_path="../.env")
openai.api_key = os.getenv("OPENAI_API_KEY")
os.environ["COHERE_API_KEY"] = os.getenv("COHERE_API_KEY")

# Task 1: Data Preparation
loader = CSVLoader(
    file_path="./data/complaints.csv",
    metadata_columns=[
        "Date received", "Product", "Sub-product", "Issue", "Sub-issue",
        "Consumer complaint narrative", "Company public response", "Company",
        "State", "ZIP code", "Tags", "Consumer consent provided?",
        "Submitted via", "Date sent to company", "Company response to consumer",
        "Timely response?", "Consumer disputed?", "Complaint ID"
    ]
)

loan_complaint_data = loader.load()

for doc in loan_complaint_data:
    doc.page_content = doc.metadata["Consumer complaint narrative"]



In [5]:
# Task 2: Set up embeddings and vectorstore
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Qdrant.from_documents(
    loan_complaint_data,
    embeddings,
    location=":memory:",
    collection_name="LoanComplaints"
)

# Task 3: Create RAG prompt and LLM
RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)
chat_model = ChatOpenAI(model="gpt-4.1-nano")



In [7]:
# Task 4: Create all retrievers

# Naive Retriever
naive_retriever = vectorstore.as_retriever(search_kwargs={"k": 10})

# BM25 Retriever
bm25_retriever = BM25Retriever.from_documents(loan_complaint_data)

# Contextual Compression Retriever
compressor = CohereRerank(model="rerank-v3.5")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

# Multi-Query Retriever
multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

# Parent Document Retriever
parent_docs = loan_complaint_data
child_splitter = RecursiveCharacterTextSplitter(chunk_size=750)

client = QdrantClient(location=":memory:")
client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = QdrantVectorStore(
    collection_name="full_documents", 
    embedding=OpenAIEmbeddings(model="text-embedding-3-small"), 
    client=client
)

store = InMemoryStore()
parent_document_retriever = ParentDocumentRetriever(
    vectorstore=parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)
parent_document_retriever.add_documents(parent_docs, ids=None)

# Ensemble Retriever
retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)
ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

# Task 5: Create RAG chains for each retriever
def create_rag_chain(retriever):
    return (
        {"context": itemgetter("question") | retriever, "question": itemgetter("question")}
        | RunnablePassthrough.assign(context=itemgetter("context"))
        | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
    )

naive_chain = create_rag_chain(naive_retriever)
bm25_chain = create_rag_chain(bm25_retriever)
compression_chain = create_rag_chain(compression_retriever)
multi_query_chain = create_rag_chain(multi_query_retriever)
parent_doc_chain = create_rag_chain(parent_document_retriever)
ensemble_chain = create_rag_chain(ensemble_retriever)



In [None]:
# Task 6: Generate synthetic evaluation dataset using Ragas
from ragas.llms import OpenAI
from ragas.generate import generate_questions

# Set up Ragas LLM
ragas_llm = OpenAI(model="gpt-4")

# Generate synthetic questions
questions = generate_questions(
    loan_complaint_data[:50],  # Use first 50 documents for efficiency
    n_questions=20,
    llm=ragas_llm
)

# Create golden dataset
golden_dataset = Dataset.from_dict({
    "question": [q.question for q in questions],
    "ground_truth": [q.ground_truth for q in questions]
})

# Task 7: Evaluation function
def evaluate_retriever(chain, dataset, retriever_name):
    print(f"Evaluating {retriever_name}...")
    
    start_time = time.time()
    
    # Generate responses for all questions
    responses = []
    contexts = []
    
    for question in dataset["question"]:
        try:
            result = chain.invoke({"question": question})
            responses.append(result["response"].content)
            contexts.append("\n".join([doc.page_content for doc in result["context"]]))
        except Exception as e:
            print(f"Error processing question: {e}")
            responses.append("Error occurred")
            contexts.append("")
    
    end_time = time.time()
    latency = end_time - start_time
    
    # Create evaluation dataset
    eval_dataset = Dataset.from_dict({
        "question": dataset["question"],
        "ground_truth": dataset["ground_truth"],
        "answer": responses,
        "contexts": contexts
    })
    
    # Run Ragas evaluation
    results = evaluate(
        eval_dataset,
        metrics=[
            faithfulness,
            answer_relevancy,
            context_precision,  # Changed from context_relevancy
            context_recall,
            answer_correctness,
            answer_similarity
        ]
    )
    
    return {
        "retriever": retriever_name,
        "metrics": results,
        "latency": latency,
        "avg_latency_per_query": latency / len(dataset["question"])
    }

# Task 8: Run evaluation for all retrievers
retrievers_to_evaluate = [
    (naive_chain, "Naive Retriever"),
    (bm25_chain, "BM25 Retriever"),
    (compression_chain, "Contextual Compression Retriever"),
    (multi_query_chain, "Multi-Query Retriever"),
    (parent_doc_chain, "Parent Document Retriever"),
    (ensemble_chain, "Ensemble Retriever")
]

evaluation_results = []

for chain, name in retrievers_to_evaluate:
    try:
        result = evaluate_retriever(chain, golden_dataset, name)
        evaluation_results.append(result)
        print(f"Completed evaluation for {name}")
    except Exception as e:
        print(f"Error evaluating {name}: {e}")

# Task 9: Compile results and analysis
def compile_results(results):
    print("=== RETRIEVER EVALUATION RESULTS ===\n")
    
    for result in results:
        print(f"�� {result['retriever']}")
        print(f"   Latency: {result['latency']:.2f}s total, {result['avg_latency_per_query']:.2f}s per query")
        print("   Metrics:")
        for metric_name, metric_value in result['metrics'].items():
            print(f"     {metric_name}: {metric_value:.4f}")
        print()
    
    # Find best performer
    best_performer = max(results, key=lambda x: x['metrics']['answer_correctness'])
    
    print("=== ANALYSIS ===")
    print(f"Best performing retriever: {best_performer['retriever']}")
    print(f"Answer correctness score: {best_performer['metrics']['answer_correctness']:.4f}")
    print(f"Average latency: {best_performer['avg_latency_per_query']:.2f}s per query")
    
    print("\n=== RECOMMENDATIONS ===")
    print("Based on the evaluation results:")
    print("1. Consider the trade-off between performance and latency")
    print("2. Ensemble retriever typically provides the best overall performance")
    print("3. BM25 is cost-effective for exact keyword matching")
    print("4. Contextual compression improves relevance but increases latency")
    print("5. Multi-query retriever improves recall but at higher cost")

# Run the compilation
compile_results(evaluation_results)

# Task 10: Cost analysis (approximate)
def estimate_costs(results):
    print("\n=== COST ANALYSIS ===")
    
    # Rough cost estimates per 1000 queries
    cost_estimates = {
        "Naive Retriever": 0.50,
        "BM25 Retriever": 0.10,
        "Contextual Compression Retriever": 2.00,
        "Multi-Query Retriever": 1.50,
        "Parent Document Retriever": 0.75,
        "Ensemble Retriever": 3.00
    }
    
    for result in results:
        estimated_cost = cost_estimates.get(result['retriever'], 1.00)
        print(f"{result['retriever']}: ~${estimated_cost:.2f} per 1000 queries")
    
    print("\nNote: Costs are approximate and depend on API usage and document size")

estimate_costs(evaluation_results)

In [3]:
!pip install rank_bm25

Collecting rank_bm25
  Obtaining dependency information for rank_bm25 from https://files.pythonhosted.org/packages/2a/21/f691fb2613100a62b3fa91e9988c991e9ca5b89ea31c0d3152a3210344f9/rank_bm25-0.2.2-py3-none-any.whl.metadata
  Downloading rank_bm25-0.2.2-py3-none-any.whl.metadata (3.2 kB)
Downloading rank_bm25-0.2.2-py3-none-any.whl (8.6 kB)
Installing collected packages: rank_bm25
Successfully installed rank_bm25-0.2.2


##### HINTS:

- LangSmith provides detailed information about latency and cost.

<div style="background-color: #204B8E; color: white; padding: 10px; border-radius: 5px;">

### Analysis & Observations:

</div>