# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

We'll also provide our OpenAI key, as well as our Cohere API key.

In [1]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

In [2]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

## Task 2: Data Collection and Preparation

We'll be using our Loan Data once again - this time the strutured data available through the CSV!

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

In [3]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

loader = CSVLoader(
    file_path=f"./data/complaints.csv",
    metadata_columns=[
      "Date received", 
      "Product", 
      "Sub-product", 
      "Issue", 
      "Sub-issue", 
      "Consumer complaint narrative", 
      "Company public response", 
      "Company", 
      "State", 
      "ZIP code", 
      "Tags", 
      "Consumer consent provided?", 
      "Submitted via", 
      "Date sent to company", 
      "Company response to consumer", 
      "Timely response?", 
      "Consumer disputed?", 
      "Complaint ID"
    ]
)

loan_complaint_data = loader.load()

for doc in loan_complaint_data:
    doc.page_content = doc.metadata["Consumer complaint narrative"]

Let's look at an example document to see if everything worked as expected!

In [4]:
loan_complaint_data[0]

Document(metadata={'source': './data/complaints.csv', 'row': 0, 'Date received': '03/27/25', 'Product': 'Student loan', 'Sub-product': 'Federal student loan servicing', 'Issue': 'Dealing with your lender or servicer', 'Sub-issue': 'Trouble with how payments are being handled', 'Consumer complaint narrative': "The federal student loan COVID-19 forbearance program ended in XX/XX/XXXX. However, payments were not re-amortized on my federal student loans currently serviced by Nelnet until very recently. The new payment amount that is effective starting with the XX/XX/XXXX payment will nearly double my payment from {$180.00} per month to {$360.00} per month. I'm fortunate that my current financial position allows me to be able to handle the increased payment amount, but I am sure there are likely many borrowers who are not in the same position. The re-amortization should have occurred once the forbearance ended to reduce the impact to borrowers.", 'Company public response': 'None', 'Company'

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "LoanComplaints".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [5]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    loan_complaint_data,
    embeddings,
    location=":memory:",
    collection_name="LoanComplaints"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [6]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [7]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [8]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-nano")

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [9]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [10]:
naive_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the available information, one of the most common issues with loans, particularly student loans, appears to be mismanagement and errors related to the handling of loans by servicers and lenders. This includes issues such as errors in loan balances, misapplied payments, incorrect or outdated information on credit reports, unnotified transfers of loan servicing, and problems with how payments are applied—often resulting in increased balances or inaccuracies that negatively impact borrowers’ credit and financial situation. \n\nIn summary, a prevalent issue is the mishandling of loan information and servicing, leading to inaccuracies, confusion, and financial hardship for borrowers.'

In [11]:
naive_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided context, there are several complaints indicating that they were not handled in a timely manner. Specifically:\n\n- One complaint (row 441, received on 03/28/25) was marked as "Timely response?": No, indicating it was not handled promptly.\n- Another complaint (row 517, received on 04/14/25) was marked as "Timely response?": Yes, so it was handled on time.\n- A different complaint (row 400, received on 03/31/25) was marked as "Timely response?": Yes, so it was handled timely.\n- Additionally, some complaints involve ongoing issues with no resolution after extended periods (e.g., complaints open for over a year with no resolution), which suggests delays in handling or responses.\n\nTherefore, yes, some complaints did not get handled in a timely manner.'

In [12]:
naive_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily because of a combination of factors such as lack of clear communication from loan servicers about repayment start dates, transfers of loan management without proper notification, difficulties in managing and applying payments (especially when extra funds are directed only to interest), and complicated or unresponsive payment options including forbearance and deferment that can lead to accumulating interest. Additionally, some borrowers faced financial hardships, stagnant wages, and unmanageable interest accumulation, making it unrealistic for them to keep up with payments. In some cases, borrowers were not adequately informed about their loan status or changes, leading to misunderstandings, delinquency reporting, and credit score drops, further complicating their repayment ability. These issues are often compounded by what borrowers perceive as inadequate support or transparency from the loan agencies or servicers.'

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [13]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(loan_complaint_data, )

We'll construct the same chain - only changing the retriever.

In [14]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [15]:
bm25_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans appears to be problems related to dealing with lenders or servicers, including issues like incorrect or bad information about the loan, difficulty in applying payments properly, and disputes over fees or repayment terms. Several complaints highlight frustrations with loan servicers providing inaccurate or misleading information, mishandling payments, or failing to resolve disputes satisfactorily.'

In [16]:
bm25_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided context, all the complaints listed indicate that the company responded in a timely manner, with responses marked as "Yes" for timely response. However, there is at least one complaint where the consumer experienced difficulty reaching the company by phone and had to hang up after waiting over several minutes, but the company\'s response still states that the response was timely.\n\nTherefore, according to the information given, no complaints appear to have gone unhandled in a truly untimely manner.'

In [17]:
bm25_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans for various reasons, including issues with the proper management of their payment plans, miscommunication from the loan servicers, and difficulties in navigating the repayment process. Specific reasons highlighted in the complaints include:\n\n- Being steered into incorrect types of forbearances or payment plans, which led to increased balances and persistent billing.\n- Loan servicers selling or transferring loans without proper notification, resulting in missed communications about payments or adjustments.\n- Auto-payments being unexpectedly discontinued or not properly set up, causing borrowers to become delinquent without their knowledge.\n- Poor communication from loan servicers, including lack of responses to requests for forbearance or deferment, leading to unpaid bills and negative credit impacts.\n- Errors or delays in processing forbearance requests or other repayment arrangements, resulting in ongoing bills despite the borrowers' effort

It's not clear that this is better or worse, if only we had a way to test this (SPOILERS: We do, the second half of the notebook will cover this)

#### ❓ Question #1:

Give an example query where BM25 is better than embeddings and justify your answer.

- `What are common complaints with the company Navient?`
  - BM25 excels at quickly finding sources that contain very specific keywords (Navient). Embeddings are better for finding things that have similar semantic meaning but may not match exact words.

## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [18]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-v3.5")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [19]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [20]:
contextual_compression_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided data, the most common issue with loans appears to be problems related to dealing with lenders or servicers, specifically issues such as errors in loan balances, misapplied payments, wrongful denials of payment plans, and mishandling of loan information. Many complaints involve incorrect or mismatched account information, lack of communication or documentation, and improper handling or transfer of loans.'

In [21]:
contextual_compression_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, yes, there are complaints that have not been handled in a timely manner. For example, the complaint regarding the student loan account review and issues with violations of FERPA has been open for nearly 18 months with no resolution, and the complainant has not received a response despite requesting updates over a year ago.'

In [22]:
contextual_compression_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for several reasons, including:\n\n1. Lack of Awareness and Communication: Many borrowers were not properly informed about their obligation to repay loans, leading to unintended non-payment. For example, some were unaware that financial aid they received needed to be repaid, or were not notified when their loans were transferred between servicers.\n\n2. Compounding Interest and Loan Terms: Borrowers often faced increasing balances due to interest accumulating over time, especially when using deferment or forbearance options. Even when they made payments, interest sometimes grew faster than they could pay down the principal, making repayment difficult.\n\n3. Limited or Ineffective Payment Options: Available options like forbearance and deferment extended the repayment period and increased total interest, without reducing the debt burden significantly. Some borrowers found it challenging to increase monthly payments due to financial hardship or day-

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [23]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [24]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [25]:
multi_query_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided complaints, appears to be problems related to servicing and handling of student loans. These include issues such as:\n\n- Struggling to repay or problems with repayment plans\n- Incorrect or inaccurate loan information and account status\n- Problems with loan transfer and servicing mishandling\n- Disputes over loan balances, interest calculations, and credit reporting errors\n- Difficulties with loan forgiveness, discharge, or discharge mismanagement\n- Poor communication from servicers or lenders\n- Issues with payment application and reversals\n- Allegations of predatory practices and unfair treatment\n\nOverall, servicing-related problems—such as mismanagement, incorrect information, poor communication, and improper credit reporting—tend to be the most prevalent issues associated with loans in the complaints.'

In [26]:
multi_query_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints data, many complaints indicate delays, failures, or lack of timely handling by the servicers or agencies. Specifically:\n\n- Several complaints mention that responses or investigations took longer than promised or required by law, such as delays of over 30 days, or no response at all over extended periods (e.g., over a year in some cases).\n- Multiple complaints cite that issues remain unresolved with no timely updates or actions, despite follow-ups from the consumers.\n- Some complaints explicitly mention that the complaint or issue, such as account status corrections, credit reporting corrections, or application processing, are still pending or not handled within the expected timelines.\n\nTherefore, **yes**, many complaints did not get handled in a timely manner, with some cases experiencing significant delays beyond standard response times.'

In [27]:
multi_query_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans for various reasons, including:\n\n- Lack of proper notice or communication from loan servicers about repayment resumption or delinquencies, leading to unintentional delinquency and negative credit reports.\n- Errors or mishandling of the loan accounts, such as incorrect reporting of delinquency, missing payment history, or transfer of loans without notification.\n- Difficulties in managing loan payments due to high interest accrual, especially when interest compounds during forbearance or deferment, making repayment impractical.\n- Unhelpful or dismissive customer service representatives who did not provide clear guidance or assistance in managing repayment options.\n- Situations where borrowers were steered into long-term forbearances or silenced from accessing income-driven repayment plans, resulting in increased debt.\n- Issues related to poor understanding of the loan terms, unexpected increases in debt, or inability to access alternative rep

#### ❓ Question #2:

Explain how generating multiple reformulations of a user query can improve recall.

- The user may have not asked the question very well. Maybe they used bad grammar or weren't very specific. The LLM can generate multiple questions that relate to their original request, and those may be better suited to our data. This will result in more (hopefully relevant) documents to be Retrieved. When used to Augment the original request, it may help Generate a more useful final response.

## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [28]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = loan_complaint_data
child_splitter = RecursiveCharacterTextSplitter(chunk_size=750)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [29]:
from langchain_qdrant import QdrantVectorStore

client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = QdrantVectorStore(
    collection_name="full_documents", embedding=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [30]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [31]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [32]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [33]:
parent_document_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided complaints, appears to be problems with federal student loan servicing. Specific sub-issues include errors in loan balances, misapplied payments, wrongful denials of payment plans, inconsistencies and errors in credit reporting, unauthorized interest rate increases, and difficulties in handling loan details such as consolidation and account verification. These issues reflect systemic challenges in the administration and accuracy of loan servicing and reporting.'

In [34]:
parent_document_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, all the complaints listed indicate that they were not handled in a timely manner. Specifically, the complaints about delays and lack of response mention that no one has reached out to the complainants within the expected timeframes (e.g., "I have not heard from anyone," and delays of several days or weeks). Additionally, the complaint from April 11, 2025, about a dispute settlement sent over 30 days ago, also suggests a delay.\n\nTherefore, yes, some complaints did not get handled in a timely manner.'

In [35]:
parent_document_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People often fail to pay back their loans due to various reasons, including financial hardship, lack of proper information or support, and issues related to the management of their loans. For example, some individuals experience severe financial difficulties after graduation, making it challenging to make loan payments, especially when they rely on deferment or forbearance that increases interest. Others face problems like misrepresentations about the value of their education, institutional instability, or the school's failure to provide adequate financial counseling, which can lead to unexpected long-term financial consequences. Additionally, issues such as lack of clear communication from loan servicers, legal challenges, or unverified debt reporting can also contribute to difficulties in repayment."

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [36]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [37]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [38]:
ensemble_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the complaints provided, appears to be "Dealing with your lender or servicer," which often involves problems such as incorrect handling of payments, bad information about loan terms, issues with loan transfers, or lack of proper communication and transparency. Many complaints also highlight issues like unfair or predatory practices, errors in loan balances, difficulty in obtaining accurate information, and mishandling of loan adjustments or consolidations.'

In [39]:
ensemble_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints data, there are multiple instances indicating that complaints did not get handled in a timely manner. Specifically:\n\n- Some complaints received responses marked as "No" for being timely, such as complaint ID 12935889 and 12654977, which both explicitly state the response was delayed beyond the required timeframe.\n- Other complaints, like ID 12739706 and 12823876, while marked as timely, still show issues with delayed responses or ongoing unresolved issues, indicating a pattern of delays or lack of resolution.\n- There are numerous complaints where consumers experienced significant delays (weeks, months, or years) without resolution, and several complaints include requests for escalation or supervisor review due to inadequate handling.\n\nIn summary, yes, several complaints did not get handled in a timely manner, and multiple complaints indicate ongoing delays or unresolved issues despite attempts by consumers to follow up.'

In [40]:
ensemble_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for various reasons, including:\n\n- Lack of proper notification about payment due dates, account transfers, or changes in loan servicing, leading to unawareness of when payments should start.\n- Difficulties in managing payment plans, often due to limited options offered by lenders, such as only being directed to ineligible forbearance or deferment, which can make interest accrue and increase debt over time.\n- Mismanagement or errors by loan servicers, including incorrect account information, misapplied payments, or failure to communicate important updates, resulting in late payments or negative credit reporting.\n- Financial hardships, such as job loss, low income, health issues, or unforeseen expenses, making timely repayment unfeasible.\n- High interest accrual during forbearance or deferment periods, which can cause the total debt to grow beyond the original balance.\n- Lack of transparency and difficulty in understanding the terms of repaym

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [41]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [42]:
semantic_documents = semantic_chunker.split_documents(loan_complaint_data[:20])

Let's create a new vector store.

In [43]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="Loan_Complaint_Data_Semantic_Chunks"
)

We'll use naive retrieval for this example.

In [44]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [45]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [46]:
semantic_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the complaints provided, appears to be problems related to the handling and servicing of federal student loans. Specific recurring issues include:\n\n- Struggling to repay loans due to administrative or process delays\n- Problems with repayment plans and inaccurate payment calculations\n- Difficulties with loan reporting and errors affecting credit scores\n- Issues with loan forgiveness, discharge, or discharge being delayed or mishandled\n- Poor communication and transparency from loan servicers\n- Errors or disputes related to loan account status and reporting\n\nOverall, many complaints suggest that borrowers frequently experience frustration with how their loans are managed, including delays, inaccuracies, and a lack of clear communication from loan servicers.\n\nIf you have any further questions or need assistance, feel free to ask!'

In [47]:
semantic_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, all of the complaints about handling times were marked as "Yes" under the "Timely response?" field, indicating that the complaints received responses from the companies within an appropriate timeframe. However, many complaints also mention ongoing issues, errors, or failures to resolve the underlying issues despite timely responses, but there is no explicit indication that any complaints were left unhandled or unresolved in a timely manner.\n\nTherefore, the information suggests that while responses were generally timely, some complaints may still involve unresolved concerns. If you are asking whether any complaints *did not* get handled in a timely manner, the data does not provide evidence of such cases.\n\n**Answer:** No, there is no indication from the provided data that any complaints did not get handled in a timely manner.'

In [48]:
semantic_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans for various reasons, including issues with loan servicing, miscommunication, deliberate stalling or delays by loan servicers, errors and improper handling of their payments, legal disputes over the legitimacy of the debt, failures to verify or correct account information, and breaches of privacy and data security. Some borrowers also faced increased payment amounts without proper re-amortization after forbearance ended, or found that their loans were incorrectly reported as delinquent or in default due to administrative errors. Overall, lack of transparency, poor communication, administrative errors, and alleged unfair or deceptive practices contributed to borrowers' inability to successfully repay their loans."

#### ❓ Question #3:

If sentences are short and highly repetitive (e.g., FAQs), how might semantic chunking behave, and how would you adjust the algorithm?

- Short, highly repetitive sentences may all be lumped together into a few big chunks because they don't have much distance from each other. I would try the `gradient` threshold method ("useful when chunks are highly correlated with each other") and/or adjust `breakpoint_threshold_amount` and `min_chunk_size` parameters to get chunks split at an appropriate size.

# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.

In [82]:
# Based on Session 7's notebook (Synthetic_Data_Generation_RAGAS_&_LangSmith)
from uuid import uuid4

projectUuid = f"AIM - Advanced Retrieval tournament - {uuid4().hex[0:8]}"
os.environ["LANGCHAIN_PROJECT"] = projectUuid
os.environ["LANGSMITH_PROJECT"] = projectUuid

os.environ["LANGCHAIN_TRACING_V2"] = "true"

os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGCHAIN_TRACING_v2"] = "true"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com/"
os.environ["LANGSMITH_ENDPOINT"] = "https://api.smith.langchain.com/"


# os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("LangChain API Key:")

In [56]:
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package punkt to /Users/kyle/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/kyle/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


True

### Generate synthetic dataset using Ragas (from our original `loan_complaint_data` from CSV)

In [None]:
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper

generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1-nano"))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

In [60]:
from ragas.testset import TestsetGenerator

generator = TestsetGenerator(llm=generator_llm, embedding_model=generator_embeddings)
dataset = generator.generate_with_langchain_docs(loan_complaint_data[:20], testset_size=10)

Applying SummaryExtractor:   0%|          | 0/14 [00:00<?, ?it/s]

Applying CustomNodeFilter:   0%|          | 0/20 [00:00<?, ?it/s]

Node c61fe901-29fa-4590-bd63-0f5b6f4ebafb does not have a summary. Skipping filtering.
Node 6879458b-cf33-4a51-ab86-1dcc6a3e42d5 does not have a summary. Skipping filtering.
Node 514dbe23-e0ea-4997-b5bb-f0766aef609a does not have a summary. Skipping filtering.
Node 8f5a9901-4331-4ac2-bada-64202412cd16 does not have a summary. Skipping filtering.
Node ed9de2e7-49ef-49ef-b56b-90fa9d2a8845 does not have a summary. Skipping filtering.
Node 70e986c9-610b-4a47-98ee-08b4f2f9028e does not have a summary. Skipping filtering.


Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:   0%|          | 0/54 [00:00<?, ?it/s]

Applying OverlapScoreBuilder:   0%|          | 0/1 [00:00<?, ?it/s]

Generating personas:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Scenarios:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Samples:   0%|          | 0/10 [00:00<?, ?it/s]

In [61]:
dataset.to_pandas()

Unnamed: 0,user_input,reference_contexts,reference,synthesizer_name
0,Can you explain how the COVID-19 forbearance p...,[The federal student loan COVID-19 forbearance...,The federal student loan COVID-19 forbearance ...,single_hop_specifc_query_synthesizer
1,How does Aidvantage handle discrepancies in ID...,[I submitted my annual Income-Driven Repayment...,"According to the provided context, the borrowe...",single_hop_specifc_query_synthesizer
2,"How does a violation of FERPA, such as the com...",[My personal and financial data was compromise...,The context indicates that a violation of FERP...,single_hop_specifc_query_synthesizer
3,According to the information provided by Stude...,"[According to Studentaid.gov, Im to get an ema...","According to Studentaid.gov, you are to receiv...",single_hop_specifc_query_synthesizer
4,How does 15 U.S.C. 16811 relate to the credit ...,[I am writing to formally dispute inaccurate i...,"15 U.S.C. 16811, part of the Fair Credit Repor...",single_hop_specifc_query_synthesizer
5,how aidvantage mess up my student loans and wh...,[<1-hop>\n\nI submitted my annual Income-Drive...,"Based on the context, I submitted my IDR recer...",multi_hop_specific_query_synthesizer
6,How does NelNet's handling of auto-debit payme...,[<1-hop>\n\nI keep setting up auto-debit with ...,The context describes multiple instances where...,multi_hop_specific_query_synthesizer
7,How does the continued reporting and collectio...,[<1-hop>\n\nThis account was transferred to Ne...,The continued reporting and collection of stud...,multi_hop_specific_query_synthesizer
8,How do federal statutes like 15 U.S.C. 1681i a...,[<1-hop>\n\nThis is a formal legal demand for ...,The context illustrates that under 15 U.S.C. 1...,multi_hop_specific_query_synthesizer
9,How can I address the issue of my student loan...,"[<1-hop>\n\nI set up autopay with AidVantage, ...","Based on the information provided, you set up ...",multi_hop_specific_query_synthesizer


In [63]:
from langsmith import Client

client = Client()

dataset_name = "Loan tournament Synthetic Data"

langsmith_dataset = client.create_dataset(
    dataset_name=dataset_name,
    description="Loan tournament Synthetic Data"
)

In [64]:
for data_row in dataset.to_pandas().iterrows():
  client.create_example(
      inputs={
          "question": data_row[1]["user_input"]
      },
      outputs={
          "answer": data_row[1]["reference"]
      },
      metadata={
          "context": data_row[1]["reference_contexts"]
      },
      dataset_id=langsmith_dataset.id
  )

In [70]:
# Use Ragas for eval (from Session 8 notebook Evaluating_RAG_with_RAGAS)
# Ignore previous 2 cells (to do a LangSmith experiment. Wrong metrics)
import time
import copy

In [84]:
# Tracing fix from Session 4 Discord
from langchain.callbacks import LangChainTracer
from langchain.schema.runnable import RunnableConfig
tracer = LangChainTracer(project_name=os.environ["LANGSMITH_PROJECT"])

### Run SGD examples through each retriever (and trace with tags in LangSmith so we can compare latency and cost)

In [None]:
naive_dataset = copy.deepcopy(dataset)

# i = 0
for test_row in naive_dataset:
#   if i<3:
    response = naive_retrieval_chain.invoke({"question" : test_row.eval_sample.user_input}, {
    "tags" : ["Naive retriever"],
    "callbacks": [tracer]
})
    test_row.eval_sample.response = response["response"]
    test_row.eval_sample.retrieved_contexts = [context.page_content for context in response["context"]]
    time.sleep(2) # To try to avoid rate limiting.
    # i += 1

In [87]:
bm25_dataset = copy.deepcopy(dataset)

for test_row in bm25_dataset:
  response = bm25_retrieval_chain.invoke({"question" : test_row.eval_sample.user_input}, {
        "tags" : ["BM25 retriever"],
        "callbacks": [tracer]
    })
  test_row.eval_sample.response = response["response"]
  test_row.eval_sample.retrieved_contexts = [context.page_content for context in response["context"]]
  time.sleep(2) # To try to avoid rate limiting.

In [88]:
compression_dataset = copy.deepcopy(dataset)

for test_row in compression_dataset:
  response = contextual_compression_retrieval_chain.invoke({"question" : test_row.eval_sample.user_input}, {
        "tags" : ["Compression retriever"],
        "callbacks": [tracer]
    })
  test_row.eval_sample.response = response["response"]
  test_row.eval_sample.retrieved_contexts = [context.page_content for context in response["context"]]
  time.sleep(2) # To try to avoid rate limiting.

In [89]:
multi_query_dataset = copy.deepcopy(dataset)

for test_row in multi_query_dataset:
  response = multi_query_retrieval_chain.invoke({"question" : test_row.eval_sample.user_input}, {
        "tags" : ["Multi query retriever"],
        "callbacks": [tracer]
    })
  test_row.eval_sample.response = response["response"]
  test_row.eval_sample.retrieved_contexts = [context.page_content for context in response["context"]]
  time.sleep(2) # To try to avoid rate limiting.

In [90]:
parent_document_dataset = copy.deepcopy(dataset)

for test_row in parent_document_dataset:
  response = parent_document_retrieval_chain.invoke({"question" : test_row.eval_sample.user_input}, {
        "tags" : ["Parent retriever"],
        "callbacks": [tracer]
    })
  test_row.eval_sample.response = response["response"]
  test_row.eval_sample.retrieved_contexts = [context.page_content for context in response["context"]]
  time.sleep(2) # To try to avoid rate limiting.

In [None]:
ensemble_dataset = copy.deepcopy(dataset)

for test_row in ensemble_dataset:
  response = ensemble_retrieval_chain.invoke({"question" : test_row.eval_sample.user_input}, {
        "tags" : ["Ensemble retriever"],
        "callbacks": [tracer]
    })
  test_row.eval_sample.response = response["response"]
  test_row.eval_sample.retrieved_contexts = [context.page_content for context in response["context"]]
  time.sleep(2) # To try to avoid rate limiting.

In [115]:
naive_dataset.samples[0].eval_sample.response

"The COVID-19 forbearance program for federal student loans was an emergency measure that paused loan payments and set interest accrual to zero, providing temporary relief to borrowers during the pandemic. This meant that during the forbearance period, borrowers did not have to make payments, and interest was not accruing on their loans, which helped reduce financial stress.\n\nHowever, once the forbearance ended, many borrowers faced challenges related to re-amortization of their loan payments. Re-amortization is the process of recalculating monthly payments based on the current loan balance, interest rate, and term. It is important because it ensures that the repayment plan accurately reflects the borrower's loan status after the forbearance or any deferment period. Without timely re-amortization, borrowers might see their monthly payments increase significantly, as in the complaint where a borrower’s payment nearly doubled due to delayed re-amortization after the forbearance ended.\

In [94]:
bm25_dataset.samples[0].eval_sample.response

AIMessage(content='The COVID-19 forbearance program for federal student loans was designed to temporarily pause loan payments and interest accumulation during the pandemic, providing relief to borrowers by suspending their obligations without penalties. However, once the forbearance ended, many borrowers faced challenges due to a lack of automatic re-amortization of their loans, which is the process of recalculating their monthly payments based on the remaining principal and the new repayment terms.\n\nRe-amortization is important because it ensures that when the forbearance ends, the borrower’s payment amount is adjusted to reflect their current loan balance and repayment schedule. Without re-amortization, some borrowers may experience sudden and significant increases in their monthly payments, which can cause financial hardship. For example, as noted in complaints, some borrowers saw their payments nearly double after forbearance, even though their circumstances may have made such pa

In [None]:
# Response object needs to be a string (Not AIMessage object) for pandas (should've used response["response"].content)
for test_row in naive_dataset:
    test_row.eval_sample.response = test_row.eval_sample.response.content
for test_row in bm25_dataset:
    test_row.eval_sample.response = test_row.eval_sample.response.content
for test_row in compression_dataset:
    test_row.eval_sample.response = test_row.eval_sample.response.content
for test_row in multi_query_dataset:
    test_row.eval_sample.response = test_row.eval_sample.response.content
for test_row in parent_document_dataset:
    test_row.eval_sample.response = test_row.eval_sample.response.content
for test_row in ensemble_dataset:
    test_row.eval_sample.response = test_row.eval_sample.response.content

### Run RAGAS evaluations to get retrieval-specific metrics

In [101]:
from ragas import EvaluationDataset

In [97]:
from ragas.llms import LangchainLLMWrapper
from ragas import evaluate, RunConfig

custom_run_config = RunConfig(timeout=360)

evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1-mini"))

In [98]:
# Only use metrics that relate to retrieval
from ragas.metrics import LLMContextPrecisionWithReference, LLMContextRecall, ContextEntityRecall

# NoiseSensitivity
# ResponseRelevancy, 
# Faithfulness, 
# FactualCorrectness, 

In [117]:
naive_evaluation_dataset = EvaluationDataset.from_pandas(naive_dataset.to_pandas())
result = evaluate(
    dataset=naive_evaluation_dataset,
    metrics=[LLMContextPrecisionWithReference(), LLMContextRecall(), ContextEntityRecall()],
    llm=evaluator_llm,
    run_config=custom_run_config
)
result

Evaluating:   0%|          | 0/30 [00:00<?, ?it/s]

{'llm_context_precision_with_reference': 0.9194, 'context_recall': 0.8567, 'context_entity_recall': 0.4665}

In [119]:
bm25 = EvaluationDataset.from_pandas(bm25_dataset.to_pandas())
result = evaluate(
    dataset=bm25,
    metrics=[LLMContextPrecisionWithReference(), LLMContextRecall(), ContextEntityRecall()],
    llm=evaluator_llm,
    run_config=custom_run_config
)
result

Evaluating:   0%|          | 0/30 [00:00<?, ?it/s]

{'llm_context_precision_with_reference': 0.8611, 'context_recall': 0.7367, 'context_entity_recall': 0.3992}

In [120]:
compression_evaluation_dataset = EvaluationDataset.from_pandas(compression_dataset.to_pandas())
result = evaluate(
    dataset=compression_evaluation_dataset,
    metrics=[LLMContextPrecisionWithReference(), LLMContextRecall(), ContextEntityRecall()],
    llm=evaluator_llm,
    run_config=custom_run_config
)
result

Evaluating:   0%|          | 0/30 [00:00<?, ?it/s]

{'llm_context_precision_with_reference': 1.0000, 'context_recall': 0.8090, 'context_entity_recall': 0.4572}

In [121]:
multi_query_evaluation_dataset = EvaluationDataset.from_pandas(multi_query_dataset.to_pandas())
result = evaluate(
    dataset=multi_query_evaluation_dataset,
    metrics=[LLMContextPrecisionWithReference(), LLMContextRecall(), ContextEntityRecall()],
    llm=evaluator_llm,
    run_config=custom_run_config
)
result

Evaluating:   0%|          | 0/30 [00:00<?, ?it/s]

Exception raised in Job[2]: TimeoutError()


{'llm_context_precision_with_reference': 0.8726, 'context_recall': 0.8967, 'context_entity_recall': 0.4728}

In [122]:
parent_document_evaluation_dataset = EvaluationDataset.from_pandas(parent_document_dataset.to_pandas())
result = evaluate(
    dataset=parent_document_evaluation_dataset,
    metrics=[LLMContextPrecisionWithReference(), LLMContextRecall(), ContextEntityRecall()],
    llm=evaluator_llm,
    run_config=custom_run_config
)
result

Evaluating:   0%|          | 0/30 [00:00<?, ?it/s]

{'llm_context_precision_with_reference': 0.9833, 'context_recall': 0.7767, 'context_entity_recall': 0.4358}

In [123]:
ensemble_evaluation_dataset = EvaluationDataset.from_pandas(ensemble_dataset.to_pandas())
result = evaluate(
    dataset=ensemble_evaluation_dataset,
    metrics=[LLMContextPrecisionWithReference(), LLMContextRecall(), ContextEntityRecall()],
    llm=evaluator_llm,
    run_config=custom_run_config
)
result

Evaluating:   0%|          | 0/30 [00:00<?, ?it/s]

Exception raised in Job[17]: LLMDidNotFinishException(The LLM generation was not completed. Please increase try increasing the max_tokens and try again.)


KeyboardInterrupt: 

Exception raised in Job[26]: TimeoutError()


### Analysis

| Retrieval Method | Cost (total tokens) |	Median Latency (sec) |	llm_context_precision_with_reference |	context_recall | context_entity_recall | 
|-------------|----|---|----|-----------|----|
| Naive Retrieval |	91,537 | 4.84 |	0.9194 |	0.8567 |  0.4665 |
| BM25 | 47,524	| 4.98 |	0.8611 |	0.7367 |  0.3992 |
| Contextual Compression |	26,520 | 8.78 |	1.0000 |	0.8090 |  0.4572 |
| Multi-Query | 113,330	| 13.93 |	0.8726 |	0.8967 |  0.4728 |
| Parent Document | 43,969	| 8.31 |	0.9833 |	0.7767 |  0.4358 |
| Ensemble | 185,663	| 14.17  |	N/A  | N/A	 |  N/A  |




It will depend on whether our use-case and budget prioritize cost, latency, or performance, but overall, I'd say `Contextual Compression` is the best option for this data.

I'd want to verify that costs were calculated correctly (did they include our Cohere API usage?), but from the data I collected, Contextual Compression had by far the least cost, decent latency (much better than multi-query or ensemble), and very good performance (context precision was 1.0 which is excellent or a mistake, context recall and context entity recall were among the best performers and close to the leaders)

In a production analysis, I'd definitely evaluate Ensemble more. I'd use a smaller data set or higher token limit to see if its performance justifies its high cost (it exceeded token limits in Ragas evaluation)