# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

We'll also provide our OpenAI key, as well as our Cohere API key.

In [1]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

In [2]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

## Task 2: Data Collection and Preparation

We'll be using our Loan Data once again - this time the strutured data available through the CSV!

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

In [3]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

loader = CSVLoader(
    file_path=f"./data/complaints.csv",
    metadata_columns=[
      "Date received", 
      "Product", 
      "Sub-product", 
      "Issue", 
      "Sub-issue", 
      "Consumer complaint narrative", 
      "Company public response", 
      "Company", 
      "State", 
      "ZIP code", 
      "Tags", 
      "Consumer consent provided?", 
      "Submitted via", 
      "Date sent to company", 
      "Company response to consumer", 
      "Timely response?", 
      "Consumer disputed?", 
      "Complaint ID"
    ]
)

loan_complaint_data = loader.load()

for doc in loan_complaint_data:
    doc.page_content = doc.metadata["Consumer complaint narrative"]

Let's look at an example document to see if everything worked as expected!

In [4]:
loan_complaint_data[0]

Document(metadata={'source': './data/complaints.csv', 'row': 0, 'Date received': '03/27/25', 'Product': 'Student loan', 'Sub-product': 'Federal student loan servicing', 'Issue': 'Dealing with your lender or servicer', 'Sub-issue': 'Trouble with how payments are being handled', 'Consumer complaint narrative': "The federal student loan COVID-19 forbearance program ended in XX/XX/XXXX. However, payments were not re-amortized on my federal student loans currently serviced by Nelnet until very recently. The new payment amount that is effective starting with the XX/XX/XXXX payment will nearly double my payment from {$180.00} per month to {$360.00} per month. I'm fortunate that my current financial position allows me to be able to handle the increased payment amount, but I am sure there are likely many borrowers who are not in the same position. The re-amortization should have occurred once the forbearance ended to reduce the impact to borrowers.", 'Company public response': 'None', 'Company'

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "LoanComplaints".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [5]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    loan_complaint_data,
    embeddings,
    location=":memory:",
    collection_name="LoanComplaints"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [6]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [7]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [8]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-nano")

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [9]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [10]:
naive_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the information provided, one of the most common issues with loans appears to be errors and mismanagement related to student loans. This includes problems such as incorrect or outdated information on credit reports, disagreements over balances and interest rates, issues with loan transfers or ownership without proper notification, and difficulties applying payments correctly. Additionally, borrowers frequently report problems with repayment plans, improper handling of loan data, and inadequate communication from loan servicers.\n\nIn summary, the most common issue with loans, particularly student loans, is mismanagement and mishandling of loan information, leading to errors, mismatched balances, and difficulties in repayment.'

In [11]:
naive_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Yes, based on the provided complaints, some complaints were not handled in a timely manner. Specifically, at least one complaint (complaint ID: 12709087) received a "No" response regarding timely response, indicating it was delayed beyond the expected timeframe. Additionally, multiple complaints mention ongoing issues with responses and resolutions taking longer than expected, such as unresolved account corrections, delayed responses to disputes, and failure to respond within the expected legal or company timeframes.'

In [12]:
naive_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People often failed to pay back their loans due to a combination of factors highlighted in the complaints:\n\n1. **Lack of Clear Communication and Notifications:** Many borrowers were not adequately informed about when their payments would resume, loan transfers, or changes in servicers. For example, some complaints mention not being notified when loans were transferred between companies or when repayment was supposed to restart, leading to unintentional delinquencies.\n\n2. **Difficulty in Managing Payment Plans:** Several complainants reported that their servicers did not offer flexible repayment options or did not help them reevaluate their payments based on their financial situations. This made it challenging for borrowers to meet their repayment obligations.\n\n3. **Financial Hardships and Unmanageable Interest:** Borrowers described prolonged financial hardships, stagnant wages, or economic downturns, making it impossible to afford payments. Additionally, high interest accumulat

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [13]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(loan_complaint_data, )

We'll construct the same chain - only changing the retriever.

In [14]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [15]:
bm25_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided information, the most common issue with loans appears to involve problems with how lenders or servicers handle dealing with the borrower, particularly regarding miscommunication or mismanagement of payments, loan information, or fees. Specific issues include:\n\n- Disputes over fees charged or incorrect fee application\n- Difficulties in applying payments correctly, especially concerning principal vs. interest\n- Receiving inaccurate or incomplete loan information\n- Problems with loan repayment terms or formulas leading to extended repayment periods\n- Receiving incorrect or misleading information about the status or history of the loan\n\nOverall, issues related to the handling and management of loans, especially concerning communication, fees, and repayment terms, seem to be the most common problems encountered by consumers.'

In [16]:
bm25_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, all the complaints listed indicate that the complaints were handled in a timely manner. Specifically, the responses from the companies were marked as "Yes" for being timely, and they were closed with explanations. Therefore, there is no evidence suggesting that any complaints were not handled in a timely manner.'

In [17]:
bm25_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People often fail to pay back their loans due to several issues highlighted in the complaints. These include problems with the communication and management of their loan accounts, such as being unenrolled from autopay without notice, receiving bad or misleading information about their loans, and experiencing repeated reversals of payments due to errors on the part of the servicers. Additionally, some borrowers have not been properly informed about their payment status or the transfer of their loans to new servicers, which can result in missed payments and negative impacts on their credit scores. In some cases, borrowers also faced difficulties obtaining assistance or clarification when trying to resolve these issues. Overall, failures in communication, errors in account management, and lack of timely, transparent assistance contribute to loan repayment failures.'

It's not clear that this is better or worse, if only we had a way to test this (SPOILERS: We do, the second half of the notebook will cover this)

#### ❓ Question #1:

Give an example query where BM25 is better than embeddings and justify your answer.


**Answer**


BM25 does absolute exact matches and has low semantic ambiguity. It is best for retrieval of questions with keywords that match exactly in the context, for example questions about technical specifications, legal or medical documents eg what are the onset symptoms of <insert awful disease name>. 

## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [18]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-v3.5")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [19]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [20]:
contextual_compression_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issues with loans involve problems related to dealing with lenders or servicers, such as receiving bad or inaccurate information, errors in loan balances, misapplied payments, wrongful denials of payment plans, lack of clear communication, and mishandling of loan data. Specifically, complaints often cite incorrect loan balances, unresolved disputes, unauthorized transfers, and privacy violations.'

In [21]:
contextual_compression_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, at least one complaint was handled in a timely manner. Specifically, the complaint regarding the issue with payments not being applied to the loan account was responded to by the company, EdFinancial Services, with a response marked as "Closed with explanation" and indicated as "Timely response? Yes." \n\nHowever, there are other complaints, such as the one involving delays in response for a request for a account review related to loan servicing issues, which have been open for over a year without resolution. The context suggests that these specific issues have not been handled in a timely manner, as they remain unresolved after extended periods.\n\nTherefore, yes, some complaints were handled in a timely manner, but not all.'

In [22]:
contextual_compression_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans for several reasons, including:\n\n1. Lack of Awareness: Borrowers often did not realize they had to repay the loans or were not properly informed by financial aid officers about the repayment obligations.\n\n2. Poor Communication from Servicers: Many borrowers reported not receiving notifications about payment due dates, changes in loan servicers, or important updates about their loans.\n\n3. Difficulty Managing Payments: Borrowers faced challenges in making payments due to financial hardships, stagnant wages, or increased interest that made payments insufficient to reduce the debt.\n\n4. Accumulation of Interest: Even when payments were made, interest continued to accrue, sometimes negating any progress in paying down the principal, especially when loans were deferred or in forbearance.\n\n5. Confusing or Incorrect Information: Discrepancies in account balances, unclear explanations of interest, and inaccuracies in credit reports contributed to 

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [23]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [24]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [25]:
multi_query_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issues with loans include:\n\n- Dealing with lenders or servicers, especially regarding trouble with payment handling, incorrect reporting, or mismanagement.\n- Problems applying additional payments, with payments often being directed to interest rather than principal, making repayment inefficient.\n- Issues stemming from forbearance steering, where borrowers are unnecessarily placed into forbearance instead of income-driven repayment plans, leading to increased balances due to accumulated interest.\n- Misclassification of loan types and errors in deferment or default status, often without proper communication.\n- Inaccurate or delayed information on loan balances, interest calculations, or account statuses.\n- Harassment or repeated unhelpful contact from servicers.\n- Discrepancies on credit reports and improper reporting of delinquency or default.\n- Mishandling of forgiveness or discharge programs, and confusion over loan transfer or 

In [26]:
multi_query_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, yes, there were several complaints indicating that complaints were not handled in a timely manner. Examples include:\n\n- Complaint at row 95 (submitted on 04/21/25): Dispute about a response taking over a year and no resolution after months of waiting.\n- Complaint at row 816 (submitted on 04/05/25): Acknowledgment that the complaint was still open nearly 18 months later with no resolution.\n- Complaint at row 816 (same as above): Mention of nearly 2.5 years of unresolved issues despite multiple follow-ups.\n- Complaint at row 128 (submitted on 04/04/25): Issues with a delayed response to a previous complaint, still unresolved after over 2-3 weeks.\n- Complaint at row 236 (submitted on 05/08/25): Multiple disputes over months with no resolution, the company failed to process documentation.\n- Complaint at row 423 (submitted on 04/24/25): Issues with unprocessed applications for over a year.\n- Complaint at row 674 (submitted on 05/14/25): Repeated fa

In [27]:
multi_query_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content



#### ❓ Question #2:

Explain how generating multiple reformulations of a user query can improve recall.

**Answer**

Often users vague questions or questions that are not properly worded. LLMs are good at writing prompts for other LLMs. We can leverage this to better word the query, add more keywords, use synonyms, hypernyms or concepts and structured that help with context retrieval

## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [28]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = loan_complaint_data
child_splitter = RecursiveCharacterTextSplitter(chunk_size=750)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [29]:
from langchain_qdrant import QdrantVectorStore

client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = QdrantVectorStore(
    collection_name="full_documents", embedding=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [30]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [31]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [32]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [33]:
parent_document_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided complaints, appears to be problems related to federal student loans, including errors in loan balances, misapplied payments, wrongful denials of payment plans, and issues with loan reporting and information accuracy. Many complainants also report difficulties with loan servicing, such as discrepancies in balances and interest rates, or inaccuracies in credit reports related to their loans.\n\nIn summary, a predominant issue is errors and misconduct in loan servicing, including incorrect reporting, unfair practices, and problems with account management.'

In [34]:
parent_document_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided context, all the complaints listed were marked as not handled in a timely manner. Specifically, the complaints with Complaint IDs 12709087 and 12935889 both have a "Timely response?" marked as "No," indicating that they were not handled promptly. Therefore, yes, some complaints did not get handled in a timely manner.'

In [35]:
parent_document_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People often fail to pay back their loans for various reasons, including severe financial hardship, lack of employment, or mismanagement of educational debt. For example, some individuals experienced financial difficulty after graduating and relying on deferment or forbearance, which increased the overall debt due to accumulated interest. Others suffered from misleading or inappropriate loan servicing practices, such as not being properly informed about repayment obligations or having their payments improperly reported or managed. Additionally, students who attended institutions that faced monetary instability or misrepresented the value of their education may find themselves unable to secure employment sufficient to repay their loans, leading to default or delayed payments.'

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [36]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [37]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [38]:
ensemble_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issues with loans, based on the complaints provided, seem to revolve around:\n\n- Dealing with loan servicer misconduct, including errors in loan balances, misapplied payments, and wrongful denials of payment plans.\n- Problems with information accuracy, such as incorrect loan balances, account status, or credit reporting errors.\n- Lack of communication from lenders or servicers, including not notifying borrowers of transfers, default status, or unpaid balances.\n- Difficulty in obtaining or verifying loan information, including missing documentation like Master Promissory Notes or payment histories.\n- Trouble with payment application, such as being unable to apply payments toward principal, or payments being directed improperly.\n- Unauthorized or improper transfers of loans between agencies or servicers without borrower consent or proper notification.\n- Issues related to loan modification, forgiveness, or discharge, especially when servicers fail to provide clear 

In [39]:
ensemble_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided data, yes, there are complaints indicating delays and issues with handling complaints or requests in a timely manner. Several complaints explicitly mention response times exceeding expected periods or delays in resolving issues. For example:\n\n- One complaint (Complaint ID: 12709087) from a consumer who reported that their application was still unprocessed despite multiple calls and follow-ups, and that they had been told it would take 5-7 business days multiple times, with no resolution as of the complaint date.\n- Other complaints mention response times of over 2-3 weeks, 4+ hour wait times, or delays of 10 days or more for promised actions, such as removal of negative credit reporting or correction of account errors.\n- Some complaints indicate that responses or corrections were "Closed with explanation" or that consumers had to escalate or follow up multiple times without resolution.\n\nTherefore, it is accurate to conclude that some complaints did not get h

In [40]:
ensemble_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for various reasons, often tied to systemic issues and mismanagement by loan servicers and lack of clear communication. Some common reasons include:\n\n1. **Lack of Notification and Communication:** Many borrowers were not properly notified about when their payments were due, changes in their account status, or transfer of loan management to new servicers. This lack of communication sometimes led to missed payments and delinquency notices appearing unexpectedly.\n\n2. **Complex and Opaque Servicing Practices:** Borrowers reported difficulty understanding their loan balances, interest accrual, and their options for repayment or deferment. Repeated steering into forbearance instead of income-driven repayment plans, often without full disclosure, increased their debt burden due to accruing interest.\n\n3. **Inaccurate or Poor Reporting:** Many experienced incorrect reporting of their account status—such as being marked delinquent when their accounts 

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [41]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [42]:
semantic_documents = semantic_chunker.split_documents(loan_complaint_data[:20])

Let's create a new vector store.

In [43]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="Loan_Complaint_Data_Semantic_Chunks"
)

We'll use naive retrieval for this example.

In [44]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [45]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [46]:
semantic_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

"Based on the provided data, the most common issue with loans appears to be related to problems with repayment and account management, such as:\n\n- Struggling to repay loans or problems with payment plans.\n- Inaccurate or improper reporting of account status (e.g., being reported as in default when not true).\n- Difficulties with loan servicer communication, transparency, and handling of accounts.\n- Errors in billing or payment processing, including auto-debit issues.\n- Disputes over loan legitimacy, privacy breaches, and unauthorized account activity.\n\nMany complaints highlight issues with loan servicers' handling of repayment, misreporting, and lack of clear communication."

In [47]:
semantic_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, yes, some complaints were not handled in a timely manner. Specifically, in the complaint from 05/04/25 regarding the transfer of an account to Nelnet, the consumer noted that despite multiple letters and acknowledgment of receipt, Nelnet never responded to the complaint nor provided answers. The company\'s response was "Closed with explanation," which suggests the complaint was not addressed satisfactorily or timely for the consumer. \n\nAdditionally, the complaint from 04/13/25 involving a legal dispute over student loan accounts reports violations of federal privacy statutes and illegal breaches, indicating ongoing issues with response and handling.\n\nOverall, at least some complaints experienced delays or lack of proper handling, implying that not all complaints were managed promptly.'

In [48]:
semantic_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People may fail to pay back their loans due to various reasons highlighted in the complaints, such as:\n\n- Lack of proper communication or transparency from lenders or servicers, leading to confusion and stress.\n- Disputes over the legitimacy or accuracy of the loan accounts or reported information.\n- Administrative errors or issues with payment processing, making it difficult to fulfill payment obligations.\n- Legal or contractual issues, such as loan accounts being reported improperly, in default unjustly, or being affected by legal breaches and privacy violations.\n- Difficulties in navigating complex or delayed processes related to loan forbearance, re-amortization, or forgiveness programs.\n- In some cases, borrowers are misled or face delays intentionally designed to cause them to give up, or they are misinformed about their loan status.\n\nOverall, failures to pay can stem from administrative problems, misinformation, legal complications, or a lack of clear and consistent co

#### ❓ Question #3:

If sentences are short and highly repetitive (e.g., FAQs), how might semantic chunking behave, and how would you adjust the algorithm?


**Answer**

If sentences are very similar then the sematic similarity will be consistently high. This could make it difficult to find a threshold to chunkify the documents. If the threshold is too high it might lead to the creation of too many small chunks that are very close in content and will cause context repetition in the vector store. Another possible outcome is that if the threshold is too low then the algorithm will produce few very large chunks and grouping different FAQs together. This would degrade retrieval as one query might return a very large chunk of data with multiple irrelevant FAQs. We could mitigate this by adding a chunk splitting rule based on the structure of the document: eg extracting based on keywords like "Question" and "Answer" or paragraphs or sentences

# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.

In [None]:
### YOUR CODE HERE