# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

We'll also provide our OpenAI key, as well as our Cohere API key.

In [1]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

In [2]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

## Task 2: Data Collection and Preparation

We'll be using our Loan Data once again - this time the strutured data available through the CSV!

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

In [3]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

loader = CSVLoader(
    file_path=f"./data/complaints.csv",
    metadata_columns=[
      "Date received", 
      "Product", 
      "Sub-product", 
      "Issue", 
      "Sub-issue", 
      "Consumer complaint narrative", 
      "Company public response", 
      "Company", 
      "State", 
      "ZIP code", 
      "Tags", 
      "Consumer consent provided?", 
      "Submitted via", 
      "Date sent to company", 
      "Company response to consumer", 
      "Timely response?", 
      "Consumer disputed?", 
      "Complaint ID"
    ]
)

loan_complaint_data = loader.load()

for doc in loan_complaint_data:
    doc.page_content = doc.metadata["Consumer complaint narrative"]

Let's look at an example document to see if everything worked as expected!

In [4]:
loan_complaint_data[0]

Document(metadata={'source': './data/complaints.csv', 'row': 0, 'Date received': '03/27/25', 'Product': 'Student loan', 'Sub-product': 'Federal student loan servicing', 'Issue': 'Dealing with your lender or servicer', 'Sub-issue': 'Trouble with how payments are being handled', 'Consumer complaint narrative': "The federal student loan COVID-19 forbearance program ended in XX/XX/XXXX. However, payments were not re-amortized on my federal student loans currently serviced by Nelnet until very recently. The new payment amount that is effective starting with the XX/XX/XXXX payment will nearly double my payment from {$180.00} per month to {$360.00} per month. I'm fortunate that my current financial position allows me to be able to handle the increased payment amount, but I am sure there are likely many borrowers who are not in the same position. The re-amortization should have occurred once the forbearance ended to reduce the impact to borrowers.", 'Company public response': 'None', 'Company'

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "LoanComplaints".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [5]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    loan_complaint_data,
    embeddings,
    location=":memory:",
    collection_name="LoanComplaints"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [6]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [7]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [8]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-nano")

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [9]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [10]:
naive_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issues with loans appear to be problems related to servicing and information accuracy. These include:\n\n- Errors in loan balances, misapplied payments, and wrongful denials of payment plans.\n- Incorrect or outdated information on credit reports, such as inappropriate delinquency status.\n- Difficulties in applying payments correctly, especially to principal versus interest.\n- Lack of transparency and communication regarding loan transfer, loan status, and interest accrual.\n- Disputes over fees, interest rates, and terms due to mismanagement or lack of proper notification.\n- Unlawful or unethical practices, such as misleading steering into forbearances or improper handling of loan data.\n\nTherefore, a key recurring issue is the mishandling and miscommunication related to loan servicing, which often leads to errors in account information, incorrect reporting, and payment application problems.\n\nIf I had to summarize the most common i

In [11]:
naive_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Yes, some complaints did not get handled in a timely manner. Specifically, at least one complaint received on 03/28/25 (Complaint ID: 12709087) was marked as "No" for timely response, indicating it was not addressed within the expected timeframe. Additionally, multiple complaints, such as the one received on 04/24/25 (Complaint ID: 13160766), were marked as "Yes" for timely response, showing they were handled promptly. Overall, there are instances where complaints were not handled in a timely manner.'

In [12]:
naive_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily due to several interconnected issues highlighted in the complaints:\n\n1. **Lack of Clear Information and Communication:** Many borrowers were not adequately informed about when and how their payments were to resume, especially after transferring loan servicers or when loans entered repayment following periods of forbearance or deferment. Several complaints mention not receiving proper notifications about loan start dates, payment due dates, or changes in loan servicers, leading to unintentional delinquencies.\n\n2. **Difficulty in Managing Payments and Repayment Plans:** Borrowers reported that available options, such as lowering payments through forbearance or deferment, resulted in accumulating interest, which increased their total debt. Others found it impossible to apply additional payments to principal, feeling that repayment methods were intentionally structured to prolong debt or maximize interest paid.\n\n3. **Interest Accumulat

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [18]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(loan_complaint_data, )

We'll construct the same chain - only changing the retriever.

In [19]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [20]:
bm25_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans appears to be problems related to dealing with lenders or servicers, such as disagreements over fees, difficulty in applying payments correctly, receiving incorrect or bad information about the loan, and challenges in resolving repayment or loan details. Specifically, complaints frequently mention issues like disputes over fees, improper application of payments, incorrect loan information, and lack of trust due to dishonesty or poor communication from loan servicers.'

In [21]:
bm25_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, all the complaints listed indicate that the company responded in a timely manner, with responses marked as "Yes" for timely response. There are no indications of complaints that were not handled in a timely manner.'

In [22]:
bm25_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for various reasons, including issues with loan management and communication. Specifically, some common reasons include:\n\n1. Problems with payment plans or forbearance: Borrowers experienced difficulties with their payment plans, sometimes being steered into incorrect forbearance options, which affected their ability to make or understand payments properly.\n2. Lack of communication from loan servicers: Borrowers reported not receiving important notices, such as emails or mail about loan transfers, billing resumption, or overdue status, leading to unintentional missed payments.\n3. Errors or issues with payment processing: Some borrowers faced repeated payment reversals or technical problems that prevented their payments from being successfully processed.\n4. Unawareness of account status: Borrowers were often unaware of changes like loan transfers to new servicers or discontinuation of autopay, which resulted in missed or late payments.\n5. Lac

It's not clear that this is better or worse, if only we had a way to test this (SPOILERS: We do, the second half of the notebook will cover this)

#### ❓ Question #1:

Give an example query where BM25 is better than embeddings and justify your answer.

## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [23]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-v3.5")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [24]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [25]:
contextual_compression_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided complaints, appears to be problems related to dealing with lenders or servicers, such as receiving bad or inaccurate information, errors in loan balances, misapplied payments, and issues with loan transfer and documentation. Specifically, many complaints highlight difficulties in understanding or verifying loan balances, managing interest accrual, and dealing with miscommunication or lack of transparency from loan servicers.'

In [26]:
contextual_compression_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, there are indications that some issues remained unresolved for extended periods. For example, one complaint mentions that it has been nearly 18 months without resolution, and the complainant is still awaiting a response and resolution regarding their account review and violations. Additionally, multiple complaints indicate that the issues have been ongoing for over a few weeks or months, despite responses indicating they were closed with explanation and marked as timely. \n\nWhile the responses say the complaints were handled "timely," the persistence of unresolved issues suggests that some complaints did not get fully resolved in a timely manner from the complainants\' perspectives.\n\nTherefore, yes, some complaints did not get handled in a timely manner.'

In [27]:
contextual_compression_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for several reasons, including:\n\n1. **Lack of understanding and information:** Borrowers were often unaware that they needed to repay their loans, as they were not informed by financial aid officers about the repayment responsibilities. Some did not receive clear or any communication from loan servicers about their repayment obligations, upcoming payments, or changes in loan ownership.\n\n2. **Administrative and communication issues:** There were instances where loan servicers, such as Nelnet and EdFinancial Services, failed to notify borrowers about payment due dates, loan ownership transfers, or account status updates. This lack of communication led borrowers to be unaware of their obligations or late payments.\n\n3. **Difficulty with payment options:** Borrowers were limited to options like forbearance or deferment, which resulted in accruing interest. Lowering payments extended repayment periods and increased total interest owed, making it h

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [28]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [29]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [30]:
multi_query_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issues with loans, based on the provided complaints, include:\n\n- Errors in loan balances and interest calculations\n- Misapplication of payments and incorrect account statuses\n- Unauthorized transfers and lack of proper notice\n- Difficulty obtaining accurate or complete documentation (e.g., Master Promissory Notes, payment history)\n- Problems with how payments are being handled, such as inability to apply extra payments to principal\n- Misleading or bad information about loan terms and balances\n- Inadequate communication and poor customer service from servicers\n- Issues with loan discharge, forgiveness, or discharge validation\n- Discrepancies and inaccuracies affecting credit reports and scores\n- Potential fraud and identity concerns related to loan accounts\n\nIn summary, errors, mismanagement, and poor communication are the most common issues faced with loans, especially student loans.'

In [31]:
multi_query_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, yes, some complaints did not get handled in a timely manner. Specifically:\n\n- Complaint ID \'12709087\' (regarding a delayed response to a graduated loan application) was marked as "Timely response? No."\n- Complaint ID \'13062402\' (dispute of incorrect information on credit report) was handled but mentioned delays (the complainant waited over 10 days for resolution despite promises of quick response).\n- Complaint ID \'12654977\' (regarding delinquent account reporting) was marked as "Timely response? No."\n- Multiple other complaints indicate delays or that responses were "Closed with explanation" after significant wait times, sometimes over months.\n  \nIn summary, yes, there were complaints that did not get handled in a timely manner, with some reports explicitly noting delays or lack of response despite expectations of prompt resolution.'

In [32]:
multi_query_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily due to a combination of factors highlighted in the complaints:\n\n1. **Lack of Adequate Information and Transparency:** Many borrowers were not properly informed about interest accrual, repayment options like income-driven repayment plans, or the consequences of forbearance. Several complaints mention being misled or not receiving clear information about their loan status, repayment obligations, or changes in servicers.\n\n2. **Interest Accumulation and Growing Balances:** Many borrowers experienced their loan balances ballooning due to compounded interest, especially when in forbearance, which prolongs debt repayment and increases total owed.\n\n3. **Negative Experiences with Loan Servicers:** Complaints frequently cite poor communication, unhelpful customer service, inability to access account information, errors in reporting delinquency, unauthorized transfers, or misapplication of payments—all contributing to financial hardship and c

#### ❓ Question #2:

Explain how generating multiple reformulations of a user query can improve recall.

## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [33]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = loan_complaint_data
child_splitter = RecursiveCharacterTextSplitter(chunk_size=750)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [34]:
from langchain_qdrant import QdrantVectorStore

client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = QdrantVectorStore(
    collection_name="full_documents", embedding=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [35]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [36]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [37]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [38]:
parent_document_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided context, appears to be related to problems with federal student loan servicing, specifically errors in loan balances, misapplied payments, wrongful denials of payment plans, and issues with incorrect information or reporting on credit reports. Many complaints highlight systemic breakdowns, such as errors in account information, unauthorized interest rate increases, and challenges in managing or verifying loan details.'

In [39]:
parent_document_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided context, all the complaints identified were marked as "Timely response?": "No" or "Yes," but notably, the first two complaints (rows 441 and 84) explicitly state "No," indicating they did not get handled in a timely manner. Both involve delays in responses from the companies MOHELA, where the consumer has not heard back within expected timeframes.\n\nTherefore, yes, some complaints did not get handled in a timely manner according to the document.'

In [40]:
parent_document_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily due to a variety of challenges, including:\n\n1. **Mismanagement and Poor Communication:** Some borrowers experienced issues like being billed for payments while still attending school or not being properly notified about payment requirements, leading to unintentional delinquencies.\n\n2. **Financial Hardship and Unemployment:** Many borrowers faced severe financial difficulties after graduation, making it difficult or impossible to keep up with loan payments, especially when their employment prospects were limited or they suffered health issues.\n\n3. **Misrepresentation and Lack of Transparency:** Some students were misled about the value of their education and the manageability of their student loans. They were not adequately informed of the long-term financial consequences or the instability of their educational institutions.\n\n4. **Inadequate Support and Counseling:** Borrowers reported that schools and loan servicers failed to pro

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [41]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [42]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [43]:
ensemble_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

"Based on the provided complaints and data, the most common issues with loans appear to be:\n\n- Dealing with your lender or servicer, including:\n  - Errors in loan balances and interest calculations\n  - Receiving bad or misleading information about the loan terms\n  - Difficulty applying payments correctly (e.g., only being able to apply to interest, or payments not being processed)\n  - Unauthorized transfers or mishandling of loan accounts\n  - Problems with loan status reporting (e.g., incorrectly reported as delinquent or in default)\n  - Lack of transparency and communication from servicers\n  \n- Issues related to loan management, such as:\n  - Discrepancies in loan balances and interest accumulation\n  - Inaccurate or missing payment history and account status information\n  - Problems with loan repayment plans and handling of forbearance or deferment\n  - Aggressive or confusing collection practices, including silent calls or harassment\n  - Incorrect credit reporting impact

In [44]:
ensemble_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, yes, there are multiple instances indicating complaints not handled in a timely manner. Specifically:\n\n- Several complaints received responses marked as "Closed with explanation" but with notes indicating that responses or corrections were delayed, or that the complainant had to follow up multiple times.\n- There are complaints explicitly mentioning delays such as "it has now been over 16 months since I submitted this application," or "about 18 months with no resolution."\n- One complaint (Complaint ID 12935889) was marked as "Timely response? No," indicating it was not handled within the expected timeframe.\n- Multiple complaints regarding unresolved issues, ongoing disputes, or delays in investigation despite repeated follow-up suggest that handling times were not sufficient or prompt.\n\nTherefore, yes, several complaints in the dataset did not get handled in a timely manner.'

In [45]:
ensemble_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily due to a combination of factors highlighted in the complaints:\n\n1. **Misleading or Lack of Information:** Borrowers were often not fully informed about their repayment options, interest accumulation, or the implications of forbearance and deferment. Many were steered into long-term forbearances or incorrect payment plans that allowed interest to capitalize and grow, making loans unmanageable over time.\n\n2. **Interest and Loan Management Issues:** Accumulating interest, compounded due to improper handling by servicers, led to balances increasing even when payments were made, leaving borrowers unable to pay off the loans within expected timeframes.\n\n3. **Servicing Errors and Lack of Communication:** Failures in communication, such as not notifying borrowers of payment resumption, transfer of loans between servicers, or default status, caused confusion and unintentional delinquency. Complaints also mention errors like incorrect report

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [46]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [47]:
semantic_documents = semantic_chunker.split_documents(loan_complaint_data[:20])

Let's create a new vector store.

In [48]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="Loan_Complaint_Data_Semantic_Chunks"
)

We'll use naive retrieval for this example.

In [49]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [50]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [None]:
semantic_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided complaints, appears to be related to problems with loan servicing and communication. Specifically, issues such as:\n\n- Struggling to repay or understanding repayment plans\n- Errors or delays in payment processing\n- Incorrect or inconsistent information about loan status or collections\n- Poor communication or lack of transparency from loan servicers\n- Disputes over loan account details, default status, or credit reporting\n\nMany complaints highlight difficulties in obtaining accurate information, delays in processing payments or applications, and frustration with communication breakdowns from loan servicers like Nelnet, EdFinancial Services, Maximus, and others.\n\nTherefore, the most common issue seems to be "Problems with loan servicing and communication accuracy."'

In [None]:
semantic_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Yes, based on the provided complaints, several complaints indicate that complaints did not get handled in a timely manner. Specifically, the complaint with ID 13331376 states that despite multiple letters sent to Nelnet, the company never responded to the complaints or corrected the errors, suggesting a lack of timely handling. Additionally, multiple complaints mention delays or ongoing issues with responses or resolutions, implying some complaints were not addressed promptly.'

In [None]:
semantic_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for various reasons, including issues such as receiving bad or unclear information from their lenders or servicers, problems with the handling and processing of payments, and complications arising from service disruptions or alleged illegal reporting practices. Some specific examples include:\n\n- Miscommunication or lack of transparency from lenders or servicers, leading borrowers to believe they were in forbearance or other relief programs when that was not properly documented.\n- Technical issues or errors with payment processing, where payments did not clear or were refused despite being sent.\n- Disputes over loan legitimacy or eligibility for forgiveness, which can stall repayments.\n- Concerns about illegal or improper reporting, such as debts being reported inaccurately or illegally, which may lead borrowers to challenge or delay repayment.\n- Borrowers experiencing financial hardship, especially when payment adjustments or re-amortization

#### ❓ Question #3:

If sentences are short and highly repetitive (e.g., FAQs), how might semantic chunking behave, and how would you adjust the algorithm?

# 🤝 Breakout Room Part #2


#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

In [2]:
import os
from getpass import getpass
os.environ["OPENAI_API_KEY"] = getpass("Please enter your OpenAI API key!")

In [3]:
from langchain_community.document_loaders import DirectoryLoader
from langchain_community.document_loaders import PyMuPDFLoader


path = "data/"
loader = DirectoryLoader(path, glob="*.pdf", loader_cls=PyMuPDFLoader)
docs = loader.load()


In [4]:
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1"))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

In [7]:
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1"))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

In [8]:
from ragas.testset import TestsetGenerator

generator = TestsetGenerator(llm=generator_llm, embedding_model=generator_embeddings)
dataset = generator.generate_with_langchain_docs(docs[:20], testset_size=10)

Applying HeadlinesExtractor:   0%|          | 0/17 [00:00<?, ?it/s]

Applying HeadlineSplitter:   0%|          | 0/20 [00:00<?, ?it/s]

unable to apply transformation: 'headlines' property not found in this node
unable to apply transformation: 'headlines' property not found in this node
unable to apply transformation: 'headlines' property not found in this node


Applying SummaryExtractor:   0%|          | 0/31 [00:00<?, ?it/s]

Property 'summary' already exists in node '07f939'. Skipping!
Property 'summary' already exists in node 'b88ff1'. Skipping!
Property 'summary' already exists in node 'c21924'. Skipping!
Property 'summary' already exists in node '0898c4'. Skipping!
Property 'summary' already exists in node 'e7c9f6'. Skipping!
Property 'summary' already exists in node '49277b'. Skipping!
Property 'summary' already exists in node 'b55916'. Skipping!
Property 'summary' already exists in node '5141f6'. Skipping!
Property 'summary' already exists in node '6ebb7d'. Skipping!
Property 'summary' already exists in node '9f47e2'. Skipping!
Property 'summary' already exists in node '1f2491'. Skipping!
Property 'summary' already exists in node '4a1b05'. Skipping!
Property 'summary' already exists in node 'c8a9b6'. Skipping!
Property 'summary' already exists in node '078f2d'. Skipping!


Applying CustomNodeFilter:   0%|          | 0/6 [00:00<?, ?it/s]

Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:   0%|          | 0/41 [00:00<?, ?it/s]

Property 'summary_embedding' already exists in node '1f2491'. Skipping!
Property 'summary_embedding' already exists in node '07f939'. Skipping!
Property 'summary_embedding' already exists in node 'e7c9f6'. Skipping!
Property 'summary_embedding' already exists in node '49277b'. Skipping!
Property 'summary_embedding' already exists in node '4a1b05'. Skipping!
Property 'summary_embedding' already exists in node 'b88ff1'. Skipping!
Property 'summary_embedding' already exists in node '6ebb7d'. Skipping!
Property 'summary_embedding' already exists in node 'b55916'. Skipping!
Property 'summary_embedding' already exists in node 'c21924'. Skipping!
Property 'summary_embedding' already exists in node '9f47e2'. Skipping!
Property 'summary_embedding' already exists in node '078f2d'. Skipping!
Property 'summary_embedding' already exists in node '5141f6'. Skipping!
Property 'summary_embedding' already exists in node '0898c4'. Skipping!
Property 'summary_embedding' already exists in node 'c8a9b6'. Sk

Applying [CosineSimilarityBuilder, OverlapScoreBuilder]:   0%|          | 0/2 [00:00<?, ?it/s]

Generating personas:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Scenarios:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Samples:   0%|          | 0/12 [00:00<?, ?it/s]