# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

We'll also provide our OpenAI key, as well as our Cohere API key.

In [1]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

In [2]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

## Task 2: Data Collection and Preparation

We'll be using our Loan Data once again - this time the strutured data available through the CSV!

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

In [4]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

loader = CSVLoader(
    file_path=f"./data/complaints.csv",
    metadata_columns=[
      "Date received", 
      "Product", 
      "Sub-product", 
      "Issue", 
      "Sub-issue", 
      "Consumer complaint narrative", 
      "Company public response", 
      "Company", 
      "State", 
      "ZIP code", 
      "Tags", 
      "Consumer consent provided?", 
      "Submitted via", 
      "Date sent to company", 
      "Company response to consumer", 
      "Timely response?", 
      "Consumer disputed?", 
      "Complaint ID"
    ]
)

loan_complaint_data = loader.load()

for doc in loan_complaint_data:
    doc.page_content = doc.metadata["Consumer complaint narrative"]

Let's look at an example document to see if everything worked as expected!

In [5]:
loan_complaint_data[0]

Document(metadata={'source': './data/complaints.csv', 'row': 0, 'Date received': '03/27/25', 'Product': 'Student loan', 'Sub-product': 'Federal student loan servicing', 'Issue': 'Dealing with your lender or servicer', 'Sub-issue': 'Trouble with how payments are being handled', 'Consumer complaint narrative': "The federal student loan COVID-19 forbearance program ended in XX/XX/XXXX. However, payments were not re-amortized on my federal student loans currently serviced by Nelnet until very recently. The new payment amount that is effective starting with the XX/XX/XXXX payment will nearly double my payment from {$180.00} per month to {$360.00} per month. I'm fortunate that my current financial position allows me to be able to handle the increased payment amount, but I am sure there are likely many borrowers who are not in the same position. The re-amortization should have occurred once the forbearance ended to reduce the impact to borrowers.", 'Company public response': 'None', 'Company'

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "LoanComplaints".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [6]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    loan_complaint_data,
    embeddings,
    location=":memory:",
    collection_name="LoanComplaints"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [7]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [8]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [9]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-nano")

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [11]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [12]:
naive_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans appears to be problems related to handling and processing by loan servicers, including errors in loan balances, misapplied payments, incorrect loan status reporting, and poor communication or transparency. Many complaints also involve issues with dealing with lenders or servicers, such as receiving bad or inconsistent information about loans, errors in credit reporting, difficulties with repayment plans, and mishandling of loan transfer or privacy violations.'

In [13]:
naive_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Yes, based on the provided complaints, there were instances where complaints did not get handled in a timely manner. Specifically, at least one complaint was marked as "Not timely response," indicating it was not handled promptly. For example, the complaint received on 03/28/25 from Mohela regarding a loan application issue was marked as "No" under "Timely response," suggesting it was delayed beyond the expected time frame. Additionally, some complaints mention delays of several weeks or over a year in getting resolution.\n\nTherefore, yes, some complaints in this dataset did not get handled in a timely manner.'

In [14]:
naive_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People often failed to pay back their loans due to a combination of factors highlighted in the complaints:\n\n1. **Lack of Clear Communication and Awareness:** Many borrowers were not adequately informed about when their payments would resume, the specifics of their loan balances, or any changes in their loan status. For example, some were unaware their loans had gone into delinquency or were surprised by sudden reporting to credit bureaus.\n\n2. **Difficulty Accessing or Understanding Payment Options:** Several complainants faced challenges in setting up manageable repayment plans or navigating online systems that failed or provided confusing information. They often reported being locked out of portals or not receiving proper guidance on deferment, forbearance, or repayment options.\n\n3. **Interest Accumulation During Forbearance/Deferment:** Borrowers experienced their interest continuing to accrue even when payments were paused, which increased total debt and extended repayment pe

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [15]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(loan_complaint_data, )

We'll construct the same chain - only changing the retriever.

In [17]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [18]:
bm25_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans, specifically student loans in this case, appears to be problems related to dealing with lenders or servicers. Common sub-issues include disputes over fees charged, difficulty applying payments correctly (e.g., applying extra funds to principal), receiving incorrect or bad information about loans, and issues with loan approval or reimbursement related to school validity. Overall, issues related to mismanagement, miscommunication, or disputes with loan servicers seem to be the most prevalent.'

In [19]:
bm25_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, all of the complaints in the context were marked as responded to with responses labeled as "Closed with explanation" and "Timely response? Yes." Therefore, it appears that none of the complaints mentioned in the context were left unhandled or responded to outside of a timely manner.'

In [20]:
bm25_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for various reasons, including issues with payment plans, miscommunication or lack of communication from lenders or servicers, problems with account management such as automatic payments being canceled or not resumed, and being misled or steered into unfavorable payment options like forbearances. In some cases, borrowers were unaware of transfers between loan servicers, did not receive important notifications, or faced technical issues with payments that were not resolved, leading to missed or reversed payments. These complications often resulted in negative impacts on credit scores and feelings of being deceived or disenfranchised.'

In [28]:
bm25_retrieval_chain.invoke({"question" : "What does 'Closed with explanation' mean?"})["response"].content

"'Closed with explanation' means that the complaint has been reviewed and a response has been provided by the company, explaining the outcome or reason for closing the case. It indicates that the issue has been addressed or resolved to some extent, and no further action is currently being taken on that complaint."

It's not clear that this is better or worse, if only we had a way to test this (SPOILERS: We do, the second half of the notebook will cover this)

<div style="background-color: #204B8E; color: white; padding: 10px; border-radius: 5px;">

#### ❓ Question #1:

Give an example query where BM25 is better than embeddings and justify your answer.

</div>

<div style="background-color: #204B8E; color: white; padding: 10px; border-radius: 5px;">

#### Answer:

- Example Query: What does 'Closed with explanation' mean?
- Explanation: The example query is better than embeddings because it contains specific words or phrases that exactly appear in the dataset. BM25 works by matching the exact words in the query with the dataset or information from document/s. So it can easily find all complaints where the company response was "Closed with explanation" and return documents that give context about how and why that response was used.

</div>

## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [23]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-v3.5")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [24]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [25]:
contextual_compression_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, a common issue with loans, particularly student loans, is dealing with errors and misconduct by servicers, such as errors in loan balances, misapplied payments, wrongful denials of payment plans, and mishandling of account information. Additionally, difficulties arise from the accumulation of interest during forbearance or deferment, which can extend repayment periods and increase total debt. Overall, issues related to poor servicing, misinformation, and handling of payments seem to be prevalent problems.'

In [26]:
contextual_compression_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, yes, there are complaints that did not get handled in a timely manner. Several complaints mention delays of over a year or multiple months before receiving a response or resolution. For example, one complaint states it has been nearly 18 months with no resolution, and others mention waiting over 1 year for a response.'

In [27]:
contextual_compression_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans often due to a combination of factors including a lack of clear information about repayment obligations, difficulties with managing accumulating interest, and unforeseen issues related to loan transfers and communication failures. Specifically, some borrowers were unaware they needed to repay their loans, did not receive proper notifications or documentation, or faced complications like interest that continued to grow even when they couldn't afford to make payments. Additionally, poor communication from lenders and servicers about payment plans, loan status, and changes in loan management contributed to borrowers being unable to meet their repayment responsibilities."

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [29]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [30]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [31]:
multi_query_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the complaints provided, appears to be mismanagement and poor communication by loan servicers. Specific problems include:\n\n- Errors in loan balances, interest calculations, or misapplied payments.\n- Inaccurate or misleading information about loan status, balances, or repayment terms.\n- Lack of proper notification or communication regarding account status, default, or transfer of loans.\n- Difficulties in applying payments correctly, especially applying extra funds to principal or paying off loans early.\n- Wrongful reporting to credit bureaus, such as incorrect default status or late payments without proper notice.\n- Mishandling of repayment plans and failure to provide accurate information about options.\n- Unauthorized transfer and servicing of loans without borrower consent or notification.\n- Failure to properly address disputes or rectify errors, leading to credit damage and financial hardship.\n\nOverall, these issues highlight a p

In [32]:
multi_query_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

"Yes, based on the provided complaints, some complaints indicate that complaints were not handled in a timely manner. Specifically:\n\n- The complaint with 'Timely response?' marked as 'No' from MOHELA (Complaint ID: 12739706) states that the response was not timely, noting delays beyond the expected response time.\n- Similarly, the complaint from EdFinancial Services (Complaint ID: 12823876) also shows a 'Timely response?' marked as 'Yes' or 'No' depending on the specific case, but some explicitly mention delays of multiple weeks or over 30 days.\n- Furthermore, several complaints note that the company failed to respond or follow up within the expected timeframe, leading to ongoing issues and frustration.\n\nIn summary, multiple complaints confirm that some complaints did not get handled in a timely manner."

In [33]:
multi_query_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily due to issues such as administrative errors, miscommunication, and misconduct by loan servicers. Many borrowers were misinformed or not adequately informed about repayment options like Income-Driven Repayment plans, income-based repayment, or rehabilitation programs, leading them to be steered into long-term forbearances or unmanageable debt situations. Additionally, systemic problems such as errors in loan balances, misapplied payments, wrongful default reporting, and failure to properly notify borrowers contributed to defaults. In some cases, borrowers experienced financial hardship, unemployment, or health issues, and were not given sufficient support or guidance to manage their repayment, resulting in missed payments and defaults.'

<div style="background-color: #204B8E; color: white; padding: 10px; border-radius: 5px;">

#### ❓ Question #2:

Explain how generating multiple reformulations of a user query can improve recall.

</div>

<div style="background-color: #204B8E; color: white; padding: 10px; border-radius: 5px;">

### Answer:

- Reformulating a user query means creating different versions of the same question or like having rephrased same query. This can improve recall because some documents might use other words or phrases that mean the same thing. By having multiple query versions of the user query, the system has a better change of finding more relevant documents even if they don't exactly match the original question exactly.

</div>

## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [34]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = loan_complaint_data
child_splitter = RecursiveCharacterTextSplitter(chunk_size=750)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [35]:
from langchain_qdrant import QdrantVectorStore

client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = QdrantVectorStore(
    collection_name="full_documents", embedding=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [36]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [37]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [38]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [39]:
parent_document_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided context, appears to be problems related to the handling and management of student loans. Specifically, frequent issues include:\n\n- Struggling to repay loans due to financial hardship and lack of proper information about the long-term consequences.\n- Problems with loan consolidation, including lack of disclosure, unexpected payment amounts, and failure to provide clear terms.\n- Discrepancies and increases in interest rates, leading to confusion and unfair charges.\n- Incorrect reporting on credit reports, causing significant drops in credit scores.\n- General mismanagement and lack of transparency from loan servicers.\n\nOverall, a prevalent theme is inadequate communication, transparency, and proper management of student loans, which contribute to repayment difficulties and financial hardship for borrowers.'

In [40]:
parent_document_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, yes, several complaints were not handled in a timely manner. Specifically, the complaint with ID 12709087 received on 03/28/25 was marked as "No" for timely response, and the complaint with ID 12935889 received on 04/11/25 also was marked as "No." \n\nAdditionally, multiple complaints describe delays, very long wait times on calls (sometimes hours), and a failure to receive responses within the expected timeframes. Therefore, it appears that some complaints did not get handled promptly.'

In [41]:
parent_document_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People often fail to pay back their loans due to various challenges, such as financial hardship, unemployment, or mismanagement of the loan process. In the provided context, specific reasons include:\n\n1. **Financial Hardship and Unemployment:** For example, a borrower enrolled in a private college that closed unexpectedly and was misled about the value of the degree, resulting in difficulties securing employment and making loan payments.\n\n2. **Lack of Transparency and Misrepresentation:** Borrowers were misinformed about the manageability of their loans, the long-term consequences, or the status of their school, which affected their ability to prepare financially.\n\n3. **Institutional Issues and College Closure:** Colleges facing financial instability or closing can lead borrowers to struggle with repayment, especially if they were misled about the institution’s stability and job prospects after graduation.\n\n4. **Administrative and Servicing Errors:** Problems such as loans bei

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [42]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [43]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [44]:
ensemble_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issues with student loans, based on the complaints data provided, appear to revolve around the following themes:\n\n1. **Dealing with lenders or servicers:** Many complaints involve poor communication, lack of transparency, or mishandling of loan accounts, including incorrect reporting of status, delinquencies, or balances.\n\n2. **Errors in loan information:** Multiple complaints highlight incorrect data on credit reports, inaccurate account statuses (e.g., showing delinquent when in fact loans are current), or discrepancies in loan balances and interest calculations.\n\n3. **Problems with payment handling:** Borrowers report issues applying payments correctly, only being able to pay interest rather than principal, or being unable to pay off smaller loans faster due to servicer restrictions.\n\n4. **Misleading or incomplete information:** Complaints often involve loan servicers providing false or misleading information about loan terms, interest accrual, or repayment 

In [45]:
ensemble_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, several complaints indicate delays or failures in handling issues in a timely manner. For example:\n\n- The complaint with ID 12709087 from 03/28/25 reports that the complainant\'s application was unprocessed for over 15 days, significantly beyond the 15-day response window.\n- The complaint with ID 12975634 from 04/14/25 states that Maximus Federal Services (Aidvantage) responded to a complaint as "Closed with explanation" but indicates that the response was not timely, and the issue remained unresolved after months.\n- Other complaints mention long wait times, unresolved issues spanning over months (e.g., nearly 18 months without resolution), and failure to respond or take action within expected or legal timeframes.\n\nTherefore, yes, **some complaints did not get handled in a timely manner**.'

In [46]:
ensemble_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for several reasons, as reflected in the complaints:\n\n1. Lack of proper information and communication from loan servicers about repayment options, loan status, and upcoming payment requirements (e.g., not being notified about delinquency, not informing about available income-driven repayment plans, or mismanaging notifications about loan transfers).\n\n2. Difficulties with repayment plans, such as being forced into forbearance or deferment that allowed interest to accumulate, making the loan balance grow over time rather than decrease.\n\n3. unpleasant or unhelpful customer service, which failed to provide guidance or assistance in managing payments or adjusting repayment plans.\n\n4. Administrative errors and mishandling of loans, including incorrect or inconsistent reporting of account status, balances, and delinquency or default status.\n\n5. Inability to access or understand complex loan and account information due to lack of transparency, d

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [47]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [48]:
semantic_documents = semantic_chunker.split_documents(loan_complaint_data[:20])

Let's create a new vector store.

In [49]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="Loan_Complaint_Data_Semantic_Chunks"
)

We'll use naive retrieval for this example.

In [50]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [51]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [52]:
semantic_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided complaints, appears to be problems related to loan servicing and communication. Specifically, borrowers frequently report issues such as:\n\n- Difficulty in getting accurate or consistent information about their loan status, balances, payments, or repayment plans.\n- Problems with auto-debit payments, including auto-payments not being processed or discrepancies in payment amounts.\n- Lack of transparent communication about changes in loan status or servicer, leading to confusion and errors.\n- Unauthorized reporting or mishandling of personal and financial information.\n- Challenges in resolving disputes or correcting incorrect information on credit reports.\n\nOverall, issues with poor communication, administrative errors, and mishandling of borrower information seem to be most prevalent among these complaints.'

In [53]:
semantic_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, it appears that several complaints were marked as "Closed with explanation" and indicate that the responses were provided in a timely manner. For example, complaints involving Nelnet, Maximus Federal Services, and EdFinancial Services all specify that the response to the consumer was "Closed with explanation" and that the responses were timely ("Yes" under "Timely response?"). \n\nHowever, the complaint narratives show ongoing issues with handling the complaints, such as lack of response to certain issues or continued violations, despite the official classification of the response being "timely." \n\nSo, to directly answer the question: **Yes, some complaints were not handled in a timely manner,** or at least continued to have unresolved issues despite the official response being marked as timely and "Closed with explanation." The ongoing nature of some problems suggests that there may have been delays or inadequate handling in some cases.'

In [54]:
semantic_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for various reasons, as reflected in the complaints. Some common reasons include:\n\n1. Lack of accurate or transparent information from lenders or servicers, leading to confusion about their loan status or repayment obligations.\n2. Administrative or processing issues, such as missing payments, misreported account statuses, or difficulties in documenting eligibility for loan forgiveness.\n3. Problems with how payments are being handled, such as delays, re-amortization errors, or payment processing failures.\n4. Disputes over the legitimacy or legality of the debt, including claims that the loan reports are invalid due to legal issues or breach of privacy laws.\n5. Challenges arising from changes in loan status, such as loans being in default due to administrative errors or miscommunication.\n6. Frustration with servicing practices, including delays, unresponsive communication, or alleged stall tactics by some loan servicers.\n7. Personal financia

<div style="background-color: #204B8E; color: white; padding: 10px; border-radius: 5px;">

#### ❓ Question #3:

If sentences are short and highly repetitive (e.g., FAQs), how might semantic chunking behave, and how would you adjust the algorithm?

</div>

<div style="background-color: #204B8E; color: white; padding: 10px; border-radius: 5px;">

### Answer:
- If the sentences are short and very similar (e.g. FAQs), semantic chunking might group too many of them together, even if they are talking about different topics. This happens because the algorithm thinks they are semantically related since they use the same words over and over again. To adjust the algorithm, I would try switching from percentile to standard_deviation. That way, it won't merge unrelated sentences just because they are similar.
</div>

# 🤝 Breakout Room Part #2

<div style="background-color: #204B8E; color: white; padding: 10px; border-radius: 5px;">

### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against each other. 
You can use the loans or bills dataset.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

</div>

In [89]:
# !pip install pandas

### Setup keys and project for langchain tracing

In [101]:
import os
import getpass

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("LangChain API Key:")

In [109]:
from uuid import uuid4

os.environ["LANGCHAIN_PROJECT"] = f"PSI - Retrievers Evaluation"

### Load Data and convert to Langchain documents

In [104]:
import pandas as pd
from langchain_core.documents import Document

df = pd.read_csv("data/complaints.csv")
df = df[df["Consumer complaint narrative"].notnull()]

docs = [
    Document(page_content=row["Consumer complaint narrative"], metadata={"row": i})
    for i, row in df.head(20).iterrows()
]

In [71]:
# !pip install ragas langchain-openai langchain-community datasets

### Generate Synthetic Dataset with RAGAS

In [105]:
from ragas.testset import TestsetGenerator
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1-nano"))
embedding_model = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

generator = TestsetGenerator(llm=llm, embedding_model=embedding_model)
testset = generator.generate_with_langchain_docs(docs, testset_size=5)

testset_df = testset.to_pandas()
testset_df.head()

Applying SummaryExtractor:   0%|          | 0/14 [00:00<?, ?it/s]

Applying CustomNodeFilter:   0%|          | 0/20 [00:00<?, ?it/s]

Node fb734aaf-c248-49c7-a614-8c4d0ba13f54 does not have a summary. Skipping filtering.
Node 386395c0-348a-4fb4-a71e-8798bd57b270 does not have a summary. Skipping filtering.
Node 89980118-bcb9-4a15-aa12-30de39145c30 does not have a summary. Skipping filtering.
Node cd71181c-fd6a-4c3b-aba4-c0f4eb9aa9a7 does not have a summary. Skipping filtering.
Node 2bd282f9-f775-4ba3-a88c-8088bd20cb1a does not have a summary. Skipping filtering.
Node e385c7f9-7b17-459a-af12-24d6f9d12e87 does not have a summary. Skipping filtering.


Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:   0%|          | 0/54 [00:00<?, ?it/s]

Applying [CosineSimilarityBuilder, OverlapScoreBuilder]:   0%|          | 0/2 [00:00<?, ?it/s]

Generating personas:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Scenarios:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Samples:   0%|          | 0/6 [00:00<?, ?it/s]

Unnamed: 0,user_input,reference_contexts,reference,synthesizer_name
0,Why Nelnet not re-amortize my loans after forb...,[The federal student loan COVID-19 forbearance...,The federal student loan COVID-19 forbearance ...,single_hop_specifc_query_synthesizer
1,Help me understand how Aidvantage can give me ...,[I submitted my annual Income-Driven Repayment...,"According to the context, Aidvantage assigned ...",single_hop_specifc_query_synthesizer
2,How did the account transfer to Nelnet involve...,[<1-hop>\n\nThis account was transferred to Ne...,The account was transferred to Nelnet from XXX...,multi_hop_abstract_query_synthesizer
3,Did the account transfer to Nelnet happen afte...,[<1-hop>\n\nThis account was transferred to Ne...,The account was transferred to Nelnet from XXX...,multi_hop_abstract_query_synthesizer
4,How does the legal dispute involving NelNet an...,[<1-hop>\n\nXX/XX/XXXX I increased the amount ...,The context details a formal legal dispute whe...,multi_hop_specific_query_synthesizer


### Upload Dataset to Langsmith

In [None]:

from langsmith import Client
from langsmith.evaluation import evaluate as ls_evaluate, LangChainStringEvaluator
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema import StrOutputParser
from langchain_core.runnables import RunnableLambda
from operator import itemgetter

client = Client()
dataset_name = "Loan Complaints Retriever Dataset"

try:
    ds = client.read_dataset(dataset_name=dataset_name)
    print(f"ℹ️ Reusing existing dataset: {dataset_name}")
except Exception:
    ds = client.create_dataset(
        dataset_name=dataset_name,
        description="Synthetic Q&A pairs generated from loan complaints for retriever evaluation."
    )

    for row in testset_df.itertuples():
        client.create_example(
            inputs={"question": row.user_input},
            outputs={"answer": row.reference},
            dataset_id=ds.id
        )
    print(f"✅ Created dataset and uploaded {len(testset_df)} examples: {dataset_name}")


rag_prompt = ChatPromptTemplate.from_template("""
You are a helpful assistant. Use ONLY the provided context to answer the question.
If the answer is not in the context, say "I don't know".

Context:
{context}

Question: {question}
""")

llm = ChatOpenAI(model="gpt-4o-mini")

cap_docs = RunnableLambda(lambda docs: docs[:4])

def make_chain(retriever):
    return (
        {"context": itemgetter("question") | retriever | cap_docs, "question": itemgetter("question")}
        | rag_prompt
        | llm
        | StrOutputParser()
    )

# Define Retrievers
retrievers = {
    "Naive": naive_retriever,
    "BM25": bm25_retriever,
    "Multi-Query": multi_query_retriever,
    # "Rerank": compression_retriever, # Commented out because of rate limits
    "Parent": parent_document_retriever,
    # "Ensemble": ensemble_retriever, # Commented out because of rate limits
}


qa_eval = LangChainStringEvaluator("qa", config={"llm": ChatOpenAI(model="gpt-4o-mini")})

for name, retr in retrievers.items():
    print(f"\n🚀 Running LangSmith evaluation for: {name}")
    rag_chain = make_chain(retr)

    result = ls_evaluate(
        rag_chain.invoke, 
        data=dataset_name,
        evaluators=[qa_eval],
        metadata={"retriever": name, "revision_id": f"retriever_eval_{name}"},
        experiment_prefix=f"Retriever Eval - {name}",
    )


✅ Created dataset and uploaded 6 examples: Loan Complaints Retriever Dataset

🚀 Running LangSmith evaluation for: Naive
View the evaluation results for experiment: 'Retriever Eval - Naive-221940af' at:
https://smith.langchain.com/o/1416e1a2-8bd8-4452-a1d2-3cea46dfc419/datasets/f9ee957c-aed1-4107-b5f6-77676c208cc9/compare?selectedSessions=bfe3d4b3-e979-4fb4-ba9e-7432e176ed75




0it [00:00, ?it/s]


🚀 Running LangSmith evaluation for: BM25
View the evaluation results for experiment: 'Retriever Eval - BM25-fe256414' at:
https://smith.langchain.com/o/1416e1a2-8bd8-4452-a1d2-3cea46dfc419/datasets/f9ee957c-aed1-4107-b5f6-77676c208cc9/compare?selectedSessions=93a408fa-10b7-48ed-9fb4-84234d5ab042




0it [00:00, ?it/s]


🚀 Running LangSmith evaluation for: Multi-Query
View the evaluation results for experiment: 'Retriever Eval - Multi-Query-a0ff6f38' at:
https://smith.langchain.com/o/1416e1a2-8bd8-4452-a1d2-3cea46dfc419/datasets/f9ee957c-aed1-4107-b5f6-77676c208cc9/compare?selectedSessions=15535e35-6dae-44e2-9504-1ffe929e5e6e




0it [00:00, ?it/s]


🚀 Running LangSmith evaluation for: Parent
View the evaluation results for experiment: 'Retriever Eval - Parent-4f47eff1' at:
https://smith.langchain.com/o/1416e1a2-8bd8-4452-a1d2-3cea46dfc419/datasets/f9ee957c-aed1-4107-b5f6-77676c208cc9/compare?selectedSessions=702b7639-05ae-47ed-b21c-324d1f32fec6




0it [00:00, ?it/s]

##### HINTS:

- LangSmith provides detailed information about latency and cost.

## LangSmith Trace Screenshots

Below are the trace images captured from LangSmith for this project.

### Naive Retriever
![Naive Retriever](images/naive.png)

### BM25 Retriever
![BM25 Retriever](images/bm25.png)

### Multi-query Retriever
![Multi-query Retriever](images/multi-query.png)

### Parent Retriever
![Parent Retriever](images/parent.png)


<div style="background-color: #204B8E; color: white; padding: 10px; border-radius: 5px;">

### Analysis & Observations:
- Naive Retriever achieved the highest correctness (100%), showing it consistently returned accurate answers for all queries. It had moderate latency (4.145s) and relatively low token usage, making it both accurate and cost-efficient.

- BM25 Retriever was the fastest (2.921s) and cheapest in terms of latency and cost but had lower correctness (66.67%). Its keyword-matching nature makes it fast but less reliable for nuanced queries.

- Multi-Query Retriever scored high on correctness (83.33%) by leveraging query reformulations to improve recall. However, it had the highest latency (6.092s) since it executes multiple retrievals per question. Cost and tokens were moderate.

- Parent Retriever matched BM25’s correctness (66.67%) and had mid-range latency (3.466s). It excels at retrieving broader context but may bring in less focused information for precise Q&A.

### Conclusion:
For this dataset, Naive Retriever is the best overall choice, combining perfect correctness with reasonable latency and cost.

- BM25 Retriever is ideal if speed is the top priority and some drop in accuracy is acceptable.

- Multi-Query Retriever is a strong option for complex queries where reformulating the question can uncover more relevant context, though at the expense of speed.

PS: I only ran the evaluation for four retrievers to gather the metrics, as the other two retrievers were commented out due to encountering rate limit issues.
</div>