# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

We'll also provide our OpenAI key, as well as our Cohere API key.

In [1]:
from dotenv import load_dotenv
import os

load_dotenv()

True

In [2]:
# import os
# import getpass

# os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

In [3]:
# os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

## Task 2: Data Collection and Preparation

We'll be using our Loan Data once again - this time the strutured data available through the CSV!

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

In [4]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

loader = CSVLoader(
    file_path=f"./data/complaints.csv",
    metadata_columns=[
      "Date received", 
      "Product", 
      "Sub-product", 
      "Issue", 
      "Sub-issue", 
      "Consumer complaint narrative", 
      "Company public response", 
      "Company", 
      "State", 
      "ZIP code", 
      "Tags", 
      "Consumer consent provided?", 
      "Submitted via", 
      "Date sent to company", 
      "Company response to consumer", 
      "Timely response?", 
      "Consumer disputed?", 
      "Complaint ID"
    ]
)

loan_complaint_data = loader.load()

for doc in loan_complaint_data:
    doc.page_content = doc.metadata["Consumer complaint narrative"]

Let's look at an example document to see if everything worked as expected!

In [5]:
loan_complaint_data[0]

Document(metadata={'source': './data/complaints.csv', 'row': 0, 'Date received': '03/27/25', 'Product': 'Student loan', 'Sub-product': 'Federal student loan servicing', 'Issue': 'Dealing with your lender or servicer', 'Sub-issue': 'Trouble with how payments are being handled', 'Consumer complaint narrative': "The federal student loan COVID-19 forbearance program ended in XX/XX/XXXX. However, payments were not re-amortized on my federal student loans currently serviced by Nelnet until very recently. The new payment amount that is effective starting with the XX/XX/XXXX payment will nearly double my payment from {$180.00} per month to {$360.00} per month. I'm fortunate that my current financial position allows me to be able to handle the increased payment amount, but I am sure there are likely many borrowers who are not in the same position. The re-amortization should have occurred once the forbearance ended to reduce the impact to borrowers.", 'Company public response': 'None', 'Company'

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "LoanComplaints".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [6]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    loan_complaint_data,
    embeddings,
    location=":memory:",
    collection_name="LoanComplaints"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [7]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [8]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [9]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-nano")

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [10]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [11]:
naive_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issues with loans seem to involve mishandling by loan servicers, errors in loan balances and interest calculations, problems with repayment plans, miscommunication or lack of notification about account transfers, and incorrect or disputed information on credit reports. Many complaints also relate to the inability to apply payments correctly, unfair increases in interest, and mishandling of loan discharge or forgiveness.\n\nIn summary, the most frequent issue appears to be **dealing with errors and mismanagement by loan servicers, including inaccuracies in loan balances, interest, and repayment handling**.'

In [12]:
naive_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, yes, some complaints did not get handled in a timely manner. Specifically, at least one complaint from page 441 (Complaint ID: 12709087) was marked as "Timely response?": "No," indicating it was not handled within the expected timeframe. The complaint involved delays in processing a graduated loan application and communication issues, with the individual reporting they had not heard back despite waiting several weeks.\n\nAdditionally, multiple other complaints mention extended periods of unresolved issues, such as complaints from pages 716 and 810, where borrowers reported waiting over a year or nearly 18 months without resolution or response.\n\nTherefore, the answer is: Yes, some complaints did not get handled in a timely manner.'

In [13]:
naive_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans primarily due to a combination of factors highlighted in the complaints:\n\n1. **Miscommunication and Lack of Information:** Many borrowers were not adequately informed about when their repayment was to begin, the transfer of loan servicers without notification, or changes in their payment status. This lack of clarity led to unintentional delinquencies.\n\n2. **Compounding Interest and Unmanageable Payments:** Borrowers cited that interest continued to accrue even during forbearance or deferment periods, increasing overall debt. Lowering monthly payments often resulted in more interest accumulation, making it difficult to pay off the principal.\n\n3. **Financial Hardships and Economic Conditions:** Many borrowers experienced financial hardships, unemployment, or stagnant wages, which made their repayment plans unfeasible. For example, some relied on income-driven plans that were inaccessible or insufficient.\n\n4. **Issues with Loan Servicers:** C

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [14]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(loan_complaint_data, )

We'll construct the same chain - only changing the retriever.

In [15]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [16]:
bm25_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans appears to be problems related to dealing with lenders or servicers, specifically issues such as incorrect or misleading information, difficulties in applying payments properly, and disputes over loan details or fees. Multiple complaints mention challenges like incorrect fee charges, trouble with payment application, and disputes about loan balances or information provided.'

In [17]:
bm25_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, all the complaints included in the context received timely responses from the companies involved. Specifically, the complaints regarding issues with the loan service and validation responses indicate that the companies responded within the required timeframe ("Timely response?": "Yes"). Therefore, no complaints in the given context appear to have gone unhandled or were delayed in handling.'

In [18]:
bm25_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for various reasons, including issues with the loan servicing process, miscommunication, and problems with payment plans. Specific factors highlighted in the complaints include:\n\n- Being steered into incorrect types of forbearances or payment plans, leading to increased principal and interest.\n- Lack of communication from loan servicers regarding loan transfers, repayment status, or status updates, resulting in missed payments or misunderstandings.\n- Payment reversals or technical issues with online payments, which were often blamed on the borrower’s bank even when payments were correctly made.\n- Unclear or inadequate notification about changes in loan status, repayment requirements, or delinquency alerts, which led to borrowers being unaware of overdue payments.\n- In some cases, servicing methods or bank automation issues caused payments to not be processed correctly, resulting in delinquency and damage to credit scores.\n\nOverall, failure

It's not clear that this is better or worse, if only we had a way to test this (SPOILERS: We do, the second half of the notebook will cover this)

#### ❓ Question #1:

Give an example query where BM25 is better than embeddings and justify your answer.

#### Answer:

Named entities - companies, addresses, people, etc.
Embeddings of named entities can actually be misleading (imagine the embedding for Apple). Keyword search is a more direct way to retrieve data for named entities.

## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [19]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-v3.5")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [20]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [21]:
contextual_compression_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans appears to be problems related to dealing with lenders or servicers, including errors, miscommunications, and mishandling of information. Specific sub-issues mentioned include receiving bad information about the loan, incorrect account balances, lack of proper documentation, unauthorized transfers, privacy violations, and mishandling of data. These issues highlight that a frequent problem is the mishandling or miscommunication by loan servicers, which can lead to disputes, inaccuracies, and legal concerns.'

In [22]:
contextual_compression_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, yes, there are complaints that did not get handled in a timely manner. For example, the complaint about the student loan issues with Maximus Federal Services, Inc. has been open for over 18 months with no resolution, despite ongoing requests for review and response. Similarly, the complaint regarding unpaid payments with EdFinancial Services involves issues that have persisted for over 2-3 weeks or longer, with the customer still seeking resolution.'

In [23]:
contextual_compression_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for several reasons, including:\n\n1. Lack of Awareness and Information: Borrowers often were not adequately informed about their obligation to repay loans or the details of their loans, such as interest accrual and repayment requirements.\n2. Compounding Interest and Payment Options: The available options like forbearance or deferment allowed interest to continue accumulating, making the loans more difficult to pay off over time.\n3. Unmanageable Payments: Borrowers faced difficulties in affording monthly payments due to financial hardships, stagnant wages, or economic circumstances, which prevented timely repayment.\n4. Misleading or Inadequate Communication: Some borrowers were not notified properly about payment due dates, loan transfers, or the need to set up payment plans, leading to missed payments and reported late payments.\n5. Loan Complexity and Growing Balances: Discrepancies in account information, unclear statements, and confusing lo

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [24]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [25]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [26]:
multi_query_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided complaints, appears to be problems related to mishandling and lack of transparency by loan servicers and agencies. This includes:\n\n- Errors in loan balances and interest calculations\n- Misapplied payments and incorrect account information\n- Unauthorized transfer or reassignment of loans without notice\n- Bad or misleading information about loan terms, interest rates, and repayment options\n- Breaches of privacy rights, including unauthorized access to personal data and potential FERPA violations\n- Problems with loan forgiveness, discharge, or dispute process\n- Issues with loan collection efforts, including silent calls and harassment\n- Discrepancies and inaccuracies in reporting to credit bureaus\n\nOverall, many complaints highlight that borrowers experience frustration due to inadequate communication, errors, and questionable practices by loan management entities.'

In [27]:
multi_query_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints data, yes, some complaints were not handled in a timely manner. Specifically:\n\n- Complaint ID 12973003 associated with EdFinancial Services was responded to promptly, and the response was marked as "Timely response: Yes."\n- Complaint ID 12709087 related to MOHELA was also acknowledged as responded to "on time," with "Timely response: Yes."\n- However, Complaint ID 12654977 regarding MOHELA was marked as "Timely response: No," indicating it was not handled in a timely manner.\n- Complaint ID 13056764 involving EdFinancial Services was handled timely.\n- Complaint ID 12975634 concerning Maximus (Aidvantage) was responded to "on time."\n- The complaint about Maximus (Aidvantage) with Complaint ID 13091395 was handled timely.\n\nIn the detailed complaint narratives, there are several instances where delays are evident, either due to the complaint being left unresolved for over a year or response times being explicitly marked as "No" for timeliness. For 

In [28]:
multi_query_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans mainly due to financial hardships, mismanagement, or lack of clear information. Many borrowers faced challenges such as accumulating interest that negated payments, inability to afford increased monthly payments, or loans being placed in forbearance for extended periods, which led to the continued growth of their debt. Some were misled about repayment options, interest accrual, or loan forgiveness programs, making repayment seem unrealistic. Others encountered improper handling by servicers, such as errors in loan balances, inadequate communication, or wrongful reporting to credit bureaus, which further complicated their ability to repay. Overall, systemic issues, lack of transparency, and unforeseen economic difficulties contributed to borrowers' struggles to pay back their loans."

#### ❓ Question #2:

Explain how generating multiple reformulations of a user query can improve recall.

#### Answer:

Imagine you have 10 golden chunks in the vector DB. Each version of the user query may have an 80% probability of retrieving all the golden chunks, so each might pull 8 golden chunks. With n versions you have a 1 - (0.2)^n probability of getting all golden chunks (that probability is equal to the probability across a test set).

## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [29]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = loan_complaint_data
child_splitter = RecursiveCharacterTextSplitter(chunk_size=750)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [30]:
from langchain_qdrant import QdrantVectorStore

client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = QdrantVectorStore(
    collection_name="full_documents", embedding=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [31]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [32]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [33]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [34]:
parent_document_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided complaints, appears to be related to mishandling by loan servicers, including errors in loan balances, misapplied payments, wrongful denials of payment plans, and issues with loan reporting and legitimacy. Many complaints cite system errors, improper reporting, and misconduct by servicers as significant problems faced by borrowers.'

In [35]:
parent_document_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Yes, based on the provided complaints, several complaints were not handled in a timely manner. Specifically, complaints regarding the processing of student loan applications by MOHELA and issues with loan servicing by Aidvantage were both marked as "No" for timely response. Additionally, the complaint about dispute settlements sent to credit bureaus by Nelnet, Inc. was marked as "Yes" for timely response, indicating it was handled promptly, but the other complaints clearly indicate delays. Therefore, the answer is that some complaints did not get handled in a timely manner.'

In [36]:
parent_document_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans for various reasons, including financial hardship due to misrepresentations about the value of their education and job prospects, lack of proper notification or communication from loan servicers, inability to secure employment in their field, and difficulties managing repayment obligations. Additionally, some faced issues related to administrative errors or mismanagement, such as incorrect reporting of payments, failure to notify them of payment requirements, or complications arising from institutional closures and the dissolution of the Department of Education's oversight."

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [37]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [38]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [39]:
ensemble_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans, particularly student loans, appears to be dealing with your lender or servicer, including problems such as errors in loan balances, misapplied payments, wrongful denials of payment plans, bad information about loans, and issues related to loan transfer or servicing mishandling. There are also frequent complaints about discrepancies in account information, difficulties in communication, and challenges with loan repayment or forgiveness programs.\n\nTherefore, the most common issue with loans is **problems related to handling and servicing the loan, including errors, miscommunication, and mismanagement by lenders or servicers.**'

In [40]:
ensemble_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, yes, there are multiple complaints indicating that certain complaints did not get handled in a timely manner. Specifically:\n\n- Complaint ID 12709087 (row 441) about a federal student loan application not being processed and the delay exceeding the company’s own communicated timeframes. The response was marked as "No" for timely response.\n- Complaint ID 12935889 (row 418) about a student\'s account reporting late payments without proper notice, which was also marked as "No" for timely response.\n- Complaint ID 13062402 (row 66) regarding inaccurate credit report information, which was responded to as "Yes" for timeliness, but indicates delays in correcting information even after promises.\n- Multiple complaints about unresolved issues, delays in dispute investigations, and failure to respond or correct issues within legally mandated timeframes. Many are marked as "No" for being handled timely.\n\nTherefore, the answer is: Yes, some complaints did n

In [41]:
ensemble_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans for various reasons, including:\n\n- Lack of proper notification and communication from loan servicers about payment due dates, account status, or changes in servicing (e.g., reports of not being notified about start dates, transfer of loans, or payment obligations).\n- Difficulty understanding or accessing their payment options, such as income-driven repayment plans, loan forgiveness, or deferment programs.\n- Accumulation of interest during deferment or forbearance periods, which increased the total amount owed and made repayment seem unmanageable.\n- Complications caused by administrative errors, misapplied payments, incorrect account information, or poor record-keeping that led to inaccurate reporting and credit score drops.\n- Financial hardships, including unemployment, health issues, and other personal difficulties, making it hard to meet payment obligations.\n- Confusion and lack of transparency regarding loan balances, interest calculatio

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [42]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [43]:
semantic_documents = semantic_chunker.split_documents(loan_complaint_data[:20])

Let's create a new vector store.

In [44]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="Loan_Complaint_Data_Semantic_Chunks"
)

We'll use naive retrieval for this example.

In [45]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [46]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [47]:
semantic_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided complaints data, the most common issue with loans appears to be "Dealing with your lender or servicer," which includes sub-issues such as receiving bad information about your loan, trouble with how payments are being handled, and problems with payment plans. Many complaints revolve around miscommunication, inaccuracies in loan information, difficulty verifying or understanding loan status, and issues with repayment arrangements. Therefore, the most common issue seems to be problems arising from loan servicers or lenders not managing or communicating loan information properly.'

In [48]:
semantic_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, it appears that several complaints were marked as "Closed with explanation" and all responses noted "Yes" for timely responses. This suggests that these complaints were addressed within the expected timeframe.\n\nSpecifically, the complaints with the following complaint IDs indicate timely handling:\n- 13331376 (Nelnet, Inc., IN)\n- 13207537 (Maximus Federal Services, Inc., WA)\n- 13425612 (Maximus Federal Services, Inc., VA)\n- 13281034 (EdFinancial Services, NY)\n- 12962044 (Nelnet, Inc., NJ)\n- 13179688 (Nelnet, Inc., IL)\n- 13020950 (Nelnet, Inc., OH)\n- 13347464 (MOHELA, IL)\n\nHowever, despite the responses being timely, the complaints detail significant issues and unresolved disputes. There is no clear evidence from these snippets indicating that complaints were **not** handled in a timely manner. All responses to complaints were marked "Yes" for timely response, which suggests that the complaints were addressed within the expected review perio

In [49]:
semantic_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans for various reasons, including issues with communication and transparency from the loan servicers, administrative challenges, and disputes over the legitimacy or accuracy of their loan information. For example, some borrowers experienced lack of proper documentation or clarity about their loan status, which led to misunderstandings and missed payments. Others faced stalling or delays from lenders or servicers when attempting to resolve issues or provide required documentation, causing frustration and difficulty in repayment. Additionally, some borrowers reported that their loans were improperly reported as delinquent or in default due to administrative errors or unresolved disputes over loan legitimacy, which negatively impacted their credit and ability to repay.\n\nIf you're facing such challenges, it's often related to administrative hurdles, miscommunication, or legal disputes over the validity of the loans, rather than a simple inability to pa

#### ❓ Question #3:

If sentences are short and highly repetitive (e.g., FAQs), how might semantic chunking behave, and how would you adjust the algorithm?

#### Answer:

Semantic chunking might group these repetivite sentences together if they are semantically similar.

Not sure if this is a thing, but to improve this I would consider just having an LLM chunk a document directly. Ask it to repeat back the document as a list/array where the elements are chunks, and the LLM decides how to keep semantically similar content contained within chunks.

# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.

In [50]:
### YOUR CODE HERE

In [51]:
from uuid import uuid4

os.environ["LANGCHAIN_PROJECT"] = f"AIM - Assignment 09 - {uuid4().hex[0:8]}"

In [None]:
# from langchain_community.document_loaders import DirectoryLoader
# from langchain_community.document_loaders import PyMuPDFLoader


# path = "data/"
# loader = DirectoryLoader(path, glob="*.pdf", loader_cls=PyMuPDFLoader)
# docs = loader.load()

In [53]:
len(docs)

269

In [54]:
docs[0]

Document(metadata={'producer': 'GPL Ghostscript 10.00.0', 'creator': 'wkhtmltopdf 0.12.6', 'creationdate': "D:20250418120630Z00'00'", 'source': 'data/Academic_Calenders_Cost_of_Attendance_and_Packaging.pdf', 'file_path': 'data/Academic_Calenders_Cost_of_Attendance_and_Packaging.pdf', 'total_pages': 57, 'format': 'PDF 1.7', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'moddate': "D:20250418120630Z00'00'", 'trapped': '', 'modDate': "D:20250418120630Z00'00'", 'creationDate': "D:20250418120630Z00'00'", 'page': 0}, page_content='Volume 3\nAcademic Calendars, Cost of Attendance, and\nPackaging\nIntroduction\nThis volume of the Federal Student Aid (FSA) Handbook discusses the academic calendar, payment period, and\ndisbursement requirements for awarding aid under the Title IV student financial aid programs, determining a student9s\ncost of attendance, and packaging Title IV aid.\nThroughout this volume of the Handbook, the words "we," "our," and "us" refer to the United States De

In [93]:
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
from ragas.testset import TestsetGenerator

generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1-nano"))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

generator = TestsetGenerator(llm=generator_llm, embedding_model=generator_embeddings)
dataset = generator.generate_with_langchain_docs(loan_complaint_data[:20], testset_size=10)

Applying SummaryExtractor:   0%|          | 0/14 [00:00<?, ?it/s]

Applying CustomNodeFilter:   0%|          | 0/20 [00:00<?, ?it/s]

Node 9998f96c-59e2-45b8-983b-1fcfcb851675 does not have a summary. Skipping filtering.
Node 90c52aaa-7ae9-4c54-aadb-b3fbfaaf9487 does not have a summary. Skipping filtering.
Node 515cc904-301d-46b6-b419-459bcd4c01a8 does not have a summary. Skipping filtering.
Node abf8933c-687a-4b9d-9a7c-dfbdfac93396 does not have a summary. Skipping filtering.
Node e6a50468-09ce-4f86-b356-40bd69b2daca does not have a summary. Skipping filtering.
Node 636f8eaf-b73f-47f8-aedd-5d37853e2e00 does not have a summary. Skipping filtering.


Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:   0%|          | 0/54 [00:00<?, ?it/s]

Applying OverlapScoreBuilder:   0%|          | 0/1 [00:00<?, ?it/s]

Generating personas:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Scenarios:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Samples:   0%|          | 0/10 [00:00<?, ?it/s]

In [None]:
# from langsmith import Client

# client = Client()

# dataset_name = f"Loan Synthetic Data - Assignment 09 - RAGAS - {uuid4().hex[0:8]}"

# langsmith_dataset = client.create_dataset(
#     dataset_name=dataset_name,
#     description="Loan Synthetic Data - Assignment 09 - RAGAS"
# )

# for idx, row in dataset.to_pandas().iterrows():
#   client.create_example(
#       inputs={
#           "question": row["user_input"]
#       },
#       outputs={
#           "answer": row["reference"],
#           "context": row["reference_contexts"]
#       },
#       dataset_id=langsmith_dataset.id
#   )


In [94]:
invocables = {
    "naive": naive_retrieval_chain,
    "bm25": bm25_retrieval_chain, 
    "contextual_compression": contextual_compression_retrieval_chain,
    "multi_query": multi_query_retrieval_chain,
    "parent_document": parent_document_retrieval_chain, 
    "ensemble": ensemble_retrieval_chain, 
    "semantic": semantic_retrieval_chain, 
}

In [95]:
from ragas import EvaluationDataset
from ragas import evaluate
from ragas.llms import LangchainLLMWrapper
from ragas.metrics import LLMContextRecall, Faithfulness, FactualCorrectness, ResponseRelevancy, ContextEntityRecall, NoiseSensitivity
from ragas import evaluate, RunConfig
import copy

def run_eval_ragas(invocable, dataset):

  dataset_this = copy.deepcopy(dataset)

  for test_row in dataset_this:
    response = invocable.invoke({"question" : test_row.eval_sample.user_input})
    test_row.eval_sample.response = response["response"].content
    test_row.eval_sample.retrieved_contexts = [context.page_content for context in response["context"]]

  evaluation_dataset = EvaluationDataset.from_pandas(dataset_this.to_pandas())
  evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1-mini"))

  custom_run_config = RunConfig(timeout=720)
  result = evaluate(
      dataset=evaluation_dataset,
      metrics=[LLMContextRecall(), Faithfulness(), FactualCorrectness(), ResponseRelevancy(), ContextEntityRecall()],
      llm=evaluator_llm,
      run_config=custom_run_config
  )
  return result

In [88]:
from concurrent.futures import ThreadPoolExecutor

results = []
with ThreadPoolExecutor() as executor:
    futures = [
        executor.submit(run_eval_ragas, invocable, dataset)
        for invocable_name, invocable in invocables.items()
    ]
    results = [f.result() for f in futures]

results_d = {invocable_name: result for invocable_name, result in zip(invocables.keys(), results)}

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

Exception raised in Job[33]: APIConnectionError(Connection error.)


Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

Exception raised in Job[17]: APIConnectionError(Connection error.)
Exception raised in Job[29]: APIConnectionError(Connection error.)


Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

Exception raised in Job[32]: APIConnectionError(Connection error.)
Exception raised in Job[36]: APIConnectionError(Connection error.)
Exception raised in Job[49]: APIConnectionError(Connection error.)
Exception raised in Job[28]: APIConnectionError(Connection error.)
Exception raised in Job[50]: APIConnectionError(Connection error.)
Exception raised in Job[21]: APIConnectionError(Connection error.)
Exception raised in Job[42]: APIConnectionError(Connection error.)
Exception raised in Job[50]: APIConnectionError(Connection error.)
Exception raised in Job[22]: APIConnectionError(Connection error.)
Exception raised in Job[40]: APIConnectionError(Connection error.)
Exception raised in Job[31]: APIConnectionError(Connection error.)
Exception raised in Job[49]: APIConnectionError(Connection error.)
Exception raised in Job[2]: APIConnectionError(Connection error.)
Exception raised in Job[56]: APIConnectionError(Connection error.)
Exception raised in Job[57]: APIConnectionError(Connection erro

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

Exception raised in Job[2]: APIConnectionError(Connection error.)
Exception raised in Job[47]: APIConnectionError(Connection error.)
Exception raised in Job[6]: APIConnectionError(Connection error.)
Exception raised in Job[27]: APIConnectionError(Connection error.)
Exception raised in Job[37]: APIConnectionError(Connection error.)
Exception raised in Job[48]: APIConnectionError(Connection error.)
Exception raised in Job[31]: InternalServerError(upstream connect error or disconnect/reset before headers. reset reason: connection timeout)


Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

Exception raised in Job[49]: LLMDidNotFinishException(The LLM generation was not completed. Please increase try increasing the max_tokens and try again.)
Exception raised in Job[54]: LLMDidNotFinishException(The LLM generation was not completed. Please increase try increasing the max_tokens and try again.)
Exception raised in Job[34]: LLMDidNotFinishException(The LLM generation was not completed. Please increase try increasing the max_tokens and try again.)
Exception raised in Job[54]: TimeoutError()
Exception raised in Job[44]: TimeoutError()
Exception raised in Job[19]: TimeoutError()
Exception raised in Job[29]: TimeoutError()
Exception raised in Job[34]: TimeoutError()
Exception raised in Job[39]: TimeoutError()
Exception raised in Job[44]: TimeoutError()
Exception raised in Job[4]: TimeoutError()
Exception raised in Job[29]: TimeoutError()
Exception raised in Job[59]: TimeoutError()
Exception raised in Job[44]: TimeoutError()
Exception raised in Job[54]: TimeoutError()


In [91]:
import pandas as pd

score_dfs = []
for name, result in results_d.items():
    score_df = pd.DataFrame(result.scores)
    score_df['experiment_name'] = name
    score_dfs.append(score_df)

score_df = pd.concat(score_dfs)
score_df

Unnamed: 0,context_recall,faithfulness,factual_correctness,answer_relevancy,context_entity_recall,experiment_name
0,0.0,1.000000,0.53,0.962242,0.000000,naive
1,0.0,0.000000,0.18,0.954630,0.000000,naive
2,0.0,0.500000,0.17,0.000000,0.000000,naive
3,0.0,0.500000,,0.000000,0.333333,naive
4,0.0,0.000000,0.24,0.961018,0.000000,naive
...,...,...,...,...,...,...
7,0.0,0.000000,0.75,0.000000,0.000000,semantic
8,0.0,0.000000,0.48,0.000000,,semantic
9,0.0,0.058824,0.55,0.944241,0.000000,semantic
10,0.0,0.000000,0.73,0.000000,0.000000,semantic


In [92]:
score_df.groupby('experiment_name').mean()

Unnamed: 0_level_0,context_recall,faithfulness,factual_correctness,answer_relevancy,context_entity_recall
experiment_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
bm25,0.0,0.354061,0.3625,0.596466,0.030303
contextual_compression,0.020833,0.355664,0.457,0.784635,0.038194
ensemble,0.166667,0.449142,0.334545,0.436101,0.025
multi_query,0.104167,0.379238,0.3725,0.785341,0.047619
naive,0.0,0.33674,0.38,0.711633,0.037037
parent_document,0.0,0.465657,0.314444,0.542614,0.030303
semantic,0.0,0.233155,0.473636,0.173299,0.0
