# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

We'll also provide our OpenAI key, as well as our Cohere API key.

In [1]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

In [2]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

## Task 2: Data Collection and Preparation

We'll be using our Loan Data once again - this time the strutured data available through the CSV!

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

In [3]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

loader = CSVLoader(
    file_path=f"./data/complaints.csv",
    metadata_columns=[
      "Date received", 
      "Product", 
      "Sub-product", 
      "Issue", 
      "Sub-issue", 
      "Consumer complaint narrative", 
      "Company public response", 
      "Company", 
      "State", 
      "ZIP code", 
      "Tags", 
      "Consumer consent provided?", 
      "Submitted via", 
      "Date sent to company", 
      "Company response to consumer", 
      "Timely response?", 
      "Consumer disputed?", 
      "Complaint ID"
    ]
)

loan_complaint_data = loader.load()

for doc in loan_complaint_data:
    doc.page_content = doc.metadata["Consumer complaint narrative"]

Let's look at an example document to see if everything worked as expected!

In [4]:
loan_complaint_data[0]

Document(metadata={'source': './data/complaints.csv', 'row': 0, 'Date received': '03/27/25', 'Product': 'Student loan', 'Sub-product': 'Federal student loan servicing', 'Issue': 'Dealing with your lender or servicer', 'Sub-issue': 'Trouble with how payments are being handled', 'Consumer complaint narrative': "The federal student loan COVID-19 forbearance program ended in XX/XX/XXXX. However, payments were not re-amortized on my federal student loans currently serviced by Nelnet until very recently. The new payment amount that is effective starting with the XX/XX/XXXX payment will nearly double my payment from {$180.00} per month to {$360.00} per month. I'm fortunate that my current financial position allows me to be able to handle the increased payment amount, but I am sure there are likely many borrowers who are not in the same position. The re-amortization should have occurred once the forbearance ended to reduce the impact to borrowers.", 'Company public response': 'None', 'Company'

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "LoanComplaints".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [5]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    loan_complaint_data,
    embeddings,
    location=":memory:",
    collection_name="LoanComplaints"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [6]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [7]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [8]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-nano")

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [9]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [10]:
naive_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided complaints, one of the most common issues with loans, specifically student loans in this context, appears to be related to mismanagement and errors in handling the loans. This includes problems such as:\n\n- Errors in loan balances and incorrect reporting.\n- Unjustified increases in interest rates and balances.\n- Unauthorized or unknown transfers of loans without notification.\n- Bad or misleading information about loan repayment and status.\n- Issues with payments being misapplied or blocked from applying towards principal.\n- Discrepancies and confusion caused by multiple loan servicers and lack of transparency.\n- Problems with loan forgiveness, discharge, or long-term forbearance that lead to continued debt growth.\n\nOverall, mismanagement, inaccurate information, lack of transparency, and servicing errors seem to be the most common and recurring problems documented in these complaints.'

In [11]:
naive_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, yes, some complaints did not get handled in a timely manner. Specifically, there are multiple complaints where the response status indicates they were closed with explanation, and the designated response time was marked as "No" (not timely). For example, the complaint from 03/28/25 submitted to MOHELA was marked as "Timely response?": No, suggesting it was not handled within the expected time frame. \n\nAdditionally, some complaints reflect ongoing issues and delays in resolution, such as the complaint about an unprocessed loan application that has been pending for over a year, and other complaints involving delays in responses or unresolved issues over extended periods.\n\nTherefore, the answer is: Yes, there were complaints that were not handled in a timely manner.'

In [12]:
naive_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'Based on the provided information, people failed to pay back their loans primarily because of several interconnected issues:\n\n1. **Accumulation of Interest During Forbearance or Deferment:** Many borrowers were offered options like forbearance or deferment, but interest continued to accrue during these periods, increasing the total amount owed and making repayment more difficult once payments resumed.\n\n2. **Lack of Clear Communication and Notification:** Several complaints highlight that borrowers were not adequately informed about when their repayment obligations would restart or about transfers between loan servicers. This lack of communication led to unexpected delinquencies and damage to credit scores.\n\n3. **Inability to Afford Payments:** Borrowers often found increasing their monthly payments to be unaffordable due to the expense of daily necessities, stagnant wages, or financial hardships, which extended the repayment period and increased total debt.\n\n4. **Mismanagement

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [13]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(loan_complaint_data, )

We'll construct the same chain - only changing the retriever.

In [14]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [15]:
bm25_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans appears to be problems related to dealing with lenders or servicers. Multiple complaints mention difficulties in obtaining accurate information, issues with payment application, and disputes over loan details, indicating that many borrowers face challenges in effectively managing their loan accounts and getting clear, truthful responses from their loan servicers.'

In [16]:
bm25_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, all the complaints reviewed indicate that the companies responded in a timely manner. The responses to complaints from EdFinancial Services and Maximus Federal Services were both marked as "Timely response? : Yes." Therefore, there is no evidence in the provided data to suggest that any complaints were not handled in a timely manner.'

In [17]:
bm25_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People fail to pay back their loans for various reasons, including issues with their payment plans, miscommunication or lack of communication from the loan servicers, errors or mismanagement by the servicers, and inability to get approved for deferments or forbearances. For example, some borrowers experience trouble with their payment plans or have their payments reversed due to errors, while others are not properly informed about changes or issues with their accounts, leading to missed payments or increased debt. Additionally, some borrowers report that servicers do not respond to their requests for deferment or forbearance, causing them to continue receiving bills despite their financial hardship. Overall, systemic issues such as poor communication, administrative errors, and problematic handling of payment plans contribute to borrowers' failure to repay their loans."

It's not clear that this is better or worse, if only we had a way to test this (SPOILERS: We do, the second half of the notebook will cover this)

#### ❓ Question #1:

Give an example query where BM25 is better than embeddings and justify your answer.

##### ✅ Answer:

Here's an example query where BM25 would likely perform better than embeddings:

## Example Query:
**"What are the specific fees charged by Nelnet for late payments?"**

## Why BM25 would be better:

### 1. **Exact Keyword Matching**
- BM25 excels at finding documents containing specific terms like "Nelnet", "fees", "late payments"
- It would prioritize documents that contain these exact words, even if they're not semantically similar
- Embeddings might miss documents that use different phrasing (e.g., "penalties" instead of "fees")

### 2. **Company/Entity Names**
- "Nelnet" is a specific company name that BM25 would treat as a high-value term
- Embeddings might group it with other loan servicers, diluting the relevance
- BM25's bag-of-words approach gives equal weight to proper nouns

### 3. **Technical/Specific Terminology**
- Terms like "late payments", "fees", "charges" are specific financial terms
- BM25 would find documents containing these exact phrases
- Embeddings might retrieve documents about general payment issues instead

### 4. **Sparse Information Retrieval**
- When looking for specific factual information (like fee amounts), BM25's keyword-based approach is more precise
- Embeddings might return broader, more general documents about payment problems

### 5. **Domain-Specific Vocabulary**
- Financial/loan terminology often has precise meanings
- BM25 treats each term independently, while embeddings might conflate related but distinct concepts
- For example, "late payment fees" vs "processing fees" vs "origination fees" are different concepts

## Real-world scenario:
If a user is specifically looking for Nelnet's late payment fee structure, BM25 would likely return documents that explicitly mention "Nelnet" and "late payment fees", while embeddings might return documents about general payment issues with any servicer.

This demonstrates BM25's strength in **precision** for specific, keyword-heavy queries versus embeddings' strength in **semantic understanding** for conceptual queries.

## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [18]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-v3.5")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [19]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [20]:
contextual_compression_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, a common issue with loans, particularly student loans, appears to be problems related to improper handling and misinformation by loan servicers. Specific issues include errors in loan balances, misapplied payments, wrongful denials of payment plans, inadequate communication, and mishandling of personal information, which can lead to disputes and credit reporting inaccuracies. \n\nIn general, a most common issue with loans is **mismanagement or miscommunication by servicers, leading to errors in balances, payments, and account handling.**'

In [21]:
contextual_compression_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided context, at least one complaint explicitly mentions that it was not handled in a timely manner. For example, the complaint with Complaint ID 12975634 describes a situation where the individual has been awaiting a response and resolution for over a year, with nearly 18 months having passed without resolution. Additionally, the complaint with Complaint ID 12973003 was responded to promptly, indicating a "Yes" for timely response, but the ongoing issues suggest that not all complaints are resolved promptly. \n\nOverall, the data indicates that some complaints did not get handled in a timely manner, notably the complaint about the account review and FERPA violations which remained unresolved for over a year.'

In [22]:
contextual_compression_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily due to a combination of factors such as lack of awareness about the obligation to repay, poor communication from lenders or servicers, and the accumulation of interest that made repayment more difficult over time. Specifically, some borrowers were not informed about the need to repay their loans or how interest would grow, especially when loans were transferred or handled without their knowledge. Others faced challenges because the only options offered—like forbearance or deferment—allowed interest to continue accruing, which increased the total amount owed and extended the repayment period. Additionally, financial hardships, stagnant wages, and the misconception that they would qualify for forgiveness programs contributed to difficulties in repaying their loans.'

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [23]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [24]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [25]:
multi_query_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided complaints, appears to be inaccuracies and errors in loan information and account management. Specific problems include incorrect loan balances, misapplied payments, wrongful delinquencies or defaults, errors in reported account statuses, and misinformation about interest accrual and repayment history. Additionally, many complaints concern poor communication, lack of transparency, and mishandling of sensitive borrower data.'

In [26]:
multi_query_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints data, yes, there are complaints indicating that some complaints were not handled in a timely manner. Specifically, several complaints explicitly state that responses or resolutions were delayed beyond the expected time frames.\n\nFor example:\n- Complaint with ID **12668396** (received 03/26/25) from MOHELA states the response was **"No"** for timeliness, indicating they did not respond in time.\n- Complaint with ID **13062402** (received 04/18/25) from Nelnet mentions a **"Timely response?": "Yes"**, but the complaint from 03/25/25 (ID 12654977) from MOHELA indicates **"No"**, highlighting a delayed response.\n- Several complaints note delays of months or over a year before resolution or response, showing systemic issues with timely handling.\n\nGiven this evidence, the answer is:\n\n**Yes, some complaints did not get handled in a timely manner.**'

In [27]:
multi_query_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People often failed to pay back their loans due to a combination of systemic issues, mismanagement by lenders and servicers, lack of proper communication, and deceptive practices. The provided context highlights several key reasons:\n\n1. **Misleading or Bad Information from Loan Servicers:** Consumers reported receiving incorrect data about their loan balances, payment status, or repayment requirements, which led to confusion and missed payments.\n\n2. **Predatory and Deceptive Lending Practices:** Some loans were obtained under circumstances where borrowers were not fully informed of their obligations, or loans were mismanaged, leading to higher balances over time due to accumulated interest and fees.\n\n3. **Servicer misconduct and systemic failures:** There are multiple reports of servicers failing to follow regulatory guidelines, such as neglecting proper notification protocols, misapplying payments, or reporting false delinquency statuses to credit bureaus, which can cause borro

#### ❓ Question #2:

Explain how generating multiple reformulations of a user query can improve recall.

##### ✅ Answer:

Here's how generating multiple reformulations of a user query can improve recall:

## How Multiple Query Reformulations Improve Recall:

### 1. **Addressing Vocabulary Mismatch**
- **Problem**: Users and documents may use different terms for the same concept
- **Solution**: Multiple reformulations can include synonyms, related terms, and alternative phrasings
- **Example**: 
  - Original: "loan payment problems"
  - Reformulations: "payment difficulties", "repayment issues", "trouble with payments", "payment challenges"

### 2. **Capturing Different Query Intentions**
- **Problem**: A single query might not capture all possible interpretations
- **Solution**: Different reformulations can explore various aspects of the query
- **Example**:
  - Original: "student loan issues"
  - Reformulations: 
    - "problems with student loans"
    - "student loan complaints"
    - "difficulties with education financing"
    - "issues with federal student aid"

### 3. **Handling Query Ambiguity**
- **Problem**: Queries can be ambiguous or underspecified
- **Solution**: Multiple reformulations can disambiguate and explore different meanings
- **Example**:
  - Original: "loan problems"
  - Reformulations:
    - "problems with loan payments"
    - "problems with loan servicers"
    - "problems with loan applications"
    - "problems with loan forgiveness"

### 4. **Expanding Semantic Coverage**
- **Problem**: Documents might be relevant but use different semantic expressions
- **Solution**: Reformulations can cover broader semantic space
- **Example**:
  - Original: "late payment fees"
  - Reformulations:
    - "penalties for missed payments"
    - "charges for overdue amounts"
    - "fees for delayed payments"
    - "consequences of not paying on time"

### 5. **Overcoming Embedding Limitations**
- **Problem**: Embeddings might miss relevant documents due to semantic drift
- **Solution**: Multiple queries increase the chance of finding relevant documents
- **Mechanism**: Each reformulation creates a different vector, potentially retrieving different document sets

## Technical Process:

1. **Query Generation**: LLM creates multiple reformulations of the original query
2. **Parallel Retrieval**: Each reformulation is used to retrieve documents independently
3. **Deduplication**: Remove duplicate documents across all retrieval results
4. **Ranking**: Combine and rank all unique documents using fusion algorithms (like Reciprocal Rank Fusion)

## Benefits:

- **Higher Recall**: More relevant documents are found across different query formulations
- **Better Coverage**: Documents that might be missed by a single query are captured
- **Robustness**: Less dependent on the specific wording of the original query
- **Comprehensive Results**: Provides a more complete picture of available relevant information

This approach essentially "casts a wider net" by exploring the query space from multiple angles, significantly improving the chances of finding all relevant documents.

## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [28]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = loan_complaint_data
child_splitter = RecursiveCharacterTextSplitter(chunk_size=750)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [30]:
from langchain_qdrant import QdrantVectorStore

client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = QdrantVectorStore(
    collection_name="full_documents", embedding=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [31]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [32]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [33]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [34]:
parent_document_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the context provided, appears to be related to errors and misconduct in loan servicing and reporting. Specifically, these issues include incorrect information on credit reports, misapplied payments, wrongful denials of payment plans, discrepancies in loan balances and interest rates, and problems arising from loan transfers and sale of loans, which can lead to confusion and unfair practices. Additionally, there are concerns about illegal credit reporting and failure to verify the legitimacy of debts, especially in the context of changes in loan management and government agency dissolution.\n\nIf you are asking about the most common problem in general, based on this context, it would be issues related to mismanagement, inaccuracies in reporting, and servicing misconduct.'

In [35]:
parent_document_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, it appears that at least some complaints were not handled in a timely manner. Specifically, the complaint with ID 12709087 from MOHELA dated 03/28/25 and the complaint ID 12935889 from MOHELA dated 04/11/25 both indicate "No" under the "Timely response?" field, suggesting these issues were not addressed promptly. Additionally, multiple complaints mention excessive wait times and lack of communication, reinforcing that some complaints did not receive timely attention.'

In [36]:
parent_document_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily due to financial hardship, lack of proper information, and issues with loan management. For example, some borrowers experienced severe financial difficulties after graduation, making it difficult to make timely payments. Others faced challenges related to misrepresentation by educational institutions about the value of their degrees and the financial obligations involved, which led to long-term financial consequences and unaffordable debt burdens. Additionally, there were issues with loan servicing, such as being unable to get clear communication or proper notice about payment requirements, loan buyouts, or account management problems. These combined factors contributed to their inability to repay their loans successfully.'

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [37]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [38]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [39]:
ensemble_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided complaints data, a most common issue with loans appears to be related to "Dealing with your lender or servicer," specifically problems such as:\n\n- Receiving bad information about the loan (e.g., incorrect balances, interest rates, or terms)\n- Trouble with how payments are handled, such as being unable to apply payments toward principal or payoff more quickly\n- Disputes over loan transfer or reassignment without proper notification or consent\n- Lack of documentation or verification of loan validity, including missing signed promissory notes\n- Problems with loan consolidation processes, including inadequate disclosure, unexpected payment amounts, or inadequate communication\n- Errors leading to negative impacts like credit score drops, inaccurate reporting, or delinquency status\n\nThe majority of complaints focus on poor communication, errors, or improper handling by loan servicers and agencies, which often result in borrower confusion, financial hardship, o

In [40]:
ensemble_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, it appears that some complaints were handled in a timely manner with responses marked as "Yes," but there are several complaints indicating delays or failures to handle issues promptly. Specifically:\n\n- Complaint ID 12709087 (EdFinancial Services) was marked as "Timely response? Yes."\n- Complaint ID 13056764 (EdFinancial Services) was marked as "Timely response? Yes."\n- Complaint ID 12935889 (MOHELA) was marked as "Timely response? No," indicating it was not handled in a timely manner.\n- Complaint ID 13283043 (EdFinancial Services) was marked as "Timely response? Yes."\n- Complaint ID 12739706 (MOHELA) was marked as "Timely response? No."\n- Several other complaints, such as 125... and 13365901, show responses as "Yes," but some have very long waiting times or unresolved issues.\n\nIn summary, while some complaints were addressed promptly, others, notably complaints about account access, overdue notices, or improper reporting, indicate delays or 

In [41]:
ensemble_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily due to a combination of factors, including:\n\n1. **Accumulating interest due to forbearance and deferment options**: Many borrowers were offered limited options like forbearance or deferment, which allowed interest to continue accruing, making the total debt grow faster than they could pay it off.\n\n2. **Lack of clear communication and transparency**: Borrowers often were not properly informed about their repayment obligations, interest calculations, or changes in loan transfer statuses. Some were unaware when their loans were transferred between servicers or when payments were expected to resume.\n\n3. **Financial hardships and unmet income-based solutions**: Borrowers faced financial difficulties, including unemployment, medical issues, or low income, and were not always offered or able to access income-driven repayment plans or loan forgiveness programs, prolonging their debt and making repayment unrealistic.\n\n4. **Unfavorable loa

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [42]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [43]:
semantic_documents = semantic_chunker.split_documents(loan_complaint_data[:20])

Let's create a new vector store.

In [44]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="Loan_Complaint_Data_Semantic_Chunks"
)

We'll use naive retrieval for this example.

In [45]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [46]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [47]:
semantic_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided complaints, the most common issue with loans appears to be problems related to poor communication, inaccurate or delayed information, and mishandling of repayment plans. Specific recurring issues include:\n\n- Struggling to repay or problematic payment plans\n- Errors or discrepancies in loan reporting and account status\n- Difficulties with auto-debit setup and payments not processing\n- Lack of transparency, delays, or confusion regarding loan account details and servicer changes\n- Unauthorized or illegal reporting and breach of privacy\n\nWhile these issues vary, a common theme is that many borrowers face frustrations due to mismanagement, poor communication, and errors in servicing or reporting.'

In [48]:
semantic_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Yes, according to the provided complaints, several complaints were handled in a timely manner, with responses marked as "Yes" for timeliness. However, there is at least one complaint (Complaint ID: 13331376) where the complaint was closed with an explanation, indicating that the issue was addressed or at least responded to by the company. \n\nBased on this data, it appears that complaints generally did not go unhandled in a timely manner, as responses were received within the expected timeframes. But without explicit information on all complaints that may not have been handled timely, I cannot definitively say that no complaints were left unhandled or significantly delayed. \n\nTherefore, the answer is:  \n**From the provided information, all complaints that included response data seem to have been handled in a timely manner.**  \nIf you need a definitive answer across all complaints, I must clarify that the data suggests timely handling for the complaints mentioned.'

In [49]:
semantic_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for various reasons, including issues with loan management and communication, technical problems, and disputes over the legitimacy or status of their loans. For example, some borrowers experienced difficulties due to lack of clarity about their repayment status, such as being incorrectly placed in default or delinquency despite never being in default. Others faced problems with loan servicing, such as missing payments, re-amortization issues after forbearance, or inaccurate reporting that negatively impacted their credit scores. Additionally, some borrowers encountered obstacles related to documentation and verification processes, which stalled their efforts to qualify for loan forgiveness or discharge. In some cases, disputes arose over the legality or legitimacy of the loans themselves, especially following administrative changes or breaches of privacy laws. Overall, these issues often stem from poor communication, errors in loan processing, or 

#### ❓ Question #3:

If sentences are short and highly repetitive (e.g., FAQs), how might semantic chunking behave, and how would you adjust the algorithm?

##### ✅ Answer:

Here's how semantic chunking would behave with short, highly repetitive sentences (like FAQs) and how to adjust the algorithm:

### 1. **Over-chunking Problem**
- **Issue**: Short, repetitive sentences will have very similar embeddings
- **Result**: The algorithm might create too many tiny chunks or fail to find meaningful breakpoints
- **Example**: FAQ sentences like "What is X?" "How do I Y?" "Where can I find Z?" would all have similar semantic vectors

### 2. **Poor Semantic Differentiation**
- **Issue**: Repetitive content lacks semantic diversity
- **Result**: The similarity thresholding methods (percentile, standard deviation, etc.) won't find clear breakpoints
- **Problem**: All sentences appear equally similar, so no natural chunking boundaries are detected

### 3. **Ineffective Thresholding**
- **Issue**: Methods like percentile or standard deviation assume semantic variation
- **Result**: With repetitive content, these methods may create arbitrary or inconsistent chunks
- **Example**: If all sentence similarities are 0.85-0.90, percentile-based chunking becomes unreliable

## How to Adjust the Algorithm:

### 1. **Use Structural Chunking Instead**
```python
# Instead of semantic chunking, use rule-based chunking
from langchain_text_splitters import RecursiveCharacterTextSplitter

structural_chunker = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    separators=["\n\n", "\n", ". ", " ", ""]
)
```

### 2. **Adjust Threshold Parameters**
```python
# Use more aggressive thresholding for repetitive content
semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile",
    percentile_threshold=95  # Higher threshold for more aggressive chunking
)
```

### 3. **Pre-filter Repetitive Content**
```python
# Remove or consolidate repetitive sentences before chunking
def deduplicate_sentences(documents):
    seen_content = set()
    filtered_docs = []
    for doc in documents:
        if doc.page_content not in seen_content:
            seen_content.add(doc.page_content)
            filtered_docs.append(doc)
    return filtered_docs
```

### 4. **Use Hybrid Approach**
```python
# Combine semantic and structural chunking
def hybrid_chunking(documents):
    # First, use structural chunking for repetitive sections
    structural_chunks = structural_chunker.split_documents(documents)
    
    # Then, apply semantic chunking only to non-repetitive sections
    semantic_chunks = []
    for chunk in structural_chunks:
        if has_semantic_variation(chunk):
            semantic_chunks.extend(semantic_chunker.split_documents([chunk]))
        else:
            semantic_chunks.append(chunk)
    
    return semantic_chunks
```

### 5. **Adjust Embedding Strategy**
```python
# Use domain-specific embeddings or fine-tuned models
from sentence_transformers import SentenceTransformer

# Use a model fine-tuned for your specific domain
domain_embeddings = SentenceTransformer('domain-specific-model')
```

### 6. **Implement Content-Aware Chunking**
```python
def content_aware_chunking(documents):
    chunks = []
    for doc in documents:
        # Check if content is repetitive
        if is_repetitive_content(doc.page_content):
            # Use larger chunks for repetitive content
            chunks.extend(structural_chunker.split_documents([doc]))
        else:
            # Use semantic chunking for varied content
            chunks.extend(semantic_chunker.split_documents([doc]))
    return chunks
```

## Best Practices for Repetitive Content:

1. **Analyze content first**: Determine if semantic chunking is appropriate
2. **Use structural chunking**: For FAQs and repetitive content, rule-based chunking is often better
3. **Combine approaches**: Use semantic chunking only where it adds value
4. **Adjust thresholds**: Use higher thresholds for repetitive content
5. **Consider domain-specific solutions**: FAQ content might benefit from question-answer pair chunking

## Alternative for FAQs:
```python
# For FAQ content, chunk by Q&A pairs instead
def faq_chunking(documents):
    chunks = []
    for doc in documents:
        # Split by question-answer patterns
        qa_pairs = extract_qa_pairs(doc.page_content)
        for qa in qa_pairs:
            chunks.append(Document(page_content=qa, metadata=doc.metadata))
    return chunks
```

The key insight is that semantic chunking works best with semantically diverse content. For repetitive content like FAQs, traditional structural chunking or domain-specific approaches are often more effective.

# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.

In [None]:
### YOUR CODE HERE