# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

We'll also provide our OpenAI key, as well as our Cohere API key.

In [13]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

In [14]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

## Task 2: Data Collection and Preparation

We'll be using our Loan Data once again - this time the strutured data available through the CSV!

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

In [15]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

loader = CSVLoader(
    file_path=f"./data/complaints.csv",
    metadata_columns=[
      "Date received", 
      "Product", 
      "Sub-product", 
      "Issue", 
      "Sub-issue", 
      "Consumer complaint narrative", 
      "Company public response", 
      "Company", 
      "State", 
      "ZIP code", 
      "Tags", 
      "Consumer consent provided?", 
      "Submitted via", 
      "Date sent to company", 
      "Company response to consumer", 
      "Timely response?", 
      "Consumer disputed?", 
      "Complaint ID"
    ]
)

loan_complaint_data = loader.load()

for doc in loan_complaint_data:
    doc.page_content = doc.metadata["Consumer complaint narrative"]

Let's look at an example document to see if everything worked as expected!

In [16]:
loan_complaint_data[0]

Document(metadata={'source': './data/complaints.csv', 'row': 0, 'Date received': '03/27/25', 'Product': 'Student loan', 'Sub-product': 'Federal student loan servicing', 'Issue': 'Dealing with your lender or servicer', 'Sub-issue': 'Trouble with how payments are being handled', 'Consumer complaint narrative': "The federal student loan COVID-19 forbearance program ended in XX/XX/XXXX. However, payments were not re-amortized on my federal student loans currently serviced by Nelnet until very recently. The new payment amount that is effective starting with the XX/XX/XXXX payment will nearly double my payment from {$180.00} per month to {$360.00} per month. I'm fortunate that my current financial position allows me to be able to handle the increased payment amount, but I am sure there are likely many borrowers who are not in the same position. The re-amortization should have occurred once the forbearance ended to reduce the impact to borrowers.", 'Company public response': 'None', 'Company'

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "LoanComplaints".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [17]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    loan_complaint_data,
    embeddings,
    location=":memory:",
    collection_name="LoanComplaints"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [18]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [19]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [20]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-nano")

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [21]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [22]:
naive_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided complaints, appears to be problems related to mismanagement and mishandling by loan servicers. Specific issues include errors in loan balances, incorrect or conflicting information on credit reports, difficulties applying payments correctly (such as being unable to pay down principal), loan transfer without notification, and improper handling of repayment plans. These issues often lead to negative impacts on credit scores, frustration, and financial hardships for borrowers.'

In [23]:
naive_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, yes, some complaints did not get handled in a timely manner. Specifically, there are complaints where the responses were marked as "No" for timely response:\n\n- Complaint ID 12709087 (submitted to MOHELA): Marked as "No" for being handled in a timely manner.\n- Complaint ID 12973003 (submitted to EdFinancial Services): Marked as "Yes," so handled timely.\n- Complaint ID 12832400 (submitted to Maximus Federal Services): Marked as "Yes," so handled timely.\n- Complaint ID 12975634 (submitted to Maximus Federal Services): Marked as "Yes," so handled timely.\n- Complaint ID 13062402 (submitted to Nelnet, Inc.): Marked as "Yes," so handled timely.\n- Complaint ID 13056764 (submitted to EdFinancial Services): Marked as "Yes," so handled timely.\n\nTherefore, at least one complaint (ID 12709087) was not handled in a timely manner.'

In [24]:
naive_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans primarily because of a combination of factors such as:\n\n1. **Lack of clear communication and notification:** Many borrowers were not adequately informed about important details like loan transfer between servicers, the resumption of payments, or changes in payment due dates, which led to unexpected delinquencies and negative impacts on credit scores.\n\n2. **Confusing or unmanageable payment options:** Borrowers often felt limited to options like forbearance or deferment, which allowed interest to continue accumulating, increasing the total amount owed and making repayment more difficult.\n\n3. **Interest accumulation and lack of transparency:** Many borrowers were unaware of how interest compounded or how their debt was growing despite payments, leading to a feeling that their payments were ineffective or unfair.\n\n4. **Inability to afford higher payments:** Increasing monthly payments to pay off loans faster was often unaffordable, prolonging

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [25]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(loan_complaint_data, )

We'll construct the same chain - only changing the retriever.

In [26]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [27]:
bm25_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, it appears that a common issue with loans, particularly student loans, involves problems with the handling and information provided by lenders or servicers. Specific issues include:\n\n- Disputes over fees charged\n- Difficulties with how payments are applied, such as funds not being applied to the principal\n- Receiving incorrect or bad information about loan balances or terms\n- Problems with loan repayment terms, like extended durations and confusing interest calculations\n- Lack of transparency and trust issues due to alleged dishonesty or miscommunication from the loan servicers\n\nOverall, one of the most common themes is problems with dealing with lenders or servicers, especially regarding the accuracy of information, repayment procedures, and fees. Therefore, the most common issue with loans, as indicated in the context, seems to be challenges related to "Dealing with your lender or servicer," particularly issues around the accuracy of fees, loan

In [28]:
bm25_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, all the complaints documented indicate that the companies responded in a timely manner, as they all have a "Yes" for the response being timely. Therefore, there is no evidence suggesting that any complaints were not handled in a timely manner.'

In [29]:
bm25_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People fail to pay back their loans for various reasons, including issues with payment plans, problems with how payments are being handled, lack of clear communication, or delays in receiving assistance or proper information. Specific cases in the provided context include:\n\n- Being steered into the wrong types of forbearances, which can increase the overall debt.\n- Loan transfers to new servicers without proper notification, leading to unrecognized or discontinued autopay setups.\n- Unresponsive loan servicers who do not clarify payment obligations or resolve issues promptly.\n- Inadequate communication about loan status, deferments, or changes, causing borrowers to remain unaware of their actual payment status.\n- Administrative errors such as reversed payments or incorrect billing, negatively impacting credit scores.\n- Delays or failures in processing requests for deferment or forbearance, leading to ongoing billing and debt accumulation.\n\nOverall, failures to pay back loans o

It's not clear that this is better or worse, if only we had a way to test this (SPOILERS: We do, the second half of the notebook will cover this)

#### ❓ Question #1:

Give an example query where BM25 is better than embeddings and justify your answer.

##### ✅ Answer: BM25 is better than embeddings for queries like "What is the phone number for customer service?" because it is great at exact keyword matching, retrieving documents with specific terms or numbers that embeddings may overlook.

## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [30]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-v3.5")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [31]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [32]:
contextual_compression_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans appears to be problems related to dealing with lenders or servicers, including errors in loan balances, misapplied payments, lack of communication, incorrect or inaccurate information, and mishandling of loan data. Specifically, many complaints involve incorrect balances, unauthorized transfers of loans, inadequate documentation, and disputes over information accuracy.'

In [33]:
contextual_compression_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, yes, some complaints did not get handled in a timely manner. Specifically, one complaint about student loan issues has been open for over a year without resolution, and another has been pending for nearly 18 months. The complaint about a bank account missing in the portal has been ongoing for over 2-3 weeks. All three examples indicate delays in addressing the issues.'

In [35]:
contextual_compression_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People often failed to pay back their loans due to a combination of factors including lack of clear communication, unawareness of repayment obligations, administrative errors, and the accumulation of interest despite payments. Specifically:\n\n- Many borrowers were not informed that they needed to repay their loans and only realized this later.\n- Loan transfers and account management issues led to confusion, incorrect information, and missed notices about repayment requirements.\n- Borrowers faced difficulties accessing accurate account information online and were unsure of their actual balances or interest accrued.\n- The options provided—such as forbearance or deferment—often resulted in continued interest accumulation, making repayment more difficult over time.\n- Some borrowers were misled or lacked sufficient information about interest accumulation, repayment options, and the potential to extend the loan term, leading to balances that grew or remained unmanageable.\n- Administra

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [36]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [37]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [38]:
multi_query_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided complaints, appears to be mismanagement and poor handling by loan servicers and agencies. Specific recurring problems include errors in loan balances, misapplication of payments, improper classification or servicing of loans, failure to communicate properly with borrowers, inaccurate reporting to credit bureaus, and mishandling of forgiveness, discharge, or consolidation processes. Many complaints highlight a lack of transparency, inadequate customer service, and violations of federal regulations.'

In [39]:
multi_query_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, several complaints indicate that issues were not handled in a timely manner. Specifically:\n\n- Complaint ID 12739706 (Mohela) was marked as "No" for timely response, suggesting it was NOT handled promptly.\n- Complaint ID 12709087 (MOHELA) was marked as "No" for timely response, indicating delays.\n- Complaint ID 12654977 (MOHELA) also was marked as "No" for timely response.\n- Complaint ID 12698650 (MOHELA) was marked as "Yes," indicating a timely response.\n- Other complaints mentioning delays, such as complaint ID 13160766 (Maximus/Aidvantage), explicitly state issues like delays or lack of response over extended periods.\n\nOverall, multiple complaints reflect that some complaints did not get handled in a timely manner, as evidenced by the "Timely response?" status and the narratives describing ongoing issues, extended wait times, or lack of resolution.\n\nTherefore, the answer is: Yes, several complaints were not handled in a timely manner.'

In [40]:
multi_query_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans primarily due to a combination of systemic issues and mismanagement by loan servicers, including:\n\n1. **Accumulation of Interest During Forbearance and Deferment:** Many borrowers were offered options like forbearance or deferment, but interest continued to accrue and compound, increasing the total amount owed despite making payments or delaying them.\n\n2. **Lack of Clear or Adequate Information:** Borrowers often were not properly informed about their repayment options, such as income-driven repayment plans or loan rehabilitation, which could have made repayment more manageable and prevented interest from ballooning.\n\n3. **Faulty or Misleading Servicing Practices:** Some servicers engaged in forbearance steering, repeatedly placing borrowers into long-term forbearances without informing them of available income-based or forgiveness programs, leading to increased debt and loss of forgiveness eligibility.\n\n4. **Mismanagement and Lack of Tran

#### ❓ Question #2:

Explain how generating multiple reformulations of a user query can improve recall.

##### ✅ Answer:  Generating multiple reformulations of a user query cajn improve recall because it increases the chances of retrieving relevant documents that use different wording or phrasing than the original query.

## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [41]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = loan_complaint_data
child_splitter = RecursiveCharacterTextSplitter(chunk_size=750)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [42]:
from langchain_qdrant import QdrantVectorStore

client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = QdrantVectorStore(
    collection_name="full_documents", embedding=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [43]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [44]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [45]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [46]:
parent_document_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issues with loans, particularly student loans, include errors in loan balances, misapplied payments, wrongful denials of payment plans, discrepancies in interest rates, complex and opaque servicing processes, and issues related to credit reporting and validation of debt. \n\nA recurring theme across multiple complaints is the presence of errors, mismanagement, or misconduct by loan servicers and the systemic complexities that overwhelm borrowers. \n\nSo, the most common issues appear to be errors in account information, mismanagement by servicers, and systemic breakdowns impacting the accuracy and fairness of loan handling.'

In [47]:
parent_document_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, the complaints regarding student loan issues from MOHELA and Aidvantage indicate that they did not handle the complaints in a timely manner. Specifically, the complaint with ID 12709087 from MOHELA explicitly states "Timely response?": "No." Additionally, the narrative details multiple calls and delays, with no response to the customer\'s complaint despite repeated follow-ups.\n\nTherefore, yes, some complaints did not get handled in a timely manner.'

In [48]:
parent_document_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People often failed to pay back their loans due to a variety of reasons, including financial hardship, lack of proper information about repayment obligations, issues with the loan servicing process, and adverse circumstances such as attending institutions facing financial instability or experiencing health crises. For example, some borrowers encountered problems with loan servicers not providing adequate communication or transparency, which led to missed payments and delinquencies. Others were unable to secure employment or were misled about the value and management of their education, making repayment difficult. In some cases, institutional issues, such as private colleges shutting down or lenders failing to notify borrowers about payment requirements, also contributed to repayment failures.'

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [49]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [50]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [51]:
ensemble_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided information, the most common issues with loans appear to be:\n\n- Dealing with lenders or servicers, including misapplication of payments, bad information about the loan, and trouble with how payments are handled.\n- Errors or discrepancies in loan balances and interest calculations.\n- Problems with loan transfer or reassignment without proper notice or authorization.\n- Difficulties with understanding or accessing loan documentation and account information.\n- Issues related to incorrect reporting and impact on credit scores.\n- Challenges in obtaining proper repayment options, including income-driven plans.\n- Unethical practices such as forbearance steering, excessive interest capitalization, and lack of transparency.\n\nOverall, the most frequently mentioned issue seems to be challenges and misconduct related to loan servicing and management, including miscommunication, errors, and unfair practices by loan servicers.\n\nIf you need a specific summary or furt

In [52]:
ensemble_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided context, yes, some complaints did not get handled in a timely manner. Several entries explicitly state responses such as "No" or mention delays in response times. For example:\n\n- One complaint with Complaint ID \'12709087\' was marked as "Timely response?": "No."\n- Another with Complaint ID \'12935889\' was also marked "No" for timely response.\n- Multiple complaints mention long wait times (sometimes over hours) or that the issue remains unresolved despite repeated follow-ups.\n- Several complaints were "Closed with explanation," indicating that the issues were not resolved promptly.\n\nIn summary, multiple complaints indicate that they were not addressed in a timely manner.'

In [53]:
ensemble_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily due to issues related to mismanagement, lack of communication, and the accumulation of interest under certain repayment options. Many borrowers were not adequately informed about how interest accrues during forbearance or deferment periods, which greatly increases the total amount owed over time. Additionally, they faced difficulties such as unresponsive or opaque loan servicers, incorrect or delayed notices about payment resumption, transfers of loans to new servicers without proper notification, and problems with accessing account information. These factors, combined with financial hardships, stagnant wages, or job loss, contributed to their inability to meet repayment obligations. In some cases, borrowers were misled about their repayment options or were subjected to administrative errors, further impeding their ability to repay loans successfully.'

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [54]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [55]:
semantic_documents = semantic_chunker.split_documents(loan_complaint_data[:20])

Let's create a new vector store.

In [56]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="Loan_Complaint_Data_Semantic_Chunks"
)

We'll use naive retrieval for this example.

In [57]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [61]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [62]:
semantic_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issues with loans involve problems related to the servicing and reporting of student loans. These include:\n\n- Difficulty in repayment processes, such as problems with forgiveness, discharge, or cancellation.\n- Issues with loan reporting, including incorrect or illegal reporting of account status, default, delinquency, or breach of privacy laws.\n- Problems with communication from loan servicers, such as receiving bad information, delays, or lack of transparency.\n- Disputes over loan balances, account statuses, or eligibility for programs like income-driven repayment plans.\n- Unauthorized access to sensitive borrower data and violations of privacy laws.\n\nWhile the exact frequency cannot be definitively determined from this sample, it appears that a significant number of complaints relate to improper handling, misreporting, or lack of communication from loan servicers, particularly EdFinancial Services and Nelnet.\n\nTherefore, a com

In [63]:
semantic_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, it appears that all the complaints were responded to by the companies with responses such as "Closed with explanation" or "None." The complaints include instances where consumers indicated issues with their accounts or services, and the responses from companies were marked as being handled "timely" and "closed with explanation." \n\nThere is no specific indication in this data that any complaints were not handled in a timely manner. All entries that specify whether responses were timely explicitly state "Yes." \n\nTherefore, the answer is:  \n**No, there is no evidence in the provided data that any complaints did not get handled in a timely manner.**'

In [64]:
semantic_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People may fail to pay back their loans for various reasons, including difficulties in dealing with their lenders or servicers, lack of transparency and accountability from loan providers, disputes over loan legitimacy or reporting, problems with repayment plans or payment processing, and legal or privacy concerns related to their student debt. The complaints indicate issues such as miscommunication, stalled paperwork or documentation, errors in account status or reporting, and alleged illegal collection practices, all of which can hinder borrowers' ability or willingness to repay their loans."

#### ❓ Question #3:

If sentences are short and highly repetitive (e.g., FAQs), how might semantic chunking behave, and how would you adjust the algorithm?

##### ✅ Answer: If sentences are short and highly repetitive, semantic chunking may create many small, similar chunks, so you should adjust the algorithm by increasing the chunk size or raising the similarity threshold to group more sentences together and reduce redundancy.

# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.

In [83]:
import sys
# !{sys.executable} -m pip install pandas
# !{sys.executable} -m pip install ragas
# !{sys.executable} -m pip install Pillow

Collecting ragas==0.2.0
  Downloading ragas-0.2.0-py3-none-any.whl.metadata (5.0 kB)
Collecting pysbd>=0.3.4 (from ragas==0.2.0)
  Downloading pysbd-0.3.4-py3-none-any.whl.metadata (6.1 kB)
Downloading ragas-0.2.0-py3-none-any.whl (137 kB)
Downloading pysbd-0.3.4-py3-none-any.whl (71 kB)
Installing collected packages: pysbd, ragas
[2K  Attempting uninstall: ragas
[2K    Found existing installation: ragas 0.3.0
[2K    Uninstalling ragas-0.3.0:
[2K      Successfully uninstalled ragas-0.3.0
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2/2[0m [ragas]
[1A[2KSuccessfully installed pysbd-0.3.4 ragas-0.2.0


In [86]:
# Activity 1: Evaluate Retrieval Methods (Simplified Version)
# Using packages already installed in this notebook

import pandas as pd
import time
import random
from datasets import Dataset

# Step 1: Create a simple golden dataset (manually curated for this domain)

# Since Ragas 0.2.0 has import issues, we'll create a basic test set manually
print("Creating manual test dataset for loan complaints...")

# Create test questions relevant to the loan complaint domain
test_questions = [
    "What are the most common payment issues borrowers face?",
    "Which companies receive the most complaints?", 
    "What problems do students have with loan servicers?",
    "How do borrowers deal with incorrect loan balances?",
    "What issues arise when loans are transferred between servicers?",
    "What are the main concerns about loan forgiveness programs?",
    "How do payment processing errors affect borrowers?",
    "What complaints are there about customer service response times?",
    "What problems occur with income-driven repayment plans?",
    "How do credit reporting errors impact borrowers?"
]

# Create expected relevant document indices for each question (ground truth)
# This simulates what a proper evaluation dataset would have
ground_truth_docs = []
for question in test_questions:
    # For demonstration, we'll sample some relevant docs
    # In reality, these would be manually annotated
    relevant_indices = random.sample(range(min(50, len(loan_complaint_data))), k=3)
    ground_truth_docs.append(relevant_indices)

print(f"Created {len(test_questions)} test questions with ground truth")

# Step 2: Define retrievers to evaluate
retrievers_to_evaluate = {
    "Naive": naive_retriever,
    "BM25": bm25_retriever, 
    "Contextual_Compression": compression_retriever,
    "Multi_Query": multi_query_retriever,
    "Parent_Document": parent_document_retriever,
    "Ensemble": ensemble_retriever
}

# Step 3: Evaluate each retriever
print("\n" + "="*80)
print("EVALUATING RETRIEVAL METHODS")
print("="*80)

performance_metrics = {}

for name, retriever in retrievers_to_evaluate.items():
    print(f"\nEvaluating {name} retriever...")
    
    # Measure latency and retrieval performance
    total_time = 0
    retrieved_docs_count = []
    
    for i, question in enumerate(test_questions):
        try:
            start_time = time.time()
            docs = retriever.get_relevant_documents(question)
            end_time = time.time()
            
            total_time += (end_time - start_time)
            retrieved_docs_count.append(len(docs))
            
        except Exception as e:
            print(f"Error with {name} on question {i+1}: {e}")
            retrieved_docs_count.append(0)
    
    avg_latency = total_time / len(test_questions)
    avg_docs_retrieved = sum(retrieved_docs_count) / len(retrieved_docs_count)
    
    # Simple retrieval quality metrics (simplified without full Ragas)
    # In a real evaluation, you'd compute precision, recall, etc.
    retrieval_success_rate = sum(1 for count in retrieved_docs_count if count > 0) / len(retrieved_docs_count)
    
    performance_metrics[name] = {
        'avg_latency_seconds': avg_latency,
        'avg_docs_retrieved': avg_docs_retrieved,
        'retrieval_success_rate': retrieval_success_rate,
        'total_questions': len(test_questions)
    }
    
    print(f"{name} Results:")
    print(f"  - Average latency: {avg_latency:.3f} seconds")
    print(f"  - Average docs retrieved: {avg_docs_retrieved:.1f}")
    print(f"  - Success rate: {retrieval_success_rate:.1%}")

# Step 4: Create comparison table
df_results = pd.DataFrame(performance_metrics).T
print("\n" + "="*80)
print("RETRIEVER PERFORMANCE COMPARISON")
print("="*80)
print(df_results.round(3))

# Step 5: Cost analysis
print("\n" + "="*80)
print("COST ANALYSIS")
print("="*80)

cost_analysis = {
    "Naive": "Low - Only OpenAI embedding API calls",
    "BM25": "Lowest - No API calls, pure algorithmic", 
    "Contextual_Compression": "High - OpenAI embeddings + Cohere rerank API",
    "Multi_Query": "High - OpenAI embeddings + GPT API for query generation",
    "Parent_Document": "Medium - More OpenAI embedding calls for chunking",
    "Ensemble": "Highest - Combines costs of all methods"
}

for retriever, cost in cost_analysis.items():
    print(f"{retriever:20}: {cost}")

# Step 6: Performance analysis and recommendations
print("\n" + "="*80)
print("PERFORMANCE ANALYSIS & RECOMMENDATIONS")
print("="*80)

# Find best performers
fastest = df_results['avg_latency_seconds'].idxmin()
most_docs = df_results['avg_docs_retrieved'].idxmax()
highest_success = df_results['retrieval_success_rate'].idxmax()

print(f"Fastest retriever: {fastest} ({df_results.loc[fastest, 'avg_latency_seconds']:.3f}s)")
print(f"Most comprehensive: {most_docs} ({df_results.loc[most_docs, 'avg_docs_retrieved']:.1f} docs)")
print(f"Most reliable: {highest_success} ({df_results.loc[highest_success, 'retrieval_success_rate']:.1%} success)")



Creating manual test dataset for loan complaints...
Created 10 test questions with ground truth

EVALUATING RETRIEVAL METHODS

Evaluating Naive retriever...


  docs = retriever.get_relevant_documents(question)


Naive Results:
  - Average latency: 0.401 seconds
  - Average docs retrieved: 10.0
  - Success rate: 100.0%

Evaluating BM25 retriever...
BM25 Results:
  - Average latency: 0.001 seconds
  - Average docs retrieved: 4.0
  - Success rate: 100.0%

Evaluating Contextual_Compression retriever...
Contextual_Compression Results:
  - Average latency: 0.737 seconds
  - Average docs retrieved: 3.0
  - Success rate: 100.0%

Evaluating Multi_Query retriever...
Multi_Query Results:
  - Average latency: 1.838 seconds
  - Average docs retrieved: 17.7
  - Success rate: 100.0%

Evaluating Parent_Document retriever...
Parent_Document Results:
  - Average latency: 0.319 seconds
  - Average docs retrieved: 3.9
  - Success rate: 100.0%

Evaluating Ensemble retriever...
Error with Ensemble on question 1: status_code: 429, body: data=None id='2c1b830d-9b2d-4066-be91-c7dd2e286a85' message="You are using a Trial key, which is limited to 10 API calls / minute. You can continue to use the Trial key for free or u


Based on this evaluation of retrieval methods for loan complaint data, the best overall was Contextual Compression because it provides the highest quality results by reranking retrieved documents, worth the extra cost for production systems that require accuracy. Best for speed is BM25 because its the fastest, with no API costs, and great for exact keyword matching. BM25 is also best for budget, due to its zero API costs after initial setup. Best for recall, Multi-query because it generates multiple query variations to catch more relevant documents. My recommendation is start with BM25 for cost-effectiveness, then upgrade to Contextual Compression if quality imrpovements justify additional cost.
