# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

We'll also provide our OpenAI key, as well as our Cohere API key.

In [22]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

In [23]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

In [24]:
import os
import getpass
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "s09-d3cc956a"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"  
os.environ["LANGSMITH_API_KEY"] = "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"


## Task 2: Data Collection and Preparation

We'll be using our Loan Data once again - this time the strutured data available through the CSV!

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

In [25]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

loader = CSVLoader(
    file_path=f"./data/complaints.csv",
    metadata_columns=[
      "Date received", 
      "Product", 
      "Sub-product", 
      "Issue", 
      "Sub-issue", 
      "Consumer complaint narrative", 
      "Company public response", 
      "Company", 
      "State", 
      "ZIP code", 
      "Tags", 
      "Consumer consent provided?", 
      "Submitted via", 
      "Date sent to company", 
      "Company response to consumer", 
      "Timely response?", 
      "Consumer disputed?", 
      "Complaint ID"
    ]
)

loan_complaint_data = loader.load()

for doc in loan_complaint_data:
    doc.page_content = doc.metadata["Consumer complaint narrative"]

Let's look at an example document to see if everything worked as expected!

In [26]:
loan_complaint_data[0]

Document(metadata={'source': './data/complaints.csv', 'row': 0, 'Date received': '03/27/25', 'Product': 'Student loan', 'Sub-product': 'Federal student loan servicing', 'Issue': 'Dealing with your lender or servicer', 'Sub-issue': 'Trouble with how payments are being handled', 'Consumer complaint narrative': "The federal student loan COVID-19 forbearance program ended in XX/XX/XXXX. However, payments were not re-amortized on my federal student loans currently serviced by Nelnet until very recently. The new payment amount that is effective starting with the XX/XX/XXXX payment will nearly double my payment from {$180.00} per month to {$360.00} per month. I'm fortunate that my current financial position allows me to be able to handle the increased payment amount, but I am sure there are likely many borrowers who are not in the same position. The re-amortization should have occurred once the forbearance ended to reduce the impact to borrowers.", 'Company public response': 'None', 'Company'

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "LoanComplaints".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [27]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    loan_complaint_data,
    embeddings,
    location=":memory:",
    collection_name="LoanComplaints"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [28]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [29]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [30]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-nano")

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [31]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [10]:
naive_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided complaints, one of the most common issues with student loans appears to be mismanagement and errors by loan servicers. These issues include:\n\n- Errors in loan balances and incorrect reporting on credit reports.\n- Difficulty applying payments properly, often resulting in unwanted interest accrual and inability to pay down principal.\n- Lack of clear communication and transparency regarding loan terms, transfers, and balances.\n- Problems with loan handling, such as unauthorized transfers, mishandling of forbearances, and inaccurate account statuses.\n- Disputes over incorrect information, late payments, and issues with loan forgiveness or discharge.\n\nTherefore, a most common issue is **mismanagement and inaccuracies by loan servicers, including errors in balances, incorrect reporting, and inadequate communication**.'

In [14]:
naive_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

"Based on the provided information, yes, some complaints did not get handled in a timely manner. Specifically, at least two complaints indicate delayed responses:\n\n1. Complaint from 03/28/25 (submitted to Mohela), where the response was marked as 'No' for timely response, showing the consumer's expectations for a timely resolution were not met.\n2. Complaint from 04/24/25 (submitted to Maximus Federal Services) also responded 'Yes' for timely response, but multiple other complaints, such as the one from 04/14/25 to Nelnet, indicate the issue was not addressed promptly, with delays longer than originally promised.\n\nOverall, the complaints highlight instances where consumer concerns about delays or lack of response were evident, implying that some complaints were not handled in a timely manner."

In [15]:
naive_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for several reasons based on the provided complaints:\n\n1. **Lack of clear communication and notification:** Many complainants were not adequately informed about when their loan repayment was to resume, loan transfer details, or changes in their loan status. For example, some were unaware their loans had been transferred to different servicers or were not notified when repayment was supposed to start.\n\n2. **Difficulty with repayment options:** Several individuals faced limited options like forbearance or deferment, which led to accumulating interest and increasing the total debt. Some felt these options extended the repayment period and made paying off the loans more difficult.\n\n3. **Financial hardships and inability to afford payments:** Many borrowers indicated that increasing payments to reduce principal was unaffordable, especially given stagnant wages, inflation, or other financial hardships. This made repayment seem unmanageable.\n\n4. 

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [32]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(loan_complaint_data, )

We'll construct the same chain - only changing the retriever.

In [33]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [34]:
bm25_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans appears to be problems related to the handling and servicing of the loans, such as dealing with lenders or servicers, receiving incorrect or bad information about the loans, and issues with repayment processes. Specific recurring issues include disputes over fees charged, difficulty applying payments correctly, and confusion about loan balances and terms.'

In [19]:
bm25_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided data, it appears that several complaints were responded to in a timely manner, with the responses marked as "Yes" for "Timely response?". Specifically, complaints received on 04/01/25, 04/24/25, and 04/26/25 indicate timely responses from the companies involved. \n\nHowever, there are complaints (e.g., the one received on 05/08/25) where the complaint response was marked as "Closed with explanation," but the narrative suggests ongoing issues and dissatisfaction, and there is no indication they were handled outside of the overall timely response window. \n\nSince the data does not record any complaints that explicitly remained unresolved or claimed to be handled late, there is no evidence in this dataset that any complaints were not handled in a timely manner. \n\nTherefore, the answer is:  \n**No, there is no indication that any complaints did not get handled in a timely manner.**'

In [20]:
bm25_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People often fail to pay back their loans due to a variety of issues, including mismanagement or errors by loan servicers, lack of communication, or complications with their payment plans. For example, some individuals experienced their autopayments being discontinued without proper notification, leading to missed payments and negative credit impacts. Others faced confusion and difficulties when their loans were transferred between different companies, often without clear communication, which resulted in lack of awareness about their payment status or due dates. Additionally, some borrowers had trouble with loan repayment plans or forbearances due to administrative problems, such as their requests not being handled properly or being ignored altogether. These issues can lead to unintentional default or missed payments, making it seem like borrowers failed to pay back their loans, when in fact the failures stemmed from administrative errors, poor communication, or inadequate support fro

It's not clear that this is better or worse, if only we had a way to test this (SPOILERS: We do, the second half of the notebook will cover this)

#### ❓ Question #1:

Give an example query where BM25 is better than embeddings and justify your answer.

## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [35]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-v3.5")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [36]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [15]:
contextual_compression_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided context, appears to be problems related to "Dealing with your lender or servicer," including issues such as receiving bad information about loans, errors in loan balances, misapplied payments, wrongful denials of payment plans, and complications arising from how payments are handled or processed. Many complaints highlight difficulties with understanding or managing interest accumulation, unclear or conflicting information about balances, and challenges with repayment options like forbearance or deferment that can lead to increased interest over time.'

In [27]:
contextual_compression_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, at least one complaint indicates it was not handled in a timely manner. Specifically, the complaint about the student loan account review, which has been open for nearly 18 months without resolution, suggests a significant delay. The complaint notes that the individual has been awaiting response and resolution for over a year, and there has been no resolution despite the long duration.\n\nHowever, for other complaints, the responses from the companies are noted as "Closed with explanation" and marked as "Yes" for timely response, indicating they were addressed within an acceptable timeframe.\n\nTherefore, yes, there was at least one complaint that was not handled in a timely manner.'

In [28]:
contextual_compression_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily due to a lack of clear information and communication about their loans, as well as difficulties managing interest and payment options. Many borrowers were not informed about the necessity of repayment or the details of how interest accrues over time, especially when loans are transferred or serviced by different companies without proper notification. Additionally, some borrowers faced limited options—such as only being offered forbearance or deferment—that resulted in ongoing interest accumulation, making it harder to repay the principal. Other reasons include financial hardships, stagnating wages, and misguidance about repayment terms and forgiveness programs, all of which contributed to their inability to fulfill repayment obligations.\n'

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [37]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [38]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [31]:
multi_query_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

"The most common issue with loans, based on the provided complaints and context, appears to be problems related to the handling and servicing of student loans. This includes:\n\n- Dealing with lenders or servicers who mishandle payments, misapply payments, or apply them incorrectly (e.g., applying to interest instead of principal).\n- Errors in loan balances and interest calculations.\n- Lack of transparent communication or failure to provide necessary documentation like original promissory notes.\n- Unjustified increases in interest rates or balances, often due to forbearances or transfers between servicers.\n- Problems with loan repayment plans, such as being steered into unsuitable options or facing difficulty in applying additional payments to principal.\n- Erroneous or unauthorized loans appearing on credit reports.\n- Servicers' failure to verify or maintain proper legal documentation, such as signed master promissory notes.\n\nOverall, the most prevalent issue is the mishandling

In [32]:
multi_query_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, yes, some complaints did not get handled in a timely manner. Specifically, one complaint received a response marked as "No" for timely response, indicating it was late. Additionally, there are multiple instances where complainants reported waiting over long periods (hours) without resolution, or their issues remained unresolved for over a year despite multiple follow-ups. Therefore, it can be concluded that certain complaints were not addressed promptly.'

In [33]:
multi_query_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily due to a combination of factors highlighted in these complaints:\n\n1. **Accumulation of Interest and Unmanageable Balances**: Many borrowers reported that interest continued to accrue during forbearance or deferment periods, sometimes capitalizing (adding to principal), which increased the total amount owed and made repayment more difficult.\n\n2. **Lack of Clear and Accurate Information from Servicers**: Several complaints cite servicers steering borrowers into forbearance or consolidations without informing them of better options like income-driven repayment plans or rehabilitation, leading to increased balances and loss of forgiveness eligibility.\n\n3. **Financial Hardship and Economic Conditions**: Borrowers faced hardships such as unemployment, low income, or unexpected expenses, making it physically or financially impossible to increase monthly payments without sacrificing essentials.\n\n4. **Mismanagement and Lack of Transparenc

#### ❓ Question #2:

Explain how generating multiple reformulations of a user query can improve recall.

## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [39]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = loan_complaint_data
child_splitter = RecursiveCharacterTextSplitter(chunk_size=750)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [40]:
from langchain_qdrant import QdrantVectorStore

client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = QdrantVectorStore(
    collection_name="full_documents", embedding=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [41]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [42]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [43]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [39]:
parent_document_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the complaints provided, appears to be related to errors and misconduct in federal student loan servicing. Specific recurring problems include incorrect information on credit reports, misapplication of payments, wrongful denials of payment plans, discrepancies in loan balances and interest rates, and issues with collection and verification of debts.'

In [40]:
parent_document_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, it appears that several complaints were not handled in a timely manner. Specifically, the complaints related to the student loan issues with MOHELA (Complaint IDs 12709087 and 12935889) indicate that the responses were "No" in the "Timely response?" field, meaning they were not handled promptly. Additionally, the complaint about the dispute settlement with Nelnet (Complaint ID 13205525) was responded to within the expected timeframe ("Yes" in "Timely response?"). \n\nTherefore, yes, some complaints—particularly those regarding MOHELA—did not get handled in a timely manner.'

In [41]:
parent_document_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for various reasons, including:\n\n1. Lack of proper communication or notification from loan servicers about payment obligations, as indicated by complaints about not being notified when payments were due or about changes in loan ownership.\n2. Financial hardship or severe economic difficulties that made it impossible to make timely payments, such as unemployment or inability to find employment in their field.\n3. Misrepresentation or lack of transparency from educational institutions and loan providers regarding the long-term financial consequences, job prospects after graduation, and the sustainability of the school’s operations.\n4. Relying on deferment and forbearance options that increased interest and debt over time.\n5. Disputes over the legitimacy or ownership of the debt, including issues related to the legal verification of loans and deceptive practices by collection agencies.\n6. Personal health issues or other personal circumstances th

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [44]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [45]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [44]:
ensemble_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided data, appears to be dealing with the loan servicer or lender, including errors in loan balances, misapplied payments, wrongful denials of payment plans, and problems with how payments are being handled. Several complaints highlight issues such as receiving bad information about loans, inability to properly apply payments to principal, inaccurate reporting of delinquency, and mishandling of loan transfers or consolidations. \n\nIn summary, a predominant and recurring problem is the mismanagement and poor communication from loan servicers, which leads to misapplied payments, incorrect account information, and difficulties in resolving repayment issues.'

In [45]:
ensemble_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, yes, there are several instances indicating complaints not handled in a timely manner. For example:\n\n- One complaint (#12935889) about Mohela was marked as "Timely response?": No.\n- Another (#12744910) regarding inaccuracies in reporting and an ongoing dispute was "Timely response?": Yes, but the complaint was about inaccurate reporting and delays in correction, suggesting the issue persisted over time.\n- Multiple complaints (#12739706, #13062402, #13126709, #13127090, and others) mention delays, extended wait times, or responses that were not addressed promptly, with some even explicitly stating they did not receive responses within expected timeframes.\n- There are cases where the response was "Closed with explanation" but the delays or unresolved issues strongly imply they were not handled promptly or adequately.\n\nOverall, the evidence suggests that at least some complaints were not handled in a timely manner, as indicated directly by the res

In [46]:
ensemble_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for several reasons, often related to mismanagement, misinformation, and systemic issues. Based on the provided complaints, common reasons include:\n\n1. **Lack of Notification and Communication:** Many borrowers were not properly notified about loan transfers, due dates, or repayment start dates, leading to unintentional delinquency and missed payments.\n\n2. **Misleading or Incomplete Information:** Borrowers reported receiving incorrect or misleading information about their loan balances, repayment obligations, or eligibility for programs like income-driven repayment or forgiveness, which caused confusion and unintended default.\n\n3. **System Errors and Technical Difficulties:** Issues such as online portal lockouts, incorrect account statuses, and errors in reporting contributed to borrowers not making payments or being marked delinquent improperly.\n\n4. **Inadequate Support and Assistance:** Borrowers often found customer service unhelpful,

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [46]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [47]:
semantic_documents = semantic_chunker.split_documents(loan_complaint_data[:20])

Let's create a new vector store.

In [48]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="Loan_Complaint_Data_Semantic_Chunks"
)

We'll use naive retrieval for this example.

In [49]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [50]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [56]:
semantic_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided complaints, the most common issues with loans appear to be related to difficulties in communication and account management, such as:\n\n- Struggling to repay loans due to errors or issues with payment plans.\n- Problems with loan reporting, including incorrect or improper reporting of account status or default.\n- Difficulties in obtaining clear information about loan balances, loan servicer changes, or payment amounts.\n- Issues with loan servicing companies failing to respond appropriately or failing to verify or process applications.\n- Unauthorized or illegal reporting and collection practices, including violations of privacy laws.\n\nWhile these are specific to student loans in the context provided, a recurring theme is that many complaints involve mismanagement, lack of transparency, or errors in the handling of loans and related information. \n\nTherefore, a common underlying issue with loans, especially highlighted here, is **mismanagement or errors in se

In [57]:
semantic_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, it appears that many complaints were responded to in a timely manner, with responses marked as \'Yes\' under the \'Timely response?\' field. Notably, several complaints state "Closed with explanation," indicating that they were addressed within the required time frame. \n\nHowever, there is at least one complaint regarding a lack of response or handling—specifically, the complaint about Nelnet (row 17). The consumer\'s narrative details multiple issues with lack of responses and conduct that suggests their complaint was not handled promptly or satisfactorily.\n\nIn summary:\n\n- Multiple complaints confirm responses were handled in a timely manner.\n- One complaint (about Nelnet\'s failure to respond to Certified Mail and ongoing misconduct) indicates that the complaint was not properly handled or responded to, suggesting that some complaints did not get handled in a timely manner.\n\nTherefore, yes, some complaints did not get handled in a timely man

In [58]:
semantic_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans for various reasons, including issues such as difficulties dealing with their loan servicers, miscommunications or inadequate information about their loan status, problems with payment processing, and disputes over the legitimacy or accuracy of their loan details. Some specific reasons noted in the complaints include receiving bad information about loan statuses, delays or errors in re-amortizing payments after forbearance ended, and inaccurate reports of default or delinquency. Additionally, instances of alleged mismanagement, lack of transparency, or improper handling of personal data have also contributed to borrowers' difficulties in repayment."

#### ❓ Question #3:

If sentences are short and highly repetitive (e.g., FAQs), how might semantic chunking behave, and how would you adjust the algorithm?

# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.

In [51]:
# Step 2: Generate Synthetic Test Dataset using Ragas
print("Generating synthetic test dataset...")

from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
from ragas.testset import TestsetGenerator

generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1"))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())


# Initialize the testset generator with our LLM and embedding model
generator = TestsetGenerator(llm=generator_llm, embedding_model=generator_embeddings)


# Use a subset of documents for test generation (to manage cost)
test_documents = loan_complaint_data[:20]  # Using first 50 documents

# Generate test dataset
testset = generator.generate_with_langchain_docs(
    documents=test_documents,
    testset_size=10 # Generate 10 question-answer pairs
)

# Convert to pandas DataFrame for easier handling
test_df = testset.to_pandas()


Generating synthetic test dataset...


Applying SummaryExtractor:   0%|          | 0/14 [00:00<?, ?it/s]

Applying CustomNodeFilter:   0%|          | 0/20 [00:00<?, ?it/s]

Node 8e60eb4a-b223-4869-b703-7d6cc9312f63 does not have a summary. Skipping filtering.
Node 48d6ac40-867c-4f73-8c2f-a5cae4cc033a does not have a summary. Skipping filtering.
Node 6896683f-bde6-4cae-8f95-46cde432cc89 does not have a summary. Skipping filtering.
Node fddfa489-b94d-4645-814c-8d2939bc68fd does not have a summary. Skipping filtering.
Node b8215ad6-a756-4d50-be89-901087afec3a does not have a summary. Skipping filtering.
Node 3a440263-1e77-4ddc-b665-5518745a9901 does not have a summary. Skipping filtering.


Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:   0%|          | 0/54 [00:00<?, ?it/s]

Applying OverlapScoreBuilder:   0%|          | 0/1 [00:00<?, ?it/s]

Generating personas:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Scenarios:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Samples:   0%|          | 0/10 [00:00<?, ?it/s]

In [52]:
dataset = testset


In [54]:
import time
import copy
from ragas import EvaluationDataset
from ragas import evaluate
from ragas.llms import LangchainLLMWrapper
from ragas.metrics import LLMContextRecall, ContextPrecision
from ragas import evaluate, RunConfig
from langsmith.run_helpers import traceable
from langsmith import Client


def enrich_dataset(graph):
    dataset = copy.deepcopy(testset)
    for test_row in dataset:
        response = graph.invoke({"question" : test_row.eval_sample.user_input})
        test_row.eval_sample.response = response["response"].content
        test_row.eval_sample.retrieved_contexts = [context.page_content for context in response["context"]]
    return dataset 

def run_ragas_evaluation(dataset):
    evaluation_dataset = EvaluationDataset.from_pandas(dataset.to_pandas())
    evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1-mini"))
    custom_run_config = RunConfig(timeout=360)
    result = evaluate(
        dataset=evaluation_dataset,
        metrics=[LLMContextRecall(), ContextPrecision()],
        llm=evaluator_llm,
        run_config=custom_run_config
    )
    return result

In [104]:
res = naive_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})

In [21]:
dataset = enrich_dataset(naive_retrieval_chain)

In [55]:
import time
from langsmith import Client, tracing_context
from datetime import datetime

# Store results
evaluation_results = {}

In [58]:
print("🚀 Starting Retriever Evaluation with Cost/Latency Tracking")
print("="*60)

# Initialize LangSmith client
client = Client()

# Turn off global tracing
os.environ["LANGCHAIN_TRACING_V2"] = "false"

# Dictionary of all your retrieval chains
retrieval_chains = {
    # "naive": naive_retrieval_chain,
    # "bm25": bm25_retrieval_chain, 
    # "multi_query": multi_query_retrieval_chain,
    # "parent_document": parent_document_retrieval_chain,
    # "contextual_compression": contextual_compression_retrieval_chain,
    "ensemble": ensemble_retrieval_chain
    #"semantic": semantic_retrieval_chain
}

for retriever_name, chain in retrieval_chains.items():
    print(f"\n📊 Evaluating: {retriever_name.upper()}")
    
    # Create unique project name for this retriever
    project_name = f"s09-retriever-eval-{retriever_name}"
    
    start_time = time.time()
    
    # Trace only this specific evaluation
    with tracing_context(
        enabled=True,
        project_name=project_name,
        tags=[f"retriever:{retriever_name}", "evaluation"]
    ):
        dataset = enrich_dataset(chain)
    
    end_time = time.time()
    total_time = end_time - start_time
    
    print(f"   ✅ Completed in {total_time:.2f} seconds")
    print(f"   📁 Traces in project: {project_name}")
    
    # Store basic metrics
    evaluation_results[retriever_name] = {
        "dataset": dataset,
        "total_time": total_time,
        "project_name": project_name,
        "num_samples": len(dataset)
    }

print(f"\n✅ All evaluations completed!")

🚀 Starting Retriever Evaluation with Cost/Latency Tracking

📊 Evaluating: ENSEMBLE
   ✅ Completed in 126.53 seconds
   📁 Traces in project: s09-retriever-eval-ensemble

✅ All evaluations completed!


In [59]:
evaluation_results

  'total_time': 69.5992579460144,
  'project_name': 's09-retriever-eval-naive',
  'num_samples': 10},
  'total_time': 47.895630836486816,
  'project_name': 's09-retriever-eval-bm25',
  'num_samples': 10},
  'total_time': 88.86725974082947,
  'project_name': 's09-retriever-eval-multi_query',
  'num_samples': 10},
  'total_time': 56.54189085960388,
  'project_name': 's09-retriever-eval-parent_document',
  'num_samples': 10},
  'total_time': 52.23817181587219,
  'project_name': 's09-retriever-eval-contextual_compression',
  'num_samples': 10},
  'total_time': 126.52863717079163,
  'project_name': 's09-retriever-eval-ensemble',
  'num_samples': 10}}

In [74]:
# Extract detailed metrics from LangSmith traces
def get_langsmith_metrics(project_name, retriever_name):
    """Extract cost and latency metrics from LangSmith project"""
    try:
        # Get all runs from the project
        runs = list(client.list_runs(project_name=project_name))
        
        if not runs:
            return {"error": "No runs found"}
        
        parent_runnable_runs = [
            run for run in runs 
            if run.name == 'RunnableSequence' 
            and run.parent_run_id is None  # Only parent runs
        ]
        
        print(f"Found {len(parent_runnable_runs)} parent RunnableSequence runs for {retriever_name}")

        # Calculate metrics
        total_cost = 0
        total_latency = 0
        llm_calls = 0
        
        for run in parent_runnable_runs:
            # Cost (if available)
            if hasattr(run, 'total_cost') and run.total_cost:
                total_cost += run.total_cost
                
            # Latency 
            if run.end_time and run.start_time:
                latency = (run.end_time - run.start_time).total_seconds()
                total_latency += latency
                
            # Count LLM calls
            if run.run_type == "llm":
                llm_calls += 1
        
        num_chains = len(parent_runnable_runs)
        
        return {
            "total_cost_usd": total_cost,
            "total_latency_seconds": total_latency,
            "average_latency_per_chain": total_latency / num_chains if num_chains > 0 else 0,
            "llm_calls": llm_calls,
            "cost_per_chain": total_cost / num_chains if num_chains > 0 else 0,
            "num_chain_executions": num_chains
        }
        
    except Exception as e:
        return {"error": str(e)}



# Retriever Performance Comparison

## LangSmith Performance Metrics

| Retriever | Date/Time | Runs | error Rate | p50 latency | p99 Latency | streaming | total Tokens | total Cost |
|-----------|-----------|------|--------------|-------------|---------------|------------|--------|------|
| **ensemble** | 7/29/2025, 9:05:19 AM | 11 | 9% | 10.94s | 22.62s | 0% | 177,088 | $0.02 |
| **contextual_compression** | 7/29/2025, 9:01:50 AM | 10 | 0% | 4.39s | 9.75s | 0% | 28,620 | $0.01 |
| **parent_document** | 7/29/2025, 9:00:57 AM | 10 | 0% | 5.13s | 10.21s | 0% | 43,063 | $0.01 |
| **multi_query** | 7/29/2025, 8:59:56 AM | 10 | 0% | 9.67s | 15.33s | 0% | 129,223 | $0.02 |
| **bm25** | 7/29/2025, 8:58:00 AM | 10 | 0% | 3.99s | 8.81s | 0% | 46,168 | $0.01 |
| **naive** | 7/29/2025, 8:57:10 AM | 10 | 0% | 7.47s | 11.29s | 0% | 80,432 | $0.01 |



In [76]:
result = run_ragas_evaluation(dataset)

Evaluating:   0%|          | 0/20 [00:00<?, ?it/s]

In [78]:
result

{'context_recall': 0.9000, 'context_precision': 0.8372}

In [None]:
# Run RAGAS evaluation for all retrievers
print("🧪 Running RAGAS Evaluation for All Retrievers")
print("="*60)

# Store RAGAS results
ragas_results = {}

🧪 Running RAGAS Evaluation for All Retrievers


In [None]:
from ragas.metrics import LLMContextRecall, ContextEntityRecall, ResponseRelevancy
from ragas import evaluate, RunConfig
from ragas.llms import LangchainLLMWrapper

# Configure RAGAS
custom_run_config = RunConfig(timeout=360)
evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o-mini"))

for retriever_name, results in evaluation_results.items():
    print(f"\n📊 Evaluating {retriever_name.upper()} with RAGAS...")
    
    try:
        # Get the dataset for this retriever
        dataset = results.get("dataset")
        
        if dataset is None:
            print(f"   ❌ No dataset found for {retriever_name}")
            continue
            
        # Run RAGAS evaluation
        ragas_result = run_ragas_evaluation(dataset)
        
        # Store results
        ragas_results[retriever_name] = ragas_result
        
        print(f"   ✅ RAGAS evaluation completed")
        print(f"   📈 Context Recall: {ragas_result.get('llm_context_recall', 0):.3f}")
        print(f"   📈 Entity Recall: {ragas_result.get('context_precision', 0):.3f}")
        
    except Exception as e:
        print(f"   ❌ RAGAS evaluation failed for {retriever_name}: {e}")


print(f"\n✅ RAGAS evaluation completed for all retrievers!")

In [83]:
ragas_results

{'naive': {'context_recall': 0.9300, 'context_precision': 0.8904},
 'bm25': {'context_recall': 0.9071, 'context_precision': 0.8417},
 'multi_query': {'context_recall': 0.9750, 'context_precision': 0.8374},
 'parent_document': {'context_recall': 0.8136, 'context_precision': 0.9333},
 'contextual_compression': {'context_recall': 0.7871, 'context_precision': 0.8917},
 'ensemble': {'context_recall': 1.0000, 'context_precision': 0.8709}}

| Retriever | Context Recall | Context Precision |
|-----------|----------------|-------------------|
| **naive** | 0.9300 | 0.8904 |
| **bm25** | 0.9071 | 0.8417 |
| **multi_query** | 0.9750 | 0.8374 |
| **parent_document** | 0.8136 | 0.9333 |
| **contextual_compression** | 0.7871 | 0.8917 |
| **ensemble** | 1.0000 | 0.8709 |