# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

We'll also provide our OpenAI key, as well as our Cohere API key.

In [1]:
from dotenv import load_dotenv
import os
from uuid import uuid4

load_dotenv()

os.environ["LANGSMITH_PROJECT"] = f"AIM - Assignment 09 - {uuid4().hex[0:8]}"

In [2]:
# import os
# import getpass

# os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

In [3]:
# os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

## Task 2: Data Collection and Preparation

We'll be using our Loan Data once again - this time the strutured data available through the CSV!

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

In [4]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

loader = CSVLoader(
    file_path=f"./data/complaints.csv",
    metadata_columns=[
      "Date received", 
      "Product", 
      "Sub-product", 
      "Issue", 
      "Sub-issue", 
      "Consumer complaint narrative", 
      "Company public response", 
      "Company", 
      "State", 
      "ZIP code", 
      "Tags", 
      "Consumer consent provided?", 
      "Submitted via", 
      "Date sent to company", 
      "Company response to consumer", 
      "Timely response?", 
      "Consumer disputed?", 
      "Complaint ID"
    ]
)

loan_complaint_data = loader.load()

for doc in loan_complaint_data:
    doc.page_content = doc.metadata["Consumer complaint narrative"]

Let's look at an example document to see if everything worked as expected!

In [5]:
loan_complaint_data[0]

Document(metadata={'source': './data/complaints.csv', 'row': 0, 'Date received': '03/27/25', 'Product': 'Student loan', 'Sub-product': 'Federal student loan servicing', 'Issue': 'Dealing with your lender or servicer', 'Sub-issue': 'Trouble with how payments are being handled', 'Consumer complaint narrative': "The federal student loan COVID-19 forbearance program ended in XX/XX/XXXX. However, payments were not re-amortized on my federal student loans currently serviced by Nelnet until very recently. The new payment amount that is effective starting with the XX/XX/XXXX payment will nearly double my payment from {$180.00} per month to {$360.00} per month. I'm fortunate that my current financial position allows me to be able to handle the increased payment amount, but I am sure there are likely many borrowers who are not in the same position. The re-amortization should have occurred once the forbearance ended to reduce the impact to borrowers.", 'Company public response': 'None', 'Company'

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "LoanComplaints".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [6]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    loan_complaint_data,
    embeddings,
    location=":memory:",
    collection_name="LoanComplaints"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [7]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [8]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [9]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-nano")

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [10]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [11]:
naive_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided complaints, the most common issues with loans appear to be related to mismanagement and errors by servicers, including:\n\n- Errors in loan balances and application of payments\n- Incorrect or disputed information on credit reports\n- Problems with how payments are being handled, such as the inability to pay down principal or paying extra funds\n- Delays or lack of transparency when loans are transferred or sold to different servicers\n- Discrepancies in interest rates and balances due to mishandling or improper adjustments\n- Struggles with repayment plans, including incorrect or unjustified loan increases\n- Unauthorized disclosures and privacy violations\n- Issues with loan forgiveness, discharge, or cancellation processes\n\nOverall, a common theme is that borrowers face challenges due to errors, delays, lack of transparency, or misconduct by loan servicers and related agencies.'

In [12]:
naive_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, yes, some complaints were not handled in a timely manner. Specifically, the complaint filed with MOHELA on 03/28/25 was marked as "No" for response timeliness, indicating it was not addressed promptly. Additionally, multiple complaints related to delays or failure to resolve issues within expected timeframes are noted, such as the complaint from 04/14/25 regarding a complaint not being addressed for over 2-3 weeks and ongoing issues that have persisted for months or over a year.'

In [13]:
naive_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans primarily because they faced financial hardships and complications related to loan management. The context highlights several reasons:\n\n1. **Accumulating interest and forbearance:** Many borrowers were only offered options like forbearance or deferment, during which interest continued to accrue, making repayment more difficult over time. Lowering monthly payments often resulted in interest surpassing payments, extending the repayment period and increasing total debt.\n\n2. **Lack of clear communication and misinformation:** Borrowers reported not being adequately informed about payment resumption dates, loan transfer details, or changes in their loan servicing. This lack of communication led to unintended delinquencies and negative credit impacts.\n\n3. **Inconsistent or confusing account management:** Discrepant loan balances, unrecognized transfer of loans between servicers without proper notification, and inability to access accurate account 

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [14]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(loan_complaint_data, )

We'll construct the same chain - only changing the retriever.

In [15]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [16]:
bm25_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans appears to be problems related to dealing with lenders or servicers, such as disputes over fees, difficulties applying payments correctly, receiving inaccurate information, or issues with loan terms and repayment processes. Several complaints mention frustration with how payments are handled, incorrect or misleading information from servicers, and issues that prevent proper repayment or understanding of loan details.'

In [17]:
bm25_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, all the complaints listed indicate that the company responded in a timely manner. Specifically, the complaints from 04/26/25, 04/01/25, 04/24/25, and 05/08/25 all have responses marked as "Yes" under the "Timely response?" field. Therefore, there is no evidence in this data to suggest that any complaints were not handled in a timely manner.'

In [18]:
bm25_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for various reasons, including:\n\n- Difficulty with payment plans and problems with their repayment options, such as being steered into wrong types of forbearances or not receiving proper assistance after applying for deferment or forbearance.\n- Lack of communication or notification from the loan servicers regarding important information, such as changes in loan status, account updates, or approval of deferment/forbearance requests.\n- Errors or issues with the loan servicing process, such as payments being reversed repeatedly, automatic payments being discontinued without notice, or billing problems.\n- Loans being transferred between companies without proper notice, leading to unenrollment from autopay and subsequent missed or late payments.\n- Disputes over the accuracy of billing and payments, which can result in accounts becoming overdue or in default.\n- Some borrowers experienced increased loan balances due to capitalization of interest o

It's not clear that this is better or worse, if only we had a way to test this (SPOILERS: We do, the second half of the notebook will cover this)

#### ❓ Question #1:

Give an example query where BM25 is better than embeddings and justify your answer.

#### Answer:

Named entities - companies, addresses, people, etc.
Embeddings of named entities can actually be misleading (imagine the embedding for Apple). Keyword search is a more direct way to retrieve data for named entities.

## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [19]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-v3.5")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [20]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [21]:
contextual_compression_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans appears to be problems related to dealing with lenders or servicers, such as receiving inaccurate or bad information about the loan, errors in loan balances, misapplied payments, wrongful denials of payment plans, and mishandling of loan data. Many complaints also involve lack of proper communication, discrepancies in loan balances, and improper handling of personal information.\n\nTherefore, the most common issue is **problems with loan servicer misconduct, including errors in loan information, miscommunication, and mishandling of account details**.'

In [22]:
contextual_compression_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, they indicate issues with timely handling. For example, one complaint from a consumer submitted over a year ago has not been resolved after nearly 18 months, and another complaint highlights a problem with customer service that has persisted for over 2-3 weeks. \n\nWhile the company responses state that some complaints were "Closed with explanation" and responses were "Yes" in terms of timeliness, the ongoing unresolved issues and long delays suggest that some complaints did not get handled in a fully timely manner from the consumer\'s perspective.\n\nTherefore, yes, there are complaints that did not get handled in a timely manner.'

In [23]:
contextual_compression_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for several reasons, including:\n\n1. Lack of awareness and understanding: Borrowers were not aware that they needed to repay their loans or were not properly informed about repayment obligations, interest accumulation, and other loan details.\n\n2. Poor communication from lenders/servicers: Borrowers reported not receiving adequate notifications about payment due dates, loan transfers, or changes in loan status, which led to missed payments and confusion.\n\n3. Issues with account management: Problems such as being locked out of online accounts, incorrect information on credit reports, and mismatched account balances contributed to difficulties in managing repayment.\n\n4. Accumulation of interest and economic hardship: While options like forbearance or deferment were available, interest continued to accrue during such periods, increasing the total debt owed and making repayment more difficult.\n\n5. Insufficient income or financial hardship: Man

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [24]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [25]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [26]:
multi_query_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided complaints, appears to be problems related to mishandling by loan servicers, including errors in loan balances, misapplied payments, inaccurate information, and issues with loan payment plans and forgiveness or discharge processes. Many complaints highlight improper information about interest accrual, account mismanagement, lack of proper communication, and difficulties in correcting errors.'

In [27]:
multi_query_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, yes, there were complaints that did not get handled in a timely manner. Several complaints explicitly state that responses from the companies or the CFPB took longer than the expected timeframes, such as over 15 days or several months, and some complaints mention that the companies failed to respond at all despite multiple follow-ups. For example, complaints with responses marked as "No" for timely response or noting delays of over a year indicate that these issues were not addressed promptly.'

In [28]:
multi_query_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans primarily due to issues such as mismanagement by loan servicers, mistaken or inaccurate information about their loan status, lack of proper communication or notification about payments, errors in account reporting, and inability to access or update their information. Additionally, some borrowers were steered into forbearance or other payment plans that accumulated interest and increased their debt, and others experienced systemic failures like transfers of their loans without proper notice, which led to missed payments or negative credit reporting. These factors created confusion, hindered borrowers' ability to stay current on their loans, and in some cases caused their debt to balloon beyond their initial borrowing amount."

#### ❓ Question #2:

Explain how generating multiple reformulations of a user query can improve recall.

#### Answer:

Imagine you have 10 golden chunks in the vector DB. Each version of the user query may have an 80% probability of retrieving all the golden chunks, so each might pull 8 golden chunks. With n versions you have a 1 - (0.2)^n probability of getting all golden chunks (that probability is equal to the probability across a test set).

## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [29]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = loan_complaint_data
child_splitter = RecursiveCharacterTextSplitter(chunk_size=750)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [30]:
from langchain_qdrant import QdrantVectorStore

client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = QdrantVectorStore(
    collection_name="full_documents", embedding=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [31]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [32]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [33]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [34]:
parent_document_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided complaints, appears to be related to problems with loan servicing, such as errors in loan balances, misapplied payments, wrongful denials of payment plans, and misconduct by loan servicers. Many complaints highlight issues like incorrect information on credit reports, unfair increases in interest rates, and disputes over debt legitimacy, indicating that servicing problems are a prevalent concern.'

In [35]:
parent_document_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, several complaints indicated that they did not receive responses in a timely manner. Specifically:\n\n- One complaint submitted on 03/28/25 by a consumer regarding federal student loan servicing was marked as "Timely response?": "No," and the consumer reported that they had not heard back after multiple calls, despite the complaint indicating a 15-day response window. This suggests the complaint was not handled in a timely manner.\n- Another complaint submitted on 04/11/25 also was marked as "Timely response?": "No," with the consumer explicitly stating that they had not received any response despite waiting several weeks.\n  \nIn contrast, the complaint from 04/11/25 regarding a dispute settlement to credit bureaus was marked as "Yes" for timely response.\n\n**Conclusion:** Yes, at least some complaints did not get handled in a timely manner, specifically those from 03/28/25 and 04/11/25 regarding loan servicing issues.'

In [36]:
parent_document_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily due to factors such as experiencing severe financial hardship, lack of proper information about repayment obligations, mismanagement by loan servicers, and issues related to the credibility and transparency of the educational institutions related to their loans. For example, some borrowers faced difficulties because their schools closed or misrepresented the value of their degrees, making it hard for them to find employment and generate income to repay their loans. Others experienced problems with loan servicing, such as not being properly notified about repayment details, being subjected to unverified debt collection practices, or encountering administrative errors like incorrect reporting or unauthorized account activity. Additionally, some borrowers relied on deferment and forbearance options, which increased their debt due to accumulating interest, further complicating repayment efforts.'

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [37]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [38]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [39]:
ensemble_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided complaints, appears to be mismanagement and errors by loan servicers. This includes inaccurate loan balances, misapplied payments, wrongful denials of repayment plans, incorrect reporting to credit bureaus, improper handling of deferments or forbearances, and inadequate communication or notifications about account changes. Many complaints also involve wrongful transfers of loans without proper notice, improper classification of loan types, and errors in interest calculations, all of which cause financial hardship and stress for borrowers.'

In [40]:
ensemble_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints data, yes, there are multiple complaints indicating that complaints did not get handled in a timely manner. Specifically:\n\n- One complaint from row 441 (received on 03/28/25) regarding a student loan issue was marked as "Timely response?": No, meaning it was not handled in a timely manner.\n- Several other complaints, for example rows 400, 418, and 523, also indicate delays or failures to respond promptly, with some explicitly mentioning that responses took over the expected time or that issues remain unresolved despite repeated follow-up.\n\nTherefore, the answer is: Yes, some complaints did not get handled in a timely manner.'

In [41]:
ensemble_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily due to a combination of systemic issues and mismanagement by loan servicers, including:\n\n1. **Lack of Clear Communication and Notification:** Many borrowers were not adequately informed about when their repayment was due or when their loans were transferred to different servicers, leading to missed or late payments. For example, some borrowers received no notification of loan status changes or repayment resumption, resulting in delinquency reports that damaged their credit scores.\n\n2. **Mismanagement and Errors by Servicers:** Errors such as incorrect account balances, misapplied payments, and improper reporting to credit bureaus contributed to borrowers falling behind. Several complaints mention incorrect delinquency reporting, failure to follow regulatory guidelines on notices, and improper account handling.\n\n3. **Unaffordable Payment Options & Interest Accumulation:** Borrowers often found themselves unable to afford increased p

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [42]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [43]:
semantic_documents = semantic_chunker.split_documents(loan_complaint_data[:20])

Let's create a new vector store.

In [44]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="Loan_Complaint_Data_Semantic_Chunks"
)

We'll use naive retrieval for this example.

In [45]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [46]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [47]:
semantic_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided complaints, appears to be problems related to the mismanagement and mishandling of student loans by servicers and lenders. Specific frequent issues include:\n\n- Struggling to repay or problems with repayment plans and payment amounts.\n- Discrepancies and errors in reported account status, including default notices and delinquency reports.\n- Difficulties with loan account management, such as trouble accessing account information, switching servicers, or inaccurate reporting.\n- Issues with improper or illegal use of reporting and collection practices.\n- Problems with documentation, verification, and processing of eligibility for forgiveness or discharge.\n- Lack of transparency, communication, and accountability from loan servicers.\n\nAmong these, a particularly recurring theme is **errors and disputes over account status, repayment amounts, and account handling**, which indicates that administrative errors and mismanagement 

In [48]:
semantic_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, all the responses from the companies were marked as "Closed with explanation." Additionally, the details indicate that responses were submitted in a timely manner ("Yes" for timely response). Despite the timely responses, the complaints highlight ongoing issues such as lack of response to specific inquiries, unresolved disputes, or continued violations by the companies.\n\nTherefore, yes, some complaints did not get fully handled in a timely manner to the complainants\' satisfaction, or were only partially addressed as the complaints persisted with unresolved issues.'

In [49]:
semantic_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People may fail to pay back their loans for various reasons, including issues with communication and transparency with lenders or servicers, difficulties in proving or verifying their loan status, problems with reporting errors or disputes about the legitimacy of the debt, and challenges related to payment processing or changes in payment plans. Additionally, some borrowers face complications due to administrative delays, bad information, or legal issues surrounding their loans, which can hinder repayment efforts.'

#### ❓ Question #3:

If sentences are short and highly repetitive (e.g., FAQs), how might semantic chunking behave, and how would you adjust the algorithm?

#### Answer:

Semantic chunking might group these repetivite sentences together if they are semantically similar.

Not sure if this is a thing, but to improve this I would consider just having an LLM chunk a document directly. Ask it to repeat back the document as a list/array where the elements are chunks, and the LLM decides how to keep semantically similar content contained within chunks.

# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.

In [50]:
### YOUR CODE HERE

In [51]:
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
from ragas.testset import TestsetGenerator

generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1-nano"))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

generator = TestsetGenerator(llm=generator_llm, embedding_model=generator_embeddings)
dataset = generator.generate_with_langchain_docs(loan_complaint_data[:20], testset_size=10)

Applying SummaryExtractor:   0%|          | 0/14 [00:00<?, ?it/s]

Applying CustomNodeFilter:   0%|          | 0/20 [00:00<?, ?it/s]

Node 87554e14-a1cb-4fb7-9dce-45f70d9d372d does not have a summary. Skipping filtering.
Node 186eb49d-de2e-4a69-a855-0aa3a2ab57cb does not have a summary. Skipping filtering.
Node 7f77e5b2-8cbd-458c-bd27-7e67eb88b7f2 does not have a summary. Skipping filtering.
Node 525cbbd3-a9f8-43a1-a9e7-786d79348914 does not have a summary. Skipping filtering.
Node 124c9b6b-1e7e-43c6-b535-c7492337898b does not have a summary. Skipping filtering.
Node aa2396cb-a241-4434-930a-be0c7f95b952 does not have a summary. Skipping filtering.


Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:   0%|          | 0/54 [00:00<?, ?it/s]

Applying OverlapScoreBuilder:   0%|          | 0/1 [00:00<?, ?it/s]

Generating personas:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Scenarios:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Samples:   0%|          | 0/10 [00:00<?, ?it/s]

In [52]:
invocables = {
    "naive": naive_retrieval_chain,
    "bm25": bm25_retrieval_chain, 
    "contextual_compression": contextual_compression_retrieval_chain,
    "multi_query": multi_query_retrieval_chain,
    "parent_document": parent_document_retrieval_chain, 
    "ensemble": ensemble_retrieval_chain, 
    "semantic": semantic_retrieval_chain, 
}

In [53]:
from ragas import EvaluationDataset
from ragas import evaluate
from ragas.llms import LangchainLLMWrapper
from ragas.metrics import LLMContextRecall, Faithfulness, FactualCorrectness, ResponseRelevancy, ContextEntityRecall, NoiseSensitivity
from ragas import evaluate, RunConfig
import copy

def run_eval_ragas(invocable, invocable_name, dataset):

  dataset_this = copy.deepcopy(dataset)

  for test_row in dataset_this:
    response = invocable.invoke({"question" : test_row.eval_sample.user_input}, {"tags": [invocable_name]})
    test_row.eval_sample.response = response["response"].content
    test_row.eval_sample.retrieved_contexts = [context.page_content for context in response["context"]]

  evaluation_dataset = EvaluationDataset.from_pandas(dataset_this.to_pandas())
  evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1-mini"))

  custom_run_config = RunConfig(timeout=720)
  result = evaluate(
      dataset=evaluation_dataset,
      metrics=[LLMContextRecall(), Faithfulness(), FactualCorrectness(), ResponseRelevancy()],
      llm=evaluator_llm,
      run_config=custom_run_config
  )
  return result

In [54]:
from concurrent.futures import ThreadPoolExecutor

results = []
with ThreadPoolExecutor() as executor:
    futures = [
        executor.submit(run_eval_ragas, invocable, invocable_name, dataset)
        for invocable_name, invocable in invocables.items()
    ]
    results = [f.result() for f in futures]

results_d = {invocable_name: result for invocable_name, result in zip(invocables.keys(), results)}

Evaluating:   0%|          | 0/40 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/40 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/40 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/40 [00:00<?, ?it/s]

Exception raised in Job[3]: APIConnectionError(Connection error.)
Exception raised in Job[22]: APIConnectionError(Connection error.)
Exception raised in Job[13]: APIConnectionError(Connection error.)
Exception raised in Job[23]: APIConnectionError(Connection error.)
Exception raised in Job[19]: APIConnectionError(Connection error.)
Exception raised in Job[27]: APIConnectionError(Connection error.)
Exception raised in Job[18]: APIConnectionError(Connection error.)
Exception raised in Job[24]: APIConnectionError(Connection error.)


Evaluating:   0%|          | 0/40 [00:00<?, ?it/s]

Exception raised in Job[1]: APIConnectionError(Connection error.)
Exception raised in Job[39]: APIConnectionError(Connection error.)
Exception raised in Job[9]: APIConnectionError(Connection error.)
Exception raised in Job[34]: APIConnectionError(Connection error.)


Evaluating:   0%|          | 0/40 [00:00<?, ?it/s]

Exception raised in Job[6]: APIConnectionError(Connection error.)
Exception raised in Job[26]: APIConnectionError(Connection error.)
Exception raised in Job[16]: APIConnectionError(Connection error.)
Exception raised in Job[35]: APIConnectionError(Connection error.)
Exception raised in Job[23]: APIConnectionError(Connection error.)
Exception raised in Job[26]: APIConnectionError(Connection error.)
Exception raised in Job[30]: APIConnectionError(Connection error.)
Exception raised in Job[34]: APIConnectionError(Connection error.)
Exception raised in Job[14]: APIConnectionError(Connection error.)
Exception raised in Job[22]: APIConnectionError(Connection error.)
Exception raised in Job[6]: APIConnectionError(Connection error.)
Exception raised in Job[10]: AttributeError('StringIO' object has no attribute 'statements')
Exception raised in Job[9]: AttributeError('StringIO' object has no attribute 'statements')
Exception raised in Job[26]: APIConnectionError(Connection error.)


Evaluating:   0%|          | 0/40 [00:00<?, ?it/s]

Exception raised in Job[22]: APIConnectionError(Connection error.)
Exception raised in Job[38]: APIConnectionError(Connection error.)


In [55]:
import pandas as pd

score_dfs = []
for name, result in results_d.items():
    score_df = pd.DataFrame(result.scores)
    score_df['experiment_name'] = name
    score_dfs.append(score_df)

score_df = pd.concat(score_dfs)
score_df

Unnamed: 0,context_recall,faithfulness,factual_correctness,answer_relevancy,experiment_name
0,1.0,0.555556,0.18,0.000000,naive
1,1.0,0.522727,0.15,0.941829,naive
2,1.0,0.666667,0.50,0.987586,naive
3,0.0,0.000000,0.24,0.960614,naive
4,1.0,0.941176,0.71,0.944441,naive
...,...,...,...,...,...
5,1.0,0.878788,0.62,0.000000,semantic
6,,0.880952,0.88,0.951291,semantic
7,1.0,0.750000,0.37,0.000000,semantic
8,1.0,0.825000,,0.934657,semantic


In [56]:
score_aggs = score_df.groupby('experiment_name').mean()
score_aggs

Unnamed: 0_level_0,context_recall,faithfulness,factual_correctness,answer_relevancy
experiment_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
bm25,0.633333,0.640254,0.562222,0.749486
contextual_compression,0.683333,0.768561,0.612857,0.842251
ensemble,0.966667,0.858049,0.455,0.665007
multi_query,1.0,0.900093,0.616667,0.735416
naive,0.833333,0.731857,0.5075,0.824444
parent_document,0.833333,0.787431,0.475,0.742986
semantic,1.0,0.777018,0.578571,0.663101


In [59]:
tags

dict_keys(['naive', 'bm25', 'contextual_compression', 'multi_query', 'parent_document', 'ensemble', 'semantic'])

In [62]:
from collections import defaultdict
from langsmith import Client

client = Client()

tags = list(invocables.keys())
by_tag = defaultdict(dict)

for t in tqdm(tags):
    by_tag[t] = client.get_run_stats(
        project_names=[os.environ["LANGSMITH_PROJECT"]],
        # filter syntax lets you match an element in the tags array
        filter=f"has(tags, '{t}')"
    )

In [67]:
langsmith_df = pd.DataFrame(by_tag).T
langsmith_df

Unnamed: 0,run_count,latency_p50,latency_p99,first_token_p50,first_token_p99,total_tokens,prompt_tokens,completion_tokens,median_tokens,completion_tokens_p50,...,last_run_start_time,feedback_stats,run_facets,error_rate,streaming_rate,total_cost,prompt_cost,completion_cost,cost_p50,cost_p99
naive,140,0.097,5.97622,,,78448,75422,3026,0,0,...,2025-07-29T18:07:50.680775,{},"[{'key': 'name', 'value': 'RunnableLambda', 'q...",0.0,0.0,0.008753,0.007542,0.00121,0.000856,0.00128
bm25,140,0.004,5.53122,,,44496,41779,2717,0,0,...,2025-07-29T18:07:44.704082,{},"[{'key': 'name', 'value': 'RunnableLambda', 'q...",0.0,0.0,0.005265,0.004178,0.001087,0.000536,0.000795
contextual_compression,150,0.2925,8.65653,,,29567,27045,2522,0,0,...,2025-07-29T18:07:49.042365,{},"[{'key': 'name', 'value': 'RunnableLambda', 'q...",0.0,0.0,0.003713,0.002704,0.001009,0.000342,0.000643
multi_query,210,0.267,10.06614,,,109729,105427,4302,0,0,...,2025-07-29T18:08:26.420350,{},"[{'key': 'name', 'value': 'RunnableLambda', 'q...",0.0,0.0,0.01163,0.009909,0.001721,0.000288,0.001536
parent_document,140,0.111,11.36261,,,43181,40126,3055,0,0,...,2025-07-29T18:08:05.123571,{},"[{'key': 'name', 'value': 'RunnableLambda', 'q...",0.0,0.0,0.005235,0.004013,0.001222,0.000406,0.000912
ensemble,270,0.311,24.77662,,,161680,157113,4567,0,0,...,2025-07-29T18:08:55.882714,{},"[{'key': 'name', 'value': 'VectorStoreRetrieve...",0.0,0.0,0.017538,0.015711,0.001827,0.000487,0.002717
semantic,140,0.089,9.86083,,,67636,64023,3613,0,0,...,2025-07-29T18:07:50.507913,{},"[{'key': 'name', 'value': 'RunnableLambda', 'q...",0.0,0.0,0.007848,0.006402,0.001445,0.000821,0.000982


In [71]:
score_aggs_combined = pd.concat([score_aggs, -langsmith_df[['latency_p50', 'latency_p99', 'cost_p50', 'cost_p99']]], axis=1)
score_aggs_combined

Unnamed: 0,context_recall,faithfulness,factual_correctness,answer_relevancy,latency_p50,latency_p99,cost_p50,cost_p99
bm25,0.633333,0.640254,0.562222,0.749486,-0.004,-5.53122,-0.000536,-0.000795
contextual_compression,0.683333,0.768561,0.612857,0.842251,-0.2925,-8.65653,-0.000342,-0.000643
ensemble,0.966667,0.858049,0.455,0.665007,-0.311,-24.77662,-0.000487,-0.002717
multi_query,1.0,0.900093,0.616667,0.735416,-0.267,-10.06614,-0.000288,-0.001536
naive,0.833333,0.731857,0.5075,0.824444,-0.097,-5.97622,-0.000856,-0.00128
parent_document,0.833333,0.787431,0.475,0.742986,-0.111,-11.36261,-0.000406,-0.000912
semantic,1.0,0.777018,0.578571,0.663101,-0.089,-9.86083,-0.000821,-0.000982


In [75]:
score_aggs_combined.rank(ascending=False)

Unnamed: 0,context_recall,faithfulness,factual_correctness,answer_relevancy,latency_p50,latency_p99,cost_p50,cost_p99
bm25,7.0,7.0,4.0,3.0,1.0,1.0,5.0,2.0
contextual_compression,6.0,5.0,2.0,1.0,6.0,3.0,2.0,1.0
ensemble,3.0,2.0,7.0,6.0,7.0,7.0,4.0,7.0
multi_query,1.5,1.0,1.0,5.0,5.0,5.0,1.0,6.0
naive,4.5,6.0,5.0,2.0,3.0,2.0,7.0,5.0
parent_document,4.5,3.0,6.0,4.0,4.0,6.0,3.0,3.0
semantic,1.5,4.0,3.0,7.0,2.0,4.0,6.0,4.0


In [None]:
# aggregated rankings for evals only
score_aggs.rank(ascending=False).mean(axis=1).sort_values(ascending=True)

experiment_name
multi_query               2.125
contextual_compression    3.500
semantic                  3.875
naive                     4.375
parent_document           4.375
ensemble                  4.500
bm25                      5.250
dtype: float64

In [77]:
# aggregated rankings for cost/latency only
(-langsmith_df[['latency_p50', 'latency_p99', 'cost_p50', 'cost_p99']]).rank(ascending=False).mean(axis=1).sort_values(ascending=True)

bm25                      2.25
contextual_compression    3.00
parent_document           4.00
semantic                  4.00
naive                     4.25
multi_query               4.25
ensemble                  6.25
dtype: float64

In [None]:
# aggregated rankings for both evals and cost/latency
score_aggs_combined.rank(ascending=False).mean(axis=1).sort_values(ascending=True)

multi_query               3.1875
contextual_compression    3.2500
bm25                      3.7500
semantic                  3.9375
parent_document           4.1875
naive                     4.3125
ensemble                  5.3750
dtype: float64

# Conclusion

The winner of my experiments is `multi_query`, which provides the optimal balance of quality, latency, and cost. The cell below provides a complete picture of how each retriever ranks compared to all other retrievers on 8 different dimensions.

The rankings largely meet expectations. For example, bm25 is fast but has the worst accuracy. The ensemble is slow and expensive, but did not produce enough accuracy boost to justify.

In [78]:
score_aggs_combined.rank(ascending=False)

Unnamed: 0,context_recall,faithfulness,factual_correctness,answer_relevancy,latency_p50,latency_p99,cost_p50,cost_p99
bm25,7.0,7.0,4.0,3.0,1.0,1.0,5.0,2.0
contextual_compression,6.0,5.0,2.0,1.0,6.0,3.0,2.0,1.0
ensemble,3.0,2.0,7.0,6.0,7.0,7.0,4.0,7.0
multi_query,1.5,1.0,1.0,5.0,5.0,5.0,1.0,6.0
naive,4.5,6.0,5.0,2.0,3.0,2.0,7.0,5.0
parent_document,4.5,3.0,6.0,4.0,4.0,6.0,3.0,3.0
semantic,1.5,4.0,3.0,7.0,2.0,4.0,6.0,4.0
