# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- ü§ù Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- ü§ù Breakout Room Part #2
  - Activity: Evaluate with Ragas

# ü§ù Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

We'll also provide our OpenAI key, as well as our Cohere API key.

In [1]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

In [2]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

## Task 2: Data Collection and Preparation

We'll be using our Loan Data once again - this time the strutured data available through the CSV!

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

In [3]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

loader = CSVLoader(
    file_path=f"./data/complaints.csv",
    metadata_columns=[
      "Date received", 
      "Product", 
      "Sub-product", 
      "Issue", 
      "Sub-issue", 
      "Consumer complaint narrative", 
      "Company public response", 
      "Company", 
      "State", 
      "ZIP code", 
      "Tags", 
      "Consumer consent provided?", 
      "Submitted via", 
      "Date sent to company", 
      "Company response to consumer", 
      "Timely response?", 
      "Consumer disputed?", 
      "Complaint ID"
    ]
)

loan_complaint_data = loader.load()

for doc in loan_complaint_data:
    doc.page_content = doc.metadata["Consumer complaint narrative"]

Let's look at an example document to see if everything worked as expected!

In [4]:
loan_complaint_data[0]

Document(metadata={'source': './data/complaints.csv', 'row': 0, 'Date received': '03/27/25', 'Product': 'Student loan', 'Sub-product': 'Federal student loan servicing', 'Issue': 'Dealing with your lender or servicer', 'Sub-issue': 'Trouble with how payments are being handled', 'Consumer complaint narrative': "The federal student loan COVID-19 forbearance program ended in XX/XX/XXXX. However, payments were not re-amortized on my federal student loans currently serviced by Nelnet until very recently. The new payment amount that is effective starting with the XX/XX/XXXX payment will nearly double my payment from {$180.00} per month to {$360.00} per month. I'm fortunate that my current financial position allows me to be able to handle the increased payment amount, but I am sure there are likely many borrowers who are not in the same position. The re-amortization should have occurred once the forbearance ended to reduce the impact to borrowers.", 'Company public response': 'None', 'Company'

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "LoanComplaints".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [5]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    loan_complaint_data,
    embeddings,
    location=":memory:",
    collection_name="LoanComplaints"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [6]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [7]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [8]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-nano")

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [9]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [10]:
naive_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issues with loans, based on the complaints provided, appear to be related to mismanagement and errors in loan handling. Specifically, frequent issues include errors in loan balances, misapplied payments, wrongful denials of payment plans, incorrect or outdated information on credit reports, problems with loan transfers without proper notification, and difficulties with repayment plans such as problematic forbearance.\n\nIn summary, a predominant issue is **mismanagement of student loans**, including incorrect information about balances, interest, and payment application, which often leads to credit damage and financial hardship for borrowers.'

In [11]:
naive_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Yes, based on the provided information, some complaints did not get handled in a timely manner. For example, the complaint submitted to MOHELA on 03/28/25 was marked as "Timely response? No," indicating it was not responded to within the expected timeframe. Additionally, a complaint involving Maximus Federal Services, Inc. with a request made over 18 months prior remained unresolved, which suggests delays in addressing some issues.'

In [12]:
naive_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily because of various financial hardships, lack of clear or timely communication from loan servicers, and the accumulation of interest that made repayment difficult. Many borrowers were not adequately informed about loan transfer processes, delinquency statuses, or repayment terms, which led to unexpected late payments and credit impacts. Additionally, factors such as stagnant wages, recession-related economic struggles, and unmanageable interest accrued over time contributed to their inability to repay loans. Some borrowers also experienced difficulties with payment plans not allowing direct application of extra payments toward principal, prolonging debt repayment. Overall, mismanagement, lack of transparency, and economic challenges made it difficult for many to successfully repay their student loans.'

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [13]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(loan_complaint_data, )

We'll construct the same chain - only changing the retriever.

In [14]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [15]:
bm25_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans appears to be problems related to the handling and servicing of loans by lenders or servicers. Specific issues include:\n\n- Dealing with lenders or servicers, such as disputes over fees, incorrect information about loan balances or terms, and poor communication.\n- Problems with repayment processes, such as difficulty applying additional funds to the principal or paying them off early.\n- Receiving inaccurate or bad information about loans, including issues about the validity of schools attended and the validity of degrees.\n\nOverall, the most prevalent issue seems to center around the quality of service, transparency, and accuracy of information provided by loan servicers.'

In [16]:
bm25_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided data, all the complaints listed mention that the companies responded with a "Closed with explanation" response and are marked as "Yes" for timely response. There is no indication of any complaints that were not handled in a timely manner.'

In [17]:
bm25_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for various reasons, including issues with loan servicing practices, miscommunication or lack of communication from loan companies, errors or problems with payment plans such as forbearances, and technical or administrative failures. Some specific reasons highlighted in the complaints include being steered into incorrect forbearance types and not receiving proper updates about their loan status, automatic payments being unenrolled without notice, delays or failures in responding to requests for deferment or forbearance, and billing errors. These issues often led to late payments, negative impacts on credit scores, or continued bills despite temporary inability to pay.'

It's not clear that this is better or worse, if only we had a way to test this (SPOILERS: We do, the second half of the notebook will cover this)

### Comparison

In [18]:
naive_retrieval_chain.invoke({"question" : "What is the state of the loan with Complaint ID 12686613?"})["response"].content

'The state of the loan with Complaint ID 12686613 is not explicitly mentioned in the provided context. However, based on the detailed complaints and responses, this specific Complaint ID does not appear in the provided sections. The context includes multiple complaints with various IDs, but not 12686613.\n\nTherefore, I do not know the state of the loan with Complaint ID 12686613.'

In [19]:
bm25_retrieval_chain.invoke({"question" : "What is the state of the loan with Complaint ID 12686613?"})["response"].content

'The context provided does not include specific information about the current status or "state" of the loan associated with Complaint ID 12686613. Therefore, I do not know the exact state of that loan.'

<div style="background-color: #204B8E; color: white; padding: 10px; border-radius: 5px;">

#### ‚ùì Question #1:

Give an example query where BM25 is better than embeddings and justify your answer.

</div>

<div style="background-color: #204B8E; color: white; padding: 10px; border-radius: 5px;">
### Answer:

- "What is the state of the loan with Complaint ID 12686613?"
- When exact keywords are needed, BM25 would be much better like in the example prompts above, BM25 still retrieved the context and gave the output needed.

</div>

## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [20]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-v3.5")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [21]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [22]:
contextual_compression_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans appears to be dealing with the lender or servicer, specifically regarding errors or misconduct. This includes issues such as errors in loan balances, misapplied payments, wrongful denials of payment plans, lack of communication, incorrect or incomplete information, and mishandling of loan data. Many complaints also highlight disputes over account details, unauthorized transfers, privacy violations, and inadequate documentation.'

In [23]:
contextual_compression_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided context, it appears that there are complaints where respondents have acknowledged delays or unresolved issues, such as the complaint involving the nearly 18-month delay in response regarding loan account review and violations of FERPA, as well as delays in resolving payment application issues. Additionally, the complaint about the main issue not being addressed and the ongoing issue over 2-3 weeks suggests some complaints may not have been handled promptly.\n\nHowever, there is also an indication that responses from the companies were marked as "Timely" and "Closed with explanation," which suggests that those particular complaints were addressed within a reasonable timeframe.\n\nIn summary, yes, some complaints did not get handled in a timely manner, specifically the case involving the 18-month delay and the unresolved issues over multiple months.'

In [24]:
contextual_compression_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily due to a lack of clear communication, misinformation, and the complexities of loan handling processes. Many borrowers were unaware of their repayment obligations, did not receive proper notifications about payment requirements, or faced difficulties with their loan servicers in accessing accurate account information. Additionally, the options provided, such as forbearance or deferment, often led to accruing interest, which increased the total amount owed over time. This situation was compounded by problems like unexplained interest accumulation, mismatched account balances, and the inability to effectively manage or understand their loan terms, all of which made repayment financially burdensome and, in some cases, seemingly impossible.'

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [25]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [26]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [27]:
multi_query_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided complaints and data, the most common issues with student loans appear to be:\n\n1. **Mismanagement and errors by loan servicers**, including incorrect balances, misapplied payments, and wrongful denials of repayment plans.\n2. **Inaccurate or misleading information about loan balances, interest accrual, and loan status.**\n3. **Problems with loan handling during transfers between servicers, including lack of proper notifications and record discrepancies.**\n4. **Difficulty in making payments, applying extra funds, or paying off loans early due to servicer restrictions or predatory practices.**\n5. **Lack of transparency and poor communication from loan servicers about loan terms, status, and repayment options.**\n6. **Issues with loan forgiveness, cancellation, or discharge, especially related to mismanagement or misinformation.**\n7. **Problems related to loan consolidation, including lack of information, unexpected payment amounts, and improper handling.**\n\nW

In [28]:
multi_query_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints data, yes, some complaints did not get handled in a timely manner. Specifically, there are multiple instances where the "Timely response?" field is marked as "No," indicating delays. For example:\n\n- Complaint ID 12973003 (complaint received 04/14/25) was responded to within the required timeframe ("Yes").\n- However, complaint ID 12739706 (complaint received 04/01/25) was marked as "No," showing it was not handled in time.\n- Complaint ID 12668396 (complaint received 03/26/25) was also marked "No."\n- Complaint ID 13062402 (complaint received 04/18/25) was marked "Yes."\n- Multiple other complaints, such as ID 12654977, indicated delays or failure to respond in a timely manner.\n\nAdditionally, several complaints explicitly mention ongoing delays, unresponsiveness, or failure to resolve issues over extended periods, sometimes exceeding a year or more, which further confirms that not all complaints were handled promptly.\n\nIn summary, yes, there were

In [29]:
multi_query_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans for various reasons, including:\n\n- Mismanagement and errors by loan servicers, such as misapplied payments, errors in loan balances, and wrongful denials of payment plans.\n- Lack of proper communication or notification about their loan status, delinquency, or repayment obligations, sometimes due to outdated contact information.\n- Being misled or not adequately informed about repayment options, such as income-driven repayment plans, rehabilitation, or forgiveness programs, leading to unintentional default or increased balances.\n- Being steered into long-term forbearances or deferments without understanding the long-term consequences, such as interest capitalization and increased total debt.\n- Errors in reporting, including inaccurate account status or delinquency reports to credit bureaus, which can harm credit scores and creditworthiness.\n- Systemic issues like improper handling of loan transfer, unverified or missing loan histories, and mi

<div style="background-color: #204B8E; color: white; padding: 10px; border-radius: 5px;">

#### ‚ùì Question #2:

Explain how generating multiple reformulations of a user query can improve recall.

</div>

<div style="background-color: #204B8E; color: white; padding: 10px; border-radius: 5px;">

### Answer:

- Multiple reformulations can extract a wider context with each query retrieving different sets of context which will be consolidated.

</div>

## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [30]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = loan_complaint_data
child_splitter = RecursiveCharacterTextSplitter(chunk_size=750)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [31]:
from langchain_qdrant import QdrantVectorStore

client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = QdrantVectorStore(
    collection_name="full_documents", embedding=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [32]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [33]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [34]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [35]:
parent_document_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans appears to be related to problems with loan servicing, including errors in loan balances, misapplied payments, wrongful denials of payment plans, and misconduct by loan servicers. Several complaints highlight issues such as incorrect information on credit reports, unnecessary interest rate increases, and difficulties in managing and verifying loan details.'

In [36]:
parent_document_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided context, all the complaints mentioned indicate that responses were not handled in a timely manner. Specifically, the complaints with Complaint IDs 12709087 and 12935889 explicitly state "No" for the prompt response, and the issue described involves delays and lack of communication from the companies. Therefore, yes, some complaints did not get handled in a timely manner.'

In [37]:
parent_document_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans for various reasons, including:\n\n- Lack of proper communication or notification from loan servicers about payment obligations, due dates, or changes in loan management.\n- Financial hardship due to unforeseen circumstances, such as severe financial problems, inability to secure employment, or other personal hardships.\n- Misrepresentation or misinformation about the value and manageability of the education received, leading to unanticipated financial burdens.\n- Issues with the legitimacy or verification of the debt, which can hinder repayment efforts.\n- Long-term consequences of taking out loans, such as increased interest from deferred or forborne payments, or the inability to afford the payments after graduation.\n- Structural issues with loan servicing, including failure to offer payment plan options or to notify borrowers about important changes like buyouts or account transfers.\n\nIn summary, a combination of communication failures, fina

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [38]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [39]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [40]:
ensemble_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided complaints and common issues described, the most common issues with loans, particularly student loans, appear to involve:\n\n- Errors or bad information about the loan balance, interest calculations, or account status (incorrect reporting, false delinquency, misclassified loan types)\n- Problems with how payments are being handled, including inability to apply payments to principal, undue interest capitalization, or misleading payment plans\n- Mismanagement during loan transfers or consolidations, often with lack of transparency or improper ending of deferments\n- Inaccurate or unfair reporting on credit reports, leading to credit score drops\n- Unauthorized or improper changes to loan terms, interest rates, or loan classification without proper notice or consent\n- Denial of eligible deferments or forgiveness, or mishandling of applications like PSLF or IDR\n- Deceptive practices related to promotional offers, or improper collection efforts\n\nWhile specific dat

In [41]:
ensemble_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, yes, there are multiple instances indicating complaints that were not handled in a timely manner. For example:\n\n- One complaint (Complaint ID: 12709087) was marked as "No" for timely response, with the response being "Closed with explanation."\n- Another complaint (Complaint ID: 12739706) was marked as "No" for timely response.\n- Several complaints, such as IDs 12935889, 13056764, 13365901, and others, received responses that were either delayed or marked "Closed with explanation," indicating delays or unresolved issues.\n\nOverall, evidence in the complaints suggests that some complaints did not get addressed promptly, with at least a few explicitly marked as "No" for timely response or showing significant delays, indicating that they were not handled in a timely manner.'

In [42]:
ensemble_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans primarily because they experienced financial hardships and lacked adequate guidance or options to manage their repayment effectively. Many borrowers were steered into forbearance or deferment without being informed about the long-term consequences like accumulated interest, which often made loans more difficult to pay off over time. Some faced issues due to mismanagement, lack of proper communication, or incorrect information from loan servicers, which led to missed payments, damage to their credit scores, and difficulty obtaining new loans or housing. Overall, systemic issues such as inadequate support, misleading practices, miscommunication, and unforeseen economic hardships contributed to borrowers' inability to repay their loans."

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [43]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [45]:
semantic_documents = semantic_chunker.split_documents(loan_complaint_data[:20])

Let's create a new vector store.

In [46]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="Loan_Complaint_Data_Semantic_Chunks"
)

We'll use naive retrieval for this example.

In [47]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [48]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [49]:
semantic_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the complaints provided, appears to be problems related to the handling and servicing of student loans. These include issues such as:\n\n- Difficulty with loan repayment and problems with forgiveness or discharge processes.\n- Improper or illegal credit reporting and collection efforts.\n- Lack of transparency and communication from loan servicers.\n- Errors in loan account status, default misinformation, or incorrect billing.\n- Problems following legal changes or government actions affecting loans.\n- Issues with data breaches and unauthorized access to personal information.\n- Errors in payment processing and auto-debit setups.\n- Disputes over loan balances, interest, and repayment plans.\n\nOverall, a key recurring theme is the mismanagement or mishandling by loan servicers, leading to borrower frustration, errors in reports, and legal concerns.'

In [50]:
semantic_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, it appears that many complaints have been responded to with a response marked as "Closed with explanation," indicating that although responses were given, the handling may not have been timely or satisfactory in resolving the issues fully. Specifically, the complaint regarding Nelnet (row 17) notes that despite multiple certified mail notices, there was no response to the written complaints, yet the response to the complaint was "Closed with explanation," suggesting some form of acknowledgment occurred.\n\nHowever, since the question asks whether any complaints did not get handled in a timely manner, the most direct evidence from the data shows that several complaints received timely responses (all marked as "Yes" for Timely response). Nonetheless, the complaint about Nelnet\'s failure to respond to formal complaints despite acknowledgment and multiple notices hints at a failure in handling in a timely or effective manner.\n\n**Summary:**  \nYes, ther

In [51]:
semantic_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans for various reasons, including:\n\n- Lack of proper communication and transparency from lenders or servicers, leading to confusion and stress.\n- Difficulties with providing or submitting required documentation, causing delays and stalled progress in loan forgiveness or discharge processes.\n- Errors or discrepancies in loan account information, such as incorrect account status or missing payments, which can result in default or negative credit reports.\n- Problems with payment processing, including payments not posting correctly despite being made, leading to unintentional defaults.\n- Legal or administrative issues, such as loans being reported as in default without proper verification, or disputes over the legitimacy of the debt, sometimes due to legal violations or data breaches.\n- Ultimately, a combination of mismanagement, inadequate support, and legal complications can hinder borrowers' ability to repay or resolve their loans successfully.

<div style="background-color: #204B8E; color: white; padding: 10px; border-radius: 5px;">

#### ‚ùì Question #3:

If sentences are short and highly repetitive (e.g., FAQs), how might semantic chunking behave, and how would you adjust the algorithm?

</div>

<div style="background-color: #204B8E; color: white; padding: 10px; border-radius: 5px;">

### Answer:

If short and highly repetitive, the cosine similarity might be too high across its embeddings. I would adjust the algorithm to use structure-aware chunking or using a hybrid approach.

</div>

# ü§ù Breakout Room Part #2

<div style="background-color: #204B8E; color: white; padding: 10px; border-radius: 5px;">

### üèóÔ∏è Activity #1

Your task is to evaluate the various Retriever methods against each other. 
You can use the loans or bills dataset.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

</div>

##### HINTS:

- LangSmith provides detailed information about latency and cost.

<div style="background-color: #204B8E; color: white; padding: 10px; border-radius: 5px;">

### Analysis & Observations:

</div>