# How to reorder retrieved results to mitigate the "lost in the middle" effect

To address the issue known as the "lost in the middle" effect in Retrieval-Augmented Generation (RAG) applications, a strategy can be applied to reorder documents. This effect happens when models miss important information because it's positioned in the middle of a long list of documents, which can result in a drop in performance, especially when dealing with many retrieved documents (e.g., more than ten).

# Here's a simplified explanation of the solution:

* Relevance Sorting Issue: 

Typically, when documents are retrieved from a vector store (like a database of documents), they are ordered by relevance to a query—most relevant first and least relevant last. However, when these documents are used as input for models like LLMs, there's a risk that information in the middle gets overlooked.

* Reordering for Better Context:

 To help LLMs better capture the most important information, you can rearrange the documents. The idea is to place the most relevant documents at the beginning and the end, while the less relevant documents go in the middle. This way, key information is more likely to be noticed.

# Implementation Example:

You start by embedding a set of documents using a tool like OpenAI embeddings.
These documents are stored in a vector store that allows you to retrieve them by relevance to a query.
After retrieval, you use a specific tool called LongContextReorder to rearrange the documents, positioning the most relevant ones at the extremes (beginning and end).

# 1. Step-by-Step Code Explanation
Embed Documents: You create a set of documents (e.g., some sentences about basketball and the Boston Celtics). These documents are converted into embeddings (numerical vectors) using OpenAI's embedding tool.

* InMemoryVectorStore: A vector store is a database that stores vectors, which are numerical representations of text. InMemoryVectorStore is a type of vector store that keeps all the stored vectors in memory, making it fast to use for testing or small-scale tasks.

* OpenAIEmbeddings: This is a tool that generates embeddings (numerical representations) of text using an OpenAI model. These embeddings allow you to measure how similar different pieces of text are by comparing their vectors.

* Here, embeddings is an instance of the OpenAIEmbeddings class, which will be used to generate vector representations of the text data.

* The texts list contains a collection of sentences. Each sentence will be converted into a numerical vector using OpenAIEmbeddings. These vectors represent the meaning of each sentence in a way that can be compared mathematically.

* InMemoryVectorStore.from_texts: This line creates an in-memory vector store from the given list of texts. Each text is embedded using OpenAIEmbeddings, resulting in a set of vectors stored in memory.
   * * from_texts: A class method that takes a list of texts and an embedding method to create the vector store.
   * * embedding=embeddings: Specifies that the embedding model (in this case, OpenAIEmbeddings) should be used to convert the texts into vectors.
* .as_retriever(): Converts the vector store into a retriever. A retriever is responsible for finding the most relevant documents (texts) based on a similarity search.
  * * search_kwargs={"k": 10}: This argument specifies that you want to retrieve the top 10 most relevant documents for a given query.

* query: This is the user's question or search input. In this case, the query asks for information about the "Celtics."
* retriever.invoke(query): This method takes the query and:
    * * Converts the query into an embedding using the same OpenAIEmbeddings method, resulting in a vector that represents the query.
    * * Compares the query vector with the vectors of all the stored documents using a similarity metric (like cosine similarity).
    * * Returns the top k (in this case, 10) most relevant documents that are closest to the query vector in terms of similarity.

* Output (docs)

docs will be a list of the top 10 documents from the texts list, sorted by their relevance to the query, "What can you tell me about the Celtics?"
Each document in docs will likely be accompanied by additional metadata, such as the relevance score or any other details tracked during retrieval.

In [7]:
# Importing necessary libraries:
# InMemoryVectorStore is a vector store that keeps all vectors in memory.
# OpenAIEmbeddings is used to generate vector embeddings of text.
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_openai import OpenAIEmbeddings

# Initialize the embeddings model using OpenAI.
# This model will convert each text into a numerical vector representation.
embeddings = OpenAIEmbeddings()

# Create a list of texts/documents that we want to store and retrieve.
texts = [
    "Basketball is a great sport.",
    # A generic statement about basketball.
    
    "Larry Bird was an iconic NBA player.",
    # A specific statement about Larry Bird, a famous NBA player.
]

# Create an in-memory vector store from the list of texts:
# - Converts each text into a vector using OpenAIEmbeddings.
# - Stores these vectors in memory for quick access.
# - The vector store is then turned into a 'retriever' that can find documents.
retriever = InMemoryVectorStore.from_texts(texts, embedding=embeddings).as_retriever(
    search_kwargs={"k": 10}  # Set to retrieve the top 10 most relevant documents.
)

# Define the query for the search:
# - This is the question or input we're interested in.
# - We're asking about "the Celtics," a famous basketball team.
query = "What can you tell me about the Celtics?"

# Use the retriever to find documents relevant to the query:
# - The query is also converted to a vector using OpenAIEmbeddings.
# - The retriever compares the query vector with the stored vectors.
# - It retrieves the top 10 documents that are most similar (relevant) to the query.
docs = retriever.invoke(query)

# At this point, 'docs' will contain a list of documents from the 'texts' list,
# sorted by their relevance to the query about the Celtics.


# 2.Document Retrieval: 
The retriever returns documents sorted by relevance to your query (in this case, about the Celtics).

Before Reordering: Most relevant documents appear first.

# 3 Reorder Documents:

Use LongContextReorder to rearrange the documents so that the most relevant documents are at the start and end.

In [8]:
# Import the LongContextReorder transformer from langchain_community:
# This tool helps to reorder retrieved documents to mitigate the "lost in the middle" effect.
# It moves the most relevant documents to the beginning and end of the context.
from langchain_community.document_transformers import LongContextReorder

# Initialize the LongContextReorder transformer.
# This will be used to reorder the retrieved documents.
reordering = LongContextReorder()

# Apply the reordering to the list of retrieved documents (docs):
# - The transformer takes the list of documents and reorders them.
# - The most relevant documents will be moved to the beginning and end of the list.
# - This reordering helps improve the chances of LLMs capturing key information
#   because they often focus more on the start and end of the context.
reordered_docs = reordering.transform_documents(docs)

# After this step, 'reordered_docs' will contain the documents with adjusted order.
# More relevant documents will be positioned at the extremes (start and end),
# while less relevant ones will be placed in the middle.


# After Reordering:
 The key documents are now positioned at the extremes (first and last), while less important ones are in the middle.

# 4 .Use in a QA Chain:

The reordered documents are then passed to a simple question-answering (QA) chain.
A prompt template is used to format the documents and query for the language model.

In [9]:
# Import necessary classes for combining documents and handling prompts:
# - `create_stuff_documents_chain` is used to create a chain that takes documents and generates a response.
# - `PromptTemplate` helps format prompts with custom input.
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI

# Initialize the language model (LLM) using the OpenAI GPT-4o-mini model.
# This model will handle generating responses based on the provided context.
llm = ChatOpenAI(model="gpt-4o-mini")

# Define a template for the prompt:
# - This is the format for how the question will be asked to the LLM.
# - It includes placeholders `{context}` for the documents' content
#   and `{query}` for the user's question.
prompt_template = """
Given these texts:
-----
{context}
-----
Please answer the following question:
{query}
"""

# Initialize the PromptTemplate:
# - `template` contains the prompt text with placeholders.
# - `input_variables` specifies which variables will be filled in (`context` and `query`).
prompt = PromptTemplate(
    template=prompt_template,
    input_variables=["context", "query"],
)

# Create a question-answering (QA) chain:
# - The chain will use the LLM to generate an answer based on the provided documents.
# - The documents will be passed in through the `context` variable, and the question via the `query`.
chain = create_stuff_documents_chain(llm, prompt)

# Run the QA chain:
# - Invoke the chain with a dictionary containing the `context` (reordered documents)
#   and the `query` (the question we're asking about the Celtics).
response = chain.invoke({"context": reordered_docs, "query": query})

# Print the response generated by the LLM based on the context and query.
print(response)


The Boston Celtics are a professional basketball team based in Boston, Massachusetts. They are one of the most successful teams in NBA history, known for their rich legacy and numerous championships. The Celtics were founded in 1946 and are a part of the Eastern Conference. Larry Bird, mentioned in your texts, is one of the most famous players to have ever played for the Celtics, contributing significantly to their success during the 1980s and helping them win three NBA championships. The team is also recognized for its strong fan base and its rivalry with teams like the Los Angeles Lakers.


# Key Takeaways
* "Lost in the Middle" Effect: LLMs might overlook documents in the middle of a long context, so placing important ones at the extremes helps.
* Reordering Strategy: Use tools like LongContextReorder to rearrange retrieved documents, keeping relevant info at the beginning and end.
* Practical Benefit: This can improve the model's ability to pick up key details, enhancing performance in RAG applications.