# Long Context Reorder

- Author: [Minji](https://github.com/r14minji)
- Design: 
- Peer Review: 
- This is a part of [LangChain OpenTutorial](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial)

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/02-Prompt/02-FewShotPromptTemplate.ipynb) [![Open in GitHub](https://img.shields.io/badge/Open%20in%20GitHub-181717?style=flat-square&logo=github&logoColor=white)](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/02-Prompt/02-FewShotPromptTemplate.ipynb)


## Overview

Regardless of the model's architecture, performance significantly degrades when including more than 10 retrieved documents.

Simply put, when the model needs to access relevant information in the middle of a long context, it tends to ignore the provided documents.

For more details, please refer to the following paper:

- https://arxiv.org/abs/2307.03172

To avoid this issue, you can prevent performance degradation by reordering documents after retrieval.

Create a retriever that can store and search text data using the Chroma vector store.
Use the retriever's invoke method to search for highly relevant documents for a given query.


### Table of Contents

- [Overview](#overview)
- [Environment Setup](#environment-setup)
- [Create an instance of the LongContextReorder class named reordering](#create-an-instance-of-the-longcontextreorder-class-named-reordering)
- [Creating Question-Answering Chain with Context Reordering](#creating-question-answering-chain-width-context-reordering)
- [FewShotChatMessagePromptTemplate](#FewShotChatMessagePromptTemplate)


---


## Environment Setup

In [17]:
# Configuration file for managing API keys as environment variables
from dotenv import load_dotenv

# Load API key information
load_dotenv(override=True)

True

In [18]:

from langchain_opentutorial import package

package.install(
    [
       "langsmith",
        "langchain",
        "langchain_openai",
        "langchain_community",
        "langchain-chroma",
    ],
    verbose=False,
    upgrade=False,
)

In [19]:
from langchain_opentutorial import set_env

set_env(
    {
        "OPENAI_API_KEY": "",
        "LANGCHAIN_API_KEY": "",
        "LANGCHAIN_TRACING_V2": "true",
        "LANGCHAIN_ENDPOINT": "https://api.smith.langchain.com",
        "LANGCHAIN_PROJECT": "04-LongContextReorder",
    }
)

Environment variables have been set successfully.


## Create an instance of the LongContextReorder class named reordering.

### Enter a query for the retriever to perform the search.

In [None]:
from langchain_core.prompts import PromptTemplate
from langchain_community.document_transformers import LongContextReorder
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings


# Get embeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

texts = [
    "This is just a random text I wrote.",
    "ChatGPT, an AI designed to converse with users, can answer various questions.",
    "iPhone, iPad, MacBook are representative products released by Apple.",
    "ChatGPT was developed by OpenAI and is continuously being improved.",
    "ChatGPT has learned from vast amounts of data to understand user questions and generate appropriate answers.",
    "Wearable devices like Apple Watch and AirPods are also part of Apple's popular product line.",
    "ChatGPT can be used to solve complex problems or suggest creative ideas.",
    "Bitcoin is also called digital gold and is gaining popularity as a store of value.",
    "ChatGPT's capabilities are continuously evolving through ongoing learning and updates.",
    "The FIFA World Cup is held every four years and is the biggest event in international football.",
]



# Create a retriever (Set K to 10)
retriever = Chroma.from_texts(texts, embedding=embeddings).as_retriever(
    search_kwargs={"k": 10}
)

In [21]:
query = "What can you tell me about ChatGPT?"

# Retrieves relevant documents sorted by relevance score.
docs = retriever.invoke(query)
docs



[Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),
 Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),
 Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),
 Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),
 Document(metadata={}, page_content='ChatGPT can be used to solve complex problems or suggest creative ideas.'),
 Document(metadata={}, page_content='ChatGPT can be used to solve complex problems or suggest creative ideas.'),
 Document(metadata={}, page_content='ChatGPT has learned from vast amounts of data to understand user questions and generate appropriate answers.'),
 Document(metadata={}, page_content='ChatGPT has learned from vast amounts of data to understand user questions and generate appropriate answers.'),
 Document(metadata={}, p

Failed to multipart ingest runs: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"detail":"Invalid token"}')trace=81208389-474c-4996-935f-25dc262e9181,id=81208389-474c-4996-935f-25dc262e9181
Failed to multipart ingest runs: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"detail":"Invalid token"}')trace=81208389-474c-4996-935f-25dc262e9181,id=81208389-474c-4996-935f-25dc262e9181
Failed to multipart ingest runs: langsmith.utils.LangSmithAuthError: Authentication failed for https://api.smith.langchain.com/runs/multipart. HTTPError('401 Client Error: Unauthorized for url: https://api.smith.langchain.com/runs/multipart', '{"detail":"Invalid token"}')trace=5dc61347-562

### Create an instance of LongContextReorder class.

- Call reordering.transform_documents(docs) to reorder the document list.
- Less relevant documents are positioned in the middle of the list, while more relevant documents are positioned at the beginning and end.


In [22]:
# Reorder the documents
# Less relevant documents are positioned in the middle, more relevant elements at start/end
reordering = LongContextReorder()
reordered_docs = reordering.transform_documents(docs)

# Verify that 4 relevant documents are positioned at start and end
reordered_docs

[Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),
 Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),
 Document(metadata={}, page_content='ChatGPT can be used to solve complex problems or suggest creative ideas.'),
 Document(metadata={}, page_content='ChatGPT has learned from vast amounts of data to understand user questions and generate appropriate answers.'),
 Document(metadata={}, page_content="ChatGPT's capabilities are continuously evolving through ongoing learning and updates."),
 Document(metadata={}, page_content="ChatGPT's capabilities are continuously evolving through ongoing learning and updates."),
 Document(metadata={}, page_content='ChatGPT has learned from vast amounts of data to understand user questions and generate appropriate answers.'),
 Document(metadata={}, page_content='ChatGPT can be used to solve complex problems or suggest creative ideas.

## Creating Question-Answering Chain with Context Reordering

In [23]:
def format_docs(docs):
    return "\n".join([doc.page_content for i, doc in enumerate(docs)])

In [24]:
print(format_docs(docs))

ChatGPT was developed by OpenAI and is continuously being improved.
ChatGPT was developed by OpenAI and is continuously being improved.
ChatGPT, an AI designed to converse with users, can answer various questions.
ChatGPT, an AI designed to converse with users, can answer various questions.
ChatGPT can be used to solve complex problems or suggest creative ideas.
ChatGPT can be used to solve complex problems or suggest creative ideas.
ChatGPT has learned from vast amounts of data to understand user questions and generate appropriate answers.
ChatGPT has learned from vast amounts of data to understand user questions and generate appropriate answers.
ChatGPT's capabilities are continuously evolving through ongoing learning and updates.
ChatGPT's capabilities are continuously evolving through ongoing learning and updates.


In [25]:
def format_docs(docs):
    return "\n".join(
        [
            f"[{i}] {doc.page_content} [source: teddylee777@gmail.com]"
            for i, doc in enumerate(docs)
        ]
    )


def reorder_documents(docs):
    # Reorder
    reordering = LongContextReorder()
    reordered_docs = reordering.transform_documents(docs)
    combined = format_docs(reordered_docs)
    print(combined)
    return combined

Prints the reordered documents.

In [26]:
# Define prompt template
_ = reorder_documents(docs)

[0] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]
[1] ChatGPT, an AI designed to converse with users, can answer various questions. [source: teddylee777@gmail.com]
[2] ChatGPT can be used to solve complex problems or suggest creative ideas. [source: teddylee777@gmail.com]
[3] ChatGPT has learned from vast amounts of data to understand user questions and generate appropriate answers. [source: teddylee777@gmail.com]
[4] ChatGPT's capabilities are continuously evolving through ongoing learning and updates. [source: teddylee777@gmail.com]
[5] ChatGPT's capabilities are continuously evolving through ongoing learning and updates. [source: teddylee777@gmail.com]
[6] ChatGPT has learned from vast amounts of data to understand user questions and generate appropriate answers. [source: teddylee777@gmail.com]
[7] ChatGPT can be used to solve complex problems or suggest creative ideas. [source: teddylee777@gmail.com]
[8] ChatGPT, an AI designed 

In [28]:
from langchain.prompts import ChatPromptTemplate
from operator import itemgetter
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableLambda

# Define prompt template
template = """Given this text extracts:
{context}

-----
Please answer the following question:
{question}

Answer in the following languages: {language}
"""

# Define prompt
prompt = ChatPromptTemplate.from_template(template)

# Define Chain
chain = (
    {
        "context": itemgetter("question")
        | retriever
        | RunnableLambda(reorder_documents),  # Search context based on question
        "question": itemgetter("question"),  # Extract question
        "language": itemgetter("language"),  # Extract answer language
    }
    | prompt  # Pass values to prompt template
    | ChatOpenAI(model="gpt-4o-mini")  # Pass prompt to language model
    | StrOutputParser()  # Parse model output as string
)


Enter the query in question and language for response.

Check the search results of reordered documents.

In [29]:
answer = chain.invoke(
    {"question": "ChatGPT에 대해 무엇을 말해줄 수 있나요?", "language": "English"}
)

[0] ChatGPT's capabilities are continuously evolving through ongoing learning and updates. [source: teddylee777@gmail.com]
[1] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]
[2] ChatGPT can be used to solve complex problems or suggest creative ideas. [source: teddylee777@gmail.com]
[3] ChatGPT has learned from vast amounts of data to understand user questions and generate appropriate answers. [source: teddylee777@gmail.com]
[4] ChatGPT, an AI designed to converse with users, can answer various questions. [source: teddylee777@gmail.com]
[5] ChatGPT, an AI designed to converse with users, can answer various questions. [source: teddylee777@gmail.com]
[6] ChatGPT has learned from vast amounts of data to understand user questions and generate appropriate answers. [source: teddylee777@gmail.com]
[7] ChatGPT can be used to solve complex problems or suggest creative ideas. [source: teddylee777@gmail.com]
[8] ChatGPT was developed by OpenAI a

Prints the response.

In [30]:
print(answer)

ChatGPT is an AI developed by OpenAI that is designed to engage in conversations with users. It has been trained on vast amounts of data, allowing it to understand user questions and generate appropriate responses. Its capabilities are continuously evolving through ongoing learning and updates, which means it is regularly being improved to enhance its performance. ChatGPT can be used to solve complex problems, suggest creative ideas, and answer a wide variety of questions.
