# Long Context Reorder

https://python.langchain.com/v0.1/docs/modules/data_connection/retrievers/long_context_reorder/
https://python.langchain.com/v0.2/docs/how_to/long_context_reorder/

#### Research paper : Lost in the Middle. 
How Language Models Use Long Contexts.

https://arxiv.org/pdf/2307.03172



## Import packages

In [1]:
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_community.document_transformers import LongContextReorder

## 1. Setup test corpus

In [2]:
# First 4 rows are related to RAG=Retrieval Augmented Generation
# Rest of the rows are related to rags of clothing
data = [
    "RAG retrieves relevant documents to inform generation.",
    "RAG is used in open-domain question answering.",
    "The retrieval component in RAG finds pertinent information quickly.",
    "RAG can handle complex queries with more precision.",

    "Old t-shirts make great cleaning rags.",
    "Rags are perfect for dusting furniture.",
    "Use rags to clean up spills quickly.",
    "Cut up old towels for durable rags.",
    "Rags can be reused multiple times.",
    "Keep rags handy in the kitchen for quick cleanups.",
    "Rags made from cotton are highly absorbent.",
    "Store rags in a bucket for easy access.",
    "Rags are useful for polishing shoes.",
    "Recycle old clothes into rags instead of throwing them away."
]


## 2. Create the Vector Store

In [3]:
# Create instance of ChromaDB and add the documents
embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

# Create a vector store from 
vector_store= Chroma.from_texts(data, embedding=embedding_function)
vector_store_retriever = vector_store.as_retriever(search_kwargs={"k": 10})

## 3. Create Long Context Reorderer

In [4]:

# Create the long context reorder
reordering_retriever = LongContextReorder()


## 4. Test: Result with & without reordering

In [5]:
# Print the context 
def dump_results(docs):
    for doc in docs:
        print(doc.page_content)

In [6]:
# Test question = input
question = "what is gpt and rag"

# Get the similar questions from vector store
print("Original ordering of context")
print("----------------------------")
vector_store_retriever_results = vector_store_retriever.invoke(input=question)
dump_results(vector_store_retriever_results)

print("\nRe-ordered context")
print("-------------------")
# Reorder the documents
reordered_docs = reordering_retriever.transform_documents(vector_store_retriever_results)
dump_results(reordered_docs)

Original ordering of context
----------------------------
RAG is used in open-domain question answering.
The retrieval component in RAG finds pertinent information quickly.
RAG can handle complex queries with more precision.
Rags are useful for polishing shoes.
Rags made from cotton are highly absorbent.
Rags are perfect for dusting furniture.
RAG retrieves relevant documents to inform generation.
Rags can be reused multiple times.
Keep rags handy in the kitchen for quick cleanups.
Store rags in a bucket for easy access.

Re-ordered context
-------------------
The retrieval component in RAG finds pertinent information quickly.
Rags are useful for polishing shoes.
Rags are perfect for dusting furniture.
Rags can be reused multiple times.
Store rags in a bucket for easy access.
Keep rags handy in the kitchen for quick cleanups.
RAG retrieves relevant documents to inform generation.
Rags made from cotton are highly absorbent.
RAG can handle complex queries with more precision.
RAG is used