# Perdido en el medio: El problema con los contextos largos

"Independientemente de la arquitectura de tu modelo, existe una degradación sustancial del rendimiento cuando incluyes más de 10 documentos recuperados. En resumen: Cuando los modelos deben acceder a información relevante en medio de contextos largos, tienden a ignorar los documentos proporcionados. Ver: https://arxiv.org/abs/2307.03172

Para evitar este problema, puedes reordenar los documentos después de recuperarlos para evitar la degradación del rendimiento."

Por: [Langchain](https://python.langchain.com/docs/modules/data_connection/document_transformers/post_retrieval/long_context_reorder)

![Lost in the Middle](./diagrams/slide_diagrama_05.png)

## Librerías

In [1]:
from operator import itemgetter

from dotenv import load_dotenv
from langchain.chat_models import ChatOpenAI
from langchain.document_transformers import LongContextReorder
from langchain.embeddings import OpenAIEmbeddings
from langchain.prompts import PromptTemplate
from langchain.schema import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma

from src.langchain_docs_loader import LangchainDocsLoader, num_tokens_from_string

load_dotenv()

False

## Carga de datos

In [2]:
import os
os.environ["OPENAI_API_KEY"]="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"

In [3]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=350,
    chunk_overlap=10,
    length_function=num_tokens_from_string,
)

documents = LangchainDocsLoader().load()
documents = text_splitter.split_documents(documents)

## Creación de retriever

In [4]:
retriever = Chroma.from_documents(documents[:100], embedding=OpenAIEmbeddings()).as_retriever(
    search_type="mmr",
    search_kwargs={
        "k": 10,
        "fetch_k": 50,
    },
)

In [5]:
retriever

VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain.vectorstores.chroma.Chroma object at 0x7f9dc437eb50>, search_type='mmr', search_kwargs={'k': 10, 'fetch_k': 50})

## Consulta con el retriever

In [6]:
relevant_docs = retriever.get_relevant_documents(
    "How to use LCEL ainvoke with a retriever?"
)
relevant_docs

[Document(page_content="# Cookbook\n\nExample code for accomplishing common tasks with the LangChain Expression Language (LCEL). These examples show how to compose different Runnable (the core LCEL interface) components to achieve various tasks. If you're just getting acquainted with LCEL, the [Prompt + LLM](/docs/expression_language/cookbook/prompt_llm_parser) page is a good place to start.", metadata={'description': "Example code for accomplishing common tasks with the LangChain Expression Language (LCEL). These examples show how to compose different Runnable (the core LCEL interface) components to achieve various tasks. If you're just getting acquainted with LCEL, the Prompt + LLM page is a good place to start.", 'language': 'en', 'source': 'https://python.langchain.com/docs/expression_language/cookbook/', 'title': 'Cookbook | 🦜️🔗 Langchain'}),
 Document(page_content='| Deprecated | Alternative | Reason |\n| ---- | ---- | ---- |\n| ChatVectorDBChain | ConversationalRetrievalChain | 

## Reordenado de documentos

In [7]:
reordering = LongContextReorder()
reordered_docs = list(reordering.transform_documents(relevant_docs))
reordered_docs

[Document(page_content='| Deprecated | Alternative | Reason |\n| ---- | ---- | ---- |\n| ChatVectorDBChain | ConversationalRetrievalChain | More general to all retrievers |\n| create_ernie_fn_chain | create_ernie_fn_runnable | Use LCEL under the hood |\n| created_structured_output_chain | create_structured_output_runnable | Use LCEL under the hood |\n| NatBotChain |  | Not used |\n| create_openai_fn_chain | create_openai_fn_runnable | Use LCEL under the hood |\n| create_structured_output_chain | create_structured_output_runnable | Use LCEL under the hood |\n| load_query_constructor_chain | load_query_constructor_runnable | Use LCEL under the hood |\n| VectorDBQA | RetrievalQA | More general to all retrievers |\n| Sequential Chain | LCEL | Obviated by LCEL |\n| SimpleSequentialChain | LCEL | Obviated by LCEL |\n| TransformChain | LCEL/RunnableLambda | Obviated by LCEL |\n| create_tagging_chain | create_structured_output_runnable | Use LCEL under the hood |\n| ChatAgent | create_react_ag

## Uso del reordenador en nuestro pipeline de `Retrieval Augmented Generation`

In [7]:
def combine_documents(documents: list[Document]) -> str:
    return "\n\n".join([doc.page_content for doc in documents])


prompt = PromptTemplate.from_template(
    """Given the following text extracts:
-----
{context}
-----
                                      
Answer the following question, if you don't know the answer, just write "I don't know.

Question: {question}"""
)

llm = ChatOpenAI(temperature=0)

stuff_chain = (
    {
        "context": itemgetter("question")
        | retriever
        | reordering.transform_documents
        | combine_documents,
        "question": itemgetter("question"),
    }
    | prompt
    | llm
)

In [9]:
response = stuff_chain.invoke(input={"question": "How to create a chain using LCEL?"}).content
print(response)

To create a chain using LCEL, you need to define the steps of the chain and their order of execution. Each step in the chain is represented by a function or a class. Here are the general steps to create a chain using LCEL:

1. Define the functions or classes that will be used as steps in the chain.
2. Specify the order of execution for the steps.
3. Use the pipe operator (`|`) to connect the steps together.
4. Optionally, you can add additional functions or classes to modify the output of the chain.
5. Invoke the chain with the input data.

In the given text extracts, an example of creating a chain using LCEL can be seen in the following code snippet:

```python
chain = prompt | model | StrOutputParser() | _sanitize_output | PythonREPL().run
```

In this example, the `prompt` is connected to the `model` using the pipe operator (`|`). The output of the `model` is then passed to the `StrOutputParser`, followed by `_sanitize_output`, and finally to `PythonREPL().run`. The `chain` is then 

In [7]:
# Import Azure OpenAI
import os
from langchain.chat_models import AzureChatOpenAI
os.environ["OPENAI_API_KEY"]="06fe9b5f4d9846d19cb44f3d456c4dad"
os.environ["OPENAI_API_VERSION"]='2023-06-01-preview'
os.environ["AZURE_OPENAI_ENDPOINT"]='https://openai-cigo.openai.azure.com/'
llm=AzureChatOpenAI(openai_api_key="06fe9b5f4d9846d19cb44f3d456c4dad", deployment_name="Test", model_name="gpt-3.5-turbo")



In [12]:
from langchain.schema import HumanMessage
llm([HumanMessage(content='Hola')])

AIMessage(content='¡Hola! ¿En qué puedo ayudarte hoy?')