# Multi-query retriever

![Multi-query retriever](./diagrams/slide_diagrama_03_V2.png)

## Librerías

In [1]:
import logging
from typing import Any

from dotenv import load_dotenv
from langchain.chains import LLMChain
from langchain.chat_models import ChatOpenAI
from langchain.embeddings import OpenAIEmbeddings
from langchain.output_parsers import PydanticOutputParser
from langchain.prompts import PromptTemplate
from langchain.retrievers import MultiQueryRetriever
from langchain.vectorstores import Chroma
from pydantic import BaseModel, Field

from src.langchain_docs_loader import load_langchain_docs_splitted

load_dotenv()

False

## Carga de datos

In [2]:
docs = load_langchain_docs_splitted()
len(docs)

3290

## Preparación de vectorstore

In [8]:
embedding = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(documents=docs[:100], embedding=embedding)

## Preparación de retriever

In [9]:
llm=ChatOpenAI()
retriever=MultiQueryRetriever.from_llm(
    llm=llm,
    retriever=vectorstore.as_retriever()
)

##  Prueba de retriever

In [10]:
logging.basicConfig()
logging.getLogger("langchain.retrievers.multy_query").setLevel(logging.INFO)

In [11]:
retriever.get_relevant_documents(
    "How to create a retriever with langchain expression language?"
)

[Document(page_content='- [HOW to Make Conversational Form with LangChain](https://youtu.be/IT93On2LB5k)\n- ⛓ [Claude-2 meets LangChain!](https://youtu.be/Hb_D3p0bK2U?si=j96Kc7oJoeRI5-iC)\n- ⛓ [PaLM 2 Meets LangChain](https://youtu.be/orPwLibLqm4?si=KgJjpEbAD9YBPqT4)\n- ⛓ [LLaMA2 with LangChain - Basics | LangChain TUTORIAL](https://youtu.be/cIRzwSXB4Rc?si=v3Hwxk1m3fksBIHN)\n- ⛓ [Serving LLaMA2 with Replicate](https://youtu.be/JIF4nNi26DE?si=dSazFyC4UQmaR-rJ)\n- ⛓ [NEW LangChain Expression Language](https://youtu.be/ud7HJ2p3gp0?si=8pJ9O6hGbXrCX5G9)\n- ⛓ [Building a RCI Chain for Agents with LangChain Expression Language](https://youtu.be/QaKM5s0TnsY?si=0miEj-o17AHcGfLG)\n- ⛓ [How to Run LLaMA-2-70B on the Together AI](https://youtu.be/Tc2DHfzHeYE?si=Xku3S9dlBxWQukpe)\n- ⛓ [RetrievalQA with LLaMA 2 70b & Chroma DB](https://youtu.be/93yueQQnqpM?si=ZMwj-eS_CGLnNMXZ)\n- ⛓ [How to use BGE Embeddings for LangChain](https://youtu.be/sWRvSG7vL4g?si=85jnvnmTCF9YIWXI)\n- ⛓ [How to use Custom Pro

## Generación de preguntas alternativas de forma personalizada

### Definición de esquema de salida de preguntas

In [12]:
class LineList(BaseModel):
    # "lines" is the key (attribute name) of the parsed output
    lines: list[str] = Field(description="Lines of text")


class LineListOutputParser(PydanticOutputParser[Any]):
    def __init__(self) -> None:
        super().__init__(pydantic_object=LineList)

    def parse(self, text: str) -> LineList:
        lines = text.strip().splitlines()
        return LineList(lines=lines)

### Creación de `prompt` personalizado

In [15]:
prompt = PromptTemplate.from_template(
    """You are an AI language assistant well versed in the Langchain Documentation.
Your more precise task is to generate five different versions of the given question to retrieve relevant documents from a vector database.
By generating multiple perspectives on the question, your goal is to overcome some of the limitations of the distance-based similarity search.

Provide these alternative questions separed by newlines.

Original question: {question}
New questions:"""
)

llm_chain = LLMChain(
    llm=ChatOpenAI(temperature=0),
    prompt=prompt,
    output_parser=LineListOutputParser(),
)

# In language expression language, you could create the chain with:
# llm_chain = prompt | llm | LineListOutputParser()

### Use de cadena de generación de preguntas personalizada

In [16]:
llm_chain.invoke(
    {"question": "How to create a retriever with langchain expression language?"}
)

{'question': 'How to create a retriever with langchain expression language?',
 'text': LineList(lines=['1. What are the steps to build a retriever using the langchain expression language?', '2. Can you explain the process of constructing a retriever using the langchain expression language?', '3. What is the procedure for creating a retriever using the langchain expression language?', '4. Could you provide a guide on how to develop a retriever using the langchain expression language?', '5. What are the necessary components for implementing a retriever with the langchain expression language?'])}

### Integración de cadena de generación de preguntas personalizada en `retriever`

In [17]:
retriever = MultiQueryRetriever(
    retriever=vectorstore.as_retriever(),
    llm_chain=llm_chain,
    parser_key="lines",
)

### Uso de `retriever` con cadena de generación de preguntas personalizada

In [18]:
retriever.get_relevant_documents(
    "How to create a retriever with lagnchain expression language?"
)

[Document(page_content='- [HOW to Make Conversational Form with LangChain](https://youtu.be/IT93On2LB5k)\n- ⛓ [Claude-2 meets LangChain!](https://youtu.be/Hb_D3p0bK2U?si=j96Kc7oJoeRI5-iC)\n- ⛓ [PaLM 2 Meets LangChain](https://youtu.be/orPwLibLqm4?si=KgJjpEbAD9YBPqT4)\n- ⛓ [LLaMA2 with LangChain - Basics | LangChain TUTORIAL](https://youtu.be/cIRzwSXB4Rc?si=v3Hwxk1m3fksBIHN)\n- ⛓ [Serving LLaMA2 with Replicate](https://youtu.be/JIF4nNi26DE?si=dSazFyC4UQmaR-rJ)\n- ⛓ [NEW LangChain Expression Language](https://youtu.be/ud7HJ2p3gp0?si=8pJ9O6hGbXrCX5G9)\n- ⛓ [Building a RCI Chain for Agents with LangChain Expression Language](https://youtu.be/QaKM5s0TnsY?si=0miEj-o17AHcGfLG)\n- ⛓ [How to Run LLaMA-2-70B on the Together AI](https://youtu.be/Tc2DHfzHeYE?si=Xku3S9dlBxWQukpe)\n- ⛓ [RetrievalQA with LLaMA 2 70b & Chroma DB](https://youtu.be/93yueQQnqpM?si=ZMwj-eS_CGLnNMXZ)\n- ⛓ [How to use BGE Embeddings for LangChain](https://youtu.be/sWRvSG7vL4g?si=85jnvnmTCF9YIWXI)\n- ⛓ [How to use Custom Pro