## Query Translation
Query translation is a series of steps to improve the likelihood of relevance between the query embeddings and the document embeddings to ensure the best possible match for the LLM answer generation.

    -Rephrase
    -Break-down
    -Abstract
    -Convert to Hypothetical Docs

In [1]:
from dotenv import load_dotenv, dotenv_values
import google.generativeai as genai
from IPython.display import Markdown, display
import os 


load_dotenv()
os.getenv("GOOGLE_API_KEY") 
my_api_key = os.getenv("GOOGLE_API_KEY")
genai.configure(api_key=my_api_key)

### MultiQueryRetriever
Distance-based vector database retrieval embeds (represents) queries in high-dimensional space and finds similar embedded documents based on "distance". But, retrieval may produce different results with subtle changes in query wording or if the embeddings do not capture the semantics of the data well. Prompt engineering / tuning is sometimes done to manually address these problems, but can be tedious.

The MultiQueryRetriever automates the process of prompt tuning by using an LLM to generate multiple queries from different perspectives for a given user input query. For each query, it retrieves a set of relevant documents and takes the unique union across all queries to get a larger set of potentially relevant documents. By generating multiple perspectives on the same question, the MultiQueryRetriever might be able to overcome some of the limitations of the distance-based retrieval and get a richer set of results.

##### INDEXING

In [3]:
# Load blog
import bs4
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
blog_docs = loader.load() 

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [5]:
# Split
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=300, 
    chunk_overlap=50)

# Make splits
splits = text_splitter.split_documents(blog_docs) 

In [7]:

# Index
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_community.vectorstores import Chroma

## Call Embedding Model
embedding = GoogleGenerativeAIEmbeddings(model="models/text-embedding-004")

vectorstore = Chroma.from_documents(documents=splits, 
                                    embedding=embedding)

retriever = vectorstore.as_retriever()

##### PROMPTING

In [16]:
from langchain.prompts import ChatPromptTemplate

# Multi Query: Different Perspectives
template = """You are an AI language model assistant. Your task is to generate five 
different versions of the given user question to retrieve relevant documents from a vector 
database. By generating multiple perspectives on the user question, your goal is to help
the user overcome some of the limitations of the distance-based similarity search. 
Provide these alternative questions separated by newlines. Original question: {question}"""
prompt_perspectives = ChatPromptTemplate.from_template(template)

from langchain_core.output_parsers import StrOutputParser
from langchain_google_genai import ChatGoogleGenerativeAI

# LLM
llm = ChatGoogleGenerativeAI(model= "gemini-1.5-flash", temperature = 0)

generate_queries = (
    prompt_perspectives 
    | llm
    | StrOutputParser() 
    | (lambda x: x.split("\n"))
)
generate_queries

ChatPromptTemplate(input_variables=['question'], messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['question'], template='You are an AI language model assistant. Your task is to generate five \ndifferent versions of the given user question to retrieve relevant documents from a vector \ndatabase. By generating multiple perspectives on the user question, your goal is to help\nthe user overcome some of the limitations of the distance-based similarity search. \nProvide these alternative questions separated by newlines. Original question: {question}'))])
| ChatGoogleGenerativeAI(model='models/gemini-1.5-flash', temperature=0.0, client=<google.ai.generativelanguage_v1beta.services.generative_service.client.GenerativeServiceClient object at 0x00000242BE27A350>, async_client=<google.ai.generativelanguage_v1beta.services.generative_service.async_client.GenerativeServiceAsyncClient object at 0x00000242BF850950>, default_metadata=())
| StrOutputParser()
| RunnableLambda(

In [23]:
from langchain.load import dumps, loads

def get_unique_union(documents: list[list]):
    """ Unique union of retrieved docs """
    # Flatten list of lists, and convert each Document to string
    flattened_docs = [dumps(doc) for sublist in documents for doc in sublist]
    # Get unique documents
    unique_docs = list(set(flattened_docs))
    # Return
    return [loads(doc) for doc in unique_docs]

# Retrieve
question = "What is task decomposition for LLM agents?"
retrieval_chain = generate_queries | retriever.map() | get_unique_union
docs = retrieval_chain.invoke({"question":question})
len(docs)

9

In [13]:
from operator import itemgetter
from langchain_core.runnables import RunnablePassthrough

# RAG
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    {"context": retrieval_chain, 
     "question": itemgetter("question")} 
    | prompt
    | llm
    | StrOutputParser()
)

res = final_rag_chain.invoke({"question":question})
print(res)

Task decomposition is the process of breaking down large, complex tasks into smaller, more manageable subgoals for LLM agents. This allows the agent to handle complex tasks more efficiently. 

The document mentions several methods for task decomposition:

* **LLM with simple prompting:**  The LLM is prompted with questions like "Steps for XYZ?" or "What are the subgoals for achieving XYZ?"
* **Task-specific instructions:**  The agent is given instructions tailored to the specific task, such as "Write a story outline" for writing a novel.
* **Human inputs:**  Humans can provide the initial task decomposition, guiding the agent's planning.

The document also highlights techniques like Chain of Thought (CoT) and Tree of Thoughts (ToT) which help LLMs decompose tasks into smaller steps. 



##### Drawbacks of the method 

 - Human evaluation is a must to to understand the maximum number of chunks to use, the maximum number of queries to generate, and a response synthesizer function designed, among other activities, to make the most of the returned contexts.
 - This approach requires the user to design a clear prompt, leading subsequently to sub-query creation, making the process more reliant on the general-info pre-tuned LLM. This might not be able to understand the context of niche user queries.
 - Series of retrievals/generations triggers higher cost and latency
 - Having a broader contextual window to look for and giving equal weightage to all the contexts will degrade the model performance if not all the returned information is relevant to that question.