In [None]:
! pip install langchain_community tiktoken langchain-openai langchainhub chromadb langchain
! pip install langchain-google-genai

## Rag Query Transformation
* Query transformations are a set of approaches focused on re-writing and / or modifying questions for retrieval.

In [None]:
# Tracing the notebooks output in langchain using langsmith
import os
import os
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
os.environ['LANGCHAIN_API_KEY'] = "LANGCHAIN_API_KEY"
LANGSMITH_PROJECT="Query Transformation"

In [None]:
import google.generativeai as genai # can also use openai
from langchain_google_genai import GoogleGenerativeAIEmbeddings

os.environ["GOOGLE_API_KEY"] = "YOUR_GOOGLE_API_KEY"
embeddings = GoogleGenerativeAIEmbeddings(model = "models/embedding-001")

In [None]:
vectors = embeddings.embed_query("MY name is Aditya")
vectors[:5]

[0.010893070138990879,
 -0.017837276682257652,
 -0.03341979160904884,
 -0.03625664487481117,
 0.057129014283418655]

### Indexing

We load and process a blog post from a specific URL using the `WebBaseLoader` from the `langchain_community.document_loaders` module and the `BeautifulSoup` library.



In [None]:
#Load the blog
import bs4
from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader(
             web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
            bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)

blog_docs=loader.load()
blog_docs

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='\n\n      LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistake

* split a long document (e.g., a blog post) into smaller chunks using the `RecursiveCharacterTextSplitter` from the `langchain` library.

 **RecursiveCharacterTextSplitter (from langchain.text_splitter):**
   - The `RecursiveCharacterTextSplitter` class is used to break down long text into smaller, manageable chunks. It ensures that each chunk is not too large and can fit into memory or be processed more efficiently.
   


In [None]:
# Split the blog into chunks
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=300,
    chunk_overlap=50,
)
splits = text_splitter.split_documents(blog_docs)

In [None]:
splits[:2]

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng'),
 Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it')]

* NOw we need to store embed text chunks into a vector database and set up a retriever to perform similarity searches using the `langchain` library and Google's Generative AI embeddings.

**GoogleGenerativeAIEmbeddings:**
   -Generative AI model for embedding text. Embeddings are numerical representations of text that capture its semantic meaning.(text to vectore)
   

In [None]:
# Embed the docs in vectorsdb
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_community.vectorstores import Chroma

vectstore = Chroma.from_documents(
    documents=splits,
    embedding = GoogleGenerativeAIEmbeddings(model = "models/embedding-001")
)

retriever = vectstore.as_retriever()

## Prompt
Defining **multiquery** in prompt i.e defining a template for generating multiple versions of a user question, which can help improve search results by overcoming the limitations of distance-based similarity searches in vector databases.
**ChatPromptTemplate:**
   - Used to create a prompt template for interacting with a language model. It helps structure the input provided to the model, such as generating different versions of a question.
  


In [None]:
from langchain.prompts import ChatPromptTemplate

template = """You are an AI language model assistant. Your task is to generate five
different versions of the given user question to retrieve relevant documents from a vector
database. By generating multiple perspectives on the user question, your goal is to help
the user overcome some of the limitations of the distance-based similarity search.
Provide these alternative questions separated by newlines. Original question: {question}"""

prompt_perspective = ChatPromptTemplate.from_template(template)


* Here we setup a pipeline that generates multiple versions of a user’s question, processes the result from a generative AI model, and splits the output into individual questions. The goal is to improve document retrieval by generating alternative queries for similarity search.

 **StrOutputParser:**- The `StrOutputParser` class is used to parse string-based outputs into a desired format, making it easier to handle and process the text generated by the AI model.


In [None]:
from langchain_core.output_parsers.string import StrOutputParser
from langchain_google_genai import ChatGoogleGenerativeAI

# Build the pipeline
generate_queries = (
    prompt_perspective
    | ChatGoogleGenerativeAI(model="gemini-1.5-pro",temperature=0)
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)

* We defines a process to generate unique versions of a user’s query, retrieve relevant documents using a generative model and a retriever, and ensure the documents returned are unique.

 **dumps and loads:** - The `dumps` function serializes data (e.g., documents) into a string format, while `loads` deserializes strings back into Python objects. These are used to remove duplicate documents by serializing and converting them back after uniqueness is checked.
  


In [None]:
from langchain.load import dumps ,loads

def get_unique_union(documents:list[list]):
  """Get unique values and remove duplicates"""

  flatten_docs = [dumps(doc) for sublist in documents for doc in sublist]
  unique_docs = list(set(flatten_docs))

  return [loads(doc) for doc in unique_docs]

#Defining and running the retriver chain
question = "What is task decompisition in agents?"
retriver_chain = generate_queries|retriever.map()|get_unique_union
docs = retriver_chain.invoke({"question":question})
len(docs),docs



(3,
 [Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Subgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.'),
  Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.'),
  Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#')])

### Retrieval-Augmented Generation (RAG)
**RAG** pipeline where the generative model (Google’s `gemini-1.5-pro`) answers a user’s question based on the context (retrieved documents). The process combines retrieval and generation to provide more accurate answers.

**itemgetter (from operator):**
   - `itemgetter` is used to retrieve specific items (e.g., the `question` from a dictionary). It is commonly used to extract specific values based on keys.


In [None]:
from operator import itemgetter
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.runnables import RunnablePassthrough

# Create the template to generate output
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

llm = ChatGoogleGenerativeAI(
        model="gemini-1.5-pro",
        temperature=0,
        max_tokens=500,
        api_key=os.getenv("GOOGLE_AI_API_KEY")
    )

# Full rag chain flow
final_rag_chain = (
    {'context':retriver_chain, # relevant documents are retrived through seperate pipeline i.e defined above
     'question':itemgetter('question')}
    |prompt
    |llm
    |StrOutputParser()
)

# Running the rag chain
final_rag_chain.invoke({'question':question})

### RAG Fusion
* Goal is to generate multiple search queries based on a single input query.

**RAG-Fusion** system that generates multiple search queries based on a single user-provided question. This approach is useful for improving document retrieval and expanding search strategies.


In [None]:
from langchain.prompts import ChatPromptTemplate

# RAG-Fusion
template = """You are a helpful assistant that generates multiple search queries based on a single input query. \n
Generate multiple search queries related to: {question} \n
Output (4 queries):"""

prompt_rag_fusion = ChatPromptTemplate.from_template(template)

* Again we create a pipeline to generate multiple search queries based on an input question. It uses the **RAG-Fusion** prompt and processes the output through a generative AI model to get structured search queries.


In [None]:
from langchain_core.output_parsers.string import StrOutputParser
from langchain_google_genai import ChatGoogleGenerativeAI

# Defini g the pipeline
generate_queries = (
    prompt_rag_fusion
    | ChatGoogleGenerativeAI(model="gemini-1.5-pro",temperature=0)
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)



* Here we are using the approach of **Reciprocal Rank Fusion (RRF)**, a method for combining multiple ranked lists of documents into a single ranked list. The goal is to improve the quality of search results by leveraging multiple sources or models.

In [None]:
from langchain.load import dumps, loads

def reciprocal_rank_fusion(results: list[list], k=60):
    """ Reciprocal_rank_fusion that takes multiple lists of ranked documents
        and an optional parameter k used in the RRF formula """

    fused_scores = {}

    for docs in results:
      for rank, doc in enumerate(docs):
        doc_str = dumps(doc) # seralize the doc
        if doc_str not in fused_scores:
                fused_scores[doc_str] = 0 # if not in fused score gives 0
        previous_score = fused_scores[doc_str]
        fused_scores[doc_str] += 1 / (rank + k)
    reranked_results = [
        (loads(doc), score)
        for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
    ]
    return reranked_results

In [None]:
# define chain
retriver_chain_rag_fusion = generate_queries | retriever.map() | reciprocal_rank_fusion
docs = retriver_chain_rag_fusion.invoke({"question":question})
len(docs),docs



(2,
 [(Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#'),
   0.19518647226209362),
  (Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Subgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.'),
   0.06506215742069787)])


Let's sets up a **RAG** pipeline that answers a user's question based on relevant context (retrieved documents or information).

 **RunnablePassthrough:** - `RunnablePassthrough` is used in cases where data should pass through the pipeline unchanged.

In [None]:
from langchain_core.runnables import RunnablePassthrough

# RAG
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

# complete rag pipeline
final_rag_chain =(
    {'context':retriver_chain_rag_fusion,
     'question':itemgetter('question')}
    |prompt
    |llm
    |StrOutputParser()
)
final_rag_chain.invoke({"question":question})


## Decomposition
 **Decomposition** breaks down a complex question into simpler sub-questions or sub-problems that can be answered individually.This is particularly useful for improving search queries or for handling complex inquiries.

In [None]:
from langchain.prompts import ChatPromptTemplate

# Decomposition
template = """You are a helpful assistant that generates multiple sub-questions related to an input question. \n
The goal is to break down the input into a set of sub-problems / sub-questions that can be answers in isolation. \n
Generate multiple search queries related to: {question} \n
Output (3 queries):"""

prompt_decomposition = ChatPromptTemplate.from_template(template)

In [None]:
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.output_parsers import StrOutputParser

# llm
llm = ChatGoogleGenerativeAI(
        model="gemini-1.5-pro",
        temperature=0,
        max_tokens=500,
    )

# chain
generate_queries_decomposition = (
    prompt_decomposition
    | llm
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)

# invoke query
question = "What are the main components of an LLM-powered autonomous agent system?"
questions = generate_queries_decomposition.invoke({"question":question})

## Answering recursively
* Q1, Q2, Q3
* Here we defines a **Prompt** template that combines multiple sources of information (the user's question, background question-answer pairs, and additional context) to generate a comprehensive answer. The template guides the language model to use all the provided context to answer the question.




In [None]:
# Prompt
template = """Here is the question you need to answer:
\n --- \n {question} \n --- \n
Here is any available background question + answer pairs:
\n --- \n {q_a_pairs} \n --- \n
Here is additional context relevant to the question:
\n --- \n {context} \n --- \n
Use the above context and any background question + answer pairs to answer the question: \n {question}
"""

decomposition_prompt = ChatPromptTemplate.from_template(template)

In [None]:
decomposition_prompt

ChatPromptTemplate(input_variables=['context', 'q_a_pairs', 'question'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'q_a_pairs', 'question'], input_types={}, partial_variables={}, template='Here is the question you need to answer:\n\n --- \n {question} \n --- \n\nHere is any available background question + answer pairs:\n\n --- \n {q_a_pairs} \n --- \n\nHere is additional context relevant to the question: \n\n --- \n {context} \n --- \n\nUse the above context and any background question + answer pairs to answer the question: \n {question}\n'), additional_kwargs={})])

In [None]:
from langchain_core.output_parsers import StrOutputParser

# format a question and answer
def format_qa(question,answer):
  fromat_string = ""
  format_string += f"Question: {question}\nAnswer: {answer}\n\n"
  return format_string.strip()

In [None]:
#LLm
llm = ChatGoogleGenerativeAI(
        model="gemini-1.5-pro",
        temperature=0,
        max_tokens=500,
        api_key=os.getenv("GOOGLE_API_KEY")
    )


* This is designed to generate answers to a series of questions using a **(RAG)** model, format the answers, and accumulate them into a list of formatted question-answer pairs.

In [None]:
qa_pair = ''

for q in questions:

  rag_chain(
      {'context':itemgetter('question')|retriever,
       'question':itemgetter('question'),
       'qa_pair':itemgetter('qa_pair')}
      |decomposition_prompt
      |llm
      |StrOutputParser()
  )
  answer = rag_chain.invoke({"question":q,"q_a_pairs":qa_pairs})
  qa_pair = format_qa(q,answer)
  qa_pairs = qa_pairs + "\n---\n"+  qa_pair

In [None]:
answer

## Answering Individually each question

* we answers each sub-question generated from a main question using **(RAG)**. The system retrieves relevant documents for each sub-question and then generates answers using a language model.


In [None]:
# Answer each sub-question individually
from langchain import hub
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

# RAG prompt
prompt_rag = hub.pull("rlm/rag-prompt")

def retrive_and_rag(question,prompt_rag,sub_question_generation_chain):
  """Applying rag on each sub question"""
  sub_questions = sub_question_generation_chain.invoke({'question':question})

  rag_result = []
  for sub_question in sub_questions:

    retrived_docs = retriever.get_relevant_documents(sub_question)

    answer = (
        prompt
        |llm
        |StrOutputParser()
    ).invoke({
        'context':retrived_docs,
        'question':sub_question
    })
    rag_result.append(answer)
  return rag_result,sub_questions

# Wrap the retrieval and RAG process in a RunnableLambda for integration into a chain
answers, questions = retrive_and_rag(question, prompt_rag, generate_queries_decomposition)


In [None]:
def format_qa_pairs(questions, answers):
    """Format Q and A pairs"""

    formatted_string = ""
    for i, (question, answer) in enumerate(zip(questions, answers), start=1):
        formatted_string += f"Question {i}: {question}\nAnswer {i}: {answer}\n\n"
    return formatted_string.strip()

context = format_qa_pairs(questions, answers)


In [None]:
# Prompt
template = """Here is a set of Q+A pairs:

{context}

Use these to synthesize an answer to the question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    prompt
    | llm
    | StrOutputParser()
)

final_rag_chain.invoke({"context":context,"question":question})

## Step back prompting
* we try a few-shot learning approach to guide a language model by providing a few examples of input-output question transformations. The few-shot examples show the model how to rephrase questions, helping it understand how to answer or generate questions with similar structures.


In [None]:
# Few Shot Examples
from langchain_core.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplate
examples = [
    {
        "input": "Could the members of The Police perform lawful arrests?",
        "output": "what can the members of The Police do?",
    },
    {
        "input": "Jan Sindel’s was born in what country?",
        "output": "what is Jan Sindel’s personal history?",
    },
]


In [None]:
# We now transform these to example messages
example_prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "{input}"),
        ("ai", "{output}"),
    ]
)
few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples,
)

In [None]:
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You are an expert at world knowledge. Your task is to step back and paraphrase a question to a more generic step-back question, which is easier to answer. Here are a few examples:""",
        ),
        # Few shot examples
        few_shot_prompt,
        # New question
        ("user", "{question}"),
    ]
)

In [None]:
generate_queries_step_back = prompt | ChatGoogleGenerativeAI( model="gemini-1.5-pro",temperature=0) | StrOutputParser()

question = "What is task decomposition for LLM agents?"

generate_queries_step_back.invoke({"question": question})



'what is task decomposition?'

In [None]:
# Response prompt
response_prompt_template = """You are an expert of world knowledge. I am going to ask you a question. Your response should be comprehensive and not contradicted with the following context if they are relevant. Otherwise, ignore them if they are not relevant.

# {normal_context}
# {step_back_context}

# Original Question: {question}
# Answer:"""

response_prompt = ChatPromptTemplate.from_template(response_prompt_template)

chain = (
    {
        # Retrieve context using the normal question
        "normal_context": RunnableLambda(lambda x: x["question"]) | retriever,
        # Retrieve context using the step-back question
        "step_back_context": generate_queries_step_back | retriever,
        # Pass on the question
        "question": lambda x: x["question"],
    }
    | response_prompt
    | ChatGoogleGenerativeAI(model="gemini-1.5-pro",temperature=0, api_key=os.getenv("GOOGLE_API_KEY"))
    | StrOutputParser()
)

chain.invoke({"question": question})



'Task decomposition for LLM agents is the process of breaking down a complex task into smaller, more manageable subtasks or subgoals. This is crucial for enabling LLMs to effectively handle intricate tasks that would be difficult or impossible to complete in a single step.  The decomposition process creates a sequence of simpler actions that the LLM can execute sequentially to achieve the overall objective.\n\nHere\'s a breakdown of how task decomposition works and why it\'s important:\n\n**Why is Task Decomposition Important?**\n\n* **Manageability:** Complex tasks can be overwhelming for LLMs. Decomposition simplifies the problem by dividing it into digestible chunks.\n* **Improved Performance:**  Smaller, focused subtasks are easier for LLMs to understand and execute correctly, leading to improved overall performance and accuracy.\n* **Efficient Execution:**  Breaking down a task allows for parallel processing of subtasks where possible, speeding up the overall completion time.\n* *

### HyDE
* At a high level, HyDE is an embedding technique that takes queries, generates a hypothetical answer, and then embeds that generated document and uses that as the final example.
* In order to use HyDE, we therefore need to provide a base embedding model, as well as an LLMChain that can be used to generate those documents. By default, the HyDE class comes with some default prompts to use (see the paper for more details on them), but we can also create our own.

In [None]:
from langchain.prompts import ChatPromptTemplate

# HyDE document genration
template = """Please write a scientific paper passage to answer the question
Question: {question}
Passage:"""
prompt_hyde = ChatPromptTemplate.from_template(template)

In [None]:
from langchain_core.output_parsers import StrOutputParser
from langchain_google_genai import ChatGoogleGenerativeAI

generate_docs_for_retrieval = (
    prompt_hyde |ChatGoogleGenerativeAI(model="gemini-1.5-pro",temperature=0, api_key=os.getenv("GOOGLE_API_KEY")) | StrOutputParser()
)

# Run
question = "What is task decomposition for LLM agents?"
generate_docs_for_retrieval.invoke({"question":question})



"Task decomposition for Large Language Model (LLM) agents refers to the process of breaking down a complex, high-level task into a sequence of smaller, more manageable subtasks that the LLM can execute effectively.  This decomposition is crucial for enabling LLMs to perform complex, real-world tasks that extend beyond simple prompt completion.  Instead of directly attempting to solve the overarching objective, the agent utilizes a decomposition strategy to generate a plan of action consisting of these subtasks.  Each subtask is typically designed to be sufficiently granular to be directly addressable by the LLM through a single prompt or a short sequence of interactions.  The successful completion of these individual subtasks, executed sequentially or in parallel as dictated by the decomposition strategy, cumulatively contributes to the accomplishment of the original, complex task.  This approach mitigates the cognitive limitations of LLMs when confronted with intricate problems, impro

In [None]:
# Retrieve
retrieval_chain = generate_docs_for_retrieval | retriever
retireved_docs = retrieval_chain.invoke({"question":question})
retireved_docs



[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#'),
 Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#'),
 Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#'),
 Document(metadata={'source': 'https://lilianweng.github.io/post

In [None]:
# RAG
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    prompt
    | llm
    | StrOutputParser()
)

final_rag_chain.invoke({"context":retireved_docs,"question":question})



'The provided context mentions "Task Decomposition" as part of the planning component for LLM-powered autonomous agents.  It states that a complicated task usually involves many steps, and the agent needs to know these steps and plan ahead.  However, it doesn\'t define what task decomposition *is* within that context, only that it\'s necessary for planning.  More information is needed to fully explain task decomposition for LLM agents.'