# WS24 - Intelligente Informationssysteme

## Block 3: Retrieval Augmented Generation

**Part 4: Advanced Retrieval - Question Translation**

1. Multi Query
2. RAG-Fusion
3. Decomposition

In [1]:
## FIRST: Initialize the VectorDB and LLM
from langchain_ollama import OllamaEmbeddings
from langchain_ollama import ChatOllama
from langchain_chroma import Chroma
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

embeddings = OllamaEmbeddings(model="nomic-embed-text")
vectorstore = Chroma(persist_directory="vector_store", collection_name="lils_blogs", embedding_function=embeddings)
retriever = vectorstore.as_retriever(search_type="similarity") #, search_kwargs={"k": 2})

llm = ChatOllama(model="llama3.2:latest", temperature=0)

## 1. Question Translation: Multi Query

Multi Query: Different perspectives on the same input question

![Multi Query](./media/LangChain_Multi_Query.png "Multi Query")


see: 
- https://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_5_to_9.ipynb
- https://python.langchain.com/docs/how_to/MultiQueryRetriever/

In [3]:
### Prompt

template = """You are an AI language model assistant. Your task is to generate five 
different versions of the given user question to retrieve relevant documents from a vector 
database. By generating multiple perspectives on the user question, your goal is to help
the user overcome some of the limitations of the distance-based similarity search. 
Provide these alternative questions separated by newline. Original question: {question}"""

prompt_template = ChatPromptTemplate.from_template(template)


In [4]:
question = "What is Task Decomposition?"
prompt_template.invoke(question)

ChatPromptValue(messages=[HumanMessage(content='You are an AI language model assistant. Your task is to generate five \ndifferent versions of the given user question to retrieve relevant documents from a vector \ndatabase. By generating multiple perspectives on the user question, your goal is to help\nthe user overcome some of the limitations of the distance-based similarity search. \nProvide these alternative questions separated by newline. Original question: What is Task Decomposition?', additional_kwargs={}, response_metadata={})])

In [5]:
# LineListOutputParser split the LLM result into a list of queries

from typing import List
from langchain_core.output_parsers import BaseOutputParser

class LineListOutputParser(BaseOutputParser[List[str]]):
    """Output parser for a list of lines."""

    def parse(self, text: str) -> List[str]:
        lines = text.strip().split("\n")
        return list(filter(None, lines))  # Remove empty lines

output_parser = LineListOutputParser()

In [10]:
llm_chain = prompt_template | llm

response = llm_chain.invoke(question)
response.content

'Here are five different versions of the original question:\n\n1. Can you explain the concept of breaking down complex tasks into smaller, manageable components?\n\n2. How does task decomposition work in project management and what benefits does it provide?\n\n3. What is the purpose of decomposing a large task into smaller sub-tasks to improve efficiency and productivity?\n\n4. Is there a specific methodology or framework for task decomposition that you can recommend?\n\n5. Can you provide an example of how task decomposition has been applied in real-world scenarios, such as software development or operations management?'

In [11]:
generate_queries = (
    prompt_template 
    | llm
    | output_parser
)

In [18]:
question = "Explain Task Decomposition step by step?"

multi_queries = generate_queries.invoke(question)
multi_queries

['Here are five different versions of the original question:',
 '1. What is the process of task decomposition, and how can it be applied to break down complex tasks into manageable components?',
 '2. Can you provide a detailed explanation of task decomposition, including its benefits and common techniques used in this approach?',
 '3. How does task decomposition work, and what are some key considerations when decomposing a task into smaller sub-tasks?',
 '4. What is the purpose of task decomposition, and how can it be used to improve project management, team collaboration, and individual productivity?',
 '5. Break down the task decomposition process step by step, including identifying goals, analyzing tasks, and creating a plan for implementation.',
 'These alternative questions aim to capture different aspects of the original question, such as the process, benefits, considerations, purpose, and application of task decomposition. By generating multiple perspectives, we can help users l

In [19]:
######### Retrieve doc for each query and aggregate 
import langchain
from langchain.load import dumps, loads

def get_unique_union(documents: list[list]):
    """ Unique union of retrieved docs """
    # Flatten list of lists, and convert each Document to string
    flattened_docs = [dumps(doc) for sublist in documents for doc in sublist]
    # Get unique documents
    unique_docs = list(set(flattened_docs))
    # Return
    return [loads(doc) for doc in unique_docs]

multi_docs = []
for query in multi_queries:
    multi_docs.append(retriever.invoke(query))
print(len(multi_docs), "questions")

new_docs = get_unique_union(multi_docs)
print(len(new_docs))

7 questions
13


In [20]:
########## Final RAG Generation ##########
from operator import itemgetter
from langchain_core.runnables import RunnablePassthrough, RunnableLambda

template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

context = "\n\n".join([doc.page_content for doc in new_docs])

runnable = RunnablePassthrough(lambda x: str(x))

final_rag_chain = (
    {"context": itemgetter("context"), 
     "question": itemgetter("question")} 
    | prompt
    | llm
    | StrOutputParser()
)

for answer in  final_rag_chain.stream({"context":context, "question":question}):
    print(answer, end="")

Task decomposition is a technique used to break down complex tasks into smaller, more manageable subtasks. Here's a step-by-step explanation of task decomposition:

1. **Identify the complex task**: Start by identifying the complex task that needs to be broken down.
2. **Determine the subgoals**: Determine what the subgoals are for each subtask. Subgoals are specific, measurable outcomes that need to be achieved in order to complete the larger task.
3. **Break down the subtasks**: Break down each subgoal into smaller subtasks. These subtasks should still be related to the original complex task and should help achieve the overall goal.
4. **Identify dependencies**: Identify any dependencies between subtasks. This means determining which subtask needs to be completed before another can begin.
5. **Create a tree structure**: Create a tree structure that represents the relationships between subtasks. Each node in the tree represents a subtask, and the edges represent the dependencies betwe

In [21]:
###### Do the same with MultiQueryRetriever #######
from typing import List

from langchain_core.output_parsers import BaseOutputParser
from langchain_core.prompts import PromptTemplate

from langchain.retrievers.multi_query import MultiQueryRetriever

class LineListOutputParser(BaseOutputParser[List[str]]):
    """Output parser for a list of lines."""

    def parse(self, text: str) -> List[str]:
        lines = text.strip().split("\n")
        return list(filter(None, lines))  # Remove empty lines

output_parser = LineListOutputParser()

QUERY_PROMPT = PromptTemplate(
    input_variables=["question"],
    template="""You are an AI language model assistant. Your task is to generate five 
    different versions of the given user question to retrieve relevant documents from a vector 
    database. By generating multiple perspectives on the user question, your goal is to help
    the user overcome some of the limitations of the distance-based similarity search. 
    Provide these alternative questions separated by newlines.
    Original question: {question}""",
)

llm = ChatOllama(model="llama3.2:latest", temperature=0)

# Chain
llm_chain = QUERY_PROMPT | llm | output_parser

multi_query_retriever = MultiQueryRetriever(
    retriever=vectorstore.as_retriever(), llm_chain=llm_chain, parser_key="lines"
)  # "lines" is the key (attribute name) of the parsed output

question = "What is Task Decomposition?"

# Results

new_docs = multi_query_retriever.invoke(question)
len(new_docs)

# Use the new docs and concatenate them to one context for final RAG Generation or use RAG Fusion

15

## 2. RAG Fusion

RAG-Fusion bridges the gap between what users explicitly ask and what they intend to ask.

![RAG Fusion](./media/LangChain_RAG_Fusion.png "RAG Fusion")


see: 
- https://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_5_to_9.ipynb
- https://towardsdatascience.com/forget-rag-the-future-is-rag-fusion-1147298d8ad1

In [22]:
##### Reciprocal RAG Fusion
from langchain.load import dumps, loads

def reciprocal_rank_fusion(results: list[list], k=60):
    """ Reciprocal_rank_fusion that takes multiple lists of ranked documents 
        and an optional parameter k used in the RRF formula """
    
    # Initialize a dictionary to hold fused scores for each unique document
    fused_scores = {}

    # Iterate through each list of ranked documents
    for docs in results:
        # Iterate through each document in the list, with its rank (position in the list)
        for rank, doc in enumerate(docs):
            # Convert the document to a string format to use as a key (assumes documents can be serialized to JSON)
            doc_str = dumps(doc)
            # If the document is not yet in the fused_scores dictionary, add it with an initial score of 0
            if doc_str not in fused_scores:
                fused_scores[doc_str] = 0
            # Retrieve the current score of the document, if any
            previous_score = fused_scores[doc_str]
            # Update the score of the document using the RRF formula: 1 / (rank + k)
            fused_scores[doc_str] += 1 / (rank + k)

    # Sort the documents based on their fused scores in descending order to get the final reranked results
    reranked_results = [
        (loads(doc), score)
        for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
    ]

    # Return the reranked results as a list of tuples, each containing the document and its fused score
    return reranked_results
print(len(new_docs))
rrf_docs = reciprocal_rank_fusion([new_docs]) # expects a list of lists
print(len(rrf_docs))

15
15


In [23]:
n = 5 # Use the top 5 reranked documents
new_docs = [doc for (doc, rank) in rrf_docs[0:n]]
# Use the new docs and concatenate them to one context for final RAG Generation!

## 3. Decomposition

Decompose the question into subqueries and answer recursively or individually:

![Decomposition: Answer recursively](./media/LangChain_Decomposition1.png "Decomposition")
![Decomposition: Answer individually](./media/LangChain_Decomposition2.png "Decomposition")

see: 
- https://github.com/langchain-ai/rag-from-scratch/blob/main/rag_from_scratch_5_to_9.ipynb

In [24]:
# First decompose the question using the llm.
    
from langchain.prompts import ChatPromptTemplate

# Decomposition Prompt
#template = """You are a helpful assistant that generates multiple sub-questions related to an input question. \n
#The goal is to break down the input into a set of sub-problems / sub-questions that can be answers in isolation. \n
#Generate multiple search queries related to: {question}.\n
#Output (3 queries):"""

template = """You are a helpful assistant that generates multiple sub-questions related to an input question. \n
The goal is to break down the input into a set of sub-problems / sub-questions that can be answers in isolation. \n
Generate multiple search queries related to: {question} Retrun only the list of search queries.\n
Output (3 queries):"""

prompt_decomposition = ChatPromptTemplate.from_template(template)

In [25]:
from langchain_ollama import ChatOllama
from langchain_core.output_parsers import StrOutputParser

# LLM
llm = ChatOllama(model="llama3.2:latest", temperature=0)

# Chain
generate_queries_decomposition = ( prompt_decomposition | llm | StrOutputParser() | (lambda x: x.split("\n")))

# Run
question = "What are the main components of an LLM-powered autonomous agent system?"
questions = generate_queries_decomposition.invoke({"question":question})

In [27]:
questions

['1. What are the key components of a large language model (LLM) used in autonomous decision-making?',
 '2. How do LLMs contribute to the development of autonomous agents with natural language understanding and processing capabilities?',
 '3. What role do reinforcement learning algorithms play in integrating LLMs with autonomous agent systems for optimal performance?']

#### Answer Recursively

In [28]:
# Final RAG Prompt
template = """Here is the question you need to answer:

\n --- \n {question} \n --- \n

Here is any available background question + answer pairs:

\n --- \n {q_a_pairs} \n --- \n

Here is additional context relevant to the question: 

\n --- \n {context} \n --- \n

Use the above context and any background question + answer pairs to answer the question: \n {question}
"""

decomposition_prompt = ChatPromptTemplate.from_template(template)

In [29]:
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

def format_qa_pair(question, answer):
    """Format Q and A pair"""
    
    formatted_string = ""
    formatted_string += f"Question: {question}\nAnswer: {answer}\n\n"
    return formatted_string.strip()

# llm
llm = ChatOllama(model="llama3.2:latest", temperature=0, max_length=400)

q_a_pairs = ""
for q in questions:
    
    rag_chain = (
    {"context": itemgetter("question") | retriever, 
     "question": itemgetter("question"),
     "q_a_pairs": itemgetter("q_a_pairs")} 
    | decomposition_prompt
    | llm
    | StrOutputParser())

    answer = rag_chain.invoke({"question":q,"q_a_pairs":q_a_pairs})
    q_a_pair = format_qa_pair(q,answer)
    print(q_a_pair)
    print("="*80)
    q_a_pairs = q_a_pairs + "\n---\n"+  q_a_pair

Question: 1. What are the key components of a large language model (LLM) used in autonomous decision-making?
Answer: Based on the provided context, the key components of a large language model (LLM) used in autonomous decision-making include:

1. Planning:
	* Subgoal and decomposition: Breaking down large tasks into smaller, manageable subgoals.
	* Reflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes, and refine them for future steps.

These components work together to enable the LLM-powered autonomous agent system to make informed decisions and improve its performance over time.
Question: 2. How do LLMs contribute to the development of autonomous agents with natural language understanding and processing capabilities?
Answer: Based on the provided context, Large Language Models (LLMs) contribute to the development of autonomous agents with natural language understanding and processing capabilities in several ways:

1. **P

In [30]:
##### Final Answer
print(answer)

Based on the provided context, reinforcement learning algorithms play a crucial role in integrating Large Language Models (LLMs) with autonomous agent systems for optimal performance.

Reinforcement learning is used to fine-tune the LLM-powered autonomous agent system, particularly when the sampling steps are non-differentiable. The algorithm maximizes the reward $\mathbb{E}_{\mathbf{x} \sim p_{red}(.)} [r(\mathbf{x}, \mathbf{y})]$ with a KL divergence term between the current $p_{red}$ and the initial model behavior, where $\mathbf{y}$ is a sample from the target model.

In the context of LLM-powered autonomous agent systems, reinforcement learning algorithms are used to:

1. Fine-tune the model: Reinforcement learning is used to fine-tune the LLM-powered autonomous agent system, particularly when the sampling steps are non-differentiable.
2. Maximize the reward: The algorithm maximizes the reward $\mathbb{E}_{\mathbf{x} \sim p_{red}(.)} [r(\mathbf{x}, \mathbf{y})]$ to improve the per

#### Answer Individually 

In [31]:
# Answer each sub-question individually 
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser
from langchain_ollama import ChatOllama

# RAG prompt
prompt = """You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question} 
Context: {context} 
Answer:"""

prompt_template = ChatPromptTemplate.from_template(prompt)

# llm
llm = ChatOllama(model="llama3.2:latest", temperature=0, max_length=400)

def retrieve_and_rag(question,prompt_rag,sub_question_generator_chain):
    """RAG on each sub-question"""
    
    # Use our decomposition / 
    sub_questions = sub_question_generator_chain.invoke({"question":question})
    
    # Initialize a list to hold RAG chain results
    rag_results = []
    
    for sub_question in sub_questions:
        
        # Retrieve documents for each sub-question
        #retrieved_docs = retriever.get_relevant_documents(sub_question)
        retrieved_docs = retriever.invoke(sub_question)
        
        # Use retrieved documents and sub-question in RAG chain
        answer = (prompt_rag | llm | StrOutputParser()).invoke({"context": retrieved_docs, 
                                                                "question": sub_question})
        rag_results.append(answer)
    
    return rag_results,sub_questions

# Wrap the retrieval and RAG process in a RunnableLambda for integration into a chain
answers, questions = retrieve_and_rag(question, prompt_template, generate_queries_decomposition)

In [32]:
for question, answer in zip(questions, answers):
    print(f"Question\n {question}\n")
    print(f"Answer\n {answer}\n")

Question
 1. What are the key components of a large language model (LLM) used in autonomous decision-making?

Answer
 The key components of a large language model (LLM) used in autonomous decision-making are planning and reflection/refinement. Planning involves breaking down complex tasks into smaller subgoals and task decomposition using techniques like Chain of Thought (CoT). Reflection and refinement enable the agent to learn from mistakes and improve its performance over time.

Question
 2. How do LLMs contribute to the development of autonomous agents with natural language understanding and processing capabilities?

Answer
 LLMs contribute to the development of autonomous agents by serving as the agent's brain, complemented by planning, reflection, and refinement components. They enable efficient handling of complex tasks through subgoal decomposition and self-criticism, improving the quality of final results. LLMs also provide a powerful general problem solver with capabilities b