In [1]:
# https://python.langchain.com/v0.2/docs/tutorials/rag/
# high-level overview of differenrt retrieval techniques: https://python.langchain.com/v0.2/docs/concepts/#retrieval

### RAG

RAG is a technique for augmenting LLM knowledge with additional data.

Two main components of RAG application

1. Indexing: a pipeline for ingesting data from a source and indexing it. This usually happens offline.
2. Retrieval and generation: the actual RAG chain, which takes the user query at run time and retrieves the relevant data from the index, then passes that to the model.


#### Indexing

1. Load: First we need to load our data. This is done with Document Loaders.
2. Split: Text splitters break large Documents into smaller chunks. This is useful both for indexing data and for passing it in to a model, since large chunks are harder to search over and won't fit in a model's finite context window.
   - `RecursiveCharacterTextSplitter`: recursively use separators(`["\n\n", "\n", " ", ""]`) until the chunks are small enough.
   - Go deeper: https://python.langchain.com/v0.2/docs/how_to/#text-splitters
   - Source code: https://python.langchain.com/v0.2/docs/integrations/document_loaders/source_code/
   - Scientific paper: https://python.langchain.com/v0.2/docs/integrations/document_loaders/grobid/
3. Store: We need somewhere to store and index our splits, so that they can later be searched over. This is often done using a VectorStore and Embeddings model.
   - Embedding models: https://python.langchain.com/v0.2/docs/integrations/text_embedding/
   - Vectorstore: https://python.langchain.com/v0.2/docs/integrations/vectorstores/
     ![](https://python.langchain.com/v0.2/assets/images/rag_indexing-8160f90a90a33253d0154659cf7d453f.png)


#### Retrieval and generation

1. Retrieve: Given a user input, relevant splits are retrieved from storage using a Retriever.

   - `MultiQueryRetriever`: generates variants of the input question to improve retrieval hit rate.
     - https://python.langchain.com/v0.2/docs/how_to/MultiQueryRetriever/
   - `MultiVectorRetriever`: instead generates variants of the embeddings, also in order to improve retrieval hit rate.
     - https://python.langchain.com/v0.2/docs/how_to/multi_vector/
   - `Max marginal relevance`: selects for relevance and diversity among the retrieved documents to avoid passing in duplicate context.
     - https://www.cs.cmu.edu/~jgc/publication/The_Use_MMR_Diversity_Based_LTMIR_1998.pdf
   - Document filtering using metadata: https://python.langchain.com/v0.2/docs/how_to/self_query/
   - Return sources: https://python.langchain.com/v0.2/docs/how_to/qa_sources/

2. Generate: A ChatModel / LLM produces an answer using a prompt that includes the question and the retrieved data

![](https://python.langchain.com/v0.2/assets/images/rag_retrieval_generation-1046a4668d6bb08786ef73c56d4f228a.png)


In [2]:
import bs4

from langchain import hub

from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter


from core.llm import CHAT_LLM, EMBEDDINGS


# 1. Indexing
# 1) Load
# Get HTML tags with class “post-content”, “post-title”, or “post-header”
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

# 2) Split
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

# 3) Store
vectorstore = FAISS.from_documents(documents=splits, embedding=EMBEDDINGS)


# 2. Retrieval & Generation
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})
prompt = hub.pull("rlm/rag-prompt")


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | CHAT_LLM
    | StrOutputParser()
)

rag_chain.invoke("What is Task Decomposition?")

USER_AGENT environment variable not set, consider setting it to identify your requests.


'Task decomposition involves breaking down a complex problem into multiple thought steps, generating a tree structure of possibilities. It can be achieved through LLM prompting, task-specific instructions, or human inputs. The goal is to transform big tasks into smaller, manageable steps for easier interpretation and execution.'

### Retriever


In [3]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})
retrieved_docs = retriever.invoke("What are the approaches to Task Decomposition?")
len(retrieved_docs)

6

In [4]:
print(retrieved_docs[0].metadata)
print(retrieved_docs[0].page_content)

{'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.


### Analyze LCEL Runnable


In [5]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | CHAT_LLM
    | StrOutputParser()
)

for chunk in rag_chain.stream("What is Task Decomposition?"):
    print(chunk, end="", flush=True)

Task decomposition involves breaking down a complex problem into multiple thought steps, generating a tree structure of possibilities. It can be achieved through prompting with LLM, task-specific instructions, or human inputs. The process helps transform big tasks into smaller, manageable steps for easier interpretation and execution.

1. Connext runnables into a `RunnableSequence`
   - Automatic casting
     - `prompt`, `retriever`, `CHAT_LLM`, `StrOutputParser()`: `Runnable`
     - `format_docs`: `RunnableLambda`
     - `{"context": retriever | format_docs, "question": RunnablePassthrough()}`: `RunnableParallel`


In [6]:
chain = retriever
chain.invoke("What is Task Decomposition?")

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.'),
 Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs

In [7]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


chain = retriever | format_docs
chain.invoke("What is Task Decomposition?")

'Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.\n\nFig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The mode

In [8]:
from langchain_core.runnables import RunnablePassthrough

chain = RunnablePassthrough()
chain.invoke("What is Task Decomposition?")

'What is Task Decomposition?'

In [9]:
from langchain_core.runnables import RunnablePassthrough


chain = {"context": retriever | format_docs, "question": RunnablePassthrough()} | prompt
chain.invoke("What is Task Decomposition?")

ChatPromptValue(messages=[HumanMessage(content='You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don\'t know the answer, just say that you don\'t know. Use three sentences maximum and keep the answer concise.\nQuestion: What is Task Decomposition? \nContext: Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.\n\nFig. 1. Over

### Built-in chains

1. `create_stuff_documents_chain` \
   specifies how retrieved context is fed into a prompt and LLM. In this case, we will "stuff" the contents into the prompt -- i.e., we will include all retrieved context without any summarization or other processing. It largely implements our above rag_chain, with input keys context and input-- it generates an answer using retrieved context and query.
   - https://api.python.langchain.com/en/latest/chains/langchain.chains.combine_documents.stuff.create_stuff_documents_chain.html
2. `create_retrieval_chain` \
   create_retrieval_chain adds the retrieval step and propagates the retrieved context through the chain, providing it alongside the final answer. It has input key input, and includes input, context, and answer in its output. - https://api.python.langchain.com/en/latest/chains/langchain.chains.retrieval.create_retrieval_chain.html


In [10]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate


system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

# rag_chain = (
#     {"context": retriever | format_docs, "question": RunnablePassthrough()}
#     | prompt
#     | CHAT_LLM
#     | StrOutputParser()
# )
question_answer_chain = create_stuff_documents_chain(CHAT_LLM, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

response = rag_chain.invoke({"input": "What is Task Decomposition?"})
print(response["answer"])

Task decomposition involves breaking down a complex task into smaller and simpler steps to make it more manageable and easier to accomplish. This technique allows an agent or model to utilize more computation at test time by dividing a big task into multiple subtasks. By decomposing tasks, models can follow a step-by-step process to enhance performance on complex tasks.


In [11]:
response["context"]

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.'),
 Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs