# Setup

In [4]:
from dotenv import load_dotenv

In [5]:
load_dotenv()

True

# Add Chat History

In many Q&A applications, we want to allow the user to have a back-and-forth conversation, meaning the application needs:
- some sort of "memory" of past questions and answers, 
- some logic for incorporating those into its current thinking.

In this guide, we will focus on **adding some logic for incorporating historical messages**.

Further details on chat history management is [**covered here**](https://python.langchain.com/v0.1/docs/expression_language/how_to/message_history/)

We'll work off the Q&A app we built over the [**LLM Powered Autonomous Agents**](https://lilianweng.github.io/posts/2023-06-23-agent/) blog post by Lilan Weng in the [**Quickstart**](https://python.langchain.com/v0.1/docs/use_cases/question_answering/quickstart/)

We'll need to update two things about our existing app:
1. **Prompt**: Update our prompt to support historical messages as an input.
2. **Contextualizing questions**: Add a sub-chain that:
    - takes the latest user question,
    - reformulates it in the contextof the chat history.
    
    This is needed in the case the latest question references some context from past messages.

    For example, if a user asks a follow-up question like: *"Can you elaborate on the second point?"*, this cannot be understood without the context of the previous message.

    Therefore, we can't effectively perform retrieval with a question like this.

# Chain Without Chat History

Here is the Q&A app we built over the LLM Powered Autonomous Agents blog post by Lilian Weng in the Quickstart:

In [6]:
import bs4
from langchain import hub
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

In [8]:
# Load, chunk and index the contents of the blog.
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())

# Retrieve and generate using the relevant snippets of the blog.
retriever = vectorstore.as_retriever()
prompt = hub.pull("rlm/rag-prompt")
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [9]:
rag_chain.invoke("What is Task Decomposition?")

'Task decomposition is a technique that breaks down complex tasks into smaller and simpler steps. It allows models to think step by step and transform big tasks into manageable ones. Different methods like Chain of Thought and Tree of Thoughts can be used for task decomposition.'

# Contextualizing the Question

First, we will need to define a sub-chain that:
- takes historical messages and the latest user question,
- reformulates the question if it makes reference to any information in the historical information.

We'll use a prompt that includes a `MessagesPlaceholder` variable under the name "chat_history".

This allows us to pass in a list of Messages to the prompt using "chat_history" input key.

These messages will be inserted:
- after the system message,
- before the human message containing the latest question.

Note that we leverage a helper function `create_history_aware_retriever` for this step, which:
- manages the case where `chat_history` is empty,
- otherwise, applies `prompt | llm | StrOutputParser() | retriever` in sequence.

`create_history_aware_retriever` constructs a chain that:
- accepts keys `input` and `chat_history` as input,
- has the same output schema as a retriever.

In [10]:
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

In [12]:
contextualize_q_system_prompt = """Given a chat history and the latest user question \
which might reference context in the chat history, formulate a standalone question \
which can be understood without the chat history. Do NOT answer the question, \
just reformulate it if needed and otherwise return it as it is."""

contextualize_q_prompt = ChatPromptTemplate.from_messages([
    ("system", contextualize_q_system_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
])

history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_q_prompt
)

This chain prepends a rephrasing of the input query to our retriever, so that the retrieval incorporates the context of the conversation.

> **API Reference**
> - [**create_history_aware_retriever**](https://api.python.langchain.com/en/latest/chains/langchain.chains.history_aware_retriever.create_history_aware_retriever.html)
> - [**ChatPromptTemplate**](https://api.python.langchain.com/en/latest/prompts/langchain_core.prompts.chat.ChatPromptTemplate.html)
> - [**MessagesPlaceholder**](https://api.python.langchain.com/en/latest/prompts/langchain_core.prompts.chat.MessagesPlaceholder.html)

# Chain With Chat History

And now we can build our full QA chain.

Here we use [**create_stuff_documents_chain**](https://api.python.langchain.com/en/latest/chains/langchain.chains.combine_documents.stuff.create_stuff_documents_chain.html) to generate a `question_answer_chain`, with input keys `context`, `chat_history` and `input`.

It accepts the retrieved context alongside the conversation history and query to generate an answer.

We build our final `rag_chain` with [**create_retrieval_chain**](https://api.python.langchain.com/en/latest/chains/langchain.chains.retrieval.create_retrieval_chain.html). This chain applies the:
- `history_aware_retriever`
- `question_answer_chain`

in sequence, retaining intermediate outputs such as the retrieved context for convenience.

It has input keys:
- `input`,
- `chat_history`;

and includes:
- `input`,
- `chat_history`,
- `context`,
- `answer` in its output.

In [13]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

In [14]:
qa_system_prompt = """You are an assistant for question-answering tasks. \
Use the following pieces of retrieved context to answer the question. \
If you don't know the answer, just say that you don't know. \
Use three sentences maximum and keep the answer concise.\

{context}"""

qa_prompt = ChatPromptTemplate.from_messages([
    ("system", qa_system_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
])

question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

> **API Reference**
> - [**create_retrieval_chain**](https://api.python.langchain.com/en/latest/chains/langchain.chains.retrieval.create_retrieval_chain.html)
> - [**create_stuff_documents_chain**](https://api.python.langchain.com/en/latest/chains/langchain.chains.combine_documents.stuff.create_stuff_documents_chain.html)

In [15]:
from langchain_core.messages import HumanMessage

In [16]:
chat_history = []

question = "What is Task Decomposition?"
ai_msg_1 = rag_chain.invoke({
    "input": question,
    "chat_history": chat_history
})
chat_history.extend([HumanMessage(content=question), ai_msg_1["answer"]])

second_question = "What are the commmon ways of doing it?"
ai_msg_2 = rag_chain.invoke({
    "input": second_question, "chat_history": chat_history
})

print(ai_msg_2["answer"])

Task decomposition can be achieved through various methods such as using prompting techniques like Chain of Thought or Tree of Thoughts, providing task-specific instructions tailored to the specific task at hand, or incorporating human inputs to break down complex tasks into smaller, more manageable steps. These approaches help in enhancing model performance on complex tasks by breaking them down into simpler components for easier problem-solving.


> **API Reference**
> - [**HumanMessage**](https://api.python.langchain.com/en/latest/messages/langchain_core.messages.human.HumanMessage.html)

💡 Check out the [**LangSmith trace**](https://smith.langchain.com/public/243301e4-4cc5-4e52-a6e7-8cfe9208398d/r)

# Returning Sources

Often, in Q&A applications, it's important to show users the sources that were used to generate the answer.

LangChain's built in `create_retrieval_chain` will propagate retrieved source documents through to the output in the `"context"` key.

In [17]:
for document in ai_msg_2["context"]:
    print(document)
    print()

page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.' metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}

page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search 