# Conversational Q&A chatbot 

In many Q&A chatbot applications , we want to allow users to have a back and forth conversation , meaning the application needs some sort of "memory" of past questions and answers , and some logic for incorporating these into the current thinking. 

In this guide we focus on addng logic for incorporating historical messages. Further details on chat history management is covered in previous videos 

We will cover two approaches : 
* Chains , in which we always execute a retrieval step 
* Agents , in which we give an LLM discretion over whether and how to execute the retrieval step(or multiple steps)

In [1]:
import os 
from dotenv import load_dotenv
load_dotenv()
from langchain_groq import ChatGroq

groq_api_key = os.getenv('GROQ_API_KEY')

llm = ChatGroq(groq_api_key= groq_api_key , model_name = "Llama3-8b-8192")

llm

ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x00000247E0D62DA0>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x00000247E0D62110>, model_name='Llama3-8b-8192', model_kwargs={}, groq_api_key=SecretStr('**********'))

In [2]:
!pip install --upgrade --quiet  langchain sentence_transformers

In [3]:
os.environ['HF_TOKEN'] = os.getenv('HF_TOKEN')
from langchain_huggingface.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")

  from .autonotebook import tqdm as notebook_tqdm





To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


In [17]:
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter



In [20]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain


In [12]:
# 1. Load , chun and index the contents of the blog to create a retriever 
import bs4

loader = WebBaseLoader(
    web_path=("https://lilianweng.github.io/posts/2021-07-11-diffusion-models/",),
    bs_kwargs=dict(
        parse_only = bs4.SoupStrainer(
            class_ = ("post-content" , "post-title" , "post-header")
        )
    ),
)

docs = loader.load()

In [13]:
docs

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2021-07-11-diffusion-models/'}, page_content='\n\n      What are Diffusion Models?\n    \nDate: July 11, 2021  |  Estimated Reading Time: 32 min  |  Author: Lilian Weng\n\n\n\n[Updated on 2021-09-19: Highly recommend this blog post on score-based generative modeling by Yang Song (author of several key papers in the references)].\n[Updated on 2022-08-27: Added classifier-free guidance, GLIDE, unCLIP and Imagen.\n[Updated on 2022-08-31: Added latent diffusion model.\n[Updated on 2024-04-13: Added progressive distillation, consistency models, and the Model Architecture section.\nSo far, I’ve written about three types of generative models, GAN, VAE, and Flow-based models. They have shown great success in generating high-quality samples, but each has some limitations of its own. GAN models are known for potentially unstable training and less diversity in generation due to their adversarial training nature. VAE relies on a sur

In [14]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000 , chunk_overlap = 200)
splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents = splits , embedding=embeddings)
retriever = vectorstore.as_retriever()
retriever


VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x00000247CECA75E0>, search_kwargs={})

In [16]:
## Prompt Template 

system_prompt = (
    "You are an assistant for questions answering tasks."
    "Use the following pieces of retrieved context to answer" 
    "the uestion.If you don't know the answer , say that you"
    "don't know. Use three sentences maximum and keep the"
    "answer concise"
    "\n \n"
    "{context}"
)

In [18]:
prompt = ChatPromptTemplate.from_messages(
    [
        ("system" , system_prompt),
        ("human" , "{input}"),
    ]
)

In [22]:
question_answer_chain = create_stuff_documents_chain(llm,prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)
response = rag_chain.invoke({"input": "What is forward diffusion process"})

In [23]:
response['answer']

'According to the context, the forward diffusion process is a process where a data point sampled from a real data distribution is repeatedly perturbed by adding small amounts of Gaussian noise, resulting in a sequence of noisy samples.'

# Adding Chat History 

In [24]:
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder

Contextualize_q_system_prompt = (
    "Given a chat history and the latest user question" 
    "which might reference context in the chat history," 
    "formulate a standalone question which can be understood" 
    "without the chat history . Do NOT Answer the question," 
    "just reformulate it if needed and otherwise return it as it is"
)

In [25]:
contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system" , Contextualize_q_system_prompt),
         MessagesPlaceholder("chat_history"),
        ("human" , "{input}"),
    ]
)

In [27]:
history_aware_retriever = create_history_aware_retriever(llm , retriever , contextualize_q_prompt)
history_aware_retriever

RunnableBinding(bound=RunnableBranch(branches=[(RunnableLambda(lambda x: not x.get('chat_history', False)), RunnableLambda(lambda x: x['input'])
| VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x00000247CECA75E0>, search_kwargs={}))], default=ChatPromptTemplate(input_variables=['chat_history', 'input'], input_types={'chat_history': list[typing.Annotated[typing.Union[typing.Annotated[langchain_core.messages.ai.AIMessage, Tag(tag='ai')], typing.Annotated[langchain_core.messages.human.HumanMessage, Tag(tag='human')], typing.Annotated[langchain_core.messages.chat.ChatMessage, Tag(tag='chat')], typing.Annotated[langchain_core.messages.system.SystemMessage, Tag(tag='system')], typing.Annotated[langchain_core.messages.function.FunctionMessage, Tag(tag='function')], typing.Annotated[langchain_core.messages.tool.ToolMessage, Tag(tag='tool')], typing.Annotated[langchain_core.messages.ai.AIMessageChunk, Tag(tag='AIMessag

In [29]:
qa_prompt = ChatPromptTemplate.from_messages(
    [
        ('system' , system_prompt),
        MessagesPlaceholder("chat_history"),
        ('human' , "{input}"),
    ]
)

In [30]:
question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)
rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)



In [35]:
from langchain_core.messages import AIMessage,HumanMessage
chat_history = []
question = "What the diffusion process "
response1 = rag_chain.invoke({"input": question, "chat_history":chat_history} )

chat_history.extend(
    [
        HumanMessage(content= question),
        AIMessage(content=response1['answer']),
    ]
)

In [36]:
question2 = "Tell me more about it"

response2 = rag_chain.invoke({"input":question2 , "chat_history" : chat_history})
print(response2['answer'])

I don't know. The context doesn't provide more information about the diffusion process.


In [38]:
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

store = {}

def get_session_history(session_id : str) -> BaseChatMessageHistory :
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]


conversational_rag_chain = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    output_messages_key="answer",
    
   
)

In [39]:
conversational_rag_chain.invoke(
    {"input": "What is Task Decomposition?"},
    config = {
        "configurable": {"session_id" : "abc123"}
    },
)["answer"]

"I don't know. The provided context does not mention Task Decomposition."