#### Conversational Q&A Chatbot 
We want to allow the user to have a back-and-forth conversation, meaning the application needs some sort of *"memory"* of past questions and answers, and some logic for incorportating those into the current thinking.
We focus on adding logic for incorporting historical messages. 
We will focus on two approaches:-
1. Chains, in which we always execute a retrival step;
2. Agents, in which we giva a LLM discretion over whether and how to execute a retrieval step (or multiple retrieval).

In [61]:
import os
from dotenv import load_dotenv
from langchain_groq import ChatGroq

In [62]:
load_dotenv()

True

In [63]:
groq_api_key = os.getenv("GROQ_API_KEY")
os.environ["HUGGINGFACEHUB_API_TOKEN"] = os.getenv("HF_TOKEN_ZETA")

In [64]:
llm = ChatGroq(groq_api_key=groq_api_key, model_name="llama3-8b-8192")

In [65]:
from langchain_huggingface import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

In [66]:
import bs4

from langchain_chroma import Chroma
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain_text_splitters import RecursiveCharacterTextSplitter

from langchain_community.document_loaders import WebBaseLoader

In [67]:
loader = WebBaseLoader(
    web_path=("https://lilianweng.github.io/posts/2024-04-12-diffusion-video/",),
    bs_kwargs= dict(parse_only=bs4.SoupStrainer(
        class_ = ("post-content", "post-title", "post-header")
    ))
)

In [68]:
docs = loader.load()

In [69]:
docs

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2024-04-12-diffusion-video/'}, page_content='\n\n      Diffusion Models for Video Generation\n    \nDate: April 12, 2024  |  Estimated Reading Time: 20 min  |  Author: Lilian Weng\n\n\nDiffusion models have demonstrated strong results on image synthesis in past years. Now the research community has started working on a harder task—using it for video generation. The task itself is a superset of the image case, since an image is a video of 1 frame, and it is much more challenging because:\n\nIt has extra requirements on temporal consistency across frames in time, which naturally demands more world knowledge to be encoded into the model.\nIn comparison to text or images, it is more difficult to collect large amounts of high-quality, high-dimensional video data, let along text-video pairs.\n\n\n\n🥑 Required Pre-read: Please make sure you have read the previous blog on “What are Diffusion Models?” for image generation before 

In [70]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

vectorstore = Chroma.from_documents(documents=splits, embedding=embeddings)
retriever = vectorstore.as_retriever()
retriever



VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x0000017E258D34D0>, search_kwargs={})

In [71]:
system_prompt = (
    "You are a highly capable AI assistant specializing in answering questions based on provided context."
    "Use only the given context to answer the question, and do not rely on prior knowledge or assumptions."
    "If the answer is not present in the context, clearly state, 'I don\'t know.'"
    "Keep your response concise, limited to a maximum of three sentences, and ensure clarity and relevance to the question."
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

In [72]:
question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)


In [73]:
response = rag_chain.invoke({"input": "In the case of video generation, we need the diffusion model for which purpose?"})
response


{'input': 'In the case of video generation, we need the diffusion model for which purpose?',
 'context': [Document(id='355d8af2-b0b7-4a35-8256-451e3521dd19', metadata={'source': 'https://lilianweng.github.io/posts/2024-04-12-diffusion-video/'}, page_content='Diffusion Models for Video Generation\n    \nDate: April 12, 2024  |  Estimated Reading Time: 20 min  |  Author: Lilian Weng\n\n\nDiffusion models have demonstrated strong results on image synthesis in past years. Now the research community has started working on a harder task—using it for video generation. The task itself is a superset of the image case, since an image is a video of 1 frame, and it is much more challenging because:\n\nIt has extra requirements on temporal consistency across frames in time, which naturally demands more world knowledge to be encoded into the model.\nIn comparison to text or images, it is more difficult to collect large amounts of high-quality, high-dimensional video data, let along text-video pairs.

In [74]:
response["answer"]

'According to the provided context, the diffusion model is needed for video generation, which is a "harder task" compared to image synthesis, due to the extra requirements on temporal consistency across frames in time and the difficulty of collecting large amounts of high-quality, high-dimensional video data.'

#adding chat history

In [75]:
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder

In [76]:
contextualize_q_system_prompt = (
    "You are a question reformulation assistant. Given a chat history and the latest user question, your task is to reformulate the latest user question into a standalone question that can be understood without any reference to the previous conversation.If the question already makes complete sense on its own, return it as is. Do not answer the question. Your task is only to reformulate the question if necessary."
)

contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        (MessagesPlaceholder("chat_history")),
        ("human", "{input}"),
    ]
)




In [77]:
qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

In [78]:
question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

rag_chain = create_retrieval_chain(retriever, question_answer_chain)


In [79]:
from langchain_core.messages import AIMessage, HumanMessage

chat_history = []

question = "What is the use of base denoising models?"
response1 = rag_chain.invoke({"input": question, "chat_history": chat_history})

chat_history.extend(
    [
        HumanMessage(content=question),
        AIMessage(content=response1["answer"])
    ]
)

question2 = "Tell me more about it"
response2 = rag_chain.invoke({"input": question2, "chat_history": chat_history})
print(response2)

{'input': 'Tell me more about it', 'chat_history': [HumanMessage(content='What is the use of base denoising models?', additional_kwargs={}, response_metadata={}), AIMessage(content='According to the context, the base denoising models perform spatial operations over all the frames with shared parameters simultaneously and then the temporal layer mixes activations across frames to better capture temporal coherence.', additional_kwargs={}, response_metadata={})], 'context': [Document(id='721f8b38-a5ef-4b75-a419-24b2ce3868c4', metadata={'source': 'https://lilianweng.github.io/posts/2024-04-12-diffusion-video/'}, page_content='Visualizing how the diffusion update step works in the angular coordinate, where DDIM evolves $\\mathbf{z}_{\\phi_s}$ by moving it along the $-\\hat{\\mathbf{v}}_{\\phi_t}$ direction. (Image source: Salimans & Ho, 2022)'), Document(id='1fc50a00-e172-4c9d-a670-79381e614ce0', metadata={'source': 'https://lilianweng.github.io/posts/2024-04-12-diffusion-video/'}, page_con

In [80]:
chat_history

[HumanMessage(content='What is the use of base denoising models?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='According to the context, the base denoising models perform spatial operations over all the frames with shared parameters simultaneously and then the temporal layer mixes activations across frames to better capture temporal coherence.', additional_kwargs={}, response_metadata={})]

In [81]:
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

In [83]:
store = {}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

conversational_rag_chain = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    output_messages_key="answer",
)


In [88]:
conversational_rag_chain.invoke(
    {"input": "What are still two common architecture choices for video generation?"},
    config={"configurable":{"session_id": "xyz123"}}
)["answer"]


"I don't know. The provided context does not mention specific architecture choices for video generation."

In [89]:
conversational_rag_chain.invoke(
    {"input": "Tell me more about it?"},
    config={"configurable":{"session_id": "xyz123"}}
)["answer"]

"I apologize, but I don't have any more information to provide. The text only describes the hierarchical sampler and its use in generating long videos with time consistency under memory constraints, and does not provide any additional details or examples."

In [90]:
store

{'xyz123': InMemoryChatMessageHistory(messages=[HumanMessage(content='What are still two common architecture choices for video generation?', additional_kwargs={}, response_metadata={}), AIMessage(content="I don't know. The provided context does not mention specific architecture choices for video generation. It only discusses the Gen-1 model by Runway and its decomposition of structure and content for video editing.", additional_kwargs={}, response_metadata={}), HumanMessage(content='Tell me more about it', additional_kwargs={}, response_metadata={}), AIMessage(content='I apologize, but the provided context does not mention the Gen-1 model by Runway or its decomposition of structure and content for video editing. The context only appears to be a diagram or image description related to the diffusion update step in the angular coordinate and its use in DDIM (Denominator Diffusion-based Image Model). It does not contain any information about video generation architecture choices.', additio