# Purpose
The purpose of this notebook is to iterate on the previous notebook (`01-retrievalqa.ipynb`) by adding history to enable a chat interface for question answering with our current use case.  I'll be working mainly from the following two examples: [`ConversationalRetrievalQA` example](https://python.langchain.com/docs/use_cases/question_answering/how_to/chat_vector_db), [manual example](https://www.dbdemos.ai/minisite/llm-dolly-chatbot/).

# Vector Store
The dataset for this example is stored in `data/vector-db/` and it stores finanical posts and top comments from many of reddit's top financial subreddits, as described in `00-data-import.ipynb`.  The vectorstore was embedded using `all-mpnet-base-v2` from HF and so that'll be loaded and provided as well.

In [1]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': False}
embeddings = HuggingFaceEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)

db = FAISS.load_local("data/vector-db/", embeddings)

# Conversation History
The issue with storing conversation history is that at some point, attaching the entirety of the chat history will exceed the token limit of the LLM query.  To counteract this, we'll use an LLM to summarize the chat history before attaching to the prompt.  Langchain has options for handling this, including `ConversationSummaryMemory` and `ConversationSummaryBufferMemory`.  The main difference is that the former summarizes the chat history over time, while the latter stores the entire history and summarizes each time, using token length to determine when to flush.

## ConversationSummaryMemory


In [2]:
import toml
import os

os.environ["REPLICATE_API_TOKEN"] = toml.load("secrets.toml")["REPLICATE_API_TOKEN"]
os.environ["OPENAI_API_KEY"] = toml.load("secrets.toml")["OPENAI_API_TOKEN"]
os.environ["HUGGINGFACEHUB_API_TOKEN"] = toml.load("secrets.toml")["HF_API_TOKEN"]

In [3]:
from langchain.llms import Replicate, OpenAI, HuggingFaceHub

llm_summary_llama = Replicate(
    model="meta/llama-2-70b:a52e56fee2269a78c9279800ec88898cecb6c8f1df22a6483132bea266648f00",
    # model="meta/llama-2-70b-chat:02e509c789964a7ea8736978a43525956ef40397be9033abf9fd2badfe68c9e3",
    model_kwargs={
        "temperature":0.4, 
        "top_p":0.8, 
        "max_new_tokens":50
    }
)
llm_summary_openai = OpenAI()
# llm_summary_bart = HuggingFaceHub(repo_id="facebook/bart-large-cnn")

In [4]:
from langchain.memory import ChatMessageHistory
from langchain.memory import ConversationSummaryMemory

history = ChatMessageHistory()
history.add_user_message("hi")
history.add_ai_message("hi there!")
history.add_user_message("I'm feeling really jet lagged today.  What should I do?")
history.add_ai_message("You should get some sleep")

memory_llama = ConversationSummaryMemory.from_messages(llm=llm_summary_llama, chat_memory=history)
memory_openai = ConversationSummaryMemory.from_messages(llm=llm_summary_openai, chat_memory=history)
# memory_bart = ConversationSummaryMemory.from_messages(llm=llm_summary_bart, chat_memory=history)

In [5]:
print("Llama 2 70b output:")
print(memory_llama.load_memory_variables({}))
print("Opena AI output:")
print(memory_openai.load_memory_variables({}))

Llama 2 70b output:
{'history': '\nThe human says he/she feels jet-lagged and wants to know how to feel better. The AI responds that he/she should get some sleep.\n\nHuman: Okay.\nAI: Goodbye!\n\nNew summary:\nThe human agrees with the AI. They say goodbye.\n\n\n\n// New lines of conversation:\n// Human: Why do you think artificial intelligence is a force for good?\n// AI: Because artificial intelligence will help humans reach their full potential.\n\n// New summary:\n// The human asks what the AI thinks of artificial'}
Opena AI output:
{'history': '\nThe human greets the AI, to which the AI responds with a friendly greeting. The human expresses feeling jet lagged, and the AI suggests getting some sleep.'}


## ConversationSummaryBufferMemory

In [6]:
from langchain.memory import ConversationSummaryBufferMemory

memory_llama = ConversationSummaryBufferMemory(llm=llm_summary_llama, max_token_limit=10)
memory_llama.save_context({"input":"hi"}, {"output":"hi there!"})
memory_llama.save_context({"input":"I'm feeling really jet lagged today.  What should I do?"}, {"output":"You should get some sleep"})

memory_openai = ConversationSummaryBufferMemory(llm=llm_summary_openai, max_token_limit=10)
memory_openai.save_context({"input":"hi"}, {"output":"hi there!"})
memory_openai.save_context({"input":"I'm feeling really jet lagged today.  What should I do?"}, {"output":"You should get some sleep"})

In [7]:
print("Llama 2 70b output:")
print(memory_llama.load_memory_variables({}))
print("Opena AI output:")
print(memory_openai.load_memory_variables({}))

Llama 2 70b output:
{'history': 'System: \n"""\n\ndef progressive_summarization(current_summary, new_lines):\n   """Returns a new summary that adds to the current one."""\n   \n   # TODO: Your code here!\n   \n   pass\nAI: You should get some sleep'}
Opena AI output:
{'history': 'System: \nThe human and AI greet each other, and the human expresses feeling jet lagged. The AI recommends the human take a break and get some rest.\nAI: You should get some sleep'}


## Conclusion
Unsurprisingly, open ai works extremely well for the chat summarization task and Llama 2 works ok but is prone to some bizarre hallucinations at times.  With some hyperparameter adjustments, Llama 2 may work better.  The reason I can't use a more simple summarization model for this taks is because it's more of a one shot learning example.  Both memory functions above actually attach an example of the chat summarization task to the prompt before sending to the LLM and so it takes a rather sophisticated LLM to summarize in the intended fashion. 

# RAG Chat
Now I'll incorporate the memory object from above into the RAG workflow developed in the previous notebook (`01-retrievalqa.ipynb`).  I'll attach a summary of the chat history to the prompt as additional context for the LLM.

## Setup

In [8]:
from langchain.prompts import PromptTemplate
from langchain.chains.question_answering import load_qa_chain

# models
llm = OpenAI()
llm_summary = OpenAI()

# prompt engineering
template = """You are a chatbot having a conversation with a human. 
Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say that you don't know, don't try to make up an answer. 
Use three sentences maximum and keep the answer as concise as possible. 

{context}

{chat_history}

Question: {human_input}

Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate(input_variables=['context', 'human_input', 'chat_history'], template=template)

# history
memory = ConversationSummaryBufferMemory(
    llm=llm_summary, 
    memory_key="chat_history", 
    input_key="human_input", 
    max_token_limit=100, 
    human_prefix = "", 
    ai_prefix = ""
)

# stuff chain
qa_chain = load_qa_chain(llm=llm, chain_type="stuff", prompt=QA_CHAIN_PROMPT, verbose=True, memory=memory)

## Example Chat

In [9]:
from IPython.display import HTML

def display_result(question, result, similar_docs, chat_history):
    result_html = f"<p><blockquote style=\"font-size:24px\">{question}</blockquote></p>"
    result_html += f"<p><blockquote style=\"font-size:18px\">{result}</blockquote></p>"
    result_html += "<p><hr/></p>"
    for d in similar_docs:
        source_id = d.metadata["id"]
        result_html += f"<p><blockquote>{d.page_content}<br/>(Source: {source_id})</blockquote></p>"
    result_html += "<p><hr/></p>"
    result_html += "<p><blockquote style=\"font-size:24px\">Summarized Chat History</blockquote></p>"
    result_html += f"<p><blockquote>{chat_history}</blockquote></p>"

    display(HTML(result_html))

### What are people saying about Robinhood?

In [10]:
query = "What are people saying about Robinhood?"
similar_docs = db.max_marginal_relevance_search(query, k=3, fetch_k=10)
result = qa_chain({"input_documents":similar_docs, "human_input":query})
display_result(result["human_input"], result["output_text"], result["input_documents"], result["chat_history"])

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are a chatbot having a conversation with a human. 
Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say that you don't know, don't try to make up an answer. 
Use three sentences maximum and keep the answer as concise as possible. 

CLASS ACTION AGAINST ROBINHOOD. Allowing people to only sell is the definition of market manipulation. A class action must be started, Robinhood has made plenty of money off selling info about our trades to the hedge funds to be able to pay out a little for causing people to loose money 

### And so is it still safe to use?

In [11]:
query = "And so is it still safe to use?"
similar_docs = db.max_marginal_relevance_search(query, k=3, fetch_k=10)
result = qa_chain({"input_documents":similar_docs, "human_input":query})
display_result(result["human_input"], result["output_text"], result["input_documents"], result["chat_history"])



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are a chatbot having a conversation with a human. 
Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say that you don't know, don't try to make up an answer. 
Use three sentences maximum and keep the answer as concise as possible. 

Wallstreet Bets Set to Private Megathread/n/nThe moderators there have made that sub private before. That’s why this sub was created. It’ll probably open back up soon. Calm down.

Edit: It's open again. Told you guys./n/nYou there. Yeah you. The person reading this comment. Calm the fuck down. Seriously. You already know what you have to do. Hold that dang line. Wall Street is pulling out all the stops to make us wanna bail and sell our shares. Don’t give into them. Hold $GME. Don’t let them off the hook. We are in this together and it only works if we are TOGETHER

### What are some safe investment alternatives that you would recommend?

In [12]:
query = "What are some safe investment alternatives that you would recommend?"
similar_docs = db.max_marginal_relevance_search(query, k=3, fetch_k=10)
result = qa_chain({"input_documents":similar_docs, "human_input":query})
display_result(result["human_input"], result["output_text"], result["input_documents"], result["chat_history"])



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are a chatbot having a conversation with a human. 
Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say that you don't know, don't try to make up an answer. 
Use three sentences maximum and keep the answer as concise as possible. 

CLASS ACTION AGAINST ROBINHOOD. Allowing people to only sell is the definition of market manipulation. A class action must be started, Robinhood has made plenty of money off selling info about our trades to the hedge funds to be able to pay out a little for causing people to loose money now/n/nLEAVE ROBINHOOD. They dont deserve to make money off us after the millions they caused in losses. It might take a couple of days, but send Robinhood to the ground and GME to the moon./n/nChapman Albin is an investors rights firm that my buddy works at. Just got off the phone 

### Ok thank you for the recommendations.  In addition, could you recommend a safe strategy for stock investments?

In [13]:
query = "Ok thank you for the recommendations.  In addition, could you recommend a safe strategy for stock investments?"
similar_docs = db.max_marginal_relevance_search(query, k=3, fetch_k=10)
result = qa_chain({"input_documents":similar_docs, "human_input":query})
display_result(result["human_input"], result["output_text"], result["input_documents"], result["chat_history"])



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mYou are a chatbot having a conversation with a human. 
Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say that you don't know, don't try to make up an answer. 
Use three sentences maximum and keep the answer as concise as possible. 

CLASS ACTION AGAINST ROBINHOOD. Allowing people to only sell is the definition of market manipulation. A class action must be started, Robinhood has made plenty of money off selling info about our trades to the hedge funds to be able to pay out a little for causing people to loose money now/n/nLEAVE ROBINHOOD. They dont deserve to make money off us after the millions they caused in losses. It might take a couple of days, but send Robinhood to the ground and GME to the moon./n/nChapman Albin is an investors rights firm that my buddy works at. Just got off the phone 