# Purpose
The purpose of this notebook is to trial the usage of `RetrievalQA` for question answering in our current use case.  I'll be working mainly from these two examples: [example 1](https://python.langchain.com/docs/use_cases/question_answering/) & [example 2](https://python.langchain.com/docs/use_cases/question_answering/how_to/vector_db_qa).

# Vector Store

The dataset for this example is stored in `data/vector-db/` and it stores finanical posts and top comments from many of reddit's top financial subreddits, as described in `00-data-import.ipynb`.  The vectorstore was embedded using `all-mpnet-base-v2` from HF and so that'll be loaded and provided as well.

In [1]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': False}
embeddings = HuggingFaceEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)

db = FAISS.load_local("data/vector-db/", embeddings)

# Retrieval

## Similarity Search

The main functionality of the `FAISS` vectorstore is it's built in, fast, approximate similarity search.  This is crucial to the RAG workflow.  I'll demonstrate with a couple similar results below.

In [7]:
query = "What are people saying about Robinhood?"

similar_docs = db.similarity_search(query, k=2)

In [10]:
from IPython.display import HTML

def display_result(question, result, similar_docs):
    result_html = f"<p><blockquote style=\"font-size:24\">{question}</blockquote></p>"
    result_html += f"<p><blockquote style=\"font-size:18px\">{result}</blockquote></p>"
    result_html += "<p><hr/></p>"
    for d in similar_docs:
        source_id = d.metadata["id"]
        result_html += f"<p><blockquote>{d.page_content}<br/>(Source: {source_id})</blockquote></p>"
    display(HTML(result_html))

display_result(query, "No result yet!", similar_docs)

## Maximum marginal relevance search (MMR)
A more robust, but slower alternative to the similarity search is MMR.  It optimizes for similarity and diversity among the selected documents.

In [11]:
similar_docs = db.max_marginal_relevance_search(query, k=2, fetch_k=10)
display_result(query, "No result yet!", similar_docs)

# Generate

## RetrievalQA Chain

Now it's time to use an LLM in combination with the above to generate a response.  Here, I'll use the `RetrievalQA` chain in combination with a custom defined prompt template and Llama 2 70b [hosted on replicate](https://replicate.com/meta/llama-2-70b).

In [12]:
import toml
import os

REPLICATE_API_TOKEN = toml.load("secrets.toml")["REPLICATE_API_TOKEN"]
os.environ["REPLICATE_API_TOKEN"] = REPLICATE_API_TOKEN

In [17]:
from langchain.llms import Replicate

llm = Replicate(
    model="meta/llama-2-70b:a52e56fee2269a78c9279800ec88898cecb6c8f1df22a6483132bea266648f00",
    model_kwargs={
        "temperature":0.1, 
        "top_p":0.9, 
        "max_length":128, 
        "repetition_penalty":1
    }
)

In [49]:
from langchain.prompts import PromptTemplate

template = """Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say that you don't know, don't try to make up an answer. 
Use three sentences maximum and keep the answer as concise as possible. 
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

In [45]:
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=db.as_retriever(),
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT},
    return_source_documents=True
)

In [46]:
result = qa_chain({"query":query})
display_result(result["query"], result["result"], result["source_documents"])

## Stuff Chain

I can exert a little more control over the process if I use a "stuff" chain instead.  This allows me to fetch the documents however I see fit and then "stuff" them into the prompt. 

In [50]:
from langchain.chains.question_answering import load_qa_chain

qa_chain = load_qa_chain(llm=llm, chain_type="stuff", prompt=QA_CHAIN_PROMPT, verbose=True)

In [51]:
similar_docs = db.max_marginal_relevance_search(query, k=2, fetch_k=10)
result = qa_chain({"input_documents":similar_docs, "question":query})



[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mUse the following pieces of context to answer the question at the end. 
If you don't know the answer, just say that you don't know, don't try to make up an answer. 
Use three sentences maximum and keep the answer as concise as possible. 
CLASS ACTION AGAINST ROBINHOOD. Allowing people to only sell is the definition of market manipulation. A class action must be started, Robinhood has made plenty of money off selling info about our trades to the hedge funds to be able to pay out a little for causing people to loose money now/n/nLEAVE ROBINHOOD. They dont deserve to make money off us after the millions they caused in losses. It might take a couple of days, but send Robinhood to the ground and GME to the moon./n/nChapman Albin is an investors rights firm that my buddy works at. Just got off the phone w him. He is going to post a press release regarding the

In [53]:
display_result(result["question"], result["output_text"], result["input_documents"])