# Understanding Retreival Question Answering

In [1]:
import os, random
from pathlib import Path
import tiktoken
from getpass import getpass
from rich.markdown import Markdown

# Set OpenAI API key 

To get key, click on [link](https://platform.openai.com/account/api-keys).

In [2]:
if os.getenv("OPENAI_API_KEY") is None:
  if any(['VSCODE' in x for x in os.environ.keys()]):
    print('Please enter password in the VS Code prompt at the top of your VS Code window!')
  os.environ["OPENAI_API_KEY"] = getpass("Paste your OpenAI key from: https://platform.openai.com/account/api-keys\n")

assert os.getenv("OPENAI_API_KEY", "").startswith("sk-"), "This doesn't look like a valid OpenAI API key"
print("OpenAI API key configured")

Please enter password in the VS Code prompt at the top of your VS Code window!


AssertionError: This doesn't look like a valid OpenAI API key

# Langchain

[LangChain](https://docs.langchain.com/docs/) is a framework for developing applications powered by LLMs. We will use some of its features in the code below:
* For processing and parsing documents.
* Use the retreival chain - containing a lot of functionality to implement our question-answering system.

Let's start by configuring W&B tracing. 

In [2]:
# Need a single line of code to start tracing langchain with W&B
os.environ["LANGCHAIN_WANDB_TRACING"] = "true"

# wandb documentation to configure wandb using env variables
# https://docs.wandb.ai/guides/track/advanced/environment-variables
# here we are configuring the wandb project name
os.environ["WANDB_PROJECT"] = "llmapps"

# os.environ["RUST_BACKTRACE"] = "full"

## Parsing documents

We will use a small sample of markdown documents in this notebook. Let's find them and make sure we can stuff them into the prompt. That means they may need to be chunked and not exceed some number of tokens. 

In [3]:
# Start with davinci model cause it uses a simpler prompt (later move on to gpt-4)
MODEL_NAME = "text-davinci-003"
# MODEL_NAME = "gpt-4"

In [None]:
# First step of parsing our documents is to load all the Markdown files in 
# specified directory
# - we do this by using class from LangChain -> DirectoryLoader

from langchain.document_loaders import DirectoryLoader

def find_md_files(directory):
    "Find all markdown files in a directory and return a LangChain Document"
    dl = DirectoryLoader(directory, "**/*.md")
    return dl.load()

# Load all the Markdown
documents = find_md_files('../docs_sample/')

# Number of documents
len(documents)

In [None]:
# We will need to count tokens in the documents. For that we need a tokenizer
tokenizer = tiktoken.encoding_for_model(MODEL_NAME)

In [None]:
# Function to count the number of tokens in each document
def count_tokens(documents):
    token_counts = [len(tokenizer.encode(document.page_content)) for document in documents]
    return token_counts

count_tokens(documents)

In above result, we can see that some documents are pretty short and others are quite long, and may want to chunk them into sections.

We use `LangChain` built in `MarkdownTextSplitter` to split the documents into sections (since docs are in `Markdown` format). 
* Splitting `Markdown` without breaking syntax is not that easy. This splitter strips out `syntax`.
* The `MarkdownTextSplitter` also takes care of removing double line breaks and save us some tokens that way.
  
We can pass:
* `chunk_size` param - to avoid lenghty chunks.
* `chunk_overlap` param - useful so you don't cut sentences randomly (less necessary with Markdown)

In [None]:
from langchain.text_splitter import MarkdownTextSplitter

md_text_splitter = MarkdownTextSplitter(chunk_size=1000)
document_sections = md_text_splitter.split_documents(documents)
len(document_sections), max(count_tokens(document_sections))

The above splitting results in 90 documents (i.e. more chunks/ documents), and the maximum number of tokens in a chunk (or document) is 537. This will fit inside context window.

In [None]:
# Here we look at the first section
Markdown(document_sections[0].page_content)

# Embeddings

Now we use embeddings with a vector database retriever to find relevant documents for a query. 

We use:
* `OpenAIEmbeddings` to embed the text, 
* `Chroma` as vector store to store the embeddings

Could use different embeddings.
* `Cohere` provides a good multilingual embedding model if dealing with languages other than English 

In [None]:
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma

# Initialise embeddings
embeddings = OpenAIEmbeddings()
# Use Chroma vector store to parse the document chunks from above
db = Chroma.from_documents(document_sections, embeddings)

Now we can create a retriever from the db. 

The `k` param is used to decide how many relevant sections we retrieve from the similarity search

In [None]:
retriever = db.as_retriever(search_kwargs=dict(k=3))

In [None]:
# Retreive the docs relevant to the query by using the above Chroma retreiver 
query = "How can I share my W&B report with my team members in a public W&B project?"
docs = retriever.get_relevant_documents(query)

In [None]:
# Let's see the results
for doc in docs:
    print(doc.metadata["source"])

Above results show that the right kind of documents are retrieved (about collaboration - related to query)

# Stuff Prompt

Now that we retrieved relevant docs, we want to stuff them into the prompt template along with the user query, and pass into an LLM to obtain the answer.

* To do this we use the `PromptTemplate` from `LangChain` (similar to an F string in Python)
* This is a simple prompt (not a Level 5 prompt)
* Define two variables: `context` and `question`

In [None]:
from langchain.prompts import PromptTemplate

prompt_template = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}
Helpful Answer:"""

# In the prompt template we define two inputs: "context", "question"
PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

# The context is a concatenation of the retrieved docs
context = "\n\n".join([doc.page_content for doc in docs])
# Populate the prompt with the context and query variables
prompt = PROMPT.format(context=context, question=query)

Use langchain to call OpenAI davinci API, to predict an answer to the above prompt, given the docs retrieved from embeddings

In [None]:
from langchain.llms import OpenAI

llm = OpenAI()
response = llm.predict(prompt)
Markdown(response)

Above, we can see that we stream LangChain activity into W&B (since we previously set `LANGCHAIN_WANDB_TRACING`=true). This will be useful to check what worked, any errors, and type of results obtained.

# Using LangChain

`LangChain` provides tools (like `RetrievalQA` chain) to encapsulate the above sequence of actions into a chain, in few lines of code


In [None]:
from langchain.chains import RetrievalQA

# Instanciate this retrieval QA chain from the OpenAI LLM
qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="stuff", retriever=retriever)
# Run above query against this chain
# - will retrive the most relevant docs (k=3) to the query
# - will concatenate docs to query for improved answer?
result = qa.run(query)

# We should see a similar answer to what we saw before
Markdown(result)

In [None]:
import wandb
wandb.finish()