# Document Question Answering

An example of using Chroma DB and LangChain to do question answering over documents.

In [1]:
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.llms import OpenAI
from langchain.chains import VectorDBQA
from langchain.document_loaders import TextLoader

## Load documents

Load documents to do question answering over. If you want to do this over your documents, this is the section you should replace.

In [4]:
loader = TextLoader('../data/cointelegraph_20230221_trunc.json')
documents = loader.load()

## Split documents

Split documents into small chunks. This is so we can find the most relevant chunks for a query and pass only those into the LLM.

In [50]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

## Initialize ChromaDB

Create embeddings for each chunk and insert into the Chroma vector database.

In [51]:
embeddings = OpenAIEmbeddings()
vectordb = Chroma.from_documents(texts, embeddings)

Running Chroma using direct local API.
Using DuckDB in-memory for database. Data will be transient.


## Create the chain

Initialize the chain we will use for question answering.

In [7]:
qa = VectorDBQA.from_chain_type(llm=OpenAI(), chain_type="stuff", vectorstore=vectordb)

## Ask questions!

Now we can use the chain to ask questions!

In [9]:
query = "What did the president say about Ketanji Brown Jackson"
qa.run(query)

" I don't know."

In [21]:
query = "Generate 10 questions that a retail crypto investor might want to ask a chatbot for investing research in cypto and web3 space, focused on topics of ETH, ZK, layer2 that are related to news and events happened on 2021?"
result = qa.run(query)
print(result)


1. What is the current price of ETH? 
2. What is the current price of ZK? 
3. What is the best way to invest in ETH? 
4. What are the risks associated with investing in ZK? 
5. What are the main features of layer2 scaling? 
6. What is the most recent news about ETH? 
7. What is the most recent news about ZK? 
8. What upcoming events should I know about in the crypto and web3 space? 
9. What are the benefits of investing in layer2 solutions? 
10. How have ETH, ZK, and layer2 solutions performed in 2021?


# Query Assisted Generation

In [65]:
query = "What is Arbitrum?"
hits = vectordb.similarity_search(query=query)

hits_page_content = [h.page_content for h in hits]
hits_sources = [h.metadata['source'] for h in hits]

In [66]:
from langchain import PromptTemplate


template = """
I want you to act as a crypto analyst working at coinbase writing about crypto currency.

Base yor answer on the following articles:
{article_1}
{article_2}
{article_3}
{article_4}

Answer the following question:
{question}
"""

prompt = PromptTemplate(
    input_variables=["question", "article_1", "article_2", "article_3", "article_4"],
    template=template,
)

In [67]:
prompt_data = {
    "question": query,
    "article_1": hits_page_content[0],
    "article_2": hits_page_content[1],
    "article_3": hits_page_content[2],
    "article_4": hits_page_content[3],

}
prompt.format(**prompt_data)



In [68]:
from langchain.llms import OpenAI

llm = OpenAI(model_name="text-davinci-003", temperature=0.5, best_of=10, n=3, max_tokens=200)

llm(prompt.format(**prompt_data))

'\nArbitrum is an Ethereum-compatible Layer 2 scaling solution developed by Offchain Labs. It is designed to improve the scalability, speed, and cost of transactions on the Ethereum blockchain, while maintaining the security of the underlying blockchain. It is based on a generalized state channel technology, which enables users to execute transactions off-chain, while using the Ethereum blockchain to settle any disputes. This allows for faster and cheaper transactions, without sacrificing the security of the blockchain.'

In [69]:
hits_page_content

['expand access to books to children in Africa and beyond and revolutionize how philanthropy can work.Related: South African President Steps Down as Banks Embrace Blockchain TechnologySpainFollowing Avalanche’s first-ever summit in Barcelona, the first Spanish Ethereum conference will be held in the same city from July 6 to 8. This comes as Ethereum co-founder Vitalik Buterin is calling for Federal Deposit Insurance Corporation-like protection for small crypto investors in the face of the recent market meltdown.Roberto de Arquer, co-founder and chief metaverse officer of Spain-based Gamium, explained: We are building the first decentralized social metaverse and the digital identity of humans. Gamium World is a 3D, fully immersive environment that allows users to access Gamium’s decentralized social metaverse. Player avatars create the world and can build experiences through the Gamium software development kit, including buying and selling land.Elsewhere in the Metaverse, holders of NFT

In [70]:
hits_page_content

['expand access to books to children in Africa and beyond and revolutionize how philanthropy can work.Related: South African President Steps Down as Banks Embrace Blockchain TechnologySpainFollowing Avalanche’s first-ever summit in Barcelona, the first Spanish Ethereum conference will be held in the same city from July 6 to 8. This comes as Ethereum co-founder Vitalik Buterin is calling for Federal Deposit Insurance Corporation-like protection for small crypto investors in the face of the recent market meltdown.Roberto de Arquer, co-founder and chief metaverse officer of Spain-based Gamium, explained: We are building the first decentralized social metaverse and the digital identity of humans. Gamium World is a 3D, fully immersive environment that allows users to access Gamium’s decentralized social metaverse. Player avatars create the world and can build experiences through the Gamium software development kit, including buying and selling land.Elsewhere in the Metaverse, holders of NFT