# Document Question Answering

An example of using Chroma DB and LangChain to do question answering over documents.

In [1]:
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.llms import OpenAI
from langchain.chains import VectorDBQA
from langchain.document_loaders import TextLoader

## Load documents

Load documents to do question answering over. If you want to do this over your documents, this is the section you should replace.

In [4]:
loader = TextLoader('../data/cointelegraph_20230221_trunc.json')
documents = loader.load()

## Split documents

Split documents into small chunks. This is so we can find the most relevant chunks for a query and pass only those into the LLM.

In [5]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

## Initialize ChromaDB

Create embeddings for each chunk and insert into the Chroma vector database.

In [6]:
embeddings = OpenAIEmbeddings()
vectordb = Chroma.from_documents(texts, embeddings)

Running Chroma using direct local API.
Using DuckDB in-memory for database. Data will be transient.


## Create the chain

Initialize the chain we will use for question answering.

In [7]:
qa = VectorDBQA.from_chain_type(llm=OpenAI(), chain_type="stuff", vectorstore=vectordb)

## Ask questions!

Now we can use the chain to ask questions!

In [9]:
query = "What did the president say about Ketanji Brown Jackson"
qa.run(query)

" I don't know."

In [10]:
query = "Generate 10 questions that a retail crypto investor might want to ask a chatbot for investing research in cypto and web3 space, focused on topics of ETH, ZK, layer2 that are related to news and events happened on 2021?"
result = qa.run(query)

In [11]:
print(result)

 
1. What are the current trends in the ETH/ZK/layer2 markets?
2. What are the most promising projects in the ETH/ZK/layer2 space?
3. What new developments have been made in the ETH/ZK/layer2 space in 2021?
4. What news and events have occurred in the ETH/ZK/layer2 space in 2021?
5. What are the key factors driving the ETH/ZK/layer2 markets in 2021?
6. What are some of the risks associated with investing in ETH/ZK/layer2?
7. What strategies can I use to maximize my returns from investing in ETH/ZK/layer2?
8. What are the best resources for staying up to date on ETH/ZK/layer2 news and events?
9. What are the potential long-term implications of the developments in the ETH/ZK/layer2 space?
10. What are the most important features to consider when choosing an exchange for trading ETH/ZK/layer2?
