# Document Question Answering

An example of using Chroma DB and LangChain to do question answering over documents.

In [17]:
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.chains import VectorDBQA
from langchain.document_loaders import TextLoader

In [18]:
import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())

open.api_key = os.environ['OPENAI_API_KEY']

## Load documents

Load documents to do question answering over. If you want to do this over your documents, this is the section you should replace.

In [19]:
loader = TextLoader('data/state_of_the_union.txt')
documents = loader.load()

## Split documents

Split documents into small chunks. This is so we can find the most relevant chunks for a query and pass only those into the LLM.

In [20]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

## Initialize ChromaDB

Create embeddings for each chunk and insert into the Chroma vector database.

In [21]:
embeddings = OpenAIEmbeddings()
vectordb = Chroma.from_documents(texts, embeddings)

## Create the chain

Initialize the chain we will use for question answering.

In [22]:
qa = VectorDBQA.from_chain_type(llm=ChatOpenAI(), chain_type="stuff", vectorstore=vectordb)

## Ask questions!

Now we can use the chain to ask questions!

In [25]:
query = "What did the president say about large corporations and the wealthy?"
qa.run(query)

"The president mentioned that when corporations don't have to compete, their profits go up, which drives up prices for consumers. He also stated that capitalism without competition is exploitation. Additionally, the president proposed closing loopholes so that the very wealthy don't pay a lower tax rate than a teacher or a firefighter."