Document Question Answering
============================

An example of using Chroma DB and LangChain to do question answering over documents.

In [1]:
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.llms import OpenAI
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader


## Load documents
Load documents to do question answering over. If you want to do this over your documents, this is the section you should replace.

In [2]:
loader = TextLoader('state_of_theunion.txt')
documents = loader.load()

## Split documents
Split documents into small chunks. This is so we can find the most relevant chunks for a query and pass only those into the LLM.

In [3]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

# Initialize ChromaDB
Create embeddings for each chunk and insert into the Chroma vector database.

In [4]:
embeddings = OpenAIEmbeddings()
vectordb = Chroma.from_documents(texts, embeddings)

Using embedded DuckDB without persistence: data will be transient


## Create the chain
Initialize the chain we will use for question answering.

In [5]:
qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="stuff", retriever=vectordb.as_retriever())

# Ask questions!
Now we can use the chain to ask questions!

In [None]:
query = "Who is Zelenskyy? Please replay in English."
qa.run(query)