# VectorDB Question Answering with Sources

This notebook goes over how to do question-answering with sources over a vector database. It does this by using the `VectorDBQAWithSourcesChain`, which does the lookup of the documents from a vector database. 

In [1]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.embeddings.cohere import CohereEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores.elastic_vector_search import ElasticVectorSearch
from langchain.vectorstores.faiss import FAISS

In [6]:
#download state_of_the_union.zip from https://www.kaggle.com/rtatman/state-of-the-union-corpus-1989-2017



<!DOCTYPE html>
<html lang="en">

<head>
  <title>Kaggle: Your Home for Data Science</title>
  <meta charset="utf-8" />
    <meta name="robots" content="index, follow" />
  <meta name="description" content="Kaggle is the world&#x2019;s largest data science community with powerful tools and resources to help you achieve your data science goals." />
  <meta name="turbolinks-cache-control" content="no-cache" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=5.0, minimum-scale=1.0">
  <meta name="theme-color" content="#008ABC" />
  <script nonce="gX8tBFUGOlF/mQoK3/FOOA==" type="text/javascript">
    window["pageRequestStartTime"] = 1674315640939;
    window["pageRequestEndTime"] = 1674315640946;
    window["initialPageLoadStartTime"] = new Date().getTime();
  </script>
  <link rel="preconnect" href="https://www.google-analytics.com" crossorigin="anonymous" /><link rel="preconnect" href="https://stats.g.doubleclick.net" /><link rel="preconnect" hre

In [24]:
#!pwd
#!ls -ltr
#run in terminal and then run code . again.
#!source /Users/deshantm/open_api_key
#!echo $OPENAI_API_KEY
from langchain.docstore.document import Document
with open('scrape-amazon/scrape_amazon/B08VWV2WDK.csv') as f:
    state_of_the_union = f.read()
text_splitter = CharacterTextSplitter(separator=" ",chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_text(state_of_the_union)


embeddings = OpenAIEmbeddings()

In [25]:
docsearch = FAISS.from_texts(texts, embeddings, max_tokens=1000)

In [26]:
# Add in a fake source information
for i, d in enumerate(docsearch.docstore._dict.values()):
    d.metadata = {'source': f"{i}-pl"}

In [27]:
from langchain.chains import VectorDBQAWithSourcesChain

In [28]:
from langchain import OpenAI

chain = VectorDBQAWithSourcesChain.from_chain_type(OpenAI(temperature=0), chain_type="stuff", vectorstore=docsearch)

In [30]:
chain({"question": "What is good about the book?"}, return_only_outputs=True)

{'answer': ' The book lays out a thorough understanding of AI possibilities and limitations, is well researched and keeps the reader glued till the end. It suggests a fascinating way to think about the brain and provides thought-provoking ideas.\n',
 'sources': '141-pl, 51-pl, 69-pl'}

## Chain Type
You can easily specify different chain types to load and use in the VectorDBQAWithSourcesChain chain. For a more detailed walkthrough of these types, please see [this notebook](qa_with_sources.ipynb).

There are two ways to load different chain types. First, you can specify the chain type argument in the `from_chain_type` method. This allows you to pass in the name of the chain type you want to use. For example, in the below we change the chain type to `map_reduce`.

In [12]:
chain = VectorDBQAWithSourcesChain.from_chain_type(OpenAI(temperature=0), chain_type="map_reduce", vectorstore=docsearch)

In [22]:
chain({"question": "What did the James Polk say about Justice Breyer"}, return_only_outputs=True)

{'answer': " I don't know.\n", 'sources': 'None'}

The above way allows you to really simply change the chain_type, but it does provide a ton of flexibility over parameters to that chain type. If you want to control those parameters, you can load the chain directly (as you did in [this notebook](qa_with_sources.ipynb)) and then pass that directly to the the VectorDBQA chain with the `combine_documents_chain` parameter. For example:

In [15]:
from langchain.chains.qa_with_sources import load_qa_with_sources_chain
qa_chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type="stuff")
qa = VectorDBQAWithSourcesChain(combine_document_chain=qa_chain, vectorstore=docsearch)

In [17]:
chain({"question": "What did the people think of the indian tribes"}, return_only_outputs=True)

{'answer': ' People believed that the best interests of the Indian tribes would be found in American citizenship, and that it was the policy and duty of the United States to use their influence in converting them to Christianity and bringing them within the pale of civilization. They also believed that the policy of removing them to a country designed for their permanent residence west of the Mississippi was better appreciated by them, and that humanity and national honor demanded that every effort should be made to avert the calamity of their situation.\n',
 'sources': '7020-pl, 249-pl, 8604-pl, 5465-pl'}

In [18]:
chain({"question": "Who is the speaker of the state of the union"}, return_only_outputs=True)

{'answer': ' The speaker of the state of the union is James K. Polk.\n',
 'sources': '8940-pl'}

In [21]:
chain({"question": "Who is the president at the time of this speech?"}, return_only_outputs=True)

{'answer': ' The president at the time of these speeches is Barack Obama.\n',
 'sources': '10024-pl, 9149-pl, 9026-pl, 9062-pl'}