# Question Answering with Sources

This notebook walks through how to use LangChain for question answering with sources over a list of documents. It covers four different chain types: `stuff`, `map_reduce`, `refine`,`map-rerank`. For a more in depth explanation of what these chain types are, see [here](../combine_docs.md).

In [1]:
from dotenv import load_dotenv
import os

load_dotenv(dotenv_path='.env', override=True)

os.environ['PINECONE_INDEX_NAME']

'uiuc-chatbot-deduped'

## Prepare Data
First we prepare the data. For this example we do similarity search over a vector database, but these documents could be fetched in any manner (the point of this notebook to highlight what to do AFTER you fetch the documents).

In [2]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.embeddings.cohere import CohereEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores.elastic_vector_search import ElasticVectorSearch
from langchain.vectorstores import Chroma
from langchain.docstore.document import Document
from langchain.prompts import PromptTemplate

In [5]:
# with open("../../state_of_the_union.txt") as f:
#     state_of_the_union = f.read()
# text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
# texts = text_splitter.split_text(state_of_the_union)

# embeddings = OpenAIEmbeddings()

In [6]:
# docsearch = Chroma.from_texts(texts, embeddings, metadatas=[{"source": str(i)} for i in range(len(texts))])

Running Chroma using direct local API.
Using DuckDB in-memory for database. Data will be transient.


In [None]:
# query = "What did the president say about Justice Breyer"
# docs = docsearch.similarity_search(query)

## Use pinecone instead

In [3]:
import pinecone
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Pinecone

query = 'What is a LC-3 ISA?'

pinecone.init(api_key=os.environ['PINECONE_API_KEY'], environment=os.environ['PINECONE_ENVIRONMENT'])
pincecone_index = pinecone.Index(os.environ['PINECONE_INDEX_NAME'])
embeddings = HuggingFaceEmbeddings(model_name="intfloat/e5-large")  # best text embedding model. 1024 dims.
vectorstore = Pinecone(index=pincecone_index, embedding_function=embeddings.embed_query, text_key="text")

docs = vectorstore.similarity_search(query)


  from tqdm.autonotebook import tqdm
No sentence-transformers model found with name /Users/kastanday/.cache/torch/sentence_transformers/intfloat_e5-large. Creating a new one with MEAN pooling.


In [4]:
# add "Source" metadata property to docs.
for i, d in enumerate(docs):
  docs[i].metadata['source'] = d.metadata['textbook_name'] + "page:" + str(d.metadata['page_number'])
docs[0].metadata['source']

'Yale-Patt_Sanjay-Patel--Intro_to_Computing_Systemspage:200.0'

In [5]:
query = '''Please answer this textbook medical question:

A 16-year-old boy is brought to the emergency department by his friends for severe anxiety. He became paranoid and unusually withdrawn at a party and began rocking back and forth, saying, "I feel like I can't breathe" and "I'm afraid I'm going to die." Prior to the party, he was his regular "happy and outgoing" self. The patient has intermittent back pain from a bicycle accident last year for which he takes oxycodone as needed. Temperature is 36.1 C (97 F), blood pressure is 140/80 mm Hg, pulse is 110/min, and respirations are 18/min. Pulse oximetry is 98% on room air. Examination shows an anxious and withdrawn boy with 3-mm pupils, conjunctival injection, dry oral mucosa, and a healed scar on his right thigh from his bicycle accident. Which of the following is most consistent with this patient's presentation?

Select the correct answer:
A. Bath salts intoxication 
B. Cannabis intoxication 
C. Cocaine intoxication
D. Cocaine withdrawal
E. Lysergie acid diethylamide intoxication'''

In [9]:
from langchain.chains import QAGenerationChain
from langchain.llms import OpenAI

chain = QAGenerationChain.from_llm(OpenAI(model_name='gpt-4', temperature=0), chain_type="stuff")
chain({"question": query})



ValueError: Missing some input keys: {'text'}

## Quickstart
If you just want to get started as quickly as possible, this is the recommended way to do it:

In [7]:
from langchain.chains.qa_with_sources import load_qa_with_sources_chain
from langchain.llms import OpenAI

In [25]:
chain = load_qa_with_sources_chain(OpenAI(model_name='gpt-4', temperature=0), chain_type="stuff")
chain({"input_documents": docs, "question": query}, return_only_outputs=True)



{'output_text': 'LC-3 ISA is the instruction set architecture of the LC-3 computer, which specifies all the information about the computer that the software has to be aware of. It has a 16-bit memory address space, corresponding to 2^16 locations, each containing one word (16 bits). The LC-3 is a three-address ISA, with instructions such as ADD, AND, LD, BR, and TRAP.\nSOURCES: Yale-Patt_Sanjay-Patel--Intro_to_Computing_Systemspage:104.0, 200.0, 227.0'}

LC-3 ISA is the instruction set architecture of the LC-3 computer, which specifies all the information about the computer that the software has to be aware of. It has a 16-bit memory address space, corresponding to 2^16 locations, each containing one word (16 bits). The LC-3 is a three-address ISA, with instructions such as ADD, AND, LD, BR, and TRAP.\nSOURCES: Yale-Patt_Sanjay-Patel--Intro_to_Computing_Systemspage:104.0, 200.0, 227.0'


If you want more control and understanding over what is happening, please see the information below.

## The `stuff` Chain

This sections shows results of using the `stuff` Chain to do question answering with sources.

In [26]:
chain = load_qa_with_sources_chain(OpenAI(model_name='gpt-4', temperature=0), chain_type="stuff")

In [7]:
# query = "What did the president say about Justice Breyer"
chain({"input_documents": docs, "question": query}, return_only_outputs=True)

{'output_text': ' The president thanked Justice Breyer for his service.\nSOURCES: 30-pl'}

**Custom Prompts**

You can also use your own prompts with this chain. In this example, we will respond in Italian.

In [7]:
template = """Given the following extracted parts of a long document and a question, create a final answer with references ("SOURCES"). 
If you don't know the answer, just say that you don't know. Don't try to make up an answer.
ALWAYS return a "SOURCES" part in your answer.
Respond in Italian.

QUESTION: {question}
=========
{summaries}
=========
FINAL ANSWER IN ITALIAN:"""
PROMPT = PromptTemplate(template=template, input_variables=["summaries", "question"])

chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type="stuff", prompt=PROMPT)
query = "What did the president say about Justice Breyer"
chain({"input_documents": docs, "question": query}, return_only_outputs=True)

{'output_text': '\nNon so cosa abbia detto il presidente riguardo a Justice Breyer.\nSOURCES: 30, 31, 33'}

## The `map_reduce` Chain

This sections shows results of using the `map_reduce` Chain to do question answering with sources.

In [27]:
chain = load_qa_with_sources_chain(OpenAI(model_name='gpt-4', temperature=0), chain_type="map_reduce")

In [28]:
# query = "What did the president say about Justice Breyer"
chain({"input_documents": docs, "question": query}, return_only_outputs=True)

ValueError: OpenAIChat currently only supports single prompt, got ['Use the following portion of a long document to see if any of the text is relevant to answer the question. \nReturn any relevant text verbatim.\nLC-3 ISA\nA.1 Overview\nThe instruction set architecture (ISA) of the LC-3 is dened as follows:\nMemory address space 16 bits, corresponding to 216 locations, each\ncontaining one word (16 bits). Addresses are numbered from 0 (i.e., x0000)\nto 65,535 (i.e., xFFFF). Addresses are used to identify memory locations\nand memory-mapped I/O device registers. Certain regions of memory are\nreserved for special uses, as described in Figure A.1.\nxFFFF\nFigure A.1\nMemory map of the LC-3\nQuestion: What is a LC-3 ISA?\nRelevant text, if any:', 'Use the following portion of a long document to see if any of the text is relevant to answer the question. \nReturn any relevant text verbatim.\nTable B.2 lists some of the data movement\nopcodes in the x86 instruction set.\nControl\nThe LC-3 has ve control opcodes: BR, JSR/JSRR, JMP, RTI, and\nTRAP. x86 has all these and more. Table B.3 lists some of the control opcodes in\nthe x86 instruction set.\nB.1.1.3 Two Address vs. Three Address\nThe LC-3 is a three-address ISA. This description reects the number of operands\nexplicitly specied by the ADD instruction. An add operation requires two\nsource operands (the numbers to be added) and one destination operand to store\nQuestion: What is a LC-3 ISA?\nRelevant text, if any:', 'Use the following portion of a long document to see if any of the text is relevant to answer the question. \nReturn any relevant text verbatim.\nLC-3\nI\nn Chapter 4, we discussed the basic components of a computerits mem-\nory, its processing unit, including the associated temporary storage (usually\na set of registers), input and output devices, and the control unit that directs the\nactivity of all the units (including itself!). We also studied the six phases of the\ninstruction cycleFETCH, DECODE, ADDRESS EVALUATION, OPERAND\nFETCH, EXECUTE, and STORE RESULT. We used elements of the LC-3 to\nillustrate some of the concepts. In fact, we introduced ve opcodes: two operate\ninstructions (ADD and AND), one data movement instruction (LD), and two con-\ntrol instructions (BR and TRAP). We are now ready to study the LC-3 in much\ngreater detail.\nRecall from Chapter 1 that the ISA is the interface between what the soft-\nware commands and what the hardware actually carries out. In this chapter, we\nwill point out most of the important features of the ISA of the LC-3. (A few ele-\nments we will leave for Chapter 8 and Chapter 9.) You will need these features\nto write programs in the LC-3s own language, that is, in the LC-3s machine\nlanguage.\nA complete description of the ISA of the LC-3 is contained in Appendix A.\n5.1 The ISA: Overview\nThe ISA species all the information about the computer that the software has\nto be aware of. In other words, the ISA species everything in the computer\nthat is available to a programmer when he/she writes programs in the com-\nputers own machine language. Most people, however, do not write programs\nin the computers own machine language, but rather opt for writing programs in\na high-level language like C++ or Python (or Fortran or COBOL, which have\nbeen around for more than 50 years). Thus, the ISA also species everything\nin the computer that is needed by someone (a compiler writer) who wishes to\ntranslate programs written in a high-level language into the machine language of\nthe computer.\nQuestion: What is a LC-3 ISA?\nRelevant text, if any:', "Use the following portion of a long document to see if any of the text is relevant to answer the question. \nReturn any relevant text verbatim.\nThat's one instruction. So here's another instruction, which is add. So this is the add opcode over here. It also has three fields and then three things that have to be fixed to zero. So there's a destination register. That's where we're going to put our answer, 0 through 7. There's one source register, also 0 through 7. Good. And then there's another source register. So it says take two registers, add them together, put the answer, the sum, into destination register. That's all. What are the zeros for? Ah, so that's a good question. So why not just let those be don't cares? So there was this commercial architecture called the 6502, let those be don't cares. And it turned out that those don't cares produced some bizarre effects when software people used those instructions with different bits, because they just built a finite state machine and they left them as don't cares. And so it did something. And turned out it did something sort of interesting that the software people decided that they really wanted. And so they put those non-existence instructions in their software. And then when the 6502 architects wanted to produce a new generation, they found that in fact people had used bit patterns that didn't exist in the instruction set architecture. So the modern view of that is don't ever make that mistake again. If you ever want to extend your instruction set architecture, it's really useful to have undefined bit patterns. On the other hand, having software that takes advantage of a particular microarchitecture's don't cares mapped into something is not so attractive from a design point of view. So let me stop there and we'll look more at these next week. Have a good weekend.\nQuestion: What is a LC-3 ISA?\nRelevant text, if any:"]

**Intermediate Steps**

We can also return the intermediate steps for `map_reduce` chains, should we want to inspect them. This is done with the `return_map_steps` variable.

In [29]:
chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type="map_reduce", return_intermediate_steps=True)

In [30]:
chain({"input_documents": docs, "question": query}, return_only_outputs=True)

{'intermediate_steps': [' "The instruction set architecture (ISA) of the LC-3 is dened as follows: Memory address space 16 bits, corresponding to 216 locations, each containing one word (16 bits). Addresses are numbered from 0 (i.e., x0000) to 65,535 (i.e., xFFFF). Addresses are used to identify memory locations and memory-mapped I/O device registers. Certain regions of memory are reserved for special uses, as described in Figure A.1."',
  ' The LC-3 is a three-address ISA.',
  ' "The ISA species all the information about the computer that the software has to be aware of. In other words, the ISA species everything in the computer that is available to a programmer when he/she writes programs in the computer\'s own machine language. Most people, however, do not write programs in the computer\'s own machine language, but rather opt for writing programs in a high-level language like C++ or Python (or Fortran or COBOL, which have been around for more than 50 years). Thus, the ISA also speci

**Custom Prompts**

You can also use your own prompts with this chain. In this example, we will respond in Italian.

In [8]:

question_prompt_template = """Use the following portion of a long document to see if any of the text is relevant to answer the question. 
Return any relevant text in Italian.
{context}
Question: {question}
Relevant text, if any, in Italian:"""
QUESTION_PROMPT = PromptTemplate(
    template=question_prompt_template, input_variables=["context", "question"]
)

combine_prompt_template = """Given the following extracted parts of a long document and a question, create a final answer with references ("SOURCES"). 
If you don't know the answer, just say that you don't know. Don't try to make up an answer.
ALWAYS return a "SOURCES" part in your answer.
Respond in Italian.

QUESTION: {question}
=========
{summaries}
=========
FINAL ANSWER IN ITALIAN:"""
COMBINE_PROMPT = PromptTemplate(
    template=combine_prompt_template, input_variables=["summaries", "question"]
)

chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type="map_reduce", return_intermediate_steps=True, question_prompt=QUESTION_PROMPT, combine_prompt=COMBINE_PROMPT)
chain({"input_documents": docs, "question": query}, return_only_outputs=True)

{'intermediate_steps': ["\nStasera vorrei onorare qualcuno che ha dedicato la sua vita a servire questo paese: il giustizia Stephen Breyer - un veterano dell'esercito, uno studioso costituzionale e un giustizia in uscita della Corte Suprema degli Stati Uniti. Giustizia Breyer, grazie per il tuo servizio.",
  ' Non pertinente.',
  ' Non rilevante.',
  " Non c'è testo pertinente."],
 'output_text': ' Non conosco la risposta. SOURCES: 30, 31, 33, 20.'}

**Batch Size**

When using the `map_reduce` chain, one thing to keep in mind is the batch size you are using during the map step. If this is too high, it could cause rate limiting errors. You can control this by setting the batch size on the LLM used. Note that this only applies for LLMs with this parameter. Below is an example of doing so:

```python
llm = OpenAI(batch_size=5, temperature=0)
```

## The `refine` Chain

This sections shows results of using the `refine` Chain to do question answering with sources.

In [31]:
chain = load_qa_with_sources_chain(OpenAI(model_name='gpt-4', temperature=0), chain_type="refine")

In [13]:
# query = "What did the president say about Justice Breyer"
chain({"input_documents": docs, "question": query}, return_only_outputs=True)

{'output_text': "\n\nThe president said that he was honoring Justice Breyer for his dedication to serving the country and that he was a retiring Justice of the United States Supreme Court. He also thanked him for his service and praised his career as a top litigator in private practice, a former federal public defender, and a family of public school educators and police officers. He noted Justice Breyer's reputation as a consensus builder and the broad range of support he has received from the Fraternal Order of Police to former judges appointed by Democrats and Republicans. He also highlighted the importance of securing the border and fixing the immigration system in order to advance liberty and justice, and mentioned the new technology, joint patrols, dedicated immigration judges, and commitments to support partners in South and Central America that have been put in place. He also expressed his commitment to the LGBTQ+ community, noting the need for the bipartisan Equality Act and th

**Intermediate Steps**

We can also return the intermediate steps for `refine` chains, should we want to inspect them. This is done with the `return_intermediate_steps` variable.

In [32]:
chain = load_qa_with_sources_chain(OpenAI(model_name='gpt-4', temperature=0), chain_type="refine", return_intermediate_steps=True)

In [33]:
chain({"input_documents": docs, "question": query}, return_only_outputs=False)

{'intermediate_steps': ['An LC-3 ISA (Instruction Set Architecture) is a specific architecture for a computer system that defines its memory address space, memory locations, and memory-mapped I/O device registers. In the case of LC-3, it has a 16-bit memory address space, corresponding to 2^16 locations, each containing one word (16 bits). Addresses are numbered from 0 (i.e., x0000) to 65,535 (i.e., xFFFF).',
  'An LC-3 ISA (Instruction Set Architecture) is a specific architecture for a computer system that defines its memory address space, memory locations, memory-mapped I/O device registers, and instruction set. In the case of LC-3, it has a 16-bit memory address space, corresponding to 2^16 locations, each containing one word (16 bits). Addresses are numbered from 0 (i.e., x0000) to 65,535 (i.e., xFFFF).\n\nThe LC-3 ISA includes various data movement opcodes, control opcodes, and supports a three-address format for instructions like ADD, which requires two source operands and one de

In [34]:
chain({"input_documents": docs, "question": query}, return_only_outputs=False)

{'input_documents': [Document(page_content='LC-3 ISA\nA.1 Overview\nThe instruction set architecture (ISA) of the LC-3 is dened as follows:\nMemory address space 16 bits, corresponding to 216 locations, each\ncontaining one word (16 bits). Addresses are numbered from 0 (i.e., x0000)\nto 65,535 (i.e., xFFFF). Addresses are used to identify memory locations\nand memory-mapped I/O device registers. Certain regions of memory are\nreserved for special uses, as described in Figure A.1.\nxFFFF\nFigure A.1\nMemory map of the LC-3', metadata={'page_number': 200.0, 'textbook_name': 'Yale-Patt_Sanjay-Patel--Intro_to_Computing_Systems', 'source': 'Yale-Patt_Sanjay-Patel--Intro_to_Computing_Systemspage:200.0'}),
  Document(page_content='Table B.2 lists some of the data movement\nopcodes in the x86 instruction set.\nControl\nThe LC-3 has ve control opcodes: BR, JSR/JSRR, JMP, RTI, and\nTRAP. x86 has all these and more. Table B.3 lists some of the control opcodes in\nthe x86 instruction set.\nB.1.1.3

**Custom Prompts**

You can also use your own prompts with this chain. In this example, we will respond in Italian.

In [9]:
refine_template = (
    "The original question is as follows: {question}\n"
    "We have provided an existing answer, including sources: {existing_answer}\n"
    "We have the opportunity to refine the existing answer"
    "(only if needed) with some more context below.\n"
    "------------\n"
    "{context_str}\n"
    "------------\n"
    "Given the new context, refine the original answer to better "
    "answer the question (in Italian)"
    "If you do update it, please update the sources as well. "
    "If the context isn't useful, return the original answer."
)
refine_prompt = PromptTemplate(
    input_variables=["question", "existing_answer", "context_str"],
    template=refine_template,
)


question_template = (
    "Context information is below. \n"
    "---------------------\n"
    "{context_str}"
    "\n---------------------\n"
    "Given the context information and not prior knowledge, "
    "answer the question in Italian: {question}\n"
)
question_prompt = PromptTemplate(
    input_variables=["context_str", "question"], template=question_template
)

In [10]:
chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type="refine", return_intermediate_steps=True, question_prompt=question_prompt, refine_prompt=refine_prompt)
chain({"input_documents": docs, "question": query}, return_only_outputs=True)

{'intermediate_steps': ['\nIl presidente ha detto che Justice Breyer ha dedicato la sua vita al servizio di questo paese e ha onorato la sua carriera.',
  "\n\nIl presidente ha detto che Justice Breyer ha dedicato la sua vita al servizio di questo paese, ha onorato la sua carriera e ha contribuito a costruire un consenso. Ha ricevuto un ampio sostegno, dall'Ordine Fraterno della Polizia a ex giudici nominati da democratici e repubblicani. Inoltre, ha sottolineato l'importanza di avanzare la libertà e la giustizia attraverso la sicurezza delle frontiere e la risoluzione del sistema di immigrazione. Ha anche menzionato le nuove tecnologie come scanner all'avanguardia per rilevare meglio il traffico di droga, le pattuglie congiunte con Messico e Guatemala per catturare più trafficanti di esseri umani, l'istituzione di giudici di immigrazione dedicati per far sì che le famiglie che fuggono da per",
  "\n\nIl presidente ha detto che Justice Breyer ha dedicato la sua vita al servizio di ques

## The `map-rerank` Chain

This sections shows results of using the `map-rerank` Chain to do question answering with sources.

In [10]:
chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type="map_rerank", metadata_keys=['source'], return_intermediate_steps=True)

In [11]:
query = "What did the president say about Justice Breyer"
result = chain({"input_documents": docs, "question": query}, return_only_outputs=True)

In [12]:
result["output_text"]

' The President thanked Justice Breyer for his service and honored him for dedicating his life to serve the country.'

In [14]:
result["intermediate_steps"]

[{'answer': ' The President thanked Justice Breyer for his service and honored him for dedicating his life to serve the country.',
  'score': '100'},
 {'answer': ' This document does not answer the question', 'score': '0'},
 {'answer': ' This document does not answer the question', 'score': '0'},
 {'answer': ' This document does not answer the question', 'score': '0'}]

**Custom Prompts**

You can also use your own prompts with this chain. In this example, we will respond in Italian.

In [11]:
from langchain.output_parsers import RegexParser

output_parser = RegexParser(
    regex=r"(.*?)\nScore: (.*)",
    output_keys=["answer", "score"],
)

prompt_template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

In addition to giving an answer, also return a score of how fully it answered the user's question. This should be in the following format:

Question: [question here]
Helpful Answer In Italian: [answer here]
Score: [score between 0 and 100]

Begin!

Context:
---------
{context}
---------
Question: {question}
Helpful Answer In Italian:"""
PROMPT = PromptTemplate(
    template=prompt_template,
    input_variables=["context", "question"],
    output_parser=output_parser,
)
chain = load_qa_with_sources_chain(OpenAI(temperature=0), chain_type="map_rerank", metadata_keys=['source'], return_intermediate_steps=True, prompt=PROMPT)
query = "What did the president say about Justice Breyer"
result = chain({"input_documents": docs, "question": query}, return_only_outputs=True)

In [12]:
result

{'source': 30,
 'intermediate_steps': [{'answer': ' Il presidente ha detto che Justice Breyer ha dedicato la sua vita a servire questo paese e ha onorato la sua carriera.',
   'score': '100'},
  {'answer': ' Il presidente non ha detto nulla sulla Giustizia Breyer.',
   'score': '100'},
  {'answer': ' Non so.', 'score': '0'},
  {'answer': ' Il presidente non ha detto nulla sulla giustizia Breyer.',
   'score': '100'}],
 'output_text': ' Il presidente ha detto che Justice Breyer ha dedicato la sua vita a servire questo paese e ha onorato la sua carriera.'}