# Document Q&A

In this notebook, we will demonstrate how you can use an LLM to answer questions based on the contents of a specific document.


## Loading our API key

At this point you should have set up a file named `secrets.env` with your OpenAI API key. We will now use a lightweight Python package called `dotenv` to read in this file and set its contents as environment variables:


In [None]:
from dotenv import load_dotenv
import os

load_dotenv("../secrets.env")

os.getenv(
    "OPENAI_API_KEY"
) is not None  # Do not print the key itself! We want to keep it secret

## Finding a document

We did not provide a document in this repository for this section. Feel free to put your own document into the folder `docs`, and see how it works out!

For this workshop, we will use an open access paper from [PLOS ONE](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0292372).


## Setting up the analysis chain

We start out by defining the LLM we would like to work with. Here, we choose the GPT 3.5 model tuned to be good at following instructions:


In [None]:
from langchain.llms import OpenAI

llm = OpenAI(temperature=0, model="gpt-3.5-turbo-instruct")

Next, we need to load the document we want to process. For consistency, we will use the same code as in the previous notebook, even though it only loads a single document this time:


In [None]:
from langchain.document_loaders import DirectoryLoader

doc_loader = DirectoryLoader("../docs", show_progress=True)
docs = doc_loader.load()

Next, we need to define a chain that prompts the model to answer a question based on some text. Thankfully, _LangChain_ already offers a pre-defined chain for this purpose we can load:


In [None]:
from langchain.chains.question_answering import load_qa_chain

# A question-answering chain that uses the Map-reduce technique to fit a document into the context window
qa_chain = load_qa_chain(llm, chain_type="map_reduce")

The question answering chain expects generic text, not a document. For convenienve, we can use the `AnalyzeDocumentChain` to easily reference a document as context for the question:


In [None]:
from langchain.chains import AnalyzeDocumentChain

qa_document_chain = AnalyzeDocumentChain(combine_docs_chain=qa_chain)

Finally, we can run our chain by providing the document and the question:


In [None]:
qa_document_chain.run(
    input_document=docs[0].page_content,
    question="What was the main takeaway of this study?",
)

In the [next notebook](04-private-llms.ipynb), we will demonstrate how to use a locally stored, and thus private, Large Language Model in _LangChain_.


<table >
<tbody>
  <tr>
    <td style="padding:0px;border-width:0px;vertical-align:center">    
    Created by Simon Stone for Dartmouth College Library under <a href="https://creativecommons.org/licenses/by/4.0/">Creative Commons CC BY-NC 4.0 License</a>.<br>For questions, comments, or improvements, email <a href="mailto:researchdatahelp@groups.dartmouth.edu">Research Data Services</a>.
    </td>
    <td style="padding:0 0 0 1em;border-width:0px;vertical-align:center"><img alt="Creative Commons License" src="https://i.creativecommons.org/l/by/4.0/88x31.png"/></td>
  </tr>
</tbody>
</table>
