# Chat with PDF

This is a simple flow that allow you to ask questions about the content of a PDF file and get answers.
You can run the flow with a URL to a PDF file and question as argument.
Once it's launched it will download the PDF and build an index of the content. 
Then when you ask a question, it will look up the index to retrieve relevant content and post the question with the relevant content to OpenAI chat model (gpt-3.5-turbo or gpt4) to get an answer.

## 0. Install dependencies

In [None]:
%pip install -r requirements.txt

## 1. Create connections
Connection in prompt flow is for managing settings of your application behaviors incl. how to talk to different services (Azure OpenAI for example).
In many applications, configuration files or environment variables are used for this purpose. Chat_with_pdf also uses environment variables, to make it work with prompt flow and without changing how environment variables are used, we populate everything in the CustomConnection into environment variables.
```python
def setup_env(conn: CustomConnection):
    if not conn:
        return
    for key in conn:
        os.environ[key] = conn[key]
```

chat_with_pdf requires following env vars (thus for the custom connection named "chat_with_pdf_custom_connection"):
```
OPENAI_API_BASE=<AOAI_ENDPOINT>
OPENAI_API_VERSION=2023-03-15-preview
OPENAI_API_KEY=<AOAI_API_KEY>
EMBEDDING_MODEL_DEPLOYMENT_NAME=text-embedding-ada-002
CHAT_MODEL_DEPLOYMENT_NAME=gpt-35-turbo
PROMPT_TOKEN_LIMIT=3000
MAX_COMPLETION_TOKENS=256
```

In [None]:
import promptflow

pf = promptflow.PFClient()

# List all the available connections
for c in pf.connections.list():
    print(c.name + " (" + c.type + ")")

In [None]:
from promptflow.entities import CustomConnection

conn_name = "chat_with_pdf_custom_connection"

if len([c for c in pf.connections.list() if c.name == conn_name]) == 0:
    # Create the custom connection that is required by chat_with_pdf_tool
    print(f"Creating custom connection: {conn_name}")
    conn = CustomConnection(
        name=conn_name,
        configs={
            "OPENAI_API_VERSION": "2023-03-15-preview",
            "EMBEDDING_MODEL_DEPLOYMENT_NAME": "text-embedding-ada-002",
            "CHAT_MODEL_DEPLOYMENT_NAME": "gpt-35-turbo",
            "PROMPT_TOKEN_LIMIT": "3000",
            "MAX_COMPLETION_TOKENS": "256",
        },
        secrets={
            "OPENAI_API_BASE": "AOAI_ENDPOINT",  # replace this
            "OPENAI_API_KEY": "AOAI_API_KEY",  # replace this
        },
    )
    pf.connections.create_or_update(conn)
    print(f"Custom connection: {conn_name} created.")
else:
    print(f"Custom connection: {conn_name} found.")

## 2. Test the flow

In [None]:
output = pf.flows.test(
    ".",
    inputs={
        "chat_history": [],
        "pdf_url": "https://arxiv.org/pdf/1810.04805.pdf",
        "question": "what is BERT?",
    },
)
print(output)

## 3. Run the flow with a data file

In [None]:
flow_path = "."
data_path = "./data/bert-paper-qna-3-line.jsonl"

run = pf.run(flow=flow_path, data=data_path)
pf.stream(run)

print(run)

In [None]:
pf.get_details(run)