# Chat with PDF using Azure AI

This is a simple flow that allow you to ask questions about the content of a PDF file and get answers.
You can run the flow with a URL to a PDF file and question as argument.
Once it's launched it will download the PDF and build an index of the content. 
Then when you ask a question, it will look up the index to retrieve relevant content and post the question with the relevant content to OpenAI chat model (gpt-3.5-turbo or gpt4) to get an answer.

## 0. Install dependencies

In [None]:
%pip install -r requirements.txt

# 1. Connect to Azure Machine Learning Workspace

In [None]:
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential
from azure.ai.ml import MLClient

try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()

# Get a handle to workspace
ml_client = MLClient.from_config(credential=credential)

## 1.1 Get familiar with the primary interface - PFClient

In [None]:
import promptflow.azure as azure

pf = azure.PFClient(ml_client)

## 1.2 Create necessary connections

Connection in prompt flow is for managing settings of your application behaviors incl. how to talk to different services (Azure OpenAI for example).
In many applications, configuration files or environment variables are used for this purpose. Chat_with_pdf also uses environment variables, to make it work with prompt flow and without changing how environment variables are used, we populate everything in the CustomConnection into environment variables.
```python
def setup_env(conn: CustomConnection):
    if not conn:
        return
    for key in conn:
        os.environ[key] = conn[key]
```

chat_with_pdf requires following env vars (thus for the custom connection named `chat_with_pdf_custom_connection`):
```
OPENAI_API_BASE=<AOAI_ENDPOINT>
OPENAI_API_VERSION=2023-03-15-preview
OPENAI_API_KEY=<AOAI_API_KEY>
EMBEDDING_MODEL_DEPLOYMENT_NAME=text-embedding-ada-002
CHAT_MODEL_DEPLOYMENT_NAME=gpt-35-turbo
PROMPT_TOKEN_LIMIT=3000
MAX_COMPLETION_TOKENS=256
```


Prepare your Azure Open AI resource follow this [instruction](https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/create-resource?pivots=web-portal) and get your `api_key` if you don't have one.

Please go to [workspace portal](https://ml.azure.com/), click `Prompt flow` -> `Connections` -> `Create`, then follow the instruction to create your own connections. 
Learn more on [connections](https://learn.microsoft.com/en-us/azure/machine-learning/prompt-flow/concept-connections?view=azureml-api-2).

In [None]:
from promptflow.entities import CustomConnection

conn_name = "chat_with_pdf_custom_connection"

conn = CustomConnection(
    name=conn_name,
    configs={
        "OPENAI_API_VERSION": "2023-03-15-preview",
        "EMBEDDING_MODEL_DEPLOYMENT_NAME": "text-embedding-ada-002",
        "CHAT_MODEL_DEPLOYMENT_NAME": "gpt-35-turbo",
        "PROMPT_TOKEN_LIMIT": "3000",
        "MAX_COMPLETION_TOKENS": "256",
    },
    secrets={
        "OPENAI_API_BASE": "AOAI_ENDPOINT",  # replace this
        "OPENAI_API_KEY": "AOAI_API_KEY",  # replace this
    },
)

# currently we only support create connection in Azure ML Studio UI
# raise Exception(f"Please create {conn_name} connection in Azure ML Studio.")

# 2. Run a flow with settings in custom connection (context size 3K)

In [None]:
flow_path = "."
data_path = "./data/bert-paper-qna.jsonl"

run_3k_context = pf.run(
    flow=flow_path,
    data=data_path,
    connections={"setup_env": {"conn": "chat_with_pdf_custom_connection"}},
    display_name="chat_with_pdf_3k_context",
    tags={"chat_with_pdf": "", "2nd_round": ""},
)
pf.stream(run_3k_context)

In [None]:
print(run_3k_context)

In [None]:
detail = pf.get_details(run_3k_context)

detail

# 3. Run a flow with settings in custom connection (context size 2K)

Assume user have created a connection `chat_with_pdf_custom_connection_smaller_context` with below setting：
```
"PROMPT_TOKEN_LIMIT": "3000",
```

In [None]:
flow_path = "."
data_path = "./data/bert-paper-qna.jsonl"

run_2k_context = pf.run(
    flow=flow_path,
    data=data_path,
    connections={
        "setup_env": {"conn": "chat_with_pdf_custom_connection_smaller_context"}
    },
    display_name="chat_with_pdf_2k_context",
    tags={"chat_with_pdf": "", "2nd_round": ""},
)
pf.stream(run_2k_context)

In [None]:
print(run_2k_context)

In [None]:
detail = pf.get_details(run_2k_context)

detail