# Chat with PDF - test, evaluation and experimentation

We will walk you through how to use prompt flow Python SDK to test, evaluate and experiment with the "Chat with PDF" flow.

## 0. Install dependencies

In [1]:
#%pip install -r requirements.txt

## 1. Create connections
Connection in prompt flow is for managing settings of your application behaviors incl. how to talk to different services (Azure OpenAI for example).

In [2]:
import promptflow

pf = promptflow.PFClient()

# List all the available connections
for c in pf.connections.list():
    print(c.name + " (" + c.type + ")")

fpdoaoaice (AzureOpenAI)
dataaioaicg (CognitiveSearch)
chatpdf (Custom)
aoai (AzureOpenAI)
aoaicg (CognitiveSearch)
entaoai (Custom)
llmops (Custom)


You will need to have a connection named **YourNameBelow to run the chat_with_pdf flow.

In [3]:
# create needed connection
from promptflow.entities import AzureOpenAIConnection, OpenAIConnection

try:
    conn_name = "fpdoaoaice"
    conn = pf.connections.get(name=conn_name)
    print("using existing connection")
except:
    # Follow https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/create-resource?pivots=web-portal to create an Azure Open AI resource.
    connection = AzureOpenAIConnection(
        name=conn_name,
        api_key="<user-input>",
        api_base="<test_base>",
        api_type="azure",
        api_version="<test_version>",
    )

    # use this if you have an existing OpenAI account
    # connection = OpenAIConnection(
    #     name=conn_name,
    #     api_key="<user-input>",
    # )
    conn = pf.connections.create_or_update(connection)
    print("successfully created connection")

print(conn)

using existing connection
name: fpdoaoaice
module: promptflow.connections
created_date: '2023-08-25T18:38:46.185205'
last_modified_date: '2023-08-28T17:24:45.268116'
type: azure_open_ai
api_key: '******'
api_base: https://dataaiapim.azure-api.net
api_type: azure
api_version: '2023-05-15'



## 2. Test the flow

In [4]:
output = pf.flows.test(
    ".",
    inputs={
        "chat_history": [],
        "pdf_url": "https://arxiv.org/pdf/1810.04805.pdf",
        "question": "what is BERT?",
    },
)
print(output)

2023-09-15 12:23:37 -0500   16700 execution          INFO     Start to run 6 nodes with concurrency level 16.
2023-09-15 12:23:37 -0500   16700 execution.flow     INFO     Executing node setup_env. node run id: 758a0fbd-c2da-4cf8-b7dc-1c110c45140c_setup_env_0
2023-09-15 12:23:37 -0500   16700 execution.flow     INFO     Node setup_env completes.
2023-09-15 12:23:37 -0500   16700 execution.flow     INFO     Executing node download_tool. node run id: 758a0fbd-c2da-4cf8-b7dc-1c110c45140c_download_tool_0
2023-09-15 12:23:37 -0500   16700 execution.flow     INFO     Executing node rewrite_question_tool. node run id: 758a0fbd-c2da-4cf8-b7dc-1c110c45140c_rewrite_question_tool_0
2023-09-15 12:23:37 -0500   16700 execution.flow     INFO     [download_tool in line 0 (index starts from 0)] stdout> Pdf already exists in d:\repos\chatpdf\Workshop\promptflow\BertChat\.pdfs\https___arxiv.org_pdf_1810.04805.pdf.pdf
2023-09-15 12:23:37 -0500   16700 execution.flow     INFO     Node download_tool comple

## 3. Run the flow with a data file

In [6]:
flow_path = "."
data_path = "./data/bert-paper-qna-1-line.jsonl"

config_2k_context = {
    "EMBEDDING_MODEL_DEPLOYMENT_NAME": "embedding",
    "CHAT_MODEL_DEPLOYMENT_NAME": "chat",
    "PROMPT_TOKEN_LIMIT": 2000,
    "MAX_COMPLETION_TOKENS": 256,
    "VERBOSE": True,
    "CHUNK_SIZE": 256,
    "CHUNK_OVERLAP": 32,
}

column_mapping = {
    "question": "${data.question}",
    "pdf_url": "${data.pdf_url}",
    "chat_history": "${data.chat_history}",
    "config": config_2k_context,
}
run_2k_context = pf.run(flow=flow_path, data=data_path, column_mapping=column_mapping)
pf.stream(run_2k_context)

print(run_2k_context)

2023-09-15 12:27:05 -0500   15492 execution          INFO     Start to run 6 nodes with concurrency level 2.
2023-09-15 12:27:05 -0500   15492 execution.flow     INFO     Executing node setup_env. node run id: bertchat_default_20230915_122701_506440_setup_env_0
2023-09-15 12:27:05 -0500   15492 execution.flow     INFO     Node setup_env completes.
2023-09-15 12:27:05 -0500   15492 execution.flow     INFO     Executing node download_tool. node run id: bertchat_default_20230915_122701_506440_download_tool_0
2023-09-15 12:27:05 -0500   15492 execution.flow     INFO     Executing node rewrite_question_tool. node run id: bertchat_default_20230915_122701_506440_rewrite_question_tool_0
2023-09-15 12:27:05 -0500   15492 execution.flow     INFO     [download_tool in line 0 (index starts from 0)] stdout> Pdf already exists in d:\repos\chatpdf\Workshop\promptflow\BertChat\.pdfs\https___arxiv.org_pdf_1810.04805.pdf.pdf
2023-09-15 12:27:05 -0500   15492 execution.flow     INFO     Node download_too

In [7]:
pf.get_details(run_2k_context)

Unnamed: 0,inputs.chat_history,inputs.config,inputs.pdf_url,inputs.question,inputs.line_number,outputs.answer,outputs.context
0,[],{'EMBEDDING_MODEL_DEPLOYMENT_NAME': 'embedding...,https://arxiv.org/pdf/1810.04805.pdf,What is the name of the new language represent...,0,The name of the new language representation mo...,[e introduce a new language representa-\ntion ...


# 4. Evaluate the "groundedness"
The [eval-groundedness flow](../../evaluation/eval-groundedness/) is using ChatGPT/GPT4 model to grade the answers generated by chat-with-pdf flow.

In [11]:
eval_groundedness_flow_path = "../EvalGroundness/"
eval_groundedness_2k_context = pf.run(
    flow=eval_groundedness_flow_path,
    run=run_2k_context,
    column_mapping={
        "question": "${run.inputs.question}",
        "answer": "${run.outputs.answer}",
        "context": "${run.outputs.context}",
    },
    display_name="eval_groundedness_2k_context",
)
pf.stream(eval_groundedness_2k_context)

print(eval_groundedness_2k_context)

2023-09-15 12:38:46 -0500   26892 execution          INFO     Start to run 2 nodes with concurrency level 2.
2023-09-15 12:38:46 -0500   26892 execution.flow     INFO     Executing node gpt_groundedness. node run id: eval_groundness_default_20230915_123840_573750_gpt_groundedness_0
2023-09-15 12:38:47 -0500   26892 execution.flow     INFO     Node gpt_groundedness completes.
2023-09-15 12:38:47 -0500   26892 execution.flow     INFO     Executing node parse_score. node run id: eval_groundness_default_20230915_123840_573750_parse_score_0
2023-09-15 12:38:47 -0500   26892 execution.flow     INFO     Node parse_score completes.
2023-09-15 12:38:48 -0500   16700 execution          INFO     Process 0 queue empty, exit.
2023-09-15 12:38:48 -0500   16700 execution          INFO     Executing aggregation nodes...
2023-09-15 12:38:48 -0500   16700 execution          INFO     Start to run 1 nodes with concurrency level 2.
2023-09-15 12:38:48 -0500   16700 execution.flow     INFO     Executing nod

In [12]:
pf.get_details(eval_groundedness_2k_context)

Unnamed: 0,inputs.answer,inputs.context,inputs.question,inputs.line_number,outputs.groundedness
0,The name of the new language representation mo...,[e introduce a new language representa-\ntion ...,What is the name of the new language represent...,0,10


In [13]:
pf.get_metrics(eval_groundedness_2k_context)

{'groundedness': 10.0}

In [14]:
pf.visualize(eval_groundedness_2k_context)

The HTML file is generated at 'C:\\Users\\astalati\\AppData\\Local\\Temp\\pf-visualize-detail-zjcjmjfx.html'.
Trying to view the result in a web browser...
Successfully visualized from the web browser.


You will see a web page like this. It gives you detail about how each row is graded and even the details how the evaluation run executes:
![pf-visualize-screenshot](./assets/pf-visualize-screenshot.png)

# 5. Try a different configuration and evaluate again - experimentation

NOTE: since we only use 3 lines of test data in this example, and because of the non-deterministic nature of LLMs, don't be surprised if you see exact same metrics when you run this process.

In [15]:
config_3k_context = {
    "EMBEDDING_MODEL_DEPLOYMENT_NAME": "embedding",
    "CHAT_MODEL_DEPLOYMENT_NAME": "chat",
    "PROMPT_TOKEN_LIMIT": 3000,
    "MAX_COMPLETION_TOKENS": 256,
    "VERBOSE": True,
    "CHUNK_SIZE": 256,
    "CHUNK_OVERLAP": 32,
}

run_3k_context = pf.run(flow=flow_path, data=data_path, column_mapping=column_mapping)
pf.stream(run_3k_context)

print(run_3k_context)

2023-09-15 13:19:35 -0500    2712 execution          INFO     Start to run 6 nodes with concurrency level 2.
2023-09-15 13:19:35 -0500    2712 execution.flow     INFO     Executing node setup_env. node run id: bertchat_default_20230915_131927_633984_setup_env_0
2023-09-15 13:19:36 -0500    2712 execution.flow     INFO     Node setup_env completes.
2023-09-15 13:19:36 -0500    2712 execution.flow     INFO     Executing node download_tool. node run id: bertchat_default_20230915_131927_633984_download_tool_0
2023-09-15 13:19:36 -0500    2712 execution.flow     INFO     Executing node rewrite_question_tool. node run id: bertchat_default_20230915_131927_633984_rewrite_question_tool_0
2023-09-15 13:19:36 -0500    2712 execution.flow     INFO     [download_tool in line 0 (index starts from 0)] stdout> Pdf already exists in d:\repos\chatpdf\Workshop\promptflow\BertChat\.pdfs\https___arxiv.org_pdf_1810.04805.pdf.pdf
2023-09-15 13:19:36 -0500    2712 execution.flow     INFO     Node download_too

In [16]:
eval_groundedness_3k_context = pf.run(
    flow=eval_groundedness_flow_path,
    run=run_3k_context,
    column_mapping={
        "question": "${run.inputs.question}",
        "answer": "${run.outputs.answer}",
        "context": "${run.outputs.context}",
    },
    display_name="eval_groundedness_3k_context",
)
pf.stream(eval_groundedness_3k_context)

print(eval_groundedness_3k_context)

2023-09-15 13:19:50 -0500   32024 execution          INFO     Start to run 2 nodes with concurrency level 2.
2023-09-15 13:19:50 -0500   32024 execution.flow     INFO     Executing node gpt_groundedness. node run id: eval_groundness_default_20230915_131943_010944_gpt_groundedness_0
2023-09-15 13:19:51 -0500   32024 execution.flow     INFO     Node gpt_groundedness completes.
2023-09-15 13:19:51 -0500   32024 execution.flow     INFO     Executing node parse_score. node run id: eval_groundness_default_20230915_131943_010944_parse_score_0
2023-09-15 13:19:51 -0500   32024 execution.flow     INFO     Node parse_score completes.
2023-09-15 13:19:52 -0500   16700 execution          INFO     Process 0 queue empty, exit.
2023-09-15 13:19:52 -0500   16700 execution          INFO     Executing aggregation nodes...
2023-09-15 13:19:52 -0500   16700 execution          INFO     Start to run 1 nodes with concurrency level 2.
2023-09-15 13:19:52 -0500   16700 execution.flow     INFO     Executing nod

In [17]:
pf.get_details(eval_groundedness_3k_context)

Unnamed: 0,inputs.answer,inputs.context,inputs.question,inputs.line_number,outputs.groundedness
0,The name of the new language representation mo...,[e introduce a new language representa-\ntion ...,What is the name of the new language represent...,0,10


In [18]:
pf.visualize([eval_groundedness_2k_context, eval_groundedness_3k_context])

The HTML file is generated at 'C:\\Users\\astalati\\AppData\\Local\\Temp\\pf-visualize-detail-lf621pbw.html'.
Trying to view the result in a web browser...
Successfully visualized from the web browser.
