# Question Answering - test, evaluation and experimentation

We will walk you through how to use prompt flow Python SDK to test, evaluate and experiment with the "Chat with PDF" flow.

## 1. Create connections
Connection in prompt flow is for managing settings of your application behaviors incl. how to talk to different services (Azure OpenAI for example).

In [1]:
import promptflow

pf = promptflow.PFClient()

# List all the available connections
for c in pf.connections.list():
    print(c.name + " (" + c.type + ")")

fpdoaoaice (AzureOpenAI)
dataaioaicg (CognitiveSearch)
chatpdf (Custom)
aoai (AzureOpenAI)
aoaicg (CognitiveSearch)
entaoai (Custom)
llmops (Custom)


You will need to have a connection named `entaoai` to run Q&A PromptFlow

In [2]:
# Create the Cognitive search connection using CLI
!pf connection create -f "./promptflow/entaoai.yml

{
    "name": "entaoai",
    "module": "promptflow.connections",
    "created_date": "2023-09-01T18:42:17.454175",
    "last_modified_date": "2023-09-21T13:07:15.332505",
    "type": "custom",
    "configs": {
        "OpenAiEmbedding": "embedding",
        "OpenAiVersion": "2023-07-01-preview",
        "OpenAiChat": "chat",
        "OpenAiChat16k": "chat16k",
        "OpenAiEndPoint": "https://dataaiapim.azure-api.net",
        "CosmosEndpoint": "https://dataaichatgpt.documents.azure.com:443/",
        "CosmosDatabase": "aoai",
        "CosmosContainer": "chatgpt",
        "PineconeEnv": "us-east-1-aws",
        "VsIndexName": "oaiembed",
        "RedisAddress": "dataairedis.southcentralus.azurecontainer.io",
        "RedisPort": "6379",
        "KbIndexName": "aoaikb",
        "SearchService": "dataaioaicg",
        "SynapseName": "dataaiazuresql.database.windows.net,1433",
        "SynapsePool": "northwind",
        "SynapseUser": "azureadmin"
    },
    "secrets": {
        "OpenAi



In [3]:
# create needed connection
from promptflow.entities import AzureOpenAIConnection, OpenAIConnection

try:
    conn_name = "entaoai"
    conn = pf.connections.get(name=conn_name)
    print("using existing connection")
except:
    print("Create connection by uncommenting previous cell")

print(conn)

using existing connection
name: entaoai
module: promptflow.connections
created_date: '2023-09-01T18:42:17.454175'
last_modified_date: '2023-09-03T17:21:44.729985'
type: custom
configs:
  OpenAiEmbedding: embedding
  OpenAiVersion: 2023-07-01-preview
  OpenAiChat: chat
  OpenAiChat16k: chat16k
  OpenAiEndPoint: https://dataaiapim.azure-api.net
  CosmosEndpoint: https://dataaichatgpt.documents.azure.com:443/
  CosmosDatabase: aoai
  CosmosContainer: chatgpt
  PineconeEnv: us-east-1-aws
  VsIndexName: oaiembed
  RedisAddress: dataairedis.southcentralus.azurecontainer.io
  RedisPort: '6379'
  KbIndexName: aoaikb
  SearchService: dataaioaicg
secrets:
  OpenAiKey: '******'
  CosmosKey: '******'
  SearchKey: '******'
  PineconeKey: '******'
  RedisPassword: '******'



## 2. Test the flow

In [4]:
output = pf.flows.test(
    "../api/PromptFlow/QuestionAnswering/",
    inputs={
	"question": "What is the main difference between BERT and previous language representation models?",
	"answer": "BERT is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers.",
	"chainType": "stuff",
	"indexType": "cogsearchvs",
	"indexNs": "8fe8ee44933240fa8bc1d72858e4d1eb",
	"postBody": {
		"values": [{
			"recordId": 0,
			"data": {
				"text": "",
				"approach": "rtr",
				"overrides": {
					"semantic_ranker": "true",
					"semantic_captions": "false",
					"top": 3,
					"temperature": 0,
					"promptTemplate": "Given the following extracted parts of a long document and a question, create a final answer. \n        If you don't know the answer, just say that you don't know. Don't try to make up an answer. \n        If the answer is not contained within the text below, say \"I don't know\".\n\n        {summaries}\n        Question: {question}\n        ",
					"chainType": "stuff",
					"tokenLength": 1000,
					"embeddingModelType": "azureopenai",
					"deploymentType": "gpt3516k"
				}
			}
		}]
	}
    },
)
print(output)

  from tqdm.autonotebook import tqdm
Unknown input(s) of flow: {'answer': 'BERT is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers.'}


2023-09-15 16:26:39 -0500   35760 execution          INFO     Start to run 8 nodes with concurrency level 16.
2023-09-15 16:26:39 -0500   35760 execution.flow     INFO     Executing node parse_postBody. node run id: ddb7db5d-bcdc-4c28-917e-a5d86ae36a23_parse_postBody_0
2023-09-15 16:26:39 -0500   35760 execution.flow     INFO     Node parse_postBody completes.
2023-09-15 16:26:39 -0500   35760 execution.flow     INFO     Executing node create_llm. node run id: ddb7db5d-bcdc-4c28-917e-a5d86ae36a23_create_llm_0
2023-09-15 16:26:39 -0500   35760 execution.flow     INFO     Executing node embed_the_question. node run id: ddb7db5d-bcdc-4c28-917e-a5d86ae36a23_embed_the_question_0
2023-09-15 16:26:39 -0500   35760 execution.flow     INFO     Node create_llm completes.
2023-09-15 16:26:39 -0500   35760 execution.flow     INFO     Node embed_the_question completes.
2023-09-15 16:26:39 -0500   35760 execution.flow     INFO     Executing node check_cache_answer. node run id: ddb7db5d-bcdc-4c28-91

## 3. Run the flow with a data file

In [16]:
flowPath = "../api/PromptFlow/QuestionAnswering/"
dataPath = "../api/PromptFlow/QuestionAnswering/bert.jsonl"

columnMapping = {
	"question": "${data.question}",
	"answer": "${data.answer}",
	"chainType": "${data.chainType}",
	"indexType": "${data.indexType}",
	"indexNs": "${data.indexNs}",
	"postBody": "${data.postBody}"
    }

bertContext = pf.run(flow=flowPath, data=dataPath, column_mapping=columnMapping)
pf.stream(bertContext)

print(bertContext)

2023-09-15 16:35:41 -0500   33668 execution          INFO     Start to run 8 nodes with concurrency level 2.
2023-09-15 16:35:41 -0500   33668 execution.flow     INFO     Executing node parse_postBody. node run id: questionanswering_default_20230915_163531_883530_parse_postBody_3
2023-09-15 16:35:41 -0500    9504 execution          INFO     Start to run 8 nodes with concurrency level 2.
2023-09-15 16:35:41 -0500   33668 execution.flow     INFO     Node parse_postBody completes.
ode run id: questionanswering_default_20230915_163531_883530_parse_postBody_1
2023-09-15 16:35:41 -0500   33668 execution.flow     INFO     Executing node create_llm. node run id: questionanswering_default_20230915_163531_883530_create_llm_3
2023-09-15 16:35:41 -0500    9504 execution.flow     INFO     Node parse_postBody completes.
2023-09-15 16:35:41 -0500   33668 execution.flow     INFO     Executing node embed_the_question. node run id: questionanswering_default_20230915_163531_883530_embed_the_question_3
20

In [19]:
pf.get_details(bertContext)

Unnamed: 0,inputs.chainType,inputs.indexNs,inputs.indexType,inputs.postBody,inputs.question,inputs.answer,inputs.line_number,outputs.answer,outputs.context,outputs.output
0,stuff,8fe8ee44933240fa8bc1d72858e4d1eb,cogsearchvs,"{'values': [{'recordId': 0, 'data': {'text': '...",What is the name of the new language represent...,BERT,0,The name of the new language representation mo...,[BERT: Pre-training of Deep Bidirectional Tran...,"{'values': [{'recordId': 0, 'data': {'data_poi..."
1,stuff,8fe8ee44933240fa8bc1d72858e4d1eb,cogsearchvs,"{'values': [{'recordId': 0, 'data': {'text': '...",What is the main difference between BERT and p...,BERT is designed to pretrain deep bidirectiona...,1,The main difference between BERT and previous ...,[BERT: Pre-training of Deep Bidirectional Tran...,"{'values': [{'recordId': 0, 'data': {'data_poi..."
2,stuff,8fe8ee44933240fa8bc1d72858e4d1eb,cogsearchvs,"{'values': [{'recordId': 0, 'data': {'text': '...",What is the advantage of fine-tuning BERT over...,Fine-tuning BERT reduces the need for many hea...,2,The advantage of fine-tuning BERT over using f...,[Fine-tuning approach\n\nBERTLARGE\nBERTBASE\n...,"{'values': [{'recordId': 0, 'data': {'data_poi..."
3,stuff,8fe8ee44933240fa8bc1d72858e4d1eb,cogsearchvs,"{'values': [{'recordId': 0, 'data': {'text': '...",What are the two unsupervised tasks used to pr...,Masked LM and next sentence prediction,3,The two unsupervised tasks used to pre-train B...,[2.2 Unsupervised Fine-tuning Approaches\n\nAs...,"{'values': [{'recordId': 0, 'data': {'data_poi..."
4,stuff,8fe8ee44933240fa8bc1d72858e4d1eb,cogsearchvs,"{'values': [{'recordId': 0, 'data': {'text': '...",How does BERT handle single sentence and sente...,It uses a special classification token ([CLS])...,4,BERT handles single sentence and sentence pair...,[2.2 Unsupervised Fine-tuning Approaches\n\nAs...,"{'values': [{'recordId': 0, 'data': {'data_poi..."


# 4. Evaluate the "groundedness"
The [eval-groundedness flow](../../evaluation/eval-groundedness/) is using ChatGPT/GPT4 model to grade the answers generated by chat-with-pdf flow.

In [25]:
evalFlowPath = "../Workshop/PromptFlow/EvalGroundness/"
evalContext = pf.run(
    flow=evalFlowPath,
    run=bertContext,
    column_mapping={
        "question": "${run.inputs.question}",
        "answer": "${run.outputs.answer}",
        "context": "${run.outputs.context}",
    },
    display_name="bertContext",
)
pf.stream(evalContext)

print(evalContext)

2023-09-15 16:38:25 -0500   37048 execution          INFO     Start to run 2 nodes with concurrency level 2.
2023-09-15 16:38:25 -0500   37048 execution.flow     INFO     Executing node gpt_groundedness. node run id: evalgroundness_default_20230915_163820_277505_gpt_groundedness_2
2023-09-15 16:38:25 -0500   27180 execution          INFO     Start to run 2 nodes with concurrency level 2.
2023-09-15 16:38:25 -0500   14868 execution          INFO     Start to run 2 nodes with concurrency level 2.
2023-09-15 16:38:25 -0500   27180 execution.flow     INFO     Executing node gpt_groundedness. node run id: evalgroundness_default_20230915_163820_277505_gpt_groundedness_0
2023-09-15 16:38:25 -0500   14868 execution.flow     INFO     Executing node gpt_groundedness. node run id: evalgroundness_default_20230915_163820_277505_gpt_groundedness_1
2023-09-15 16:38:25 -0500   37284 execution          INFO     Start to run 2 nodes with concurrency level 2.
2023-09-15 16:38:25 -0500   22752 execution  

In [26]:
pf.get_details(evalContext)

Unnamed: 0,inputs.answer,inputs.context,inputs.question,inputs.line_number,outputs.groundedness
0,The name of the new language representation mo...,[BERT: Pre-training of Deep Bidirectional Tran...,What is the name of the new language represent...,0,10
1,The main difference between BERT and previous ...,[BERT: Pre-training of Deep Bidirectional Tran...,What is the main difference between BERT and p...,1,10
2,The advantage of fine-tuning BERT over using f...,[Fine-tuning approach\n\nBERTLARGE\nBERTBASE\n...,What is the advantage of fine-tuning BERT over...,2,10
3,The two unsupervised tasks used to pre-train B...,[2.2 Unsupervised Fine-tuning Approaches\n\nAs...,What are the two unsupervised tasks used to pr...,3,10
4,BERT handles single sentence and sentence pair...,[2.2 Unsupervised Fine-tuning Approaches\n\nAs...,How does BERT handle single sentence and sente...,4,10


In [27]:
pf.get_metrics(evalContext)

{'groundedness': 10.0}

In [28]:
pf.visualize(evalContext)

The HTML file is generated at 'C:\\Users\\astalati\\AppData\\Local\\Temp\\pf-visualize-detail-x0ekmtxd.html'.
Trying to view the result in a web browser...
Successfully visualized from the web browser.
