# Local Evaluation - Groundedness

After you have setup and configured the prompt flow, its time to evaluation its performance. Here we can use the prompt flow SDK to test different questions and see how the prompt flow performs using the evaluation prompt flows provided.

In [1]:
from promptflow import PFClient
pf_client = PFClient()

from dotenv import load_dotenv

from pathlib import Path
load_dotenv(Path("../local.env"))

True

In [2]:
# Add a question to test the base prompt flow.
question = "How do I wash the jacket I purchased?"
customerId = "4"
output = pf_client.test(
    flow="../contoso-intent", # Path to the flow directory
    inputs={ # Inputs to the flow
        "chat_history": [],
        "question": question,
        "customerId": customerId,
    },
)

output["answer"] = "".join(list(output["answer"]))



2024-01-08 18:45:10 -0600   30664 execution.flow     INFO     Start executing nodes in thread pool mode.
2024-01-08 18:45:10 -0600   30664 execution.flow     INFO     Start to run 3 nodes with concurrency level 16.
2024-01-08 18:45:10 -0600   30664 execution.flow     INFO     Executing node classify_intent_prompt. node run id: a089ffa8-f1ea-490d-b995-de361a38cfc2_classify_intent_prompt_0
2024-01-08 18:45:10 -0600   30664 execution.flow     INFO     Node classify_intent_prompt completes.
2024-01-08 18:45:10 -0600   30664 execution.flow     INFO     Executing node classify_intent_llm. node run id: a089ffa8-f1ea-490d-b995-de361a38cfc2_classify_intent_llm_0
2024-01-08 18:45:10 -0600   30664 execution.flow     INFO     Node classify_intent_llm completes.
2024-01-08 18:45:10 -0600   30664 execution.flow     INFO     Executing node run_chat_or_support. node run id: a089ffa8-f1ea-490d-b995-de361a38cfc2_run_chat_or_support_0
2024-01-08 18:45:10 -0600   30664 execution.flow     INFO     [run_cha

In [3]:
output

{'answer': '{"answer":"Hi Emily! To wash the Summit Breeze Jacket you purchased, machine wash it on a gentle cycle with cold water and air dry it. Avoid using bleach or fabric softeners. \\ud83e\\uddfc\\ud83e\\uddfa Hope this helps! Let me know if you have any other questions. \\ud83d\\ude0a","citations":[{"content":"# Information about product item_number: 17\\n\\nRainGuard Hiking Jacket, price $110,\\n\\n\\n\\nWear appropriate layers underneath the jacket based on the weather conditions.\\nAdjust the hood, cuffs, and hem to achieve a snug and comfortable fit.\\nUtilize the pockets for storing small items such as keys, wallet, or a mobile phone.\\nIf needed, open the ventilation zippers to regulate airflow and prevent overheating.\\nBe mindful of the jacket\'s limitations in extreme weather conditions.\\n\\nCare and Maintenance\\n   To ensure the longevity and performance of your RainGuard Hiking Jacket, please adhere to the following care instructions:\\n\\nClean the jacket as needed

Test the groundedness of the prompt flow with the answer from the above question.

In [4]:
test = pf_client.test(
    flow="intent_eval",
    inputs={
        "question": question,
        "prediction": str(output["intent_context"]),
        "groundtruth": "support",
    },
)



2024-01-08 18:45:29 -0600   30664 execution.flow     INFO     Start executing nodes in thread pool mode.
2024-01-08 18:45:29 -0600   30664 execution.flow     INFO     Start to run 3 nodes with concurrency level 16.
2024-01-08 18:45:29 -0600   30664 execution.flow     INFO     Executing node llm_call. node run id: 0be4e512-3cfb-4356-87dd-63f0d305605a_llm_call_0
2024-01-08 18:45:29 -0600   30664 execution.flow     INFO     Node llm_call completes.
2024-01-08 18:45:29 -0600   30664 execution.flow     INFO     Executing node assert_value. node run id: 0be4e512-3cfb-4356-87dd-63f0d305605a_assert_value_0
2024-01-08 18:45:29 -0600   30664 execution.flow     INFO     Node assert_value completes.
2024-01-08 18:45:29 -0600   30664 execution.flow     INFO     Executing node get_accuracy. node run id: 0be4e512-3cfb-4356-87dd-63f0d305605a_get_accuracy_0
2024-01-08 18:45:29 -0600   30664 execution.flow     INFO     Node get_accuracy completes.


In [5]:
test

{'results': {'accuracy': 100.0}}

# AI Studio Azure batch run on an evaluation json dataset for intent mapping classification accuracy

Now in order to test these more thoroughly, we can use the Azure AI Studio to run batches of test data with the evaluation prompt flow on a larger dataset.

In [1]:
import json
# Import required libraries
from promptflow.azure import PFClient

# Import required libraries
from azure.identity import DefaultAzureCredential, InteractiveBrowserCredential

In [2]:
try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()

Populate the `config.json` file with the subscription_id, resource_group, and workspace_name.

In [3]:
config_path = "../config.json"
pf_azure_client = PFClient.from_config(credential=credential, path=config_path)

Found the config file in: ..\config.json


Add the runtime from the AI Studio that will be used for the cloud batch runs.

In [4]:
# Update the runtime to the name of the runtime you created previously
runtime = "automatic"
# load flow
flow = "../contoso-intent"
# load data
data = "../data/alltestdata.jsonl"

In [5]:
# get current time stamp for run name
import datetime
now = datetime.datetime.now()
timestamp = now.strftime("%Y_%m_%d_%H%M%S")
run_name = timestamp+"_intent_base_run"
print(run_name)

2024_01_09_101416_intent_base_run


Create a base run to use as the variant for the evaluation runs. 

_NOTE: If you get "'An existing connection was forcibly closed by the remote host'" run the cell again._

In [6]:
# create base run in Azure Ai Studio
base_run = pf_azure_client.run(
    flow=flow,
    data=data,
    column_mapping={
        # reference data
        "customerId": "${data.customerId}",
        "question": "${data.question}",
    },
    runtime=runtime,
    display_name=run_name,
    name=run_name
)
print(base_run)

[32mUploading contoso-intent (0.12 MBs): 100%|##########| 119139/119139 [00:01<00:00, 75232.08it/s]
[39m

[2024-01-09 10:14:26,164][promptflow.azure._restclient.flow_service_caller][INFO] - Start polling until session creation is completed...


Waiting for session creation, current status: InProgress
Waiting for session creation, current status: InProgress
Waiting for session creation, current status: InProgress
Waiting for session creation, current status: InProgress
Waiting for session creation, current status: InProgress


[2024-01-09 10:18:26,978][promptflow.azure._restclient.flow_service_caller][INFO] - Session creation finished with status Succeeded.


Portal url: https://ai.azure.com/projectflows/bulkrun/run/2024_01_09_101416_intent_base_run/details?wsid=/subscriptions/91d27443-f037-45d9-bb0c-428256992df6/resourcegroups/rg-aitourcontosostore/providers/Microsoft.MachineLearningServices/workspaces/contoso-store
name: 2024_01_09_101416_intent_base_run
created_on: '2024-01-09T16:18:33.239005+00:00'
status: Preparing
display_name: 2024_01_09_101416_intent_base_run
description: null
tags: {}
properties:
  azureml.promptflow.runtime_name: automatic
  azureml.promptflow.runtime_version: 20231218.v2
  azureml.promptflow.definition_file_name: flow.dag.yaml
  azureml.promptflow.session_id: c980f1d6de077e6432ed9ddd0187f8d8096d0f4faffa88e0
  azureml.promptflow.flow_lineage_id: a31bf17848f3a357f1665ac0d316fbaee941549935ef1a1d0f3bc93fe92db52a
  azureml.promptflow.flow_definition_datastore_name: workspaceblobstore
  azureml.promptflow.flow_definition_blob_path: LocalUpload/49a7817a950e1f77272d75a859c3d850/contoso-intent/flow.dag.yaml
  azureml.prom

In [7]:
pf_azure_client.stream(base_run)

2024-01-09 16:18:39 +0000     135 promptflow-runtime INFO     [2024_01_09_101416_intent_base_run] Receiving v2 bulk run request 80e5e30f-1385-47a9-b1b0-9388810f3439: {"flow_id": "2024_01_09_101416_intent_base_run", "flow_run_id": "2024_01_09_101416_intent_base_run", "flow_source": {"flow_source_type": 1, "flow_source_info": {"snapshot_id": "c419a9bc-211e-4eea-b66b-5cc766c37960"}, "flow_dag_file": "flow.dag.yaml"}, "connections": "**data_scrubbed**", "log_path": "https://staitourcont008192701846.blob.core.windows.net/8a82542e-6930-4508-aafa-2285e571f07b-azureml/ExperimentRun/dcid.2024_01_09_101416_intent_base_run/logs/azureml/executionlogs.txt?sv=2019-07-07&sr=b&sig=**data_scrubbed**&skoid=db45885e-fcc8-4eb9-b2b3-f2c33cd507f5&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2024-01-09T16%3A08%3A32Z&ske=2024-01-11T00%3A18%3A32Z&sks=b&skv=2019-07-07&st=2024-01-09T16%3A08%3A39Z&se=2024-01-10T00%3A18%3A39Z&sp=rcw", "app_insights_instrumentation_key": "InstrumentationKey=**data_scrubbed**;Inge

<promptflow._sdk.entities._run.Run at 0x14e722eea90>

In [8]:
details = pf_azure_client.get_details(base_run)
details.head(10)

Unnamed: 0_level_0,inputs.chat_history,inputs.customerId,inputs.question,inputs.line_number,outputs.answer,outputs.intent_context,outputs.context
outputs.line_number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,[],7,what is the temperature rating of my sleeping ...,0,,,
1,[],6,is the jacket I bought machine washable?,5,Hi Emily! The CozyNights Sleeping Bag has a te...,intent: support,{'citations': [{'content': '# Information abou...
2,[],8,what is the waterproof rating of the TrailMast...,3,Hi Emily! Thank you for your question. The Tra...,intent: support,{'citations': [{'content': '# Information abou...
3,[],8,I would like to return the tent I bought. It i...,6,,,
4,[],2,What is your return or exchange policy?,4,,,
5,[],6,Do you have any hiking boots?,10,,,
6,[],1,Do you have any climbing gear?,8,,,
7,[],2,What gear do you recommend for hiking?,11,,,
8,[],4,tell me about your hiking jackets,7,,,
9,[],7,what is the temperature rating of the cozynigh...,1,,,


# Cloud Eval run on Json Data for Intent Mapping Classification

In [17]:
eval_flow = "intent_eval/"
run_name = timestamp+"intent_eval_run"
print(run_name)

eval_run_variant = pf_azure_client.run(
    flow=eval_flow,
    data=data,  # path to the data file
    run=base_run,  # use run as the variant
    column_mapping={
        # reference data
        "question": "${data.question}",
        "groundtruth": "${data.intent}",
        "prediction": "${run.outputs.intent_context}",
    },
    runtime=runtime,
    display_name=run_name,
    name=run_name
)

2024_01_08_184555intent_eval_run-1


[2024-01-08 18:55:07,695][promptflow.azure._restclient.flow_service_caller][INFO] - Start polling until session creation is completed...
[2024-01-08 18:55:14,824][promptflow.azure._restclient.flow_service_caller][INFO] - Session creation finished with status Succeeded.


Portal url: https://ai.azure.com/projectflows/bulkrun/run/2024_01_08_184555intent_eval_run-1/details?wsid=/subscriptions/91d27443-f037-45d9-bb0c-428256992df6/resourceGroups/rg-aitourcontosostore/providers/Microsoft.MachineLearningServices/workspaces/contoso-store


In [18]:
pf_azure_client.stream(eval_run_variant)

2024-01-09 00:55:24 +0000      72 promptflow-runtime INFO     [2024_01_08_184555intent_eval_run-1] Receiving v2 bulk run request 12f30ba0-7135-4607-8e2b-92900166eea3: {"flow_id": "2024_01_08_184555intent_eval_run-1", "flow_run_id": "2024_01_08_184555intent_eval_run-1", "flow_source": {"flow_source_type": 1, "flow_source_info": {"snapshot_id": "e44359c2-b993-4bb7-9293-12ea709862c4"}, "flow_dag_file": "flow.dag.yaml"}, "connections": "**data_scrubbed**", "log_path": "https://staitourcont008192701846.blob.core.windows.net/8a82542e-6930-4508-aafa-2285e571f07b-azureml/ExperimentRun/dcid.2024_01_08_184555intent_eval_run-1/logs/azureml/executionlogs.txt?sv=2019-07-07&sr=b&sig=**data_scrubbed**&skoid=db45885e-fcc8-4eb9-b2b3-f2c33cd507f5&sktid=72f988bf-86f1-41af-91ab-2d7cd011db47&skt=2024-01-08T17%3A33%3A09Z&ske=2024-01-10T01%3A43%3A09Z&sks=b&skv=2019-07-07&st=2024-01-09T00%3A45%3A23Z&se=2024-01-09T08%3A55%3A23Z&sp=rcw", "app_insights_instrumentation_key": "InstrumentationKey=**data_scrubbed**;

<promptflow._sdk.entities._run.Run at 0x15c71f01df0>

In [19]:
details = pf_azure_client.get_details(eval_run_variant)
details.head(10)

Unnamed: 0_level_0,inputs.question,inputs.groundtruth,inputs.prediction,inputs.line_number,outputs.results
outputs.line_number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
3,what is the waterproof rating of the TrailMast...,support,Intent: support,3,{'accuracy': 100.0}
0,what is the temperature rating of my sleeping ...,support,intent: support,0,{'accuracy': 100.0}
1,what is the temperature rating of the cozynigh...,support,intent: support,1,{'accuracy': 100.0}
4,What is your return or exchange policy?,support,intent: support,4,{'accuracy': 100.0}
2,what is the waterproof rating of the tent I bo...,support,intent: support,2,{'accuracy': 100.0}
6,I would like to return the tent I bought. It i...,support,intent: support,6,{'accuracy': 100.0}
5,is the jacket I bought machine washable?,support,intent: support,5,{'accuracy': 100.0}
7,tell me about your hiking jackets,chat,intent: chat,7,{'accuracy': 100.0}
8,Do you have any climbing gear?,chat,intent: chat,8,{'accuracy': 100.0}
10,Do you have any hiking boots?,chat,intent: chat,10,{'accuracy': 100.0}


In [20]:

metrics = pf_azure_client.get_metrics(eval_run_variant)
print(json.dumps(metrics, indent=4))

{}


In [21]:
pf_azure_client.visualize([base_run, eval_run_variant])

Web View: https://ml.azure.com/prompts/flow/bulkrun/runs/outputs?wsid=/subscriptions/91d27443-f037-45d9-bb0c-428256992df6/resourceGroups/rg-aitourcontosostore/providers/Microsoft.MachineLearningServices/workspaces/contoso-store&runId=2024_01_08_184555_intent_base_run,2024_01_08_184555intent_eval_run-1
