# Execute batch groundness evaluation flow using Promptflow Python SDK

### Overview

Prompt flow is a suite of development tools designed to streamline the end-to-end development cycle of LLM-based AI applications, from ideation, prototyping, testing, evaluation to production deployment and monitoring. It makes prompt engineering much easier and enables you to build LLM apps with production quality.

In this handson, you will be able to:
Evaluate your flows, calculate quality and performance metrics with run result datasets.
Debug and iterate your flows, especially tracing interaction with LLMs with ease.
In order to calculate the other metrics like accuracy, relevance score. Please refer to [Develop evaluation flow](https://microsoft.github.io/promptflow/how-to-guides/develop-a-dag-flow/develop-evaluation-flow.html) to learn how to develop an evaluation flow.

#### 1. Create Promptflow client with Credential and configuration

#### 2. AI Studio batch run to get the base run data

#### 3. Run Groundedness Evaluation of the Promptflow

[Note] Please use `Python 3.10 - SDK v2 (azureml_py310_sdkv2)` conda environment.


In [1]:
%load_ext autoreload
%autoreload 2

import os, sys
lab_prep_dir = os.getcwd().split("slm-innovator-lab")[0] + "slm-innovator-lab/0_lab_preparation"
sys.path.append(os.path.abspath(lab_prep_dir))

from common import check_kernel
check_kernel()

Kernel: python31014jvsc74a57bd02139c70ac98f3202d028164a545621647e07f47fd6f5d8ac55cf952bf7c15ed1


In [2]:
import json
import os
import time

# Import required libraries
from promptflow.azure import PFClient
from promptflow.entities import Run
# Import required libraries
from azure.identity import DefaultAzureCredential, EnvironmentCredential, InteractiveBrowserCredential
from dotenv import load_dotenv
from azure.core.exceptions import HttpResponseError

load_dotenv("../../.env")

with open('../3_2_prototyping/config.json', 'r') as f:
    config = json.load(f)
    
print(config["subscription_id"])
print(config["resource_group"])
print(config["workspace_name"]) # Azure AI Studio project name which is not the same as the Azure ML workspace name


49aee8bf-3f02-464f-a0ba-e3467e7d85e2
hubestus1grp
lgestus1


In [3]:
from tqdm import tqdm

# Monitor the status of the run_result
def monitor_status(pf_azure_client:PFClient, run_result:Run):
    with tqdm(total=3, desc="Running Status", unit="step") as pbar:
        status = pf_azure_client.runs.get(run_result).status
        if status == "Preparing":
            pbar.update(1)
        while status != "Completed" and status != "Failed":
            if status == "Running" and pbar.n < 2:
                pbar.update(1)
            print(f"Current Status: {status}")
            time.sleep(10)
            status = pf_azure_client.runs.get(run_result).status
        pbar.update(1)
        print("Promptflow Running Completed")

## 1. Create Promptflow client with Credential and configuration

-   Create a promptflow client with the credential and configuration. You need to set the `config.json` file with subscription_id, resource_group and workspace_name


In [4]:
try:
    credential = DefaultAzureCredential()
    # Check if given credential can get token successfully.
    credential.get_token("https://management.azure.com/.default")
except Exception as ex:
    # Fall back to InteractiveBrowserCredential in case DefaultAzureCredential not work
    credential = InteractiveBrowserCredential()
# if you cannot use DefaultAzureCredential and InteractiveBrowserCredential you need to set up the Managed identity in your .env file

pf_azure_client = PFClient.from_config(credential=credential, path="../3_2_prototyping/config.json")

try:
    workspace = pf_azure_client.ml_client.workspaces.get(name=config["workspace_name"])
    print(f"Connected to Azure AI Studio Workspace: {workspace.name}")
    print(f"Workspace Location: {workspace.location}")
    print(f"Workspace ID: {workspace.id}")
except HttpResponseError as e:
    print(f"Failed to connect to Azure ML Workspace: {e}")


Found the config file in: ../3_2_prototyping/config.json


Connected to Azure AI Studio Workspace: lgestus1
Workspace Location: eastus
Workspace ID: /subscriptions/49aee8bf-3f02-464f-a0ba-e3467e7d85e2/resourceGroups/hubestus1grp/providers/Microsoft.MachineLearningServices/workspaces/lgestus1


## 2. AI Studio batch run to get the base run data


## Check the exist connections

-   currently we only support create connection in Azure AI, ML Studio UI. Check the exiting connections in the workspace.
    > ✨ **_important_** <br>
    > Update flow.dag.yaml files in your flow_path with the connection name you have created in the Azure ML Studio UI.


In [5]:
from jinja2 import Environment, FileSystemLoader
from pathlib import Path

env = Environment(loader=FileSystemLoader('.'))
# Read the template file
template = env.get_template('./flow-template/chat-serverless.flow.dag.yaml')

# Define the variables for the template with your connection names for chat serverless 
variables = {
	"your_phi35_serverless_connection_name": "lgph35-1",
	"your_gpt4o_connection_name": "aoai-tst1"
}

rendered_content = template.render(variables)
Path('../3_2_prototyping/chat-serverless/flow.dag.yaml').write_text(rendered_content)

print(Path('../3_2_prototyping/chat-serverless/flow.dag.yaml').read_text()) 

$schema: https://azuremlschemas.azureedge.net/promptflow/latest/Flow.schema.json
environment:
  python_requirements_txt: requirements.txt
inputs:
  question:
    type: string
    is_chat_input: true
    default: What is the capital of France?
  context:
    type: string
    is_chat_input: false
    default: TrailMaster X4 Tent is a durable polyester tent
outputs:
  phi35_answer:
    type: string
    reference: ${phi35.output}
    is_chat_output: false
  gpt4o_answer:
    type: string
    reference: ${gpt4o.output}
    is_chat_output: true
nodes:
- name: phi35
  type: python
  source:
    type: code
    path: phi35_chatcompletion.py
  inputs:
    connection: lgph35-1
    question: ${inputs.question}
    context: ${inputs.context}
- name: gpt4o
  type: llm
  source:
    type: code
    path: chat.jinja2
  inputs:
    deployment_name: gpt-4o
    temperature: 0.2
    top_p: 1
    max_tokens: 512
    question: ${inputs.question}
  connection: aoai-tst1
  api: chat
  module: promptflow.tools.

In [6]:
from jinja2 import Environment, FileSystemLoader
from pathlib import Path

env = Environment(loader=FileSystemLoader('.'))

# Read the template file
template = env.get_template('./flow-template/evaluation.flow.dag.yaml')

# Define the variables for the template with your connection names for chat serverless 
variables = {
	"your_gpt4o_connection_name": "aoai-tst1"
}

rendered_content = template.render(variables)
Path('./evaluation/flow.dag.yaml').write_text(rendered_content)

print(Path('./evaluation/flow.dag.yaml').read_text()) 

$schema: https://azuremlschemas.azureedge.net/promptflow/latest/Flow.schema.json
environment:
  python_requirements_txt: requirements.txt
inputs:
  question:
    type: string
    default: What is TrailMaster X4 Tent?
  context:
    type: string
    default:
      TrailMaster X4 Tent is a durable polyester tent for four, with water-resistant construction, 
      multiple doors, interior pockets, and reflective guy lines. Summit Breeze Jacket is lightweight, windproof, 
      and water-resistant hiking jacket with breathable polyester material, adjustable cuffs, 
      and secure zippered pockets. TrekReady Hiking Boots is durable leather boots 
      with reinforced stitching, toe protection, cushioned insoles, and breathable materials for comfort. 
      BaseCamp Folding Table is lightweight aluminum table, 48 x 24 inches, foldable design. 
      EcoFire Camping Stove is portable stainless steel stove, lightweight, fuel-efficient, and easy to use.
  answer:
    type: string
    default

In [7]:
flow_path = "../3_2_prototyping/chat-serverless"
data_path = "../3_2_prototyping/data/questions_outdoor.jsonl"

# get the context from context.json file as str and map it to the column_mapping
with open('../3_2_prototyping/data/context_simple.json', 'r') as file:
    context = json.load(file)

column_mapping = {
    "question": "${data.question}",
    "context": context.get("context")    
}

base_run = pf_azure_client.run(
    flow=flow_path,
    type="chat",
    data=data_path, 
    column_mapping=column_mapping,
    display_name="chat_serverless_context_data",
    tags={"chat_serverless_context_jsonl": "", "1st_round": ""},
)

[32mUploading chat-serverless (0.0 MBs): 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4115/4115 [00:00<00:00, 73007.52i

Portal url: https://ai.azure.com/projectflows/trace/run/chat_serverless_variant_0_20241126_001937_448496/details?wsid=/subscriptions/49aee8bf-3f02-464f-a0ba-e3467e7d85e2/resourcegroups/hubestus1grp/providers/Microsoft.MachineLearningServices/workspaces/lgestus1


In [8]:
monitor_status(pf_azure_client, base_run)

Running Status:  33%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | 1/3 [00:00<00:01,  1.23step/s]

Promptflow Running Completed





In [9]:
detail = pf_azure_client.get_details(base_run)

detail

Unnamed: 0_level_0,inputs.question,inputs.context,inputs.line_number,outputs.phi35_answer,outputs.gpt4o_answer
outputs.line_number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,tell me about your TrailMaster X4 Tent,TrailMaster X4 Tent is a durable polyester ten...,0,,The TrailMaster X4 Tent is a spacious and dura...
1,Do you have any climbing gear?,TrailMaster X4 Tent is a durable polyester ten...,1,,"I don't have any climbing gear myself, but I c..."
2,Can you tell me about your selection of tents?,TrailMaster X4 Tent is a durable polyester ten...,2,,Sure thing! While I don't have a physical sele...
3,Do you have TrekReady Hiking Boots? How much i...,TrailMaster X4 Tent is a durable polyester ten...,3,,"I don't have a stockroom to check, but TrekRea..."


## 3. Run Groundedness Evaluation of the Promptflow

The eval-groundness flow is illustrating measures how grounded the model's predicted answers are against the context. Even if LLM’s responses are true, if not verifiable against context, then such responses are considered ungrounded.

> 🧪 +For Your Information<br> > **Groundedness** is a measure of how well the model's responses are grounded in the context. A grounded response is one that is directly supported by the context. For example, if the context is about a dog, a grounded response would be "Dogs are mammals." An ungrounded response would be "Dogs can fly."


In [10]:
import datetime

eval_groundedness_flow_path = "./evaluation/"
data_path = "./data/qna_outdoor.jsonl"

with open('../3_2_prototyping/data/context_simple.json', 'r') as file:
    context = json.load(file)

column_mapping={
        "question": "${data.question}",
        "context": context.get("context")    ,
        "answer": "${run.outputs.gpt4o_answer}",
    }
eval_name = "eval_groundedness"
now = datetime.datetime.now()
timestamp = now.strftime("%m_%d_%H%M")
eval_name = str(eval_name + "_" + timestamp)

eval_groundedness_result = pf_azure_client.run(
    flow=eval_groundedness_flow_path,
    data=data_path,
    run=base_run,  # use run as the variant
    column_mapping=column_mapping,
    display_name=eval_name,
    name=eval_name,
)



# pf_azure_client.stream(eval_groundedness_result)

[32mUploading evaluation (0.01 MBs): 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6936/6936 [00:00<00:00, 84094.38i

Portal url: https://ai.azure.com/projectflows/trace/run/eval_groundedness_11_26_0023/details?wsid=/subscriptions/49aee8bf-3f02-464f-a0ba-e3467e7d85e2/resourcegroups/hubestus1grp/providers/Microsoft.MachineLearningServices/workspaces/lgestus1


In [11]:
monitor_status(pf_azure_client, eval_groundedness_result)

Running Status:  33%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | 1/3 [05:01<10:02, 301.41s/step]

Promptflow Running Completed





In [12]:
detail = pf_azure_client.get_details(eval_groundedness_result)

detail

Unnamed: 0_level_0,inputs.question,inputs.context,inputs.answer,inputs.line_number,outputs.gpt_groundedness
outputs.line_number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,Can you tell me about your selection of tents?,TrailMaster X4 Tent is a durable polyester ten...,The TrailMaster X4 Tent is a spacious and dura...,0,1.0
1,can you tell me BaseCamp Folding Table?,TrailMaster X4 Tent is a durable polyester ten...,"I don't have any climbing gear myself, but I c...",1,1.0
2,Do you have any climbing gear?,TrailMaster X4 Tent is a durable polyester ten...,Sure thing! While I don't have a physical sele...,2,1.0
3,Do you have TrekReady Hiking Boots? How much i...,TrailMaster X4 Tent is a durable polyester ten...,"I don't have a stockroom to check, but TrekRea...",3,1.0
