# Evaluating an LLM Agent Routine Using Meta-Prompting

This notebook systematically **evaluates** an LLM Agent routine using a **test-driven approach**. By simulating real-world interactions between a **user and the agent**, we measure the agent's effectiveness in following its structured **system prompt** and making accurate decisions.

## Evaluation Framework

To assess the agent’s **compliance, reasoning, and accuracy**, we define a controlled testing environment where:

1. **Test Cases (CSV Input)** – A dataset containing multiple test scenarios, each with:
   - **Context**: The operational background or situation.
   - **User Request**: The problem or inquiry initiated by the user.
   - **Expected Function**: The correct function the agent should invoke.
   - **Expected Inputs**: The required parameters for proper execution.

2. **Dual LLM Simulation**
   - **User Simulator**: A simulated user that provides inputs dynamically from the test case.
   - **Agent Simulator**: The LLM agent utilizing the **structured routine** from the meta-prompt.

3. **Conversational Interaction**
   - The **User Simulator** presents queries based on the test case.
   - The **Agent Simulator** processes requests and responds according to its defined **meta-prompt routine**.
   - A simulated **dialogue** unfolds, mimicking a real-world exchange.

4. **Evaluation Metrics**
   - We verify whether the **Agent Simulator** correctly identifies and executes the **Expected Function**.
   - We check if the function was called with the **correct inputs**.
   - **Accuracy Score** is computed based on passed and failed tests.

## Objectives

- **Assess the Agent’s Compliance** with structured reasoning and policy adherence.
- **Measure Execution Accuracy** by comparing function calls against expected outputs.
- **Identify System Strengths & Weaknesses** for further refinements.

<div align="center">
  <img src="images/evaluate_ai.png" alt="Meta-prompting Evaluate" width="800">
</div>

## Getting Started

Prerequisites:

- Python 3.11 or 3.12 installed on your system.
- LLM Agentic Tool Mesh library installed. If not, install it using: `pip install 'llmesh[chat, agents]'`
- Install kernel for Jupiter notebook `pip install ipykernel`
- Create `.env` file inside `notebooks/data` folder with the value of `PYTHONPATH` and `OPENAI_API_KEY` environment variables

In [1]:
import os
from notebooks.meta_prompting.utils import load_environment

load_environment()

llm_api_key = os.getenv('OPENAI_API_KEY')
if not llm_api_key:
    raise ValueError("API key is not set in environment variables.")

2025-07-18 17:59:46,465 - ATHON - INFO - Environment variables loaded from: /home/kinsy/llmesh/.env
Environment variables loaded from: %s /home/kinsy/llmesh/.env


In [2]:
from notebooks.meta_prompting.utils import get_routine

routine = get_routine()

2025-07-18 17:59:49,421 - ATHON - INFO - Data loaded from /home/kinsy/llmesh/notebooks/meta_prompting/files/output/ecommerce_agent_routine_0_cached.pkl.


## Loading the Test Dataframe

The test dataframe is loaded to evaluate the LLM Agent's performance and ensure its responses align with the expected outcomes.  

In [3]:
from IPython.display import HTML
from notebooks.meta_prompting.utils import read_test_dataframe

test_dataframe = read_test_dataframe()
display(HTML(test_dataframe.to_html(escape=False)))

Unnamed: 0,Context,Request,Expected Function,Expected Inputs,Order ID,Modification Details,Reason,Refund Amount,Customer ID,Complaint Details,Issue Details,Update Type,New Value,FAQ Topic
0,I want to check the status of my order.,What is the status of order 12345?,get_order_status,"{""order_id"": ""12345""}",12345.0,,,,,,,,,
1,Can you tell me if my order has shipped?,Check the status of order 67890.,get_order_status,"{""order_id"": ""67890""}",67890.0,,,,,,,,,
2,I need to change my shipping address.,Update shipping address for order 11122.,modify_order,"{""order_id"": ""11122"", ""modification_details"": ""New Address: 123 Main St""}",11122.0,New Address: 123 Main St,,,,,,,,
3,Can I change the quantity of an item in my order?,Modify order 33344 to update item quantity.,modify_order,"{""order_id"": ""33344"", ""modification_details"": ""Increase item quantity to 3""}",33344.0,Increase item quantity to 3,,,,,,,,
4,I want to cancel my recent order.,Cancel order 55566.,cancel_order,"{""order_id"": ""55566""}",55566.0,,,,,,,,,
5,Can I cancel an order before it's shipped?,Request to cancel order 77788.,cancel_order,"{""order_id"": ""77788""}",77788.0,,,,,,,,,
6,Where is my package?,Track shipment for order 99900.,track_shipment,"{""order_id"": ""99900""}",99900.0,,,,,,,,,
7,Has my order been delivered?,Get tracking details for order 11223.,track_shipment,"{""order_id"": ""11223""}",11223.0,,,,,,,,,
8,I need to return an item.,Initiate return for order 33445 due to defect.,initiate_return,"{""order_id"": ""33445"", ""reason"": ""Defective item""}",33445.0,,Defective item,,,,,,,
9,Am I eligible for a refund?,Check refund eligibility for order 55667.,check_refund_eligibility,"{""order_id"": ""55667""}",55667.0,,,,,,,,,


## Evaluating the Routine

For each test case, we **generate both the Agent and the Customer**, simulating their conversation in a controlled environment. The interaction follows a structured flow, allowing us to assess the Agent's ability to interpret requests, apply reasoning, and execute the correct function with the appropriate inputs.


In [4]:
import pandas as pd
from notebooks.meta_prompting.utils import evaluate_test_case, write_result_dataframe

records = []
for row_number, test_row in test_dataframe.iterrows():
    print(f"Processing row: {row_number}")
    record = evaluate_test_case(routine, test_row)
    records.append(record)

# Save results in a file
result_dataframe = pd.DataFrame(records)
write_result_dataframe(result_dataframe)

Processing row: 0
2025-07-18 17:59:57,224 - ATHON - DEBUG - Prompt generated from /home/kinsy/llmesh/notebooks/meta_prompting/prompts/customer_persona.txt with params {'details': {'Context': 'I want to check the status of my order.', 'Request': 'What is the status of order 12345?', 'Order ID': 12345.0}}
2025-07-18 17:59:57,226 - ATHON - DEBUG - Selected Langchain ChatOpenAI
2025-07-18 17:59:57,802 - ATHON - DEBUG - Returned model 'gpt-4o-mini'
2025-07-18 17:59:57,804 - ATHON - DEBUG - Prompt generated from /home/kinsy/llmesh/notebooks/meta_prompting/prompts/agent_persona.txt with params {'routine': '1. Identify the customer\'s inquiry type by asking clarifying questions:\n   a. Politely ask the customer to specify if their inquiry is related to Order Management, Returns/Refunds/Disputes, FAQs, or Account Management.\n   b. If additional information is needed (e.g., order ID, customer ID, details of the complaint), ask the customer: "Could you please provide the necessary details to ass

In [5]:
from IPython.display import Markdown

# Print accuracy
total_accuracy = result_dataframe['Is Correct'].mean()
display(Markdown(f"## Accurancy: {round(total_accuracy * 100, 2)}"))

# Print results
result_dataframe = result_dataframe.map(lambda x: x.replace("\n", "<br>") if isinstance(x, str) else x)
display(HTML(result_dataframe.to_html(escape=False)))

## Accurancy: 44.44

Unnamed: 0,Request,Context,Expected Function,Expected Inputs,Actual Function,Actual Inputs,Is Correct,Transcripts
0,What is the status of order 12345?,I want to check the status of my order.,get_order_status,{'order_id': '12345'},escalate_to_human,{'issue_details': 'Customer ID 67890 and Order ID 12345 are not found in the system. Customer has confirmed the details.'},False,"* User: What is the status of order 12345? * Assistant: Could you please specify if your inquiry is related to Order Management, Returns/Refunds/Disputes, FAQs, or Account Management? * User: I'm inquiring about the status of my order, specifically Order ID 12345. * Assistant: * Tool: The order 12345 status: Order ID not found.. * Assistant: It seems that the order ID 12345 is not found in our system. Could you please double-check the order ID or provide any additional details that might help me assist you further? * User: The order ID is definitely 12345. Is there any other information you need to help locate it? * Assistant: Could you please provide your customer ID or any other details related to the order? This will help me assist you in locating the order status. * User: My customer ID is 67890. * Assistant: * Tool: Order history for 67890:No order history available. * Tool: The order 12345 status: Order ID not found.. * Assistant: It appears that there is no order history available for your customer ID 67890, and the order ID 12345 is still not found in our system. Could you please verify if there might be a different order ID or if the order was placed under a different customer account? Any additional information you can provide will help me assist you better. * User: I have confirmed that the order ID is 12345 and the customer ID is 67890. Is there anything else you need to help locate my order? * Assistant: It seems that we are unable to find the order with the provided details in our system. To assist you further, I can escalate this issue to a human representative for specialized attention. Would you like me to do that? * User: Yes, please escalate this issue to a human representative. Thank you. * Assistant: * Tool: Your issue has been escalated to a human agent: Customer ID 67890 and Order ID 12345 are not found in the system. Customer has confirmed the details.. An agent will contact you shortly. * Assistant: Your issue has been successfully escalated to a human agent. They will review the details regarding Customer ID 67890 and Order ID 12345 and will contact you shortly. Is there anything more I can assist you with today? * User: No, that will be all. Thank you for your assistance! * Assistant: * Tool: Close case for order: 12345"
1,Check the status of order 67890.,Can you tell me if my order has shipped?,get_order_status,{'order_id': '67890'},escalate_to_human,{'issue_details': 'Customer is unable to find order ID 67890 in the system and has no order history associated with the email example@email.com.'},False,"* User: Check the status of order 67890. * Assistant: Could you please confirm if your inquiry is related to Order Management? * User: Yes, I would like to check the status of my order, which is Order ID 67890. * Assistant: * Tool: The order 67890 status: Order ID not found.. * Assistant: It seems that the order ID 67890 is not found in our system. Could you please double-check the order ID or provide any additional details that might help me assist you further? * User: The order ID is definitely 67890. It was placed recently, and I would like to confirm its status. * Assistant: Could you please provide your customer ID so I can look into this issue further? * User: I don't have a customer ID readily available. However, I can provide my email address associated with the order if that helps. Would you like that? * Assistant: Yes, please provide the email address associated with the order, and I'll do my best to assist you further. * User: My email address associated with the order is example@email.com. * Assistant: * Tool: Order history for example@email.com:No order history available. * Assistant: It appears that there is no order history available for the email address example@email.com. This could indicate that the order was not successfully placed or that there is an issue with the account. Would you like me to escalate this issue to a human representative for further assistance? * User: Yes, please escalate this issue to a human representative for further assistance. Thank you. * Assistant: * Tool: Your issue has been escalated to a human agent: Customer is unable to find order ID 67890 in the system and has no order history associated with the email example@email.com.. An agent will contact you shortly. * Assistant: Your issue has been successfully escalated to a human agent, and they will contact you shortly for further assistance. Is there anything more I can assist you with today? * User: No, that will be all. Thank you for your help! * Assistant: * Tool: Close case for order: 67890"
2,Update shipping address for order 11122.,I need to change my shipping address.,modify_order,"{'order_id': '11122', 'modification_details': 'New Address: 123 Main St'}",modify_order,"{'order_id': '11122', 'modification_details': 'Update shipping address to 123 Main St.'}",False,"* User: Update shipping address for order 11122. * Assistant: Could you please confirm if the order is still in the processing stage? Additionally, please provide the new shipping address details so I can assist you further. * User: The order is still in the processing stage. The new shipping address is 123 Main St. * Assistant: * Tool: Order 11122 modification successful: Update shipping address to 123 Main St.. * Assistant: The shipping address for order 11122 has been successfully updated to 123 Main St. Is there anything more I can assist you with today? * User: No, that will be all. Thank you for your help! * Assistant: * Tool: Close case for order: 11122"
3,Modify order 33344 to update item quantity.,Can I change the quantity of an item in my order?,modify_order,"{'order_id': '33344', 'modification_details': 'Increase item quantity to 3'}",get_order_status,{'order_id': '33344.0'},False,"* User: Modify order 33344 to update item quantity. * Assistant: Could you please confirm if the order is still in the processing stage? Additionally, please provide the updated item quantity details for order ID 33344. * User: I would like to increase the item quantity to 3 for order ID 33344. * Assistant: I will check if order ID 33344 is still in the processing stage before proceeding with the modification. Please hold on for a moment. I'll need to call the `get_order_status` function to check the status of the order. Let's proceed. * Tool: The order 33344 status: Order ID not found.. * Assistant: It seems that order ID 33344 is not found in our system. Could you please double-check the order ID and provide the correct one? * User: The order ID is 33344.0. Please proceed with modifying the item quantity to 3. * Assistant: It appears that the order ID format you provided (33344.0) may not be valid. Could you please confirm if there are any additional digits or if there is a different format for the order ID? Once I have the correct order ID, I can proceed with the modification. * User: The order ID is definitely 33344.0. Please proceed with modifying the item quantity to 3 for this order. * Assistant: I will check if order ID 33344.0 is still in the processing stage before proceeding with the modification. Please hold on for a moment. Let's check the order status again. * Tool: The order 33344.0 status: Order ID not found.. * Assistant: It seems that order ID 33344.0 is also not found in our system. Unfortunately, I am unable to proceed with the modification without a valid order ID. Could you please verify the order ID once more or provide any additional details that might help me locate the order? * User: I understand that the order ID is 33344.0, and I would like to increase the item quantity to 3. Please proceed with the modification. * Assistant: Since the order ID 33344.0 is still not found in our system, I am unable to modify the order as requested. To assist you further, could you please provide any additional details about the order or confirm if there is another order ID you might have? Alternatively, if you have any other inquiries, feel free to let me know! * User: I understand that you would like to increase the item quantity to 3 for order ID 33344.0. However, I am unable to locate this order in our system. If you have any other details or a different order ID, please provide that information so I can assist you further. * Assistant: Could you please specify if your inquiry is related to Order Management, Returns/Refunds/Disputes, FAQs, or Account Management? This will help me assist you better. * User: I would like to increase the item quantity to 3 for order ID 33344.0. Please proceed with the modification. * Assistant: I understand that you would like to modify the item quantity to 3 for order ID 33344.0. However, I am still unable to locate this order in our system. To assist you further, could you please provide any additional details about the order or confirm if there is another order ID you might have? Alternatively, if you have any other inquiries, feel free to let me know! * User: I apologize for the inconvenience, but I am unable to locate order ID 33344.0 in our system. Unfortunately, without a valid order ID, I cannot proceed with the modification to increase the item quantity to 3. If you have any other details or a different order ID, please provide that information so I can assist you further. Thank you for your understanding. * Assistant: Is there anything more I can assist you with today? * User: I would like to increase the item quantity to 3 for order ID 33344.0. Please proceed with the modification."
4,Cancel order 55566.,I want to cancel my recent order.,cancel_order,{'order_id': '55566'},cancel_order,{'order_id': '55566'},True,"* User: Cancel order 55566. * Assistant: Could you please confirm if the order 55566 is still in the processing stage and has not yet been shipped? * User: Yes, the order 55566 is still in the processing stage and has not yet been shipped. * Assistant: * Tool: Order 55566 has been successfully cancelled. * Assistant: Your order 55566 has been successfully cancelled. Is there anything more I can assist you with today? * User: No, that will be all. Thank you! * Assistant: * Tool: Close case for order: 55566"
5,Request to cancel order 77788.,Can I cancel an order before it's shipped?,cancel_order,{'order_id': '77788'},cancel_order,{'order_id': '77788'},True,"* User: Request to cancel order 77788. * Assistant: Could you please confirm that the order is still in the processing stage and has not yet been shipped? * User: Yes, the order is still in the processing stage and has not yet been shipped. * Assistant: * Tool: Order 77788 cannot be cancelled.Current status: Shipped * Assistant: It appears that order 77788 has already been shipped and cannot be canceled at this time. Is there anything else I can assist you with today? * User: I understand. Thank you for your help. * Assistant: * Tool: Close case for order: 77788"
6,Track shipment for order 99900.,Where is my package?,track_shipment,{'order_id': '99900'},track_shipment,{'order_id': '99900'},True,"* User: Track shipment for order 99900. * Assistant: Could you please confirm if your inquiry is related to Order Management? If so, I will proceed to track the shipment for order ID 99900. * User: Yes, I would like to track the shipment for order ID 99900. * Assistant: * Tool: Order 99900 tracking status:In transit - Expected delivery in 2 days * Assistant: The tracking status for order ID 99900 is currently ""In transit"" and is expected to be delivered in 2 days. Is there anything more I can assist you with today? * User: No, that will be all. Thank you for your help! * Assistant: * Tool: Close case for order: 99900"
7,Get tracking details for order 11223.,Has my order been delivered?,track_shipment,{'order_id': '11223'},track_shipment,{'order_id': '11223'},True,"* User: Get tracking details for order 11223. * Assistant: Could you please confirm if your inquiry is related to Order Management? If so, I will proceed to get the tracking details for your order. * User: Yes, I am inquiring about Order Management for order 11223. Please provide the tracking details. * Assistant: * Tool: Order 11223 tracking status:Delivered - Signed by recipient * Assistant: The tracking status for order 11223 is: **Delivered** - Signed by recipient. Is there anything more I can assist you with today? * User: No, that will be all. Thank you for your help! * Assistant: * Tool: Close case for order: 11223"
8,Initiate return for order 33445 due to defect.,I need to return an item.,initiate_return,"{'order_id': '33445', 'reason': 'Defective item'}",initiate_return,"{'order_id': '33445', 'reason': 'The item is defective.'}",False,"* User: Initiate return for order 33445 due to defect. * Assistant: Could you please confirm the reason for the return? You mentioned it is due to a defect, but if you have any specific details about the defect, that would be helpful. Once I have that information, I can proceed with initiating the return for order 33445. * User: The reason for the return is that the item is defective. I would like to proceed with the return for order 33445. * Assistant: * Tool: Return for order 33445 has been initiated for the following reason: The item is defective.. * Assistant: The return for order 33445 has been successfully initiated due to the item being defective. Is there anything more I can assist you with today? * User: No, that will be all. Thank you for your help! * Assistant: * Tool: Close case for order: 33445"
9,Check refund eligibility for order 55667.,Am I eligible for a refund?,check_refund_eligibility,{'order_id': '55667'},check_refund_eligibility,{'order_id': '55667'},True,"* User: Check refund eligibility for order 55667. * Assistant: Could you please specify if your inquiry is related to Returns/Refunds/Disputes? * User: My inquiry is related to a refund for order 55667. * Assistant: * Tool: Order 55667 refund status:Eligible for refund * Assistant: The refund status for order 55667 is eligible for a refund. Is there anything more I can assist you with today? * User: No, that will be all. Thank you for your help! * Assistant: * Tool: Close case for order: 55667"


## Next Steps

In the next notebook ([**06: Improving the Meta-Prompt**](06_Improving_Meta_Prompt.ipynb)), we will improve the assistant’s re-creating it with evaluation result and repeating the loop.