# Prompt Workflow Test - Experiment V2

This notebook demonstrates the complete prompt workflow using HTTP requests.

**CRITICAL**: This test creates logs with VARIABLES in the input field (not messages!)
because prompt workflows need variables to render prompts.

## Test Coverage:
1. Create and deploy a prompt with Jinja2 template
2. Create logs with variables using the deployed prompt
3. Create dataset from those logs
4. Create experiment with prompt workflow
5. Verify workflow execution
6. Verify span tree structure (root + workflow + evaluator spans)


## Configuration

**Before running this notebook**, set your API key using one of these methods:

### Option 1: Create a `.env` file (Recommended)
Create a `.env` file in the project root with:
```
KEYWORDSAI_API_KEY=your-api-key-here
```

### Option 2: Export environment variable
```bash
export KEYWORDSAI_API_KEY="your-api-key-here"
```

The cell below will load the API key from the environment/`.env` file and validate it.


In [17]:
import os
from dotenv import load_dotenv

# Load environment variables from .env file
loaded = load_dotenv(override=True)

# API Configuration
BASE_URL = os.getenv("KEYWORDSAI_BASE_URL")
API_KEY = os.getenv("KEYWORDSAI_API_KEY")

if not API_KEY:
    raise ValueError(
        "‚ùå KEYWORDSAI_API_KEY environment variable not set!\n"
        "Make sure you have a .env file with: KEYWORDSAI_API_KEY=your-api-key-here\n"
        "Or set it with: export KEYWORDSAI_API_KEY='your-api-key-here'"
    )

print(f"Loaded .env from {os.getcwd()}/.env: {loaded}")
print(f"‚úÖ API Key loaded: {API_KEY[:8]}{'*' * 20}")


Loaded .env from /Users/huangyunrui/platform/keywordsai-example-projects/example_workflows/src/example_workflows/experiment_v2_workflows/.env: True
‚úÖ API Key loaded: s1SiVa3y********************


## Imports and Utility Functions


In [18]:
import json
import time
import random
import requests
from typing import Dict, Any, List, Optional
from datetime import datetime, timedelta

def print_step(step_number: int, title: str):
    """Print a formatted step header."""
    print("\n" + "=" * 70)
    print(f"STEP {step_number}: {title}")
    print("=" * 70)

def print_success(message: str):
    """Print a success message."""
    print(f"‚úÖ {message}")

def print_warning(message: str):
    """Print a warning message."""
    print(f"‚ö†Ô∏è  {message}")

def print_error(message: str):
    """Print an error message."""
    print(f"‚ùå {message}")

def print_info(message: str):
    """Print an info message."""
    print(f"‚ÑπÔ∏è  {message}")

def wait_for_processing(seconds: int = 15):
    """Wait for async processing to complete."""
    print(f"\n‚è≥ Waiting {seconds} seconds for processing...")
    time.sleep(seconds)
    print("‚úì Wait complete")


## Initialize Variables


In [19]:
# Track resources created throughout the workflow
prompt_id = None
log_ids = []
dataset_id = None
evaluator_slug = None
experiment_id = None
count = 0
success = False

# Headers for API requests
headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}


## Step 0: Create and Deploy Prompt

Create a prompt with a Jinja2 template and deploy it.

**Key Insight**: To deploy a prompt version:
1. Create version 1 (draft, readonly=False)
2. Create version 2 (this makes version 1 readonly=True)
3. Deploy version 1 (only readonly versions can be deployed)


In [20]:
print_step(0, "Create and Deploy Prompt")

# Step 0a: Create the prompt
print("\nüìù Creating prompt...")
prompt_data = {
    "name": "Prompt workflow test",
    "description": "Prompt for testing prompt workflow experiments"
}

response = requests.post(
    f"{BASE_URL}/prompts/",
    headers=headers,
    json=prompt_data
)
response.raise_for_status()

prompt_result = response.json()
prompt_id = prompt_result.get('id')
print_success(f"Prompt created with ID: {prompt_id}")



STEP 0: Create and Deploy Prompt

üìù Creating prompt...
‚úÖ Prompt created with ID: 1a716b85c23d453d976054509b5426a1


In [21]:
# Step 0b: Create version 1 with Jinja2 template
print("\nüìù Creating prompt version 1...")

system_content = """{# This is a comment showing personal information #}

Height: {{ height }}v

Weight: {{ weight | float }}  {# Filter example #}

Hobby: {{ hobby | title }}    {# Makes first letter uppercase #}

Name: {{ name | default('Anonymous') }}  {# Default value if name is undefined #}

First Name: {{ full_name.first_name }}

Last Name: {{ full_name.last_name }}

Full Name: {{ full_name.first_name ~ ' ' ~ full_name.last_name }}  {# String concatenation #}

Age: {{ age | int }}         {# Type conversion #}



{# Math operations #}

Years until retirement: {{ 65 - age }}

Monthly salary in USD: {{ salary | round(2) }}



The company position is shown below:

{% if name == 'Raymond' %}

    He is the Co-founder & CTO

{% elif name == 'Andy' %}

    He is the Co-founder & CEO

{% elif name == 'Hendrix' %}

    He is the Co-founder & CPO

{% else %}

    He is not a member of Keywords AI

{% endif %}



{# Using set to create variables #}

{% set years_of_experience = 5 %}

Experience: {{ years_of_experience }} years



He has some items in his backpack, they are listed below:



{# Loop with counter and conditions #}

{% for item in backpack_items %}

    {{ loop.index }}. {{ item }}

    {% if loop.first %}(This is the first item){% endif %}

    {% if loop.last %}(This is the last item){% endif %}

    {% if not loop.last %}, {% endif %}

{% else %}

    No items in backpack!

{% endfor %}



{# Dictionary iteration #}

Skills:

{% for skill, level in skills.items() %}

    - {{ skill }}: {{ level }}/10

{% endfor %}



{# Using macros (reusable template snippets) #}

{% macro render_achievement(title, year) %}

    {{ year }}: {{ title }}

{% endmacro %}



Achievements:

{{ render_achievement('Graduated', 2020) }}

{{ render_achievement('Started Company', 2022) }}



{# Working with dates #}

Current Date: {{ now }}

Formatted Date: {{ now }}



{# String operations #}

Email Domain: {{ email.split('@')[1] }}

Username: {{ email | replace('@keywords.ai', '') }}



{# Conditional assignment #}

Experience Level: {{ 'Senior' if years_of_experience >= 5 else 'Junior' }}



{# Length filter #}

Number of Skills: {{ skills | length }}



{# Join filter for arrays #}

All Skills: {{ skills.keys() | join(', ') }}



{# Using namespace for organization #}

{% set ns = namespace(total_score=0) %}

{% for score in scores %}

    {% set ns.total_score = ns.total_score + score %}

{% endfor %}

Average Score: {{ (ns.total_score / scores|length) | round(2) }}"""

version_data = {
    "description": "Test version with Jinja2 template for prompt workflow",
    "messages": [
        {
            "role": "system",
            "content": system_content
        },
        {
            "role": "user",
            "content": "How many things are there in the backpack?"
        }
    ],
    "model": "gpt-4o-mini",
    "stream": False,
    "temperature": 0.7,
    "max_tokens": 256,
    "variables": {
        "name": "string",
        "full_name": "object",
        "age": "number",
        "height": "string",
        "weight": "number",
        "hobby": "string",
        "salary": "number",
        "email": "string",
        "now": "string",
        "backpack_items": "array",
        "skills": "object",
        "scores": "array",
        "customer_inquiry": "string"
    },
    "readonly": True
}

response = requests.post(
    f"{BASE_URL}/prompts/{prompt_id}/versions/",
    headers=headers,
    json=version_data
)
response.raise_for_status()

version_result = response.json()
version_number = version_result.get('version')
print_success(f"Prompt version {version_number} created (draft)")



üìù Creating prompt version 1...
‚úÖ Prompt version 1 created (draft)


In [22]:
# Step 0c: Create version 2 to commit version 1
print("\nüìù Creating version 2 to commit version 1...")
dummy_version_data = {
    "description": "Dummy version to commit version 1",
    "messages": version_data["messages"],
    "model": version_data["model"],
    "stream": False,
    "temperature": 0.7,
    "max_tokens": 256
}

response = requests.post(
    f"{BASE_URL}/prompts/{prompt_id}/versions/",
    headers=headers,
    json=dummy_version_data
)
response.raise_for_status()
print_success("Version 2 created (version 1 is now readonly/committed)")

# Step 0d: Deploy version 1
print("\nüìù Deploying prompt version 1...")
response = requests.patch(
    f"{BASE_URL}/prompts/{prompt_id}/versions/1/",
    headers=headers,
    json={"deploy": True}
)
response.raise_for_status()

deploy_result = response.json()
print_success(f"Version 1 deployed successfully")
print_info(f"Using prompt ID: {prompt_id}")



üìù Creating version 2 to commit version 1...
‚úÖ Version 2 created (version 1 is now readonly/committed)

üìù Deploying prompt version 1...
‚úÖ Version 1 deployed successfully
‚ÑπÔ∏è  Using prompt ID: 1a716b85c23d453d976054509b5426a1


## Step 1: Create Logs with Variables

Create logs using the chat completions API with the deployed prompt.
The logs contain **variables** that will be used to render the prompt template.


In [23]:
print_step(1, "Create Logs with Variables via Chat Completions API")

test_variables = [
    {
        "height": "6 foot 2", "weight": 180, "hobby": "Aviation", "name": "Raymond",
        "full_name": {"first_name": "Raymond", "last_name": "Huang"},
        "age": 28, "salary": 120000.50, "email": "raymond@keywords.ai",
        "backpack_items": ["Laptop", "Notebook", "Pen"],
        "skills": {"Python": 9, "Django": 8, "React": 7},
        "now": "2025-11-30", "scores": [85, 90, 88],
        "customer_inquiry": "My order is damaged"
    },
    {
        "height": "5 foot 10", "weight": 165, "hobby": "Reading", "name": "Alice",
        "full_name": {"first_name": "Alice", "last_name": "Johnson"},
        "age": 32, "salary": 95000.00, "email": "alice@keywords.ai",
        "backpack_items": ["Book", "Water bottle"],
        "skills": {"JavaScript": 8, "CSS": 9},
        "now": "2025-11-30", "scores": [92, 88, 95],
        "customer_inquiry": "Wrong item received"
    },
    {
        "height": "5 foot 8", "weight": 155, "hobby": "Gaming", "name": "Bob",
        "full_name": {"first_name": "Bob", "last_name": "Smith"},
        "age": 25, "salary": 75000.00, "email": "bob@keywords.ai",
        "backpack_items": ["Controller", "Headset", "Charger"],
        "skills": {"Java": 7, "SQL": 8, "AWS": 6},
        "now": "2025-11-30", "scores": [78, 85, 80],
        "customer_inquiry": "Delayed delivery"
    }
]

test_run_id = int(time.time())
custom_identifier = f"prompt_workflow_test"  # Single identifier for all logs
completion_ids = []

print(f"\nUsing custom identifier: {custom_identifier}")

for i, variables in enumerate(test_variables, 1):
    payload = {
        "model": "gpt-4o-mini",
        "messages": [{"role": "user", "content": "placeholder"}],
        "prompt": {"prompt_id": prompt_id, "variables": variables, "override": True},
        "custom_identifier": custom_identifier  # Same identifier for all logs
    }
    
    print(f"\nCreating log {i}/3...")
    response = requests.post(f"{BASE_URL}/chat/completions", headers=headers, json=payload)
    
    if response.status_code in [200, 201]:
        result = response.json()
        completion_id = result.get("id")
        print_success(f"Completion created with ID: {completion_id}")
        completion_ids.append(completion_id)
    else:
        print_error(f"Failed: {response.status_code}")
        print(f"Response: {response.text[:300]}")

print_success(f"\nCreated {len(completion_ids)} completions total")
print_info(f"All logs tagged with custom_identifier: {custom_identifier}")
print_info("Note: Completion IDs are different from log unique_ids")
print_info("We'll fetch actual log IDs in the next step...")
wait_for_processing(30)



STEP 1: Create Logs with Variables via Chat Completions API

Using custom identifier: prompt_workflow_test

Creating log 1/3...
‚úÖ Completion created with ID: chatcmpl-CnvcT9neGOmnirsgiOcdX0dnVZlys

Creating log 2/3...
‚úÖ Completion created with ID: chatcmpl-CnvcUr6VUfkohuXV3byWc4LCjLo5D

Creating log 3/3...
‚úÖ Completion created with ID: chatcmpl-CnvcVwRQV38YyN6Q6FLXcNP5feBLD
‚úÖ 
Created 3 completions total
‚ÑπÔ∏è  All logs tagged with custom_identifier: prompt_workflow_test
‚ÑπÔ∏è  Note: Completion IDs are different from log unique_ids
‚ÑπÔ∏è  We'll fetch actual log IDs in the next step...

‚è≥ Waiting 30 seconds for processing...
‚úì Wait complete


## Step 1b: Fetch Actual Log IDs

The completion IDs from `/chat/completions` are different from the log `unique_id`.
We need to fetch the actual log IDs using the custom_identifier filter.

**Note:** We query logs from the last 30 minutes to ensure we capture all recently created logs.


In [24]:
print("\nüìã Fetching actual log IDs using custom_identifier filter...")

# Set time range: last 30 minutes
end_time = datetime.utcnow()
start_time = end_time - timedelta(minutes=30)

params = {
    "page": 1,
    "page_size": 100,
    "start_time": start_time.isoformat() + "Z",
    "end_time": end_time.isoformat() + "Z",
    "all_envs": "true"
}

filter_data = {
    "filters": {
        "custom_identifier": {
            "value": [custom_identifier],
            "operator": "",
            "connector": "AND"
        }
    },
    "exporting": False
}

response = requests.post(
    f"{BASE_URL}/request-logs/list/",
    headers=headers,
    params=params,
    json=filter_data
)
response.raise_for_status()

logs_response = response.json()
fetched_logs = logs_response.get('results', [])

log_ids = []
for log in fetched_logs:
    log_id = log.get('unique_id') or log.get('id')
    if log_id:
        log_ids.append(log_id)

print_success(f"Fetched {len(log_ids)} actual log IDs:")
for i, log_id in enumerate(log_ids, 1):
    custom_id = fetched_logs[i-1].get('custom_identifier', 'N/A')
    print(f"  {i}. {log_id} (custom_id: {custom_id})")

expected_count = len(test_variables)
if len(log_ids) != expected_count:
    print_warning(f"Expected {expected_count} logs but found {len(log_ids)}")
    print_info("Some logs may still be processing...")



üìã Fetching actual log IDs using custom_identifier filter...


  end_time = datetime.utcnow()


‚úÖ Fetched 6 actual log IDs:
  1. 5648f92f14574bd6b26c839d9b8066d1 (custom_id: prompt_workflow_test)
  2. 38f6b84fc2c64c3d97b37bc3cb229c1e (custom_id: prompt_workflow_test)
  3. 8af32d8012d44180ad1d97a3c15bb032 (custom_id: prompt_workflow_test)
  4. 99cdb8b1f2d64fa882f4c73b61b2297d (custom_id: prompt_workflow_test)
  5. 3e60f218e88b489eb406533a86bccc19 (custom_id: prompt_workflow_test)
  6. 3a41c7cbad564e389f7a08aed1372b80 (custom_id: prompt_workflow_test)
‚ö†Ô∏è  Expected 3 logs but found 6
‚ÑπÔ∏è  Some logs may still be processing...


In [25]:
print_step(2, "Create Dataset from Logs")

end_time = datetime.utcnow()
start_time = end_time - timedelta(minutes=10)

dataset_data = {
    "name": f"Prompt workflow test",
    "description": "Dataset for prompt workflow notebook test",
    "type": "sampling",
    "start_time": start_time.isoformat() + "Z",
    "end_time": end_time.isoformat() + "Z",
    "initial_log_filters": {
        "custom_identifier": {
            "value": [custom_identifier],
            "operator": "",
            "connector": "AND"
        }
    }
}

print("\nCreating dataset...")
response = requests.post(f"{BASE_URL}/datasets", headers=headers, json=dataset_data)
response.raise_for_status()

dataset_result = response.json()
dataset_id = dataset_result.get('id')
print_success(f"Dataset created with ID: {dataset_id}")
print_info("Waiting for dataset to populate...")
wait_for_processing(20)



STEP 2: Create Dataset from Logs

Creating dataset...


  end_time = datetime.utcnow()


‚úÖ Dataset created with ID: ce03dc10-f6d5-4ace-a18a-a5461fb9d4c0
‚ÑπÔ∏è  Waiting for dataset to populate...

‚è≥ Waiting 20 seconds for processing...
‚úì Wait complete


## Step 3: Create Evaluator


In [26]:
print_step(3, "Create Evaluator")

evaluator_slug = f"prompt_workflow_test_eval_{int(time.time())}"

evaluator_data = {
    "name": f"Prompt workflow test",
    "evaluator_slug": evaluator_slug,
    "type": "llm",
    "score_value_type": "numerical",
    "description": "Evaluates response quality on a 1-5 scale",
    "configurations": {
        "evaluator_definition": "Rate the response quality based on accuracy, relevance, and completeness.\n<llm_input>{{input}}</llm_input>\n<llm_output>{{output}}</llm_output>",
        "scoring_rubric": "1=Poor, 2=Fair, 3=Good, 4=Very Good, 5=Excellent",
        "llm_engine": "gpt-4o-mini",
        "model_options": {
            "temperature": 0.1,
            "max_tokens": 200
        },
        "min_score": 1.0,
        "max_score": 5.0,
        "passing_score": 3.0
    }
}

print("\nCreating evaluator...")
print(f"  Name: {evaluator_data['name']}")
print(f"  Slug: {evaluator_slug}")
print(f"  Type: {evaluator_data['type']}")
print(f"  Score Type: {evaluator_data['score_value_type']}")

response = requests.post(f"{BASE_URL}/evaluators", headers=headers, json=evaluator_data)
response.raise_for_status()

evaluator_result = response.json()
evaluator_slug = evaluator_result.get('evaluator_slug')
print_success(f"Evaluator created: {evaluator_slug}")
print(f"  Evaluator ID: {evaluator_result.get('id')}")



STEP 3: Create Evaluator

Creating evaluator...
  Name: Prompt workflow test
  Slug: prompt_workflow_test_eval_1766015651
  Type: llm
  Score Type: numerical
‚úÖ Evaluator created: prompt_workflow_test_eval_1766015651
  Evaluator ID: 355967a2-2cfc-4abf-ad44-f1abaaf4ec49


## Step 4: Create Experiment with Prompt Workflow


In [27]:
print_step(4, "Create Experiment with Prompt Workflow")

experiment_data = {
    "name": f"Prompt workflow test",
    "description": "Testing prompt workflow from notebook",
    "dataset_id": dataset_id,
    "workflow": [{
        "type": "prompt",
        "config": {"prompt_id": prompt_id}
    }],
    "evaluator_slugs": [evaluator_slug]
}

print(f"\nCreating experiment with:")
print(f"  Prompt ID: {prompt_id}")
print(f"  Dataset ID: {dataset_id}")
print(f"  Evaluator: {evaluator_slug}")

response = requests.post(f"{BASE_URL}/v2/experiments/", headers=headers, json=experiment_data)
response.raise_for_status()

experiment_result = response.json()
experiment_id = experiment_result.get('id')
print_success(f"Experiment created with ID: {experiment_id}")
print_info(f"Status: {experiment_result.get('status')}")
print_info("Waiting for async workflow execution...")
wait_for_processing(30)



STEP 4: Create Experiment with Prompt Workflow

Creating experiment with:
  Prompt ID: 1a716b85c23d453d976054509b5426a1
  Dataset ID: ce03dc10-f6d5-4ace-a18a-a5461fb9d4c0
  Evaluator: prompt_workflow_test_eval_1766015651
‚úÖ Experiment created with ID: cceb26eb951a4cdda9fb46d4eae592e6
‚ÑπÔ∏è  Status: pending
‚ÑπÔ∏è  Waiting for async workflow execution...

‚è≥ Waiting 30 seconds for processing...
‚úì Wait complete


## Step 5: Verify Workflow Execution

Check that the experiment created logs successfully.


In [28]:
print_step(5, "Verify Workflow Execution")

print("\nListing experiment logs...")
response = requests.get(
    f"{BASE_URL}/v2/experiments/{experiment_id}/logs/list/",
    headers=headers
)
response.raise_for_status()

logs_result = response.json()
logs = logs_result.get('results', [])
count = len(logs)

print(f"\nüìä Found {count} log(s)")

if count == 0:
    print_error("No logs created - workflow may have failed!")
    print_info("Check:")
    print("  1. Celery worker is running")
    print("  2. Prompt is properly deployed")
    print("  3. Dataset contains logs")
else:
    # Check log status
    statuses = {}
    for log in logs:
        status = log.get('status', 'unknown')
        statuses[status] = statuses.get(status, 0) + 1
    
    print(f"üìà Status breakdown: {statuses}")
    print_success(f"Workflow executed successfully!")



STEP 5: Verify Workflow Execution

Listing experiment logs...

üìä Found 3 log(s)
üìà Status breakdown: {'success': 3}
‚úÖ Workflow executed successfully!


## Step 6: Verify Span Tree Structure

Verify that the span tree has the expected structure for a prompt workflow.


In [29]:
print_step(6, "Verify Span Tree Structure")

if not logs:
    print_error("No logs to verify")
else:
    # Get detailed span tree for first log
    log_id = logs[0].get('id')
    print(f"\nGetting span tree for log: {log_id}")
    
    response = requests.get(
        f"{BASE_URL}/v2/experiments/{experiment_id}/logs/{log_id}/",
        headers=headers,
        params={"detail": 1}
    )
    response.raise_for_status()
    
    log_detail = response.json()
    span_tree = log_detail.get('span_tree', [])
    
    print(f"\nüå≤ Span tree contains {len(span_tree)} top-level span(s):")
    
    # Collect all spans (including nested)
    all_spans = []
    
    def collect_spans(spans):
        for span in spans:
            all_spans.append(span)
            if 'children' in span:
                collect_spans(span['children'])
    
    collect_spans(span_tree)
    
    # Print span tree with indentation to show nesting
    def print_span_tree(spans, indent=0):
        for span in spans:
            span_name = span.get('span_name', '')
            span_type = span.get('span_type', 'N/A')
            log_type = span.get('log_type', 'N/A')
            prefix = "  " * indent + "- "
            print(f"{prefix}{span_name} (span_type: {span_type}, log_type: {log_type})")
            
            # Print children if they exist
            children = span.get('children', [])
            if children:
                print_span_tree(children, indent + 1)
    
    print("\nüå≤ Nested Span Tree Structure:")
    print_span_tree(span_tree)
    
    # Verify expected spans
    has_root = False
    has_workflow = False
    has_prompt_workflow = False
    has_evaluator = False
    has_llm_call = False
    has_load_prompt = False
    has_completion_span = False
    evaluator_spans = []
    llm_spans = []
    
    for span in all_spans:
        span_name = span.get('span_name', '')
        span_type = span.get('span_type', 'N/A')
        log_type = span.get('log_type', 'N/A')
        
        if span_name == 'experiment_trace':
            has_root = True
        elif 'workflow_execution' in span_name:
            has_workflow = True
        elif 'prompt' in span_name.lower():
            has_prompt_workflow = True
        
        # Check for prompt workflow child spans
        if 'load_prompt' in span_name:
            has_load_prompt = True
        if 'completion' in span_name and 'workflow' in span_name:
            has_completion_span = True
        
        # Check for LLM calls (actual generation)
        if log_type == 'llm':
            has_llm_call = True
            llm_spans.append(span)
        
        # Check for evaluator spans (by span_type or name)
        if span_type == 'SCORE' or 'evaluator' in span_name.lower():
            has_evaluator = True
            evaluator_spans.append(span)
    
    print(f"\nüìã Verification:")
    print(f"  Root span (experiment_trace): {'‚úÖ' if has_root else '‚ùå'}")
    print(f"  Workflow execution span: {'‚úÖ' if has_workflow else '‚ùå'}")
    print(f"  Prompt workflow span: {'‚úÖ' if has_prompt_workflow else '‚ùå'}")
    print(f"  ‚îú‚îÄ load_prompt child span: {'‚úÖ' if has_load_prompt else '‚ùå'}")
    print(f"  ‚îú‚îÄ completion child span: {'‚úÖ' if has_completion_span else '‚ùå'}")
    print(f"  ‚îî‚îÄ LLM call span (log_type=llm): {'‚úÖ' if has_llm_call else '‚ùå'} (REQUIRED)")
    print(f"  Evaluator span(s): {'‚úÖ' if has_evaluator else '‚ùå'} (REQUIRED)")
    
    if llm_spans:
        print(f"\nüîç LLM Call Details ({len(llm_spans)} span(s)):")
        for i, llm_span in enumerate(llm_spans, 1):
            print(f"  [{i}] {llm_span.get('span_name', 'N/A')}")
            if 'model' in llm_span:
                print(f"      Model: {llm_span.get('model')}")
            if 'prompt_id' in llm_span:
                print(f"      Prompt ID: {llm_span.get('prompt_id')}")
            if 'prompt_tokens' in llm_span:
                print(f"      Prompt tokens: {llm_span.get('prompt_tokens')}")
            if 'completion_tokens' in llm_span:
                print(f"      Completion tokens: {llm_span.get('completion_tokens')}")
            if 'status' in llm_span:
                print(f"      Status: {llm_span.get('status')}")
    
    if evaluator_spans:
        print(f"\nüîç Evaluator Details ({len(evaluator_spans)} span(s)):")
        for i, eval_span in enumerate(evaluator_spans, 1):
            print(f"  [{i}] {eval_span.get('span_name', 'N/A')}")
            if 'evaluator_slug' in eval_span:
                print(f"      Evaluator slug: {eval_span.get('evaluator_slug')}")
            if 'score' in eval_span:
                print(f"      Score: {eval_span.get('score')}")
            if 'status' in eval_span:
                print(f"      Status: {eval_span.get('status')}")
    
    success = has_root and has_workflow and has_prompt_workflow and has_llm_call and has_evaluator
    
    if success:
        print_success("\n‚úÖ Span tree structure is correct! All expected spans found.")
    else:
        print_error("\n‚ùå Span tree missing expected spans")
        if not has_llm_call:
            print_warning("  Missing LLM call - workflow may not have executed properly")
            if not has_load_prompt and not has_completion_span:
                print_warning("  ‚Üí Prompt workflow child spans are also missing!")
                print_info("  ‚Üí This suggests the prompt workflow didn't create child spans")
        if not has_evaluator:
            print_warning("  Missing evaluator span - evaluators may still be running")



STEP 6: Verify Span Tree Structure

Getting span tree for log: 49ae2c09bf48c2751b131ff5274b879d

üå≤ Span tree contains 1 top-level span(s):

üå≤ Nested Span Tree Structure:
- experiment_trace (span_type: N/A, log_type: workflow)
  - workflow_execution (span_type: N/A, log_type: chat)
    - Experiment Workflow.prompt (span_type: N/A, log_type: workflow)
      - workflow.prompt.load_prompt (span_type: N/A, log_type: workflow)
      - workflow.prompt.completion (span_type: N/A, log_type: chat)
  - evaluator.prompt_workflow_test_eval_1766015651 (span_type: N/A, log_type: score)

üìã Verification:
  Root span (experiment_trace): ‚úÖ
  Workflow execution span: ‚úÖ
  Prompt workflow span: ‚úÖ
  ‚îú‚îÄ load_prompt child span: ‚úÖ
  ‚îú‚îÄ completion child span: ‚úÖ
  ‚îî‚îÄ LLM call span (log_type=llm): ‚ùå (REQUIRED)
  Evaluator span(s): ‚úÖ (REQUIRED)

üîç Evaluator Details (1 span(s)):
  [1] evaluator.prompt_workflow_test_eval_1766015651
      Status: success
‚ùå 
‚ùå Span tree missin

In [30]:
print("\n" + "=" * 70)
print("TEST SUMMARY")
print("=" * 70)

print(f"\nüìä Test Results:")
print(f"  Prompt ID: {prompt_id}")
print(f"  Logs Created: {len(log_ids)}")
print(f"  Dataset ID: {dataset_id}")
print(f"  Evaluator: {evaluator_slug}")
print(f"  Experiment ID: {experiment_id}")
print(f"  Experiment Logs: {count if 'count' in locals() else 0}")

if count > 0 and success:
    print("\nüéâ All tests passed!")
    print("\n‚úÖ Prompt workflow executed successfully")
    print("‚úÖ Span tree structure is correct")
else:
    print("\n‚ö†Ô∏è  Some tests failed or incomplete")
    if count == 0:
        print("   - Logs not created")
    if not success:
        print("   - Span tree structure incorrect")

print("\n" + "=" * 70)



TEST SUMMARY

üìä Test Results:
  Prompt ID: 1a716b85c23d453d976054509b5426a1
  Logs Created: 6
  Dataset ID: ce03dc10-f6d5-4ace-a18a-a5461fb9d4c0
  Evaluator: prompt_workflow_test_eval_1766015651
  Experiment ID: cceb26eb951a4cdda9fb46d4eae592e6
  Experiment Logs: 3

‚ö†Ô∏è  Some tests failed or incomplete
   - Span tree structure incorrect

