# TriageFlow: Incident Triage Demo

Multi-agent incident triage with Domino GenAI tracing.

## Setup

Save your API key as a Domino user environment variable:
1. **Account Settings** → **User Environment Variables**
2. Add `OPENAI_API_KEY` or `ANTHROPIC_API_KEY`

In [None]:
import os
import yaml
import pandas as pd
import ipywidgets as widgets
from IPython.display import display
from datetime import datetime

## Select Provider & Vertical

Choose your LLM provider and industry vertical for sample incidents.

In [2]:
provider_dropdown = widgets.Dropdown(
    options=["openai", "anthropic"],
    value="openai",
    description="Provider:"
)

vertical_dropdown = widgets.Dropdown(
    options=[
        ("Financial Services", "financial_services"),
        ("Healthcare", "healthcare"),
        ("Energy", "energy"),
        ("Public Sector", "public_sector")
    ],
    value="financial_services",
    description="Vertical:"
)

display(provider_dropdown, vertical_dropdown)

Dropdown(description='Provider:', options=('openai', 'anthropic'), value='openai')

Dropdown(description='Vertical:', options=(('Financial Services', 'financial_services'), ('Healthcare', 'healt…

## Load Configuration

All prompts, model settings, and agent parameters are centralized in `config.yaml`.

In [3]:
with open("config.yaml") as f:
    config = yaml.safe_load(f)

provider = provider_dropdown.value
model = config["models"][provider]
print(f"Provider: {provider}\nModel: {model}")

Provider: anthropic
Model: claude-sonnet-4-20250514


## Initialize Client & Auto-Tracing

MLflow's `autolog()` automatically captures all LLM calls without additional instrumentation.

In [4]:
import mlflow

if provider == "openai":
    from openai import OpenAI
    client = OpenAI()
    mlflow.openai.autolog()
else:
    from anthropic import Anthropic
    client = Anthropic()
    mlflow.anthropic.autolog()

print(f"Auto-tracing enabled for {provider}")

Auto-tracing enabled for anthropic


## Import Domino Tracing

- `add_tracing`: Decorator for capturing inputs, outputs, and evaluation metrics
- `DominoRun`: Context manager for aggregating metrics across multiple traces

In [5]:
from domino.agents.tracing import add_tracing
from domino.agents.logging import DominoRun

## Load Models, Agents, and Judges

In [6]:
from src.models import Incident, IncidentSource
from src.agents import classify_incident, assess_impact, match_resources, draft_response
from src.judges import judge_classification, judge_response, judge_triage

In [None]:
def pipeline_evaluator(span) -> dict:
    """Extract metrics and run LLM judges on pipeline outputs."""
    outputs = span.outputs or {}
    inputs = span.inputs or {}

    if not hasattr(outputs, "get"):
        return {}

    # Get incident from inputs
    incident_data = inputs.get("incident", {})
    if hasattr(incident_data, "model_dump"):
        incident_data = incident_data.model_dump()
    incident_desc = incident_data.get("description", "") if isinstance(incident_data, dict) else ""

    classification = outputs.get("classification")
    impact = outputs.get("impact")
    resources = outputs.get("resources")
    response = outputs.get("response")

    # Convert Pydantic models to dicts
    if hasattr(classification, "model_dump"):
        classification = classification.model_dump()
    if hasattr(impact, "model_dump"):
        impact = impact.model_dump()
    if hasattr(resources, "model_dump"):
        resources = resources.model_dump()
    if hasattr(response, "model_dump"):
        response = response.model_dump()

    classification = classification or {}
    impact = impact or {}
    resources = resources or {}
    response = response or {}

    primary = resources.get("primary_responder", {})
    if hasattr(primary, "model_dump"):
        primary = primary.model_dump()
    primary = primary or {}

    # Base metrics
    metrics = {
        "classification_confidence": classification.get("confidence", 0.5),
        "impact_score": impact.get("impact_score", 5.0),
        "resource_match_score": primary.get("match_score", 0.5) if isinstance(primary, dict) else 0.5,
        "completeness_score": response.get("completeness_score", 0.5),
    }

    # Judge evaluations
    metrics["classification_judge_score"] = judge_classification(
        client, provider, incident_desc, classification
    ).get("score", 3)

    comms = response.get("communications", [])
    if comms:
        comm = comms[0] if isinstance(comms[0], dict) else comms[0].model_dump() if hasattr(comms[0], "model_dump") else {}
        metrics["response_judge_score"] = judge_response(
            client, provider, incident_desc, classification.get("urgency", 3), comm
        ).get("score", 3)
    else:
        metrics["response_judge_score"] = 1

    metrics["triage_judge_score"] = judge_triage(
        client, provider, incident_desc, classification, impact, resources, response
    ).get("score", 3)

    return metrics

## Define Traced Pipeline

The `@add_tracing` decorator creates a single trace tree per incident. Each agent runs as a nested span with:
- Function inputs and outputs
- LLM calls captured via autolog (showing span types like `ChatCompletion`)
- Evaluation metrics attached to the trace

In [7]:
@add_tracing(name="triage_incident", autolog_frameworks=[provider], evaluator=pipeline_evaluator)
def triage_incident(incident: Incident):
    """Run the 4-agent triage pipeline. Autolog captures all LLM calls with span types."""
    classification = classify_incident(client, provider, model, incident, config)
    impact = assess_impact(client, provider, model, incident, classification, config)
    resources = match_resources(client, provider, model, classification, impact, config)
    response = draft_response(client, provider, model, incident, classification, impact, resources, config)

    return {
        "classification": classification,
        "impact": impact,
        "resources": resources,
        "response": response
    }

## Load Sample Incidents

Example incidents will be loaded from the vertical selected above.

In [8]:
vertical = vertical_dropdown.value
df = pd.read_csv(f"example-data/{vertical}.csv")
print(f"Loaded {len(df)} incidents from {vertical}")
df

Loaded 10 incidents from healthcare


Unnamed: 0,ticket_id,description,source,reporter,affected_system,initial_severity
0,HLT-2024-001,Electronic Health Records system unresponsive ...,user_report,ED Charge Nurse,Epic EHR,5
1,HLT-2024-002,Lab results interface failing to transmit crit...,monitoring,,Lab Information System,5
2,HLT-2024-003,Patient portal showing incorrect appointment t...,user_report,Patient Services,Patient Portal,3
3,HLT-2024-004,HIPAA audit log showing gaps in access trackin...,automated_scan,,Radiology PACS,4
4,HLT-2024-005,Medication dispensing cabinets in ICU reportin...,user_report,ICU Pharmacy,Pyxis MedStation,4
5,HLT-2024-006,Telehealth platform video quality degraded. Pr...,user_report,Telehealth Support,Teladoc Platform,3
6,HLT-2024-007,Billing system rejecting Medicare claims with ...,monitoring,,Claims Processing System,3
7,HLT-2024-008,Surgical scheduling system double-booked OR-3 ...,user_report,Surgery Coordinator,Surgical Scheduling,4
8,HLT-2024-009,Blood bank inventory system not syncing with r...,monitoring,,Blood Bank Management,4
9,HLT-2024-010,Ransomware indicators detected on imaging work...,automated_scan,,Radiology Workstation,5


In [9]:
def row_to_incident(row) -> Incident:
    return Incident(
        ticket_id=row["ticket_id"],
        description=row["description"],
        source=IncidentSource(row["source"]),
        reporter=row["reporter"] if pd.notna(row["reporter"]) else None,
        affected_system=row["affected_system"] if pd.notna(row["affected_system"]) else None,
        initial_severity=int(row["initial_severity"]) if pd.notna(row["initial_severity"]) else None
    )

incidents = [row_to_incident(row) for _, row in df.iterrows()]
print(f"Loaded {len(incidents)} incidents")

Loaded 10 incidents


## Run Triage Pipeline

`DominoRun` aggregates metrics across all traces in the batch via `custom_summary_metrics`. Supported aggregations: `mean`, `median`, `stdev`, `min`, `max`.

In [10]:
# Experiment and run naming
username = os.environ.get("DOMINO_USER_NAME", os.environ.get("USER", "demo_user"))
timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")

experiment_name = f"tracing-{username}"
run_name = f"{vertical}-{username}-{timestamp}"

aggregated_metrics = [
    # Base metrics
    ("classification_confidence", "mean"),
    ("impact_score", "median"),
    ("resource_match_score", "mean"),
    ("completeness_score", "mean"),
    # Judge scores
    ("classification_judge_score", "mean"),
    ("response_judge_score", "mean"),
    ("triage_judge_score", "mean"),
]

print(f"Experiment: {experiment_name}")
print(f"Run: {run_name}")

Experiment: tracing-andrea_lowe
Run: healthcare-andrea_lowe-20251201-205009


In [None]:
# Set MLflow experiment
mlflow.set_experiment(experiment_name)

results = []

with DominoRun(agent_config_path="config.yaml", custom_summary_metrics=aggregated_metrics) as run:
    # Set run name via MLflow
    mlflow.set_tag("mlflow.runName", run_name)
    
    for incident in incidents:
        print(f"Processing {incident.ticket_id}...")
        
        result = triage_incident(incident)
        
        results.append({
            "ticket_id": incident.ticket_id,
            **result
        })
        print(f"  → {result['classification'].category.value} | Urgency: {result['classification'].urgency} | Impact: {result['impact'].impact_score}")

print(f"\nProcessed {len(results)} incidents")

Processing HLT-2024-001...
  → infrastructure | Urgency: 5 | Impact: 8.0
Processing HLT-2024-002...
  → infrastructure | Urgency: 5 | Impact: 8.5
Processing HLT-2024-003...
  → operational | Urgency: 3 | Impact: 5.0
Processing HLT-2024-004...
  → compliance | Urgency: 4 | Impact: 7.0
Processing HLT-2024-005...
  → infrastructure | Urgency: 4 | Impact: 7.0
Processing HLT-2024-006...
  → performance | Urgency: 4 | Impact: 7.0
Processing HLT-2024-007...
  → operational | Urgency: 3 | Impact: 6.0
Processing HLT-2024-008...
  → data_integrity | Urgency: 4 | Impact: 6.0
Processing HLT-2024-009...


## Results Summary

In [None]:
summary = pd.DataFrame([{
    "Ticket": r["ticket_id"],
    "Category": r["classification"].category.value,
    "Urgency": r["classification"].urgency,
    "Impact": r["impact"].impact_score,
    "Responder": r["resources"].primary_responder.name,
    "SLA Met": r["resources"].sla_met
} for r in results])
summary

## Sample Communication

Each incident generates tailored communications for technical teams, management, and affected users.

In [None]:
sample = results[0]
print(f"Ticket: {sample['ticket_id']}\n")
for comm in sample["response"].communications:
    print(f"--- {comm.audience.upper()} ---")
    print(f"Subject: {comm.subject}")
    print(f"{comm.body[:300]}...\n")

## Next Steps

Open **Domino Experiment Manager** to view:
- Execution flow across all 4 agents
- Inline evaluation metrics per trace
- Aggregated statistics across the batch