# Multi-Agent Ticket Triage System

This notebook demonstrates how to build a **multi-agent system** using Azure AI Agent Service for automated ticket triage. The system uses multiple specialized agents that work together to analyze support tickets and determine:

- **Priority Level** (High/Medium/Low)
- **Team Assignment** (Frontend/Backend/Infrastructure/Marketing)
- **Effort Estimation** (Small/Medium/Large)

## Architecture Overview

The system consists of:
1. **Three Specialist Agents**: Each focused on one aspect of ticket analysis
2. **One Orchestrator Agent**: Uses the specialist agents as tools to provide comprehensive triage
3. **Connected Agent Tools**: Enable the orchestrator to call specialist agents as needed

In [None]:
import os
from azure.ai.agents import AgentsClient
from azure.ai.agents.models import ConnectedAgentTool, MessageRole, ListSortOrder, ToolSet, FunctionTool
from azure.identity import DefaultAzureCredential
from dotenv import load_dotenv
load_dotenv()
project_endpoint = os.getenv("AZURE_AI_PROJECT_ENDPOINT")
model_deployment = os.getenv("MODEL_DEPLOYMENT_NAME")


## 🔍 Observability & Tracing Setup

To monitor our multi-agent system's performance and behavior, we'll set up Azure AI Foundry tracing. This will help us:

- **Track agent interactions**: See how the orchestrator calls each specialist agent
- **Monitor response times**: Understand the performance of each agent
- **Debug issues**: Trace the flow of data through the multi-agent system
- **Analyze usage**: Monitor token consumption and API calls

The tracing will capture:
1. Each specialist agent call (priority, team, effort assessment)
2. The orchestrator agent's decision-making process
3. Token usage and response times
4. Any errors or issues in the workflow

In [None]:
# # Install required packages for Azure AI Foundry tracing
# %pip install azure-monitor-opentelemetry opentelemetry-instrumentation-openai-v2

### Configure OpenTelemetry Instrumentation

This cell sets up the telemetry configuration to capture detailed information about our multi-agent system interactions. We enable content recording to see the actual messages exchanged between agents.

In [None]:
import os
from azure.ai.inference.tracing import AIInferenceInstrumentor

# Enable content recording for detailed trace information
os.environ["OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT"] = "true"
os.environ["AZURE_TRACING_GEN_AI_CONTENT_RECORDING_ENABLED"] = "true"

# Configure Azure SDK to use OpenTelemetry
os.environ["AZURE_SDK_TRACING_IMPLEMENTATION"] = "opentelemetry"

# Instrument OpenAI SDK first
try:
    from opentelemetry.instrumentation.openai_v2 import OpenAIInstrumentor
    OpenAIInstrumentor().instrument()
    print("✅ OpenAI v2 instrumentation enabled.")
except ImportError:
    print("⚠️ opentelemetry-instrumentation-openai-v2 not available")

# Then instrument Azure AI Inference
AIInferenceInstrumentor().instrument()
print("✅ Azure AI Inference instrumentation enabled.")
print("✅ Content recording enabled for multi-agent traces.")

### Configure Azure Monitor Tracing

This cell sets up Azure Monitor integration to send our multi-agent system traces to Azure AI Foundry. The traces will be visible in the Tracing tab of your AI Foundry project, allowing you to monitor the complete workflow of the ticket triage process.

In [None]:
from azure.monitor.opentelemetry import configure_azure_monitor
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential

# Create AI Project client for telemetry
project_client = AIProjectClient(
    endpoint=project_endpoint,
    credential=DefaultAzureCredential(
        exclude_environment_credential=True, 
        exclude_managed_identity_credential=True
    )
)

# Get Application Insights connection string - following Microsoft's recommended approach
print("🔧 Attempting to get Application Insights connection string...")
try:
    connection_string = project_client.telemetry.get_connection_string()
    if connection_string:
        print("✅ Found App Insights connection string!")
        print(f"🔗 Connection string format: {connection_string[:50]}...")
    else:
        print("⚠️ No connection string returned from project client.")
        connection_string = None
except Exception as e:
    print(f"⚠️ Error getting App Insights connection string: {e}")
    connection_string = None

# Configure Azure Monitor following Microsoft's exact approach from the documentation
if connection_string:
    try:
        print("🔧 Configuring Azure Monitor for multi-agent tracing...")
        
        # Use manual Azure Monitor setup to avoid logging_formatter issues
        from azure.monitor.opentelemetry.exporter import AzureMonitorTraceExporter
        from opentelemetry.sdk.trace import TracerProvider
        from opentelemetry.sdk.trace.export import BatchSpanProcessor
        from opentelemetry import trace
        
        # Create new tracer provider
        tracer_provider = TracerProvider()
        trace.set_tracer_provider(tracer_provider)
        
        # Add Azure Monitor exporter
        azure_exporter = AzureMonitorTraceExporter(connection_string=connection_string)
        span_processor = BatchSpanProcessor(azure_exporter)
        tracer_provider.add_span_processor(span_processor)
        
        print("✅ Azure Monitor configured successfully!")
        print("🎯 Multi-agent traces will be visible in Azure AI Foundry portal!")
        
    except Exception as config_error:
        print(f"❌ Azure Monitor configuration failed: {config_error}")
        print("💡 Proceeding without Azure Monitor tracing")
        
        # Set up basic tracer provider as fallback
        from opentelemetry.sdk.trace import TracerProvider
        from opentelemetry import trace
        tracer_provider = TracerProvider()
        trace.set_tracer_provider(tracer_provider)
        print("✅ Basic tracing provider configured as fallback")
else:
    print("❌ Cannot configure Azure Monitor without connection string.")
    print("💡 Please ensure Application Insights is properly configured in your AI Foundry project.")
    
    # Set up basic tracer provider as fallback
    from opentelemetry.sdk.trace import TracerProvider
    from opentelemetry import trace
    tracer_provider = TracerProvider()
    trace.set_tracer_provider(tracer_provider)
    print("✅ Basic tracing provider configured - spans will be created but not exported")

In [None]:
# Priority agent definition
priority_agent_name = "priority_agent"
priority_agent_instructions = """
Assess how urgent a ticket is based on its description.

Respond with one of the following levels:
- High: User-facing or blocking issues
- Medium: Time-sensitive but not breaking anything
- Low: Cosmetic or non-urgent tasks

Only output the urgency level and a very brief explanation.
"""

In [None]:
# Team agent definition
team_agent_name = "team_agent"
team_agent_instructions = """
Decide which team should own each ticket.

Choose from the following teams:
- Frontend
- Backend
- Infrastructure
- Marketing

Base your answer on the content of the ticket. Respond with the team name and a very brief explanation.
"""

In [None]:
# Effort agent definition
effort_agent_name = "effort_agent"
effort_agent_instructions = """
Estimate how much work each ticket will require.

Use the following scale:
- Small: Can be completed in a day
- Medium: 2-3 days of work
- Large: Multi-day or cross-team effort

Base your estimate on the complexity implied by the ticket. Respond with the effort level and a brief justification.
"""

In [None]:
# Instructions for the primary agent
triage_agent_instructions = """
Triage the given ticket. Use the connected tools to determine the ticket's priority, 
which team it should be assigned to, and how much effort it may take.
"""

In [None]:
# Connect to the agents client
agents_client = AgentsClient(
endpoint=project_endpoint,
credential=DefaultAzureCredential(
    exclude_environment_credential=True, 
    exclude_managed_identity_credential=True
),
)

In [None]:
# FIXED: Create agents without context manager to keep client active
print("Creating agents...")

# Create the priority agent on the Azure AI agent service
priority_agent = agents_client.create_agent(
    model=model_deployment,
    name=priority_agent_name,
    instructions=priority_agent_instructions
)
print("✅ Priority agent created")

# Create a connected agent tool for the priority agent
priority_agent_tool = ConnectedAgentTool(
    id=priority_agent.id, 
    name=priority_agent_name, 
    description="Assess the priority of a ticket"
)

# Create the team agent and connected tool
team_agent = agents_client.create_agent(
    model=model_deployment,
    name=team_agent_name,
    instructions=team_agent_instructions
)
print("✅ Team agent created")

team_agent_tool = ConnectedAgentTool(
    id=team_agent.id, 
    name=team_agent_name, 
    description="Determines which team should take the ticket"
)

# Create the effort agent and connected tool
effort_agent = agents_client.create_agent(
    model=model_deployment,
    name=effort_agent_name,
    instructions=effort_agent_instructions
)
print("✅ Effort agent created")

effort_agent_tool = ConnectedAgentTool(
    id=effort_agent.id, 
    name=effort_agent_name, 
    description="Determines the effort required to complete the ticket"
)

# Create a main agent with the Connected Agent tools
agent = agents_client.create_agent(
    model=model_deployment,
    name="triage-agent",
    instructions=triage_agent_instructions,
    tools=[
        priority_agent_tool.definitions[0],
        team_agent_tool.definitions[0],
        effort_agent_tool.definitions[0]
    ]
)
print("✅ Main triage agent created with connected tools")
print("🎯 All agents are ready for use!")

In [None]:
# Create thread for the chat session
print("Creating agent thread.")
thread = agents_client.threads.create()

In [None]:
from opentelemetry import trace
import time

# Get tracer for custom span creation - following Microsoft's example
tracer = trace.get_tracer(__name__)

# Create the ticket prompt
prompt = "Users can't reset their password from the mobile app."

# Start main tracing span for the entire multi-agent workflow
with tracer.start_as_current_span("multi_agent_ticket_triage") as span:
    # Add custom attributes as shown in Microsoft docs
    span.set_attribute("operation.type", "ticket_triage")
    span.set_attribute("operation.category", "multi_agent")
    span.set_attribute("ticket.content", prompt)
    span.set_attribute("system.agents.count", 4)  # 3 specialists + 1 orchestrator
    
    start_time = time.time()
    
    # Send a prompt to the agent
    message = agents_client.messages.create(
        thread_id=thread.id,
        role=MessageRole.USER,
        content=prompt,
    )
    print(f"📝 Created user message: {prompt}")
    
    # Create and process Agent run in thread with tools
    print("🔄 Processing multi-agent thread. Please wait...")
    run = agents_client.runs.create_and_process(thread_id=thread.id, agent_id=agent.id)
    
    # Add run information to span
    span.set_attribute("run.id", run.id)
    span.set_attribute("run.status", run.status)
    span.set_attribute("thread.id", thread.id)
    span.set_attribute("agent.id", agent.id)
    
    if run.status == "failed":
        span.set_attribute("run.error", str(run.last_error))
        print(f"❌ Run failed: {run.last_error}")
    else:
        print("✅ Multi-agent processing completed successfully")

    # Fetch and analyze all messages
    messages = agents_client.messages.list(thread_id=thread.id, order=ListSortOrder.ASCENDING)
    
    message_count = 0
    agent_calls = 0
    
    print("\n📊 Multi-Agent Conversation Flow:")
    print("=" * 50)
    
    for message in messages:
        if message.text_messages:
            message_count += 1
            last_msg = message.text_messages[-1]
            
            # Count specialist agent calls by looking for tool calls in assistant messages
            if message.role.lower() == "assistant":
                msg_content = last_msg.text.value.lower()
                if any(agent_name in msg_content for agent_name in ["priority_agent", "team_agent", "effort_agent"]):
                    agent_calls += 1
            
            print(f"🤖 {message.role.upper()}:")
            print(f"{last_msg.text.value}\n")
    
    end_time = time.time()
    total_duration = end_time - start_time
    
    # Add response metadata to span following Microsoft's pattern
    span.set_attribute("workflow.duration_seconds", round(total_duration, 2))
    span.set_attribute("workflow.messages_processed", message_count)
    span.set_attribute("workflow.specialist_calls", agent_calls)
    span.set_attribute("workflow.status", "completed" if run.status != "failed" else "failed")
    
    print("=" * 50)
    print(f"📈 Workflow Summary:")
    print(f"   ⏱️  Total Duration: {total_duration:.2f} seconds")
    print(f"   💬 Messages Processed: {message_count}")
    print(f"   🤖 Specialist Agent Calls: {agent_calls}")
    print(f"   ✅ Status: {'Completed' if run.status != 'failed' else 'Failed'}")

print("\n🎯 Multi-agent workflow trace completed!")
print("📊 Trace should now be visible in Azure AI Foundry portal!")
print("   Navigate to: Your Project → Tracing tab")
print("   Look for a trace with operation name 'multi_agent_ticket_triage'")
print("\n💡 If traces don't appear:")
print("   1. Wait 2-5 minutes for propagation")
print("   2. Check Application Insights directly in Azure Portal")
print("   3. Verify OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true is set")

## 🎯 Execute Multi-Agent Ticket Triage with Tracing

Now we'll execute our multi-agent ticket triage system with full observability. This will create a comprehensive trace showing:

1. **Main triage operation**: The overall ticket analysis process
2. **Specialist agent calls**: Each agent's individual assessment (priority, team, effort)
3. **Token usage**: Tracking resource consumption across all agents
4. **Timing information**: Performance metrics for each step
5. **Message flow**: The complete conversation between agents

The traces will help us understand how the agents collaborate and optimize the system's performance.

In [None]:
# Delete the agent when done
print("Cleaning up agents:")
agents_client.delete_agent(agent.id)
print("Deleted triage agent.")

# Delete the connected agents when done
agents_client.delete_agent(priority_agent.id)
print("Deleted priority agent.")
agents_client.delete_agent(team_agent.id)
print("Deleted team agent.")
agents_client.delete_agent(effort_agent.id)
print("Deleted effort agent.")

## 📊 Viewing and Analyzing Traces

After running the multi-agent system, you can view detailed traces in several ways:

### 🌐 Azure AI Foundry Portal
1. Navigate to your AI Foundry project
2. Click on the **Tracing** tab
3. Look for traces with operation name `ticket_triage_workflow`
4. Explore the trace hierarchy to see:
   - Main workflow span
   - Individual specialist agent calls
   - Message creation and processing
   - Response analysis

### 🔍 What to Look For in Traces
- **Performance bottlenecks**: Which agent takes longest to respond?
- **Token usage**: How many tokens each specialist agent consumes
- **Error patterns**: Any failures in the multi-agent workflow
- **Agent collaboration**: The sequence of specialist agent calls
- **Message flow**: Complete conversation between user and agents

### 📈 Key Metrics to Monitor
- **Total workflow duration**: End-to-end processing time
- **Individual agent response times**: Performance of each specialist
- **Token consumption**: Cost optimization opportunities
- **Success rates**: Percentage of successful triage operations
- **Message count**: Efficiency of agent communication

### 💡 Optimization Insights
The traces can help you:
- Identify slow agents that need optimization
- Optimize prompt engineering for better performance
- Monitor resource usage across the multi-agent system
- Debug issues in agent coordination