Skip to content

[Requirement] ExecutionLog Implementation for All ToolsΒ #10

@KeyOfSpectator

Description

@KeyOfSpectator

πŸ“‹ Overview

Implement comprehensive ExecutionLog tracking for all MCP tools and functions in ack-mcp-server to provide complete observability, audit trails, and debugging capabilities.

🎯 Objectives

  • βœ… Full Traceability: Track every tool execution from start to finish
  • βœ… API Call Monitoring: Record all external API calls (ACK, ARMS, SLS, Prometheus, Kubectl)
  • βœ… Error Diagnostics: Capture detailed error context and metadata
  • βœ… Performance Metrics: Track execution duration and API latency
  • βœ… Audit Compliance: Maintain complete audit trails for security and compliance

πŸ“ Implementation Requirements

1. ExecutionLog Data Structure

All tools must use the standardized ExecutionLog model:

class ExecutionLog(BaseModel):
    """Execution log model for tracking tool execution"""
    tool_call_id: str              # Unique identifier: "{tool_name}_{cluster_id}_{timestamp}"
    start_time: str                # ISO 8601 format: "2025-01-19T10:23:09Z"
    end_time: Optional[str]        # ISO 8601 format
    duration_ms: Optional[int]     # Total execution time in milliseconds
    messages: List[str]            # Execution messages (use sparingly)
    api_calls: List[Dict[str, Any]]  # Detailed API call records
    warnings: List[str]            # Warning messages
    error: Optional[str]           # Error message (if failed)
    metadata: Optional[Dict[str, Any]]  # Error context (only for failures)

2. Success Case Logging (Concise)

For successful executions, keep logs minimal and essential:

execution_log.api_calls.append({
    "api": "DescribeClusterDetail",
    "cluster_id": cluster_id,
    "request_id": "B8A0D7C3-...",
    "duration_ms": 234,
    "status": "success"
})

Guidelines:

  • ❌ No verbose descriptive messages
  • ❌ No metadata field population
  • βœ… Only essential fields: api, request_id, duration_ms, status

3. Error Case Logging (Detailed)

For failures, provide comprehensive diagnostic information:

execution_log.error = "Failed to get cluster details"
execution_log.metadata = {
    "error_type": "ValueError",
    "error_code": "ClusterNotFound",
    "failure_stage": "cluster_query",
    "cluster_id": cluster_id,
    "region_id": region_id
}

Guidelines:

  • βœ… Detailed error messages
  • βœ… Error type and error code
  • βœ… Failure stage identification
  • βœ… Context metadata for debugging

4. External Call Tracking Requirements

Track all external API calls with appropriate metrics:

Alibaba Cloud OpenAPI

{
    "api": "DescribeClusterDetail",
    "request_id": "B8A0D7C3-...",  # From x-acs-request-id header
    "duration_ms": 234,
    "status": "success",
    "cluster_id": cluster_id
}

Prometheus HTTP API

{
    "api": "PrometheusQuery",
    "endpoint": "https://prometheus.example.com/api/v1/query",
    "duration_ms": 856,
    "status": "success",
    "http_status": 200,
    "response_size_bytes": 3456
}

SLS API

{
    "api": "GetLogs",
    "project": "k8s-log-project",
    "logstore": "audit-log",
    "request_id": "A7B2C6D4-...",
    "duration_ms": 567,
    "status": "success",
    "log_count": 150
}

Kubectl Commands

{
    "api": "KubectlCommand",
    "command": "get pods -A",
    "type": "normal",  # or "streaming"
    "duration_ms": 1023,
    "exit_code": 0,
    "status": "success"
}

5. Output Model Standards

All output models must inherit from BaseOutputModel:

class BaseOutputModel(BaseModel):
    """Base class for all tool output models"""
    execution_log: ExecutionLog = Field(
        default_factory=lambda: ExecutionLog(
            tool_call_id="",
            start_time=datetime.utcnow().isoformat() + "Z"
        ),
        description="Execution log"
    )

class YourToolOutput(BaseOutputModel):
    """Your tool output model"""
    # Your fields here
    data: Dict[str, Any]
    # execution_log is automatically inherited

πŸ”§ Implementation Pattern

Standard Tool Template

@mcp.tool(name='your_tool')
async def your_tool(
    ctx: Context,
    param: str = Field(..., description="Parameter description"),
) -> YourToolOutput:
    # 1. Initialize execution log
    start_ms = int(time.time() * 1000)
    execution_log = ExecutionLog(
        tool_call_id=f"your_tool_{param}_{start_ms}",
        start_time=datetime.utcnow().isoformat() + "Z"
    )
    
    try:
        # 2. Execute business logic with API call tracking
        api_start = int(time.time() * 1000)
        response = await api_client.call_api(param)
        api_duration = int(time.time() * 1000) - api_start
        
        # 3. Extract request_id
        request_id = response.headers.get('x-acs-request-id', 'N/A')
        
        # 4. Log API call (concise for success)
        execution_log.api_calls.append({
            "api": "YourAPI",
            "param": param,
            "request_id": request_id,
            "duration_ms": api_duration,
            "status": "success"
        })
        
        # 5. Record end time and duration
        execution_log.end_time = datetime.utcnow().isoformat() + "Z"
        execution_log.duration_ms = int(time.time() * 1000) - start_ms
        
        # 6. Return with execution_log
        return YourToolOutput(
            data=response.data,
            execution_log=execution_log
        )
        
    except Exception as e:
        # 7. Handle errors with detailed logging
        execution_log.error = str(e)
        execution_log.end_time = datetime.utcnow().isoformat() + "Z"
        execution_log.duration_ms = int(time.time() * 1000) - start_ms
        execution_log.metadata = {
            "error_type": type(e).__name__,
            "failure_stage": "api_call",
            "param": param
        }
        return {
            "error": ErrorModel(error_code="APIError", error_message=str(e)).model_dump(),
            "execution_log": execution_log
        }

πŸ§ͺ Testing Requirements

For each implemented tool, verify:

  1. Success Path

    • βœ… execution_log is present in response
    • βœ… All API calls are logged with request_id
    • βœ… Duration is calculated correctly
    • βœ… Logs are concise (no verbose messages)
  2. Error Path

    • βœ… error field is populated
    • βœ… metadata contains diagnostic information
    • βœ… failure_stage is identified
    • βœ… API calls before failure are logged
  3. Performance

    • βœ… Logging overhead is minimal (<5ms)
    • βœ… No blocking operations in logging code
    • βœ… Large responses don't cause memory issues

πŸ“ˆ Benefits

  • πŸ” Debugging: Rapid issue identification with complete execution context
  • πŸ“Š Monitoring: Real-time performance metrics and API latency tracking
  • πŸ”’ Audit: Complete audit trails for compliance and security
  • πŸš€ Optimization: Identify performance bottlenecks with timing data
  • πŸ€– AI Observability: Enable AI agents to understand execution flow

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions