-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
π Overview
Implement comprehensive ExecutionLog tracking for all MCP tools and functions in ack-mcp-server to provide complete observability, audit trails, and debugging capabilities.
π― Objectives
- β Full Traceability: Track every tool execution from start to finish
- β API Call Monitoring: Record all external API calls (ACK, ARMS, SLS, Prometheus, Kubectl)
- β Error Diagnostics: Capture detailed error context and metadata
- β Performance Metrics: Track execution duration and API latency
- β Audit Compliance: Maintain complete audit trails for security and compliance
π Implementation Requirements
1. ExecutionLog Data Structure
All tools must use the standardized ExecutionLog model:
class ExecutionLog(BaseModel):
"""Execution log model for tracking tool execution"""
tool_call_id: str # Unique identifier: "{tool_name}_{cluster_id}_{timestamp}"
start_time: str # ISO 8601 format: "2025-01-19T10:23:09Z"
end_time: Optional[str] # ISO 8601 format
duration_ms: Optional[int] # Total execution time in milliseconds
messages: List[str] # Execution messages (use sparingly)
api_calls: List[Dict[str, Any]] # Detailed API call records
warnings: List[str] # Warning messages
error: Optional[str] # Error message (if failed)
metadata: Optional[Dict[str, Any]] # Error context (only for failures)2. Success Case Logging (Concise)
For successful executions, keep logs minimal and essential:
execution_log.api_calls.append({
"api": "DescribeClusterDetail",
"cluster_id": cluster_id,
"request_id": "B8A0D7C3-...",
"duration_ms": 234,
"status": "success"
})Guidelines:
- β No verbose descriptive messages
- β No metadata field population
- β Only essential fields: api, request_id, duration_ms, status
3. Error Case Logging (Detailed)
For failures, provide comprehensive diagnostic information:
execution_log.error = "Failed to get cluster details"
execution_log.metadata = {
"error_type": "ValueError",
"error_code": "ClusterNotFound",
"failure_stage": "cluster_query",
"cluster_id": cluster_id,
"region_id": region_id
}Guidelines:
- β Detailed error messages
- β Error type and error code
- β Failure stage identification
- β Context metadata for debugging
4. External Call Tracking Requirements
Track all external API calls with appropriate metrics:
Alibaba Cloud OpenAPI
{
"api": "DescribeClusterDetail",
"request_id": "B8A0D7C3-...", # From x-acs-request-id header
"duration_ms": 234,
"status": "success",
"cluster_id": cluster_id
}Prometheus HTTP API
{
"api": "PrometheusQuery",
"endpoint": "https://prometheus.example.com/api/v1/query",
"duration_ms": 856,
"status": "success",
"http_status": 200,
"response_size_bytes": 3456
}SLS API
{
"api": "GetLogs",
"project": "k8s-log-project",
"logstore": "audit-log",
"request_id": "A7B2C6D4-...",
"duration_ms": 567,
"status": "success",
"log_count": 150
}Kubectl Commands
{
"api": "KubectlCommand",
"command": "get pods -A",
"type": "normal", # or "streaming"
"duration_ms": 1023,
"exit_code": 0,
"status": "success"
}5. Output Model Standards
All output models must inherit from BaseOutputModel:
class BaseOutputModel(BaseModel):
"""Base class for all tool output models"""
execution_log: ExecutionLog = Field(
default_factory=lambda: ExecutionLog(
tool_call_id="",
start_time=datetime.utcnow().isoformat() + "Z"
),
description="Execution log"
)
class YourToolOutput(BaseOutputModel):
"""Your tool output model"""
# Your fields here
data: Dict[str, Any]
# execution_log is automatically inheritedπ§ Implementation Pattern
Standard Tool Template
@mcp.tool(name='your_tool')
async def your_tool(
ctx: Context,
param: str = Field(..., description="Parameter description"),
) -> YourToolOutput:
# 1. Initialize execution log
start_ms = int(time.time() * 1000)
execution_log = ExecutionLog(
tool_call_id=f"your_tool_{param}_{start_ms}",
start_time=datetime.utcnow().isoformat() + "Z"
)
try:
# 2. Execute business logic with API call tracking
api_start = int(time.time() * 1000)
response = await api_client.call_api(param)
api_duration = int(time.time() * 1000) - api_start
# 3. Extract request_id
request_id = response.headers.get('x-acs-request-id', 'N/A')
# 4. Log API call (concise for success)
execution_log.api_calls.append({
"api": "YourAPI",
"param": param,
"request_id": request_id,
"duration_ms": api_duration,
"status": "success"
})
# 5. Record end time and duration
execution_log.end_time = datetime.utcnow().isoformat() + "Z"
execution_log.duration_ms = int(time.time() * 1000) - start_ms
# 6. Return with execution_log
return YourToolOutput(
data=response.data,
execution_log=execution_log
)
except Exception as e:
# 7. Handle errors with detailed logging
execution_log.error = str(e)
execution_log.end_time = datetime.utcnow().isoformat() + "Z"
execution_log.duration_ms = int(time.time() * 1000) - start_ms
execution_log.metadata = {
"error_type": type(e).__name__,
"failure_stage": "api_call",
"param": param
}
return {
"error": ErrorModel(error_code="APIError", error_message=str(e)).model_dump(),
"execution_log": execution_log
}π§ͺ Testing Requirements
For each implemented tool, verify:
-
Success Path
- β
execution_logis present in response - β All API calls are logged with request_id
- β Duration is calculated correctly
- β Logs are concise (no verbose messages)
- β
-
Error Path
- β
errorfield is populated - β
metadatacontains diagnostic information - β
failure_stageis identified - β API calls before failure are logged
- β
-
Performance
- β Logging overhead is minimal (<5ms)
- β No blocking operations in logging code
- β Large responses don't cause memory issues
π Benefits
- π Debugging: Rapid issue identification with complete execution context
- π Monitoring: Real-time performance metrics and API latency tracking
- π Audit: Complete audit trails for compliance and security
- π Optimization: Identify performance bottlenecks with timing data
- π€ AI Observability: Enable AI agents to understand execution flow
Metadata
Metadata
Assignees
Labels
No labels