Route error spans through MapError to reduce UnknownFailure bucket#7241
Route error spans through MapError to reduce UnknownFailure bucket#7241rajeshkamal5050 merged 1 commit intomainfrom
Conversation
#7239) Several high-volume code paths used EndWithStatus(err) which bypasses MapError() — the central error classification function. This caused errors to appear with raw Go type names or empty descriptions in telemetry, inflating the UnknownFailure bucket. Changes: - cmd/mcp.go: MCP tool handler now calls MapError(err, span) instead of EndWithStatus(err), giving MCP tool failures proper telemetry codes (e.g., service.arm.400 instead of azcore_ResponseError) - internal/agent/copilot_agent.go: Initialize() and ensureSession() now route errors through cmd.MapError for structured classification - internal/tracing/tracer.go: EndWithStatus() now prefixes error descriptions with 'internal.' to match MapError's catch-all convention, ensuring remaining callers in pkg/ (container_helper, extensions/manager) produce consistent internal.* codes Fixes #7239 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Routes more error paths through the centralized internal/cmd.MapError classification so telemetry spans emit structured result codes (reducing the UnknownFailure bucket) instead of raw Go type names.
Changes:
- Updated MCP tool handler spans to call
internal/cmd.MapError(err, span)before ending spans. - Updated Copilot agent spans in
Initialize()andensureSession()to usecmd.MapErrorfor error classification. - Adjusted
Span.EndWithStatus(err)fallback to prefix raw error type descriptions withinternal.for callers that can’t useinternal/cmd.MapError.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| cli/azd/internal/tracing/tracer.go | Prefixes EndWithStatus error status descriptions with internal. to align with the repo’s catch-all convention. |
| cli/azd/internal/agent/copilot_agent.go | Routes Copilot agent span errors through cmd.MapError and ends spans explicitly. |
| cli/azd/cmd/mcp.go | Routes MCP tool handler span errors through internal/cmd.MapError and ends spans explicitly. |
| cli/azd/.vscode/cspell.yaml | Adds internalcmd to cspell overrides for cmd/mcp.go. |
Azure Dev CLI Install InstructionsInstall scriptsMacOS/Linux
bash: pwsh: WindowsPowerShell install MSI install Standalone Binary
MSI
Documentationlearn.microsoft.com documentationtitle: Azure Developer CLI reference
|
Telemetry Context: 1,017 Opaque Errors This PR Will Help ClassifyDeep dive into 1,017 errors (21.6%) have Where the opaque errors come from
Container Apps (all variants) = 47% of the opaque errors. Why this matters beyond classificationThe ARM response body contains an Currently, 366 unique machines hit these opaque errors. 65% retry and fail again — they have no information to guide a fix. |
Summary
Fixes #7239
The
UnknownFailureresult code is the single largest error bucket in telemetry — 2.35M errors/28d with zero classification. Root cause: several high-volume code paths usedspan.EndWithStatus(err)which bypassesMapError(), the central error classification function.Root Cause
EndWithStatus(err)callserrorDescription(err)which produces a raw Go type name (e.g.,errors_errorString) as the span status description. When this is empty or unrecognized, the telemetry converter falls back to"UnknownFailure":Meanwhile,
MapError()produces structured codes likeservice.arm.400,auth.login_required,tool.docker.failed— but these paths bypass it.Changes
1. MCP tool handler (
cmd/mcp.go)Replaced
span.EndWithStatus(err)withinternalcmd.MapError(err, span)+span.End(). Every MCP tool failure now gets proper telemetry classification.2. Copilot agent (
internal/agent/copilot_agent.go)Both
Initialize()andensureSession()now route errors throughcmd.MapErrorinstead ofEndWithStatus.3.
EndWithStatusfallback (internal/tracing/tracer.go)For remaining callers in
pkg/(container_helper.go, extensions/manager.go) that cannot importinternal/cmd,EndWithStatus()now prefixes error descriptions withinternal.to matchMapError's catch-all convention. This ensures consistentinternal.*codes instead of bare type names.Expected Impact
UnknownFailureor raw type nameservice.arm.400,auth.login_required, etc.errors_errorStringinternal.errors_errorString(consistent convention)This should significantly reduce the 2.35M
UnknownFailurebucket by routing errors through the existing classification infrastructure.