diff --git a/cli/azd/docs/tracing-in-azd.md b/cli/azd/docs/tracing-in-azd.md index 5edc59881d3..a6660a61d35 100644 --- a/cli/azd/docs/tracing-in-azd.md +++ b/cli/azd/docs/tracing-in-azd.md @@ -1,5 +1,7 @@ # Tracing in `azd` + + ## Overview `azd` uses [OpenTelemetry](https://opentelemetry.io/docs/concepts/signals/traces/) to collect **anonymous usage information** and **success metrics**. @@ -30,8 +32,8 @@ Decide whether you are adding: | Type | File to Update | Description | | ------------- | -------------------------------------------------------- | ------------------------------------ | -| **Attribute** | [`fields.go`](cli/azd/internal/tracing/fields/fields.go) | Defines standardized attribute keys. | -| **Event** | [`events.go`](cli/azd/internal/tracing/events/events.go) | Defines standardized event names. | +| **Attribute** | [`fields.go`](../internal/tracing/fields/fields.go) | Defines standardized attribute keys. | +| **Event** | [`events.go`](../internal/tracing/events/events.go) | Defines standardized event names. | ### 2. Recording a New Event @@ -71,6 +73,79 @@ This example sets a usage attribute to be included in the root command event. --- +## Existing Event Taxonomy + +The following events already exist in [`events.go`](../internal/tracing/events/events.go). Use them instead of +adding new events for extension and hook lifecycle telemetry. + +| Event | Lifecycle | Attributes to expect | Sample row | +| ----- | --------- | -------------------- | ---------- | +| `ext.run` | Running an installed extension command through `azd`. | Command attributes such as `cmd.entry`, `cmd.flags`, `cmd.args.count`, plus `extension.installed` on the root span. | `name=ext.run`, `cmd.entry=cmd.ai.chat`, `cmd.flags=["model"]`, `cmd.args.count=0` | +| `ext.install` | Installing one extension version. | `extension.id` (set as soon as installation begins); `extension.version` (set after the version is resolved). On failure the span uses OpenTelemetry status `Error`; `EndWithStatus` derives the status description from the error type. | `name=ext.install`, `extension.id=microsoft.azd.ai`, `extension.version=1.2.0`, `status=Ok` | +| `ext.upgrade` | Upgrading one extension attempt. | `extension.id`, `extension.version.from`, `extension.version.to`, `extension.source`, `extension.upgrade.duration_ms`, `extension.upgrade.outcome`. | `name=ext.upgrade`, `extension.id=microsoft.azd.ai`, `extension.version.from=1.1.0`, `extension.version.to=1.2.0`, `extension.upgrade.outcome=upgraded` | +| `ext.promote` | Promoting an extension registry entry, such as dev to main. | `extension.id`, `extension.version.from`, `extension.version.to`, `extension.source.from`, `extension.source.to`. | `name=ext.promote`, `extension.id=microsoft.azd.ai`, `extension.source.from=dev`, `extension.source.to=main`, `status=Ok` | +| `hooks.exec` | Executing a project, layer, or service lifecycle hook. | `hooks.name`, `hooks.type`, `hooks.kind`; status description uses hook-specific codes such as `hook.validation_failed`. | `name=hooks.exec`, `hooks.name=predeploy`, `hooks.type=service`, `hooks.kind=sh`, `status=Ok` | + +### Extension Attributes + +Extension telemetry attributes are defined in [`fields.go`](../internal/tracing/fields/fields.go). + +| Attribute | Description | Example | +| --------- | ----------- | ------- | +| `extension.id` | Extension identifier. | `microsoft.azd.ai` | +| `extension.version` | Installed extension version. | `1.2.0` | +| `extension.installed` | Installed extensions on a command span, each formatted as `id@version`. | `["microsoft.azd.ai@1.2.0"]` | +| `extension.version.from` | Version before an upgrade or promotion. | `1.1.0` | +| `extension.version.to` | Version after an upgrade or promotion. | `1.2.0` | +| `extension.source` | Registry source used for an upgrade. | `main` | +| `extension.source.from` | Registry source before a promotion. | `dev` | +| `extension.source.to` | Registry source after a promotion. | `main` | +| `extension.upgrade.duration_ms` | Upgrade duration in milliseconds. | `1532` | +| `extension.upgrade.outcome` | Upgrade result status. | `upgraded` | + +### Hook Attributes + +`hooks.exec` spans should include the hook name and scope as soon as they are known, then add the executor kind after hook +validation succeeds. + +| Attribute | Description | Example | +| --------- | ----------- | ------- | +| `hooks.name` | Hook name. The `azd hooks run` root command hashes unknown hook names before recording usage attributes; `hooks.exec` child spans record the resolved hook name. | `predeploy` | +| `hooks.type` | Hook run scope. | `project`, `layer`, or `service` | +| `hooks.kind` | Executor kind used to run the hook. | `sh`, `pwsh`, `python`, `js`, `ts`, or `dotnet` | + +### Error Attribute Conventions + +Use `MapError` from [`internal/cmd/errors.go`](../internal/cmd/errors.go) for command, extension, JSON-RPC, MCP, and +agent spans so that error status and attributes stay consistent. + +| Convention | Description | Example | +| ---------- | ----------- | ------- | +| Span status | Failed spans mapped through `MapError` set OpenTelemetry status `Error`; the status description is the primary error code. Codes use stable families such as `auth.*`, `ext.*`, `internal.*`, `service.*`, `tool.*`, or `user.*`. | `ext.run.failed`, `service.arm.deployment.failed`, `user.canceled` | +| `error.category` | Broad local error category, used when the error is local rather than returned by an external service. | `auth` | +| `error.code` | Normalized local or extension error code. | `invalid_payload` | +| `error.type` | Go error type for unclassified or suggestion-wrapped errors. | `*os.PathError` | +| `error.service.name` | External service name after `fields.ErrorKey` prefixes `service.name` for error details. Only set this when an external service returned the error. | `arm`, `aad`, `storage` | +| `error.service.errorCode` | Error code returned by an external service, after `fields.ErrorKey` prefixes `service.errorCode`. For ARM deployment errors this is a JSON array describing the nested error chain (see below). | `AuthorizationFailed` | +| `error.service.statusCode` | Status code returned by an external service, after `fields.ErrorKey` prefixes `service.statusCode`. | `403` | + +For nested ARM deployment failures, `MapError` walks the inner error tree and encodes each level as an entry in the +JSON array stored on `error.service.errorCode`. Each entry has the shape `{"error.code": "", "error.frame": }`, +where `error.frame` is the depth in the nested chain (0 for the outermost error). For example: + +```json +[ + {"error.code": "InvalidTemplateDeployment", "error.frame": 0}, + {"error.code": "AuthorizationFailed", "error.frame": 1} +] +``` + +Do not attach arbitrary user input or secrets to `error.*` attributes. Prefer the standardized field constants from +`fields.go`; if you need to include a service-related field on an error span, pass it through `fields.ErrorKey` so it is +reported under the `error.` namespace. + +--- + ## 🧪 Observing New Traces ### 1. Local Observation diff --git a/cli/azd/pkg/extensions/manager.go b/cli/azd/pkg/extensions/manager.go index 52abca3f334..249e29a986f 100644 --- a/cli/azd/pkg/extensions/manager.go +++ b/cli/azd/pkg/extensions/manager.go @@ -319,6 +319,9 @@ func (m *Manager) Install( } ctx, span := tracing.Start(ctx, events.ExtensionInstallEvent) + // Set the extension id immediately so failure spans can be correlated to the + // extension being installed. The version is added later, once it has been resolved. + span.SetAttributes(fields.ExtensionId.String(extension.Id)) defer func() { span.EndWithStatus(err) }() @@ -376,6 +379,10 @@ func (m *Manager) Install( return nil, fmt.Errorf("no compatible version found for extension: %s", extension.Id) } + // Record the resolved version on the span so failures during install + // (artifact download, checksum, copy, config save) still emit it. + span.SetAttributes(fields.ExtensionVersion.String(selectedVersion.Version)) + // Binaries are optional as long as dependencies are provided // This allows for extensions that are just extension packs if len(selectedVersion.Artifacts) == 0 && len(selectedVersion.Dependencies) == 0 { @@ -530,10 +537,6 @@ func (m *Manager) Install( return nil, fmt.Errorf("failed to save user config: %w", err) } - span.SetAttributes( - fields.ExtensionId.String(extension.Id), - fields.ExtensionVersion.String(selectedVersion.Version)) - log.Printf( "Extension '%s' (version %s) installed successfully to %s\n", extension.Id,