Skip to content

Telemetry: auth errors (login_required, not_logged_in) classified as 'unknown' error category #7233

@spboyer

Description

@spboyer

Problem

Auth-related errors from cmd.auth.token are correctly assigned meaningful result codes (auth.login_required, auth.not_logged_in, internal.azidentity_AuthenticationFailedError) but their error category (AzdErrorCategory in Kusto) remains "unknown" instead of being classified as "aad" or "auth".

This makes it impossible to filter auth errors by category in telemetry dashboards and inflates the "unknown" error bucket.

Telemetry Impact (rolling 28 days ending Mar 18, 2026)

Result Code Current Category Count Users
auth.login_required unknown 543,972 1,353
auth.not_logged_in unknown 65,643 1,233
internal.azidentity_AuthenticationFailedError unknown 7,258 65
service.aad.failed aad 12,246 658

610K+ auth failures per 28 days are miscategorized. These account for ~17% of all auth token errors.

Root Cause

In cli/azd/internal/cmd/errors.go, the MapError function handles auth errors in three separate code paths, but only one sets a service/category detail attribute:

  1. auth.ReLoginRequiredError (line ~53) — sets errCode = "auth.login_required" but does NOT append fields.ServiceName.String("aad") to errDetails
  2. auth.ErrNoCurrentUser sentinel (in classifySentinel) — returns "auth.not_logged_in" but sentinel handling does NOT set any errDetails at all
  3. auth.AuthFailedError (line ~133) — correctly appends fields.ServiceName.String("aad") to errDetails

The downstream Kusto ETL derives AzdErrorCategory from the ServiceName span attribute. When it is absent, the category defaults to "unknown".

Suggested Fix

Add fields.ServiceName.String("aad") (or a new fields.ErrCategory.String("auth")) to the errDetails for the ReLoginRequiredError branch, and add similar detail attribution for auth-related sentinels.

In MapError():

// Current (line ~53):
} else if _, ok := errors.AsType[*auth.ReLoginRequiredError](err); ok {
    errCode = "auth.login_required"

// Suggested:
} else if _, ok := errors.AsType[*auth.ReLoginRequiredError](err); ok {
    errCode = "auth.login_required"
    errDetails = append(errDetails, fields.ServiceName.String("aad"))

For sentinel-based auth errors, the fix would need to either:

  • (a) Return details alongside the code from classifySentinel, or
  • (b) Check for auth sentinels before falling through to classifySentinel in the main MapError function:
// Add before the classifySentinel fallthrough:
} else if errors.Is(err, auth.ErrNoCurrentUser) {
    errCode = "auth.not_logged_in"
    errDetails = append(errDetails, fields.ServiceName.String("aad"))

A similar pattern should be applied for azidentity errors that currently fall through to the generic internal.* catch-all.

Expected Result

After this fix, the Kusto AzdErrorCategory column would show:

Result Code Category (after fix)
auth.login_required aad
auth.not_logged_in aad
internal.azidentity_AuthenticationFailedError aad
service.aad.failed aad ✅ (unchanged)

This would remove ~610K errors/28d from the "unknown" bucket, making the overall error category distribution far more actionable for telemetry analysis.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions