-
Notifications
You must be signed in to change notification settings - Fork 281
Description
Problem
Auth-related errors from cmd.auth.token are correctly assigned meaningful result codes (auth.login_required, auth.not_logged_in, internal.azidentity_AuthenticationFailedError) but their error category (AzdErrorCategory in Kusto) remains "unknown" instead of being classified as "aad" or "auth".
This makes it impossible to filter auth errors by category in telemetry dashboards and inflates the "unknown" error bucket.
Telemetry Impact (rolling 28 days ending Mar 18, 2026)
| Result Code | Current Category | Count | Users |
|---|---|---|---|
auth.login_required |
unknown ❌ | 543,972 | 1,353 |
auth.not_logged_in |
unknown ❌ | 65,643 | 1,233 |
internal.azidentity_AuthenticationFailedError |
unknown ❌ | 7,258 | 65 |
service.aad.failed |
aad ✅ | 12,246 | 658 |
610K+ auth failures per 28 days are miscategorized. These account for ~17% of all auth token errors.
Root Cause
In cli/azd/internal/cmd/errors.go, the MapError function handles auth errors in three separate code paths, but only one sets a service/category detail attribute:
auth.ReLoginRequiredError(line ~53) — setserrCode = "auth.login_required"but does NOT appendfields.ServiceName.String("aad")toerrDetailsauth.ErrNoCurrentUsersentinel (inclassifySentinel) — returns"auth.not_logged_in"but sentinel handling does NOT set anyerrDetailsat allauth.AuthFailedError(line ~133) — correctly appendsfields.ServiceName.String("aad")toerrDetails✅
The downstream Kusto ETL derives AzdErrorCategory from the ServiceName span attribute. When it is absent, the category defaults to "unknown".
Suggested Fix
Add fields.ServiceName.String("aad") (or a new fields.ErrCategory.String("auth")) to the errDetails for the ReLoginRequiredError branch, and add similar detail attribution for auth-related sentinels.
In MapError():
// Current (line ~53):
} else if _, ok := errors.AsType[*auth.ReLoginRequiredError](err); ok {
errCode = "auth.login_required"
// Suggested:
} else if _, ok := errors.AsType[*auth.ReLoginRequiredError](err); ok {
errCode = "auth.login_required"
errDetails = append(errDetails, fields.ServiceName.String("aad"))For sentinel-based auth errors, the fix would need to either:
- (a) Return details alongside the code from
classifySentinel, or - (b) Check for auth sentinels before falling through to
classifySentinelin the mainMapErrorfunction:
// Add before the classifySentinel fallthrough:
} else if errors.Is(err, auth.ErrNoCurrentUser) {
errCode = "auth.not_logged_in"
errDetails = append(errDetails, fields.ServiceName.String("aad"))A similar pattern should be applied for azidentity errors that currently fall through to the generic internal.* catch-all.
Expected Result
After this fix, the Kusto AzdErrorCategory column would show:
| Result Code | Category (after fix) |
|---|---|
auth.login_required |
aad ✅ |
auth.not_logged_in |
aad ✅ |
internal.azidentity_AuthenticationFailedError |
aad ✅ |
service.aad.failed |
aad ✅ (unchanged) |
This would remove ~610K errors/28d from the "unknown" bucket, making the overall error category distribution far more actionable for telemetry analysis.