-
Notifications
You must be signed in to change notification settings - Fork 281
Description
Problem
azd provision, azd deploy, and azd up do not validate auth state before starting work. When auth is expired or missing, users discover the failure after the command has already begun execution — sometimes minutes into a deployment. This is especially painful for azd up (31% success rate) where a compound provision+deploy operation fails late.
Telemetry Evidence (rolling 28 days)
The most common failure path is: user runs azd up → internally calls azd auth token → token fails → operation aborts.
| Result Code | Failures/28d | Affected Users |
|---|---|---|
auth.login_required |
543,972 | 1,353 |
auth.not_logged_in |
65,643 | 1,233 |
azidentity_AuthenticationFailedError |
7,258 | 65 |
610K+ auth failures per 28 days that could be caught before any work begins.
AI agent environments are hit hardest — GitHub Copilot CLI sees a 61% auth token failure rate, and Claude Code hits 41%. These agents call azd auth token speculatively before establishing sessions.
Proposal: Auth Pre-Flight Check
Add an auth validation step at the beginning of commands that require Azure credentials (provision, deploy, up, down, monitor). The check would:
- Verify a current user exists — catch
auth.ErrNoCurrentUserearly - Attempt a token acquisition — validate the credential is still valid (not just cached)
- Check token expiry — if the token expires within a configurable window (e.g., 5 minutes), warn the user
- Provide actionable guidance — instead of a generic failure, tell the user exactly what to do
Where to implement
The existing infrastructure supports this well:
ErrorHandlerPipeline(cli/azd/pkg/errorhandler/pipeline.go) already matches errors by type and pattern, and wraps them withErrorWithSuggestion. Auth errors could be added toerror_suggestions.yamlwith clear fix instructions.ErrorMiddleware(cli/azd/cmd/middleware/error.go) already classifies auth errors asUserContextErrorinclassifyError(). A pre-flight middleware could run before the action, not just after.- Middleware pattern — azd uses a middleware chain. A new
AuthPreFlightMiddlewarecould slot in before the action runs:
// Conceptual — new middleware that runs before the action
type AuthPreFlightMiddleware struct {
authManager auth.Manager
console input.Console
}
func (m *AuthPreFlightMiddleware) Run(ctx context.Context, next NextFn) (*actions.ActionResult, error) {
// Quick check: is anyone logged in?
currentUser, err := m.authManager.GetLoggedInUser(ctx)
if errors.Is(err, auth.ErrNoCurrentUser) {
return nil, &internal.ErrorWithSuggestion{
Err: err,
Suggestion: "Run 'azd auth login' to sign in before running this command.",
}
}
// Quick check: can we acquire a token?
_, err = m.authManager.GetToken(ctx)
if err != nil {
if _, ok := errors.AsType[*auth.ReLoginRequiredError](err); ok {
return nil, &internal.ErrorWithSuggestion{
Err: err,
Suggestion: "Your session has expired. Run 'azd auth login' to re-authenticate.",
}
}
}
// Auth is good — proceed with the real command
return next(ctx)
}For AI agents specifically
AI agents (Claude Code, GitHub Copilot CLI, OpenCode) would benefit from:
- Non-interactive token check —
azd auth token --check(exit 0 if valid, exit 1 if not, no output). Agents can call this before running expensive commands. - Machine-readable auth status —
azd auth status --output jsonreturning{"logged_in": true, "expires_at": "2026-03-21T22:00:00Z", "tenant": "...", "user": "..."}. Agents can parse this to decide whether to prompt for login. - Structured error codes in stderr — when auth fails, emit a JSON line to stderr with the error code so agents can programmatically handle it without parsing error messages.
Expected Impact
- Pre-flight catch: Prevents ~610K failed
auth tokencalls per 28 days from cascading into provision/deploy failures - Better UX: Users see "Please run
azd auth login" immediately, not after waiting for Bicep compilation or Docker builds - Agent-friendly:
--checkflag lets agents validate auth state cheaply before running expensive operations - Telemetry clarity: Pre-flight failures would get their own result code (e.g.,
preflight.auth.not_logged_in), separating preventable errors from genuine runtime auth failures
Related
- Telemetry: auth errors (login_required, not_logged_in) classified as 'unknown' error category #7233 — Auth error category classification (fixes the telemetry labeling for these same errors)