Skip to content

Feature: Pre-flight auth validation for provision/deploy/up to prevent downstream failures #7234

@spboyer

Description

@spboyer

Problem

azd provision, azd deploy, and azd up do not validate auth state before starting work. When auth is expired or missing, users discover the failure after the command has already begun execution — sometimes minutes into a deployment. This is especially painful for azd up (31% success rate) where a compound provision+deploy operation fails late.

Telemetry Evidence (rolling 28 days)

The most common failure path is: user runs azd up → internally calls azd auth token → token fails → operation aborts.

Result Code Failures/28d Affected Users
auth.login_required 543,972 1,353
auth.not_logged_in 65,643 1,233
azidentity_AuthenticationFailedError 7,258 65

610K+ auth failures per 28 days that could be caught before any work begins.

AI agent environments are hit hardest — GitHub Copilot CLI sees a 61% auth token failure rate, and Claude Code hits 41%. These agents call azd auth token speculatively before establishing sessions.

Proposal: Auth Pre-Flight Check

Add an auth validation step at the beginning of commands that require Azure credentials (provision, deploy, up, down, monitor). The check would:

  1. Verify a current user exists — catch auth.ErrNoCurrentUser early
  2. Attempt a token acquisition — validate the credential is still valid (not just cached)
  3. Check token expiry — if the token expires within a configurable window (e.g., 5 minutes), warn the user
  4. Provide actionable guidance — instead of a generic failure, tell the user exactly what to do

Where to implement

The existing infrastructure supports this well:

  • ErrorHandlerPipeline (cli/azd/pkg/errorhandler/pipeline.go) already matches errors by type and pattern, and wraps them with ErrorWithSuggestion. Auth errors could be added to error_suggestions.yaml with clear fix instructions.
  • ErrorMiddleware (cli/azd/cmd/middleware/error.go) already classifies auth errors as UserContextError in classifyError(). A pre-flight middleware could run before the action, not just after.
  • Middleware pattern — azd uses a middleware chain. A new AuthPreFlightMiddleware could slot in before the action runs:
// Conceptual — new middleware that runs before the action
type AuthPreFlightMiddleware struct {
    authManager auth.Manager
    console     input.Console
}

func (m *AuthPreFlightMiddleware) Run(ctx context.Context, next NextFn) (*actions.ActionResult, error) {
    // Quick check: is anyone logged in?
    currentUser, err := m.authManager.GetLoggedInUser(ctx)
    if errors.Is(err, auth.ErrNoCurrentUser) {
        return nil, &internal.ErrorWithSuggestion{
            Err:        err,
            Suggestion: "Run 'azd auth login' to sign in before running this command.",
        }
    }

    // Quick check: can we acquire a token?
    _, err = m.authManager.GetToken(ctx)
    if err != nil {
        if _, ok := errors.AsType[*auth.ReLoginRequiredError](err); ok {
            return nil, &internal.ErrorWithSuggestion{
                Err:        err,
                Suggestion: "Your session has expired. Run 'azd auth login' to re-authenticate.",
            }
        }
    }

    // Auth is good — proceed with the real command
    return next(ctx)
}

For AI agents specifically

AI agents (Claude Code, GitHub Copilot CLI, OpenCode) would benefit from:

  1. Non-interactive token checkazd auth token --check (exit 0 if valid, exit 1 if not, no output). Agents can call this before running expensive commands.
  2. Machine-readable auth statusazd auth status --output json returning {"logged_in": true, "expires_at": "2026-03-21T22:00:00Z", "tenant": "...", "user": "..."}. Agents can parse this to decide whether to prompt for login.
  3. Structured error codes in stderr — when auth fails, emit a JSON line to stderr with the error code so agents can programmatically handle it without parsing error messages.

Expected Impact

  • Pre-flight catch: Prevents ~610K failed auth token calls per 28 days from cascading into provision/deploy failures
  • Better UX: Users see "Please run azd auth login" immediately, not after waiting for Bicep compilation or Docker builds
  • Agent-friendly: --check flag lets agents validate auth state cheaply before running expensive operations
  • Telemetry clarity: Pre-flight failures would get their own result code (e.g., preflight.auth.not_logged_in), separating preventable errors from genuine runtime auth failures

Related

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions