Skip to content

Pin cfg.Profile to the resolved name in resolveDefaultProfile#5280

Open
simonfaltum wants to merge 9 commits into
mainfrom
simonfaltum/default-profile-cache-key
Open

Pin cfg.Profile to the resolved name in resolveDefaultProfile#5280
simonfaltum wants to merge 9 commits into
mainfrom
simonfaltum/default-profile-cache-key

Conversation

@simonfaltum
Copy link
Copy Markdown
Member

@simonfaltum simonfaltum commented May 20, 2026

Why

databricks auth login --profile DEFAULT --host ... followed by a no-flag databricks auth describe (or any other command that needs the U2M token) fails when secure storage is in use:

Unable to authenticate: A new access token could not be retrieved because the refresh token is invalid.

databricks auth describe --profile DEFAULT works. Running the same flow under DATABRICKS_AUTH_STORAGE=plaintext also works. So the bug is specific to secure storage + the implicit DEFAULT fallback.

Root cause is a cache-key mismatch between login and read:

  • cmd/auth/login.go:222 hardcodes profileName = "DEFAULT" when no --profile is given, so the OAuthArgument's cache key is the literal string "DEFAULT". The token lands in the keyring under account "DEFAULT".
  • On the read path, cfg.Profile starts empty, resolveDefaultProfile only consults [__settings__].default_profile (so it stays empty), and the SDK's configFileLoader.Configure (config_file.go:103-105) loads [DEFAULT]'s values but deliberately leaves cfg.Profile empty when it falls back (isFallback=true). CLICredentials.Configure then builds an OAuthArgument with profile="", so GetCacheKey() falls back to GetHostCacheKey() and the lookup goes to the host URL, not "DEFAULT". Miss.

plaintext mode masks the same mismatch with DualWritingTokenCache, which mirrors every write under the host key — so reads via host URL still find the token. secure mode does not dual-write, so the bug surfaces.

This is a pre-existing bug independent of toggling secure-storage by default, but doing so turns a corner case into the default experience. The fix here is targeted enough to land standalone.

A defense-in-depth followup in databricks-sdk-go will drop the SDK-side if !isFallback gate so all SDK consumers benefit from the same self-consistency. The CLI fix lands first so secure-storage users are unblocked without waiting on an SDK release cycle.

Changes

  • cmd/root/auth.go: resolveDefaultProfile swaps databrickscfg.ResolveDefaultProfile (settings-only) for databrickscfg.GetDefaultProfile, which already does the full 4-step resolution: [__settings__].default_profile → the only profile in the file → [DEFAULT] → empty. The SDK then sees a non-empty cfg.Profile, takes the isFallback=false branch, and the name flows through to CLICredentials.Configure. OAuthArgument's cache key now matches what login wrote.
  • cmd/root/bundle.go is intentionally NOT touched: bundles deliberately limit their fallback to [__settings__].default_profile so a hostless bundle does not get silently routed at a [DEFAULT] profile pointing at the wrong workspace. That comment in bundle.go:74-80 stays load-bearing.
  • cmd/root/auth_test.go:
    • TestMustWorkspaceClientWithoutConfiguredDefaultFallsBackToDefaultSection now asserts cfg.Profile == "DEFAULT" (was ""). The previous assertion documented the bug; the new one documents the contract.
    • New table-driven TestResolveDefaultProfile covers the full resolution order: preset cfg.Profile, DATABRICKS_CONFIG_PROFILE env, [__settings__].default_profile, single profile, [DEFAULT] section among many, no fallback, missing file.
  • NEXT_CHANGELOG.md: one-line entry describing the fix and the mismatch it removes.

Test plan

  • task checks clean
  • task lint-q clean
  • go test ./cmd/root/... ./cmd/auth/... ./libs/databrickscfg/... passes
  • go test ./acceptance -run 'TestAccept/cmd/auth' passes
  • Manual repro of Pieter's case (auth login --profile DEFAULT --host ... then auth describe with no flag under secure storage) succeeds after this PR; the same flow on main fails.
  • Verify bundle resolution is unaffected: a bundle without workspace.host and no --profile still uses [__settings__].default_profile only (no silent DEFAULT routing).

This pull request and its description were written by Isaac.

When `--profile` and `DATABRICKS_CONFIG_PROFILE` are unset, the SDK's
configFileLoader silently loads the [DEFAULT] section's values into cfg
but deliberately leaves cfg.Profile empty (config_file.go isFallback
branch). This produces a host-URL OAuth cache key on the read path,
while `databricks auth login` stores tokens under the literal profile
name "DEFAULT" (cmd/auth/login.go hardcodes that when no profile is
given). The two cache keys never match.

plaintext mode hid this with the DualWritingTokenCache wrapper, which
mirrors every write under the host key, so reads via host URL succeeded
anyway. secure mode does not dual-write, so the same code path surfaces
as "cache: token not found" on a fresh keyring or an
InvalidRefreshTokenError when a stale host-key entry happens to be
present. Pieter Noordhuis reproduced this on dogfood: `auth login
--profile DEFAULT --host ...` followed by a no-flag `auth describe`
fails, but `auth describe --profile DEFAULT` (or running the same flow
under DATABRICKS_AUTH_STORAGE=plaintext) succeeds.

The fix pins cfg.Profile before the SDK loader runs by swapping the
settings-only ResolveDefaultProfile call for GetDefaultProfile, which
already does the 4-step resolution: [__settings__].default_profile,
then the only profile in the file, then [DEFAULT], else empty. The
SDK then sees a non-empty cfg.Profile and uses it explicitly
(isFallback=false), so cfg.Profile stays set through to
CLICredentials.Configure and the OAuthArgument's cache key matches
what login wrote.

cmd/root/bundle.go is intentionally not touched: bundles deliberately
limit their fallback to [__settings__].default_profile to avoid
silently routing a hostless bundle at a [DEFAULT] profile that points
at the wrong workspace.

The existing
TestMustWorkspaceClientWithoutConfiguredDefaultFallsBackToDefaultSection
now asserts cfg.Profile == "DEFAULT" (was ""), which documents the new
contract. New table-driven TestResolveDefaultProfile covers the full
resolution order (preset cfg.Profile, env var, settings, single
profile, [DEFAULT], no fallback, missing file).

A defense-in-depth followup in databricks-sdk-go is planned to drop the
SDK-side `if !isFallback` gate so any SDK consumer benefits from the
same self-consistency; the CLI fix lands first so secure-storage users
are unblocked without waiting on an SDK release cycle.

Co-authored-by: Isaac
Two follow-up fixes to PR #5280 after CI surfaced issues:

1. govet copylocks: the new TestResolveDefaultProfile table held a
   config.Config value and the per-case loop ranged over it, copying
   the embedded sync.Mutex. Hold the preset profile as a string field
   instead and construct cfg := &config.Config{Profile: ...} per case.

2. TestMustAnyClientCanCreateAccountClient regressed. The test fixture
   is a single account-only profile. Previously cfg.Profile stayed
   empty through resolveDefaultProfile (no settings) so MustWorkspaceClient
   ended up with no host, AskForWorkspaceProfile returned
   ErrNoWorkspaceProfiles (the profile is account-only), and MustAnyClient
   fell through to MustAccountClient. My initial fix used
   databrickscfg.GetDefaultProfile, whose 4-step resolution includes
   "if there is exactly one profile in the file, return it" and so
   picked up the account profile, then NewWorkspaceClient succeeded
   (SDK v0.125+ no longer rejects on host type) and the test got a
   workspace client back instead of an account client.

   The single-profile fallback is a CLI prompt-seeding convenience,
   not auth-resolution rule. The SDK's resolveProfile only does
   settings + [DEFAULT]; the CLI's resolveDefaultProfile should mirror
   that exactly.

Add databrickscfg.GetAuthDefaultProfile, scoped to settings + [DEFAULT]
only, and use it in resolveDefaultProfile. GetDefaultProfile is left
unchanged for its existing callers (logout, switch, token, login,
profiles_test) which use it for "show this as the default in a prompt"
and where single-profile fallback is desirable. New table-driven
TestGetAuthDefaultProfile and TestGetAuthDefaultProfile_NoFile cover
the new function.

Co-authored-by: Isaac
…e-key' into simonfaltum/default-profile-cache-key
Single-profile fallback is no longer applied (see prior commit).

Co-authored-by: Isaac
The function/test comments were narrating the full PR description.
Trim to the load-bearing parts: what the function does, the one
non-obvious invariant (cache-key match with login), and the bundle
exception. The longer story stays in the PR description and commit
log.

Co-authored-by: Isaac
Comment thread libs/databrickscfg/ops.go Outdated
return ini.DefaultSection, nil
}
return "", nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not part of ResolveDefaultProfile?

Then the function in auth.go can remain unchanged.

The only thing this does extra is the ini probe.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResolveDefaultProfile is also called from cmd/root/bundle.go:80, which deliberately only does the settings lookup; see the comment immediately above the call:

Fall back to [settings].default_profile only when the bundle does not pin its own workspace.host. ... a default_profile here could route the user to a profile that points at the wrong workspace, masking config errors.

So folding the [DEFAULT] section probe into ResolveDefaultProfile changes the bundle's deploy resolution: a hostless bundle would now silently fall back to whatever [DEFAULT] points at. That's what the bundle comment was added to prevent.

Two ways to do what you're suggesting:

  1. Keep ResolveDefaultProfile as-is and let auth use a separate function (current PR).
  2. Promote the new behavior into ResolveDefaultProfile and switch bundle to GetConfiguredDefaultProfile (or a new narrow wrapper) directly.

Happy with either. If you prefer (2), I can rename GetAuthDefaultProfile -> ResolveDefaultProfile, drop the current ResolveDefaultProfile, and update bundle.go. Let me know.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 sounds cleanest.

Btw, there are other callers of ResolveDefaultProfile that need to benefit from this change.

For ex, otherwise auth token would not resolve DEFAULT either.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auth token afaik never did, you always needed to be specific on what you wanted - at least that is the behavior today

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in c11c17c. Extended ResolveDefaultProfile to also probe [DEFAULT], dropped GetAuthDefaultProfile, reverted resolveDefaultProfile in cmd/root/auth.go to the original one-liner. Every existing caller (cmd/api, cmd/auth/token, cmd/auth/describe via cmd/root/auth, cmd/root/bundle) now benefits — same scope of fallback that already accepts [settings].default_profile. Net -61 lines.

simonfaltum and others added 3 commits May 20, 2026 21:32
Per Pieter's review: instead of a separate GetAuthDefaultProfile, extend
the existing ResolveDefaultProfile to do the [DEFAULT] section probe
when [__settings__].default_profile is unset. Every caller that already
accepts default_profile (cmd/api, cmd/auth/token, cmd/auth/describe via
cmd/root/auth, cmd/root/bundle) now also benefits from the [DEFAULT]
fallback. By Simon's principle: same scope, same risk profile — if
default_profile is OK here, so is [DEFAULT].

Bundle keeps the same call site; the existing safeguard ("only if the
bundle does not pin workspace.host") still applies and now covers
[DEFAULT] too.

Drop GetAuthDefaultProfile, its tests, and the special case in
cmd/root/auth.go's resolveDefaultProfile (which goes back to a thin
wrapper around ResolveDefaultProfile).

Update the changelog entry to name the actual set of beneficiaries.

Co-authored-by: Isaac
Two follow-ups to the consolidation:

1. cmd/root/bundle.go now uses GetConfiguredDefaultProfile so the
   bundle path stays strictly on [__settings__].default_profile, even
   though ResolveDefaultProfile is now broader. A hostless bundle
   silently routing to whatever [DEFAULT] points at would mask a
   missing workspace.host. Updated the inline comment to reflect that
   the [DEFAULT] fallback is auth-specific. Matches Pieter's preference
   and the GPT review's concern.

2. cmd/root/auth.go and cmd/root/auth_test.go are back to having no
   functional changes in this PR. The fallback's behavior change lives
   entirely in libs/databrickscfg.ResolveDefaultProfile. Dropped the
   doc-comment addition that explained the OAuth cache-key story (it
   belongs on ResolveDefaultProfile, where it already is) and the new
   TestResolveDefaultProfile (its useful cases duplicate
   libs/databrickscfg/ops_test.go; the wrapper itself is four lines).
   The only remaining cmd/root/auth_test.go change is the one
   load-bearing assertion in
   TestMustWorkspaceClientWithoutConfiguredDefaultFallsBackToDefaultSection
   ("" -> "DEFAULT") that documents the new contract.

NEXT_CHANGELOG.md: clarified that bundle commands intentionally stay
on settings-only.

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants