Pin cfg.Profile to the resolved name in resolveDefaultProfile#5280
Pin cfg.Profile to the resolved name in resolveDefaultProfile#5280simonfaltum wants to merge 9 commits into
Conversation
When `--profile` and `DATABRICKS_CONFIG_PROFILE` are unset, the SDK's configFileLoader silently loads the [DEFAULT] section's values into cfg but deliberately leaves cfg.Profile empty (config_file.go isFallback branch). This produces a host-URL OAuth cache key on the read path, while `databricks auth login` stores tokens under the literal profile name "DEFAULT" (cmd/auth/login.go hardcodes that when no profile is given). The two cache keys never match. plaintext mode hid this with the DualWritingTokenCache wrapper, which mirrors every write under the host key, so reads via host URL succeeded anyway. secure mode does not dual-write, so the same code path surfaces as "cache: token not found" on a fresh keyring or an InvalidRefreshTokenError when a stale host-key entry happens to be present. Pieter Noordhuis reproduced this on dogfood: `auth login --profile DEFAULT --host ...` followed by a no-flag `auth describe` fails, but `auth describe --profile DEFAULT` (or running the same flow under DATABRICKS_AUTH_STORAGE=plaintext) succeeds. The fix pins cfg.Profile before the SDK loader runs by swapping the settings-only ResolveDefaultProfile call for GetDefaultProfile, which already does the 4-step resolution: [__settings__].default_profile, then the only profile in the file, then [DEFAULT], else empty. The SDK then sees a non-empty cfg.Profile and uses it explicitly (isFallback=false), so cfg.Profile stays set through to CLICredentials.Configure and the OAuthArgument's cache key matches what login wrote. cmd/root/bundle.go is intentionally not touched: bundles deliberately limit their fallback to [__settings__].default_profile to avoid silently routing a hostless bundle at a [DEFAULT] profile that points at the wrong workspace. The existing TestMustWorkspaceClientWithoutConfiguredDefaultFallsBackToDefaultSection now asserts cfg.Profile == "DEFAULT" (was ""), which documents the new contract. New table-driven TestResolveDefaultProfile covers the full resolution order (preset cfg.Profile, env var, settings, single profile, [DEFAULT], no fallback, missing file). A defense-in-depth followup in databricks-sdk-go is planned to drop the SDK-side `if !isFallback` gate so any SDK consumer benefits from the same self-consistency; the CLI fix lands first so secure-storage users are unblocked without waiting on an SDK release cycle. Co-authored-by: Isaac
Two follow-up fixes to PR #5280 after CI surfaced issues: 1. govet copylocks: the new TestResolveDefaultProfile table held a config.Config value and the per-case loop ranged over it, copying the embedded sync.Mutex. Hold the preset profile as a string field instead and construct cfg := &config.Config{Profile: ...} per case. 2. TestMustAnyClientCanCreateAccountClient regressed. The test fixture is a single account-only profile. Previously cfg.Profile stayed empty through resolveDefaultProfile (no settings) so MustWorkspaceClient ended up with no host, AskForWorkspaceProfile returned ErrNoWorkspaceProfiles (the profile is account-only), and MustAnyClient fell through to MustAccountClient. My initial fix used databrickscfg.GetDefaultProfile, whose 4-step resolution includes "if there is exactly one profile in the file, return it" and so picked up the account profile, then NewWorkspaceClient succeeded (SDK v0.125+ no longer rejects on host type) and the test got a workspace client back instead of an account client. The single-profile fallback is a CLI prompt-seeding convenience, not auth-resolution rule. The SDK's resolveProfile only does settings + [DEFAULT]; the CLI's resolveDefaultProfile should mirror that exactly. Add databrickscfg.GetAuthDefaultProfile, scoped to settings + [DEFAULT] only, and use it in resolveDefaultProfile. GetDefaultProfile is left unchanged for its existing callers (logout, switch, token, login, profiles_test) which use it for "show this as the default in a prompt" and where single-profile fallback is desirable. New table-driven TestGetAuthDefaultProfile and TestGetAuthDefaultProfile_NoFile cover the new function. Co-authored-by: Isaac
…e-key' into simonfaltum/default-profile-cache-key
Single-profile fallback is no longer applied (see prior commit). Co-authored-by: Isaac
The function/test comments were narrating the full PR description. Trim to the load-bearing parts: what the function does, the one non-obvious invariant (cache-key match with login), and the bundle exception. The longer story stays in the PR description and commit log. Co-authored-by: Isaac
| return ini.DefaultSection, nil | ||
| } | ||
| return "", nil | ||
| } |
There was a problem hiding this comment.
Why is this not part of ResolveDefaultProfile?
Then the function in auth.go can remain unchanged.
The only thing this does extra is the ini probe.
There was a problem hiding this comment.
ResolveDefaultProfile is also called from cmd/root/bundle.go:80, which deliberately only does the settings lookup; see the comment immediately above the call:
Fall back to [settings].default_profile only when the bundle does not pin its own workspace.host. ... a default_profile here could route the user to a profile that points at the wrong workspace, masking config errors.
So folding the [DEFAULT] section probe into ResolveDefaultProfile changes the bundle's deploy resolution: a hostless bundle would now silently fall back to whatever [DEFAULT] points at. That's what the bundle comment was added to prevent.
Two ways to do what you're suggesting:
- Keep ResolveDefaultProfile as-is and let auth use a separate function (current PR).
- Promote the new behavior into ResolveDefaultProfile and switch bundle to GetConfiguredDefaultProfile (or a new narrow wrapper) directly.
Happy with either. If you prefer (2), I can rename GetAuthDefaultProfile -> ResolveDefaultProfile, drop the current ResolveDefaultProfile, and update bundle.go. Let me know.
There was a problem hiding this comment.
2 sounds cleanest.
Btw, there are other callers of ResolveDefaultProfile that need to benefit from this change.
For ex, otherwise auth token would not resolve DEFAULT either.
There was a problem hiding this comment.
auth token afaik never did, you always needed to be specific on what you wanted - at least that is the behavior today
There was a problem hiding this comment.
Done in c11c17c. Extended ResolveDefaultProfile to also probe [DEFAULT], dropped GetAuthDefaultProfile, reverted resolveDefaultProfile in cmd/root/auth.go to the original one-liner. Every existing caller (cmd/api, cmd/auth/token, cmd/auth/describe via cmd/root/auth, cmd/root/bundle) now benefits — same scope of fallback that already accepts [settings].default_profile. Net -61 lines.
Per Pieter's review: instead of a separate GetAuthDefaultProfile, extend
the existing ResolveDefaultProfile to do the [DEFAULT] section probe
when [__settings__].default_profile is unset. Every caller that already
accepts default_profile (cmd/api, cmd/auth/token, cmd/auth/describe via
cmd/root/auth, cmd/root/bundle) now also benefits from the [DEFAULT]
fallback. By Simon's principle: same scope, same risk profile — if
default_profile is OK here, so is [DEFAULT].
Bundle keeps the same call site; the existing safeguard ("only if the
bundle does not pin workspace.host") still applies and now covers
[DEFAULT] too.
Drop GetAuthDefaultProfile, its tests, and the special case in
cmd/root/auth.go's resolveDefaultProfile (which goes back to a thin
wrapper around ResolveDefaultProfile).
Update the changelog entry to name the actual set of beneficiaries.
Co-authored-by: Isaac
Two follow-ups to the consolidation:
1. cmd/root/bundle.go now uses GetConfiguredDefaultProfile so the
bundle path stays strictly on [__settings__].default_profile, even
though ResolveDefaultProfile is now broader. A hostless bundle
silently routing to whatever [DEFAULT] points at would mask a
missing workspace.host. Updated the inline comment to reflect that
the [DEFAULT] fallback is auth-specific. Matches Pieter's preference
and the GPT review's concern.
2. cmd/root/auth.go and cmd/root/auth_test.go are back to having no
functional changes in this PR. The fallback's behavior change lives
entirely in libs/databrickscfg.ResolveDefaultProfile. Dropped the
doc-comment addition that explained the OAuth cache-key story (it
belongs on ResolveDefaultProfile, where it already is) and the new
TestResolveDefaultProfile (its useful cases duplicate
libs/databrickscfg/ops_test.go; the wrapper itself is four lines).
The only remaining cmd/root/auth_test.go change is the one
load-bearing assertion in
TestMustWorkspaceClientWithoutConfiguredDefaultFallsBackToDefaultSection
("" -> "DEFAULT") that documents the new contract.
NEXT_CHANGELOG.md: clarified that bundle commands intentionally stay
on settings-only.
Co-authored-by: Isaac
Why
databricks auth login --profile DEFAULT --host ...followed by a no-flagdatabricks auth describe(or any other command that needs the U2M token) fails when secure storage is in use:databricks auth describe --profile DEFAULTworks. Running the same flow underDATABRICKS_AUTH_STORAGE=plaintextalso works. So the bug is specific to secure storage + the implicit DEFAULT fallback.Root cause is a cache-key mismatch between login and read:
cmd/auth/login.go:222hardcodesprofileName = "DEFAULT"when no--profileis given, so the OAuthArgument's cache key is the literal string"DEFAULT". The token lands in the keyring under account"DEFAULT".cfg.Profilestarts empty,resolveDefaultProfileonly consults[__settings__].default_profile(so it stays empty), and the SDK'sconfigFileLoader.Configure(config_file.go:103-105) loads[DEFAULT]'s values but deliberately leavescfg.Profileempty when it falls back (isFallback=true).CLICredentials.Configurethen builds an OAuthArgument withprofile="", soGetCacheKey()falls back toGetHostCacheKey()and the lookup goes to the host URL, not"DEFAULT". Miss.plaintext mode masks the same mismatch with
DualWritingTokenCache, which mirrors every write under the host key — so reads via host URL still find the token. secure mode does not dual-write, so the bug surfaces.This is a pre-existing bug independent of toggling secure-storage by default, but doing so turns a corner case into the default experience. The fix here is targeted enough to land standalone.
A defense-in-depth followup in
databricks-sdk-gowill drop the SDK-sideif !isFallbackgate so all SDK consumers benefit from the same self-consistency. The CLI fix lands first so secure-storage users are unblocked without waiting on an SDK release cycle.Changes
cmd/root/auth.go:resolveDefaultProfileswapsdatabrickscfg.ResolveDefaultProfile(settings-only) fordatabrickscfg.GetDefaultProfile, which already does the full 4-step resolution:[__settings__].default_profile→ the only profile in the file →[DEFAULT]→ empty. The SDK then sees a non-emptycfg.Profile, takes theisFallback=falsebranch, and the name flows through toCLICredentials.Configure. OAuthArgument's cache key now matches what login wrote.cmd/root/bundle.gois intentionally NOT touched: bundles deliberately limit their fallback to[__settings__].default_profileso a hostless bundle does not get silently routed at a[DEFAULT]profile pointing at the wrong workspace. That comment inbundle.go:74-80stays load-bearing.cmd/root/auth_test.go:TestMustWorkspaceClientWithoutConfiguredDefaultFallsBackToDefaultSectionnow assertscfg.Profile == "DEFAULT"(was""). The previous assertion documented the bug; the new one documents the contract.TestResolveDefaultProfilecovers the full resolution order: presetcfg.Profile,DATABRICKS_CONFIG_PROFILEenv,[__settings__].default_profile, single profile,[DEFAULT]section among many, no fallback, missing file.NEXT_CHANGELOG.md: one-line entry describing the fix and the mismatch it removes.Test plan
task checkscleantask lint-qcleango test ./cmd/root/... ./cmd/auth/... ./libs/databrickscfg/...passesgo test ./acceptance -run 'TestAccept/cmd/auth'passesauth login --profile DEFAULT --host ...thenauth describewith no flag under secure storage) succeeds after this PR; the same flow onmainfails.workspace.hostand no--profilestill uses[__settings__].default_profileonly (no silent DEFAULT routing).This pull request and its description were written by Isaac.