fix(litellm): forgetting api keys on restart#436
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
(1) Stop clobbering provider keys on every up —
internal/embed/infrastructure/base/templates/llm.yaml: the
litellm-secrets Secret no longer pre-declares
ANTHROPIC_API_KEY: "" / OPENAI_API_KEY: "". The bootstrap
only owns LITELLM_MASTER_KEY. envFrom: secretRef doesn't
require those keys to exist; LiteLLM just won't see them
until obol model setup (or auto-config) patches them in.
Added a regression test in stack_test.go so a future
copy-paste can't reintroduce the empty placeholders.
(3) Self-heal in auto-config — autoDetectCloudProvider
(internal/stack/stack.go:633-642): replaced the
ConfigMap-only HasProviderConfigured check with
GetProviderStatus, and skip only when the provider is both
Enabled (ConfigMap entry) and has HasAPIKey (Secret value
present). If the entry exists but the key is missing —
exactly your symptom — auto-config falls through and
re-patches from the env var. So even if drift recurs from
some other path, the next obol stack up heals it instead of
leaving you in the misleading enabled: true, api_key: false
state.
Transitional caveat for users on the upgrade path: kubectl's
three-way merge will, on the first up after this change, see
that ANTHROPIC_API_KEY was previously a managed empty field
and remove it — wiping any key currently set. Fix (3) makes
that self-heal automatically (provided ANTHROPIC_API_KEY /
CLAUDE_CODE_OAUTH_TOKEN is in the shell env at up-time), so
the user-visible impact should be nil. After that one up, the
field stops being managed and persists forever.
go build ./... and go test ./internal/stack/...
./internal/model/... ./cmd/obol/ all pass.