Skip to content

fix(setup): escape API key in config.toml so it survives relaunch (#288)#293

Merged
emal-avala merged 2 commits into
mainfrom
fix/issue-288-setup-api-key-escaping
Jun 1, 2026
Merged

fix(setup): escape API key in config.toml so it survives relaunch (#288)#293
emal-avala merged 2 commits into
mainfrom
fix/issue-288-setup-api-key-escaping

Conversation

@emal-avala
Copy link
Copy Markdown
Member

Fixes #288.

Problem

The reporter set up an API key (via the "Other"/custom OpenAI-compatible provider), it worked for that session, but every relaunch "forgot" the key from config.toml — only setting AGENT_CODE_API_KEY in the environment worked.

Root cause

The load path is fine — I verified end-to-end that Config::load() reads a setup-written api_key back through the migration + merge pipeline. The bug is on the write side.

crates/cli/src/ui/setup.rs::write_config hand-formatted the TOML:

let api_key_line = format!("api_key = \"{}\"\n", result.api_key);

That does not escape the value, so a key containing a backslash or a double quote is silently broken on the next load:

pasted key reloaded value
sk-normaltoken-123 sk-normaltoken-123
sk-with\backslash sk-with␈ackslash ✗ (\b → backspace)
sk-with"quote parse error — whole file unparseable ✗

In all the broken cases setup still looked successful (and the in-process env var set during the wizard kept the current session working), so the failure only surfaced on the next launch — exactly the reported symptom. Such keys are common on the "Other"/custom endpoints the reporter selected.

Fix

write_config now:

  1. Serializes through the toml crate (render_config_toml), so every value — the API key in particular — is escaped correctly.
  2. Writes atomically with owner-only 0600 permissions via atomic_write_secret (the file holds a secret; this matches how the rest of the codebase — onboarding, config_tool, migrations — already treats config.toml). Existing files keep their mode.
  3. Surfaces save failures to the user instead of swallowing them with let _ =, so a failed save no longer masquerades as a successful setup.

Tests

Added unit tests in ui::setup::tests covering:

  • plain key round-trips
  • key with \ survives (the silent-corruption case)
  • key with " survives (the unparseable case)
  • other sections (base_url, model, permissions, ui) preserved
  • empty key and the ollama sentinel are omitted
running 6 tests
test ui::setup::tests::plain_key_round_trips ... ok
test ui::setup::tests::key_with_backslash_survives ... ok
test ui::setup::tests::key_with_quote_survives ... ok
test ui::setup::tests::other_sections_are_preserved ... ok
test ui::setup::tests::empty_key_is_omitted ... ok
test ui::setup::tests::ollama_sentinel_key_is_omitted ... ok

cargo fmt --check and cargo clippy are clean.

🤖 Generated with Claude Code

The first-run setup wizard wrote config.toml by hand-formatting the TOML
(`format!("api_key = \"{}\"", key)`). That does not escape the value, so:

  - a key containing `\` was silently mangled on reload (`\b` decoded to
    a backspace), and
  - a key containing `"` made config.toml unparseable.

Either way the key was effectively "forgotten" on the next launch even
though setup looked successful — and the in-process env var set during
the wizard masked it for the current session. This is common on
OpenAI-compatible "Other" endpoints whose tokens contain such
characters.

Render the config through the `toml` serializer instead (correct
escaping for every value), write it atomically with owner-only `0600`
permissions since it holds a secret, and surface save failures to the
user instead of swallowing them with `let _ =`.

Adds unit tests covering the backslash and quote cases plus the
empty/ollama-sentinel omission paths.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`prune_skips_non_json_files` pinned the session's `updated_at` to a
hardcoded `2026-04-23` but compared it against `chrono::Utc::now()`.
Once the wall clock passed 2026-05-23 the kept session aged beyond the
30-day threshold and was pruned, so `stats.removed` became 1 and the
assertion failed (currently breaking Test + Coverage on CI; the Windows
matrix leg is fail-fast-cancelled as a result).

Pin `now` to a fixed instant, matching the sibling `prune_*` tests, so
the test no longer depends on the current date.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@emal-avala
Copy link
Copy Markdown
Member Author

Pushed a follow-up commit to fix the red CI.

The failures weren't from the setup change — they all traced to a pre-existing time-bomb test, services::session::tests::prune_skips_non_json_files. It pinned a session's updated_at to a hardcoded 2026-04-23 but compared against chrono::Utc::now(); now that the wall clock is past 2026-05-23, the session ages beyond the 30-day prune window, so stats.removed is 1 instead of 0. That broke Test (ubuntu) and Coverage, and the Test (windows) leg was fail-fast-cancelled as a result.

Fixed by pinning now to a fixed instant, matching the sibling prune_* tests. Verified locally that the session prune tests pass.

@emal-avala emal-avala merged commit 0dd3e5f into main Jun 1, 2026
14 checks passed
@emal-avala emal-avala deleted the fix/issue-288-setup-api-key-escaping branch June 1, 2026 17:40
@emal-avala emal-avala mentioned this pull request Jun 1, 2026
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

(minor)Agent: forget API key from toml file even it had setup

1 participant