Skip to content

feat: load output styles from disk#256

Merged
emal-avala merged 4 commits into
mainfrom
feat/output-styles-from-disk
May 1, 2026
Merged

feat: load output styles from disk#256
emal-avala merged 4 commits into
mainfrom
feat/output-styles-from-disk

Conversation

@emal-avala
Copy link
Copy Markdown
Member

Summary

  • Add an output-style loader that picks up markdown presets from <project>/.agent/output-styles/*.md and ~/.config/agent-code/output-styles/*.md. The frontmatter schema is name (required, becomes the style id), description (required), and applies_to (optional list of subagent kinds; defaults to all).
  • Merge disk styles with the existing hard-coded built-ins. On id collision, the disk entry wins and a warning is logged. Malformed files are skipped instead of crashing session load.
  • Extend /output-style so the listing surfaces every style with a (built-in) / (project) / (user) source label and accepts disk-loaded ids alongside the built-in names.

Test Plan

  • cargo build --workspace
  • cargo test -p agent-code-lib --lib output_style (13 tests)
  • cargo test -p agent-code-lib --test output_styles_integration (5 tests)
  • Existing system_prompt_* tests still pass — the prompt-cache hash now includes the active disk style id so /output-style flips bust the cache.
  • cargo clippy -p agent-code-lib --tests clean
  • Drop a markdown file in .agent/output-styles/ of a fixture project and confirm it appears in the loader output (exercised by the integration test).

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3614b2dcbf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/cli/src/commands/mod.rs Outdated
Comment on lines +4029 to +4033
if let Some(new_style) = agent_code_lib::state::ResponseStyle::from_name(raw) {
engine.state_mut().response_style = new_style;
// Picking a built-in clears any active disk override.
engine.state_mut().disk_output_style = None;
println!("Response style set to '{}'.", new_style.name());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Prefer disk registry entry when style IDs collide

/output-style resolves built-ins before consulting the disk registry, so IDs like concise/default always select ResponseStyle and clear disk_output_style. That makes project/user overrides with the same ID impossible to activate, contradicting the loader behavior where disk entries override built-ins; users can see an override in the list but cannot actually use it.

Useful? React with 👍 / 👎.

Comment thread crates/lib/src/state/mod.rs Outdated
Comment on lines +186 to +190
pub fn active_output_style_fragment(&self) -> &str {
if let Some(style) = &self.disk_output_style {
return style.body.as_str();
}
self.response_style.prompt_fragment()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor applies_to when applying disk output styles

The active disk style fragment is always returned without checking applies_to, so a style intended only for a subset of agent kinds (for example subagent) still affects main-agent system prompts. Since the frontmatter parser and docs expose applies_to, this causes incorrect prompt scoping whenever a style excludes main.

Useful? React with 👍 / 👎.

Comment thread crates/lib/src/query/mod.rs Outdated
Comment on lines +667 to +670
.disk_output_style
.as_ref()
.map(|s| s.name.as_str())
.unwrap_or("")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Hash disk style content, not only style name

Prompt caching only includes the disk style ID, so if the user edits a style file and re-selects the same name in-session, the cached system prompt can remain stale because the hash does not change. In that scenario the new style body is ignored until some unrelated cache-busting change happens.

Useful? React with 👍 / 👎.

emal-avala added a commit that referenced this pull request May 1, 2026
…style content

Codex review on PR #256 surfaced three regressions in the disk
output-styles loader:

- /output-style resolved built-ins before disk styles, so a disk
  override with id `concise` or `default` could never activate.
  Resolution now checks the disk registry first and falls back to
  ResponseStyle aliases (off, terse, explain, teach) only when no
  disk file matches.

- active_output_style_fragment ignored frontmatter `applies_to`,
  letting a `[subagent]`-only style leak into the main agent's
  prompt. Both `active_output_style_fragment` and
  `active_output_style_name` now take an `AgentKind` and return
  None when the active style doesn't apply to that role.
  build_system_prompt threads the kind through; the Agent tool
  marks spawned children with AGENT_CODE_SUBAGENT=1 so the CLI can
  distinguish a real subagent from a user-invoked one-shot.

- The cached system prompt's hash mixed in the active style id
  but not its body, so editing a style file in-session and
  re-selecting the same id left a stale prompt cached. OutputStyle
  now carries a 12-byte SHA256 prefix of its body, hashed into the
  cache key alongside the id and applies_to list.

Tests: adds disk-override-wins, applies_to-filters-main, and
content-hash-changes regressions.
@emal-avala
Copy link
Copy Markdown
Member Author

Addressed codex review findings: disk styles now win over built-ins on id collision, applies_to is honored at apply time via a new AgentKind parameter (subagents marked via AGENT_CODE_SUBAGENT=1), and the system-prompt cache key now mixes in a SHA256 prefix of the style's body so in-session edits invalidate it.

@emal-avala
Copy link
Copy Markdown
Member Author

Addressed second-round codex findings: subagent propagation via AGENT_CODE_DISK_OUTPUT_STYLE env var, parse-time applies_to validation, source-aware active marker on /output-style list and /status, hermetic user-dir override (load_all_with_user_dir + AGENT_CODE_USER_OUTPUT_STYLES_DIR), and /reload re-reads the active disk style from its source file. CI gate clean (check, test, clippy -D warnings, fmt --check) — only pre-existing bwrap-in-CI uid-map failures remain in sandbox_integration.

emal-avala added 4 commits May 1, 2026 13:48
Add a loader that picks up markdown output-style presets from
<project>/.agent/output-styles/ and ~/.config/agent-code/output-styles/.
Built-in styles continue to ship with the binary; a disk style with the
same id wins on collision and a warning is logged. Malformed files are
skipped (not fatal) so a single broken preset cannot crash startup.

The /output-style listing now shows every loaded style with a source
label (built-in / project / user) and accepts disk-loaded ids as
arguments. Active style selection clears any previous override so the
two paths cannot drift apart.

Includes unit tests for the frontmatter parser and the override
behaviour, plus an end-to-end integration test that stages golden
markdown fixtures into a fake project and verifies the loader picks
them up.
…style content

Codex review on PR #256 surfaced three regressions in the disk
output-styles loader:

- /output-style resolved built-ins before disk styles, so a disk
  override with id `concise` or `default` could never activate.
  Resolution now checks the disk registry first and falls back to
  ResponseStyle aliases (off, terse, explain, teach) only when no
  disk file matches.

- active_output_style_fragment ignored frontmatter `applies_to`,
  letting a `[subagent]`-only style leak into the main agent's
  prompt. Both `active_output_style_fragment` and
  `active_output_style_name` now take an `AgentKind` and return
  None when the active style doesn't apply to that role.
  build_system_prompt threads the kind through; the Agent tool
  marks spawned children with AGENT_CODE_SUBAGENT=1 so the CLI can
  distinguish a real subagent from a user-invoked one-shot.

- The cached system prompt's hash mixed in the active style id
  but not its body, so editing a style file in-session and
  re-selecting the same id left a stale prompt cached. OutputStyle
  now carries a 12-byte SHA256 prefix of its body, hashed into the
  cache key alongside the id and applies_to list.

Tests: adds disk-override-wins, applies_to-filters-main, and
content-hash-changes regressions.
…rate active marker, hermetic tests

- Subagent propagation: the Agent tool now sets
  AGENT_CODE_DISK_OUTPUT_STYLE=<name> when spawning a child, and the
  child resolves the named style against its own loaded registry. Without
  this, applies_to: [subagent] styles were dead at the subprocess
  boundary. Missing styles log a warning instead of crashing the child.
- applies_to validation: reject unknown values ("mian", "Main") at
  parse time so a typo can't silently produce a no-op style. Lowercase
  is canonical; mixed case is rejected to keep the matcher simple.
- Active marker: identity is (name, source), not just name. Disk styles
  shadowing built-in ids no longer falsely activate the disk version
  when only the built-in is in use. /status uses a label that includes
  the source so a disk override is visible at a glance.
- /reload: re-read the active disk style from its source file so
  in-session edits land on the next turn. If the file is gone, clear
  the override (best-effort, never error).
- Hermetic tests: add OutputStyleRegistry::load_all_with_user_dir and
  the AGENT_CODE_USER_OUTPUT_STYLES_DIR env override so integration
  tests don't read the developer's real ~/.config/agent-code/.
The hermetic e2e test in output_styles_subagent.rs relies on
--dump-system-prompt to inspect the rendered prompt without ever
contacting an LLM. CI strips every API-key env var, so the binary
errored out at the API-key check before reaching the
--dump-system-prompt branch — green locally only because OPENAI_API_KEY
happens to be set.

--dump-system-prompt is a pure tool+state diagnostic that never uses
the LLM provider, so accept an empty placeholder key for this path
specifically. All other paths still demand a real key.
@emal-avala emal-avala force-pushed the feat/output-styles-from-disk branch from af74acc to f30f1f3 Compare May 1, 2026 20:54
@emal-avala
Copy link
Copy Markdown
Member Author

Rebased onto main; conflicts resolved across:

  • crates/cli/src/main.rs — unioned PermissionChecker::with_project_root (from feat: team-memory layer #257 team-memory) with this PR's AGENT_CODE_DISK_OUTPUT_STYLE lookup; switched the registry's project-root source to session_env.project_root (matches the same source PermissionChecker now uses).
  • crates/lib/src/tools/cron_support.rs and crates/lib/tests/cron_tools_integration.rs — added the new active_disk_output_style: None field to the ToolContext literals introduced by feat: cron and remote-trigger model tools #254 cron tools.
  • crates/lib/src/schedule/executor.rs — auto-merged: this PR's agent_kind: AgentKind::Main for scheduled top-level runs sits next to feat: cron and remote-trigger model tools #254's cron-related additions.
  • crates/cli/src/commands/mod.rs, crates/lib/src/tools/{mod,executor,agent}.rs, crates/lib/src/query/mod.rs — all auto-merged cleanly.

Ubuntu/Coverage test regression: fixed. Root cause was output_styles_subagent.rs's e2e harness exercising --dump-system-prompt after stripping AGENT_CODE_API_KEY and ANTHROPIC_API_KEY, but not the rest of resolve_api_key_from_env's fallbacks (OPENAI_API_KEY, etc.). It passed locally because developer machines have one of those set; CI machines don't, so the binary errored on the API-key check before reaching the --dump-system-prompt early-return branch. Fixed in a new commit by accepting an empty placeholder key on the --dump-system-prompt path — that path never constructs an LLM request, so a real key is unnecessary. All other paths still require a real key.

Local CI gates: cargo check --all-targets, cargo test --all-targets (modulo the pre-existing bwrap_* sandbox failures that need user-namespaces; not related to this PR), cargo clippy --all-targets -- -D warnings, cargo fmt --all -- --check all pass.

@emal-avala
Copy link
Copy Markdown
Member Author

Final CI status (run 25232752430, head f30f1f3):

  • Test (ubuntu-latest): pass (was failing — fixed)
  • Coverage: pass (was failing — fixed)
  • Check / Clippy / Format / Analyze: pass
  • Test (windows-latest), Build Release (windows-latest), Build Release (ubuntu/macos cancelled): fail/cancel — pre-existing main bug, not from this PR. crates/lib/src/tools/remote_trigger.rs from feat: cron and remote-trigger model tools #254 references child_pid outside the #[cfg(unix)] block. The same job fails on c76b9e6 (latest main HEAD) for the identical reason. Filing/fixing belongs to a follow-up PR scoped to feat: cron and remote-trigger model tools #254.

@emal-avala emal-avala merged commit 8d41423 into main May 1, 2026
10 of 14 checks passed
@emal-avala emal-avala deleted the feat/output-styles-from-disk branch May 1, 2026 21:07
emal-avala added a commit that referenced this pull request May 1, 2026
…ask executors

#263 added new ToolContext{ ... } literals in tools/tasks/{executors/local_agent,tools}.rs
and tests/task_kind_integration.rs. They predate #256 landing the
active_disk_output_style field. Fill it with None — task executors don't
propagate disk output styles.
emal-avala added a commit that referenced this pull request May 1, 2026
…literals

#263 (task model variants) was based on main before #256 (output styles
from disk) added active_disk_output_style to ToolContext. After both
landed, three ToolContext literals in the task subsystem fail to
compile with E0063.

Add the field with None to:
- crates/lib/src/tools/tasks/executors/local_agent.rs:75
- crates/lib/src/tools/tasks/tools.rs (test helper)
- crates/lib/tests/task_kind_integration.rs:34

Task executors don't propagate disk output styles today; None is the
correct value until 8.7 (swarm mode) and 8.10 (fork/resume) wire that
through.
emal-avala added a commit that referenced this pull request May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant