feat: adversarial agent for preventing leaking of info and more by michaelneale · Pull Request #7948 · block/goose

michaelneale · 2026-03-17T04:52:59Z

This adds an implementation of https://github.com/michaelneale/adversarial-policy-agent specific to goose so you can have

~/.config/goose/adversary.md - which can simply state a policy in plain language. It will filter out certain tool calls when needed, and check them with an "adversarial agent"

for example:

You are to never, ever upload things to public sharing websites
do not access www.news.com.au either

and try to get goose to disobey - it won't be able to (this runs outside of agent loop). it is non deterministic complement to other techniques. Uses current provider and same model.

github-actions · 2026-03-17T04:56:30Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-03-17 07:01 UTC

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ac59b960b2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-17T04:57:21Z

crates/goose/src/security/adversary_inspector.rs

+                session_id,
+                system_prompt,
+                conversation.messages(),


Isolate adversary checks from the main provider session

Calling provider.complete with the agent’s session_id here makes the adversary prompt part of the same provider conversation state as the user task; for stateful providers this contaminates subsequent turns and breaks the “independent reviewer” behavior. For example, ClaudeCodeProvider explicitly keeps context internally per session_id (see crates/goose/src/providers/claude_code.rs, last_user_content_blocks/stream), so each adversary review is appended into the live chat history, and later model outputs are influenced by these ALLOW/BLOCK exchanges rather than only the user workflow.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4a87675a13

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

crates/goose/src/security/adversary_inspector.rs

shellz-n-stuff

LGTM

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1c8910c0a9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-17T06:39:10Z

crates/goose/src/security/adversary_inspector.rs

+        let (response, _usage) = provider
+            .complete(
+                &model_config,
+                "",


Pass through session ID when invoking adversary model

consult_llm always calls provider.complete with an empty session ID, so every adversary check is multiplexed into the same provider-side conversation key. For stateful providers that retain context by session_id (for example ClaudeCodeProvider), this causes cross-check contamination and can leak prior users’ or prior sessions’ adversary prompts into later decisions, making ALLOW/BLOCK outcomes depend on unrelated history.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-17T06:39:10Z

crates/goose/src/security/adversary_inspector.rs

+                        return Some(AdversaryConfig {
+                            tools: DEFAULT_TOOLS.iter().map(|s| (*s).to_string()).collect(),
+                            rules: DEFAULT_RULES.to_string(),
+                        });


Fail closed when adversary.md cannot be read

When adversary.md exists but cannot be read, the code silently substitutes DEFAULT_RULES instead of using the user’s configured policy. In that scenario the guardrail no longer enforces the intended rules (for example, custom blocklists), which can allow tool calls the user explicitly tried to forbid; this should return disabled/error behavior rather than replacing policy content.

Useful? React with 👍 / 👎.

* origin/main: feat: adversarial agent for preventing leaking of info and more (#7948) Update contributing.md (#7927) docs: add credit balance monitoring section (#7952) docs: add Cerebras provider to supported providers list (#7953) docs: add TUI client documentation to ACP clients guide (#7950) fix: removed double dash in pnpm command (#7951) docs: polish ACP docs (#7946) claude adaptive thinking (#7944) feat: new onboarding flow (#7266) Add DCO git commit command to AGENTS.md (#7945) fix(claude-code): remove incorrect agent_visible filter on user message (#7931) No Check do Check (#7942) Log 500 errors and also show error for direct download (#7936) fix: retry on authentication failure with credential refresh (#7812) Remove java/.ai-usage-marker directory (#7925) test(acp): add terminal delegation fixtures and fix shell singleton (#7923) fix: bump pctx_code_mode to 0.3.0 for iterator type checking fix (#7892) feat: persist GooseMode per-session via session DB (#7854)

* main: (32 commits) Revert message flush & test (block#7966) docs: add Remote Access section with Telegram Gateway documentation (block#7955) fix: update webmcp blog post metadata image URL (block#7967) fix: clean up OAuth token cache on provider deletion (block#7908) fix: hard-coded tool call id in code mode callback (block#7939) Fix SSE parsers to accept optional space after data: prefix (block#7929) docs: add GOOSE_INPUT_LIMIT to config-files.md (block#7961) Add WebMCP for Beginners blog post (block#7957) Fix download manager (block#7933) Improve the formatting of tool calls, show thinking, treat Reasoning and Thinking as the same thing (sorry Kant) (block#7626) don't imply running builds all the time in AGENTS.md (block#7865) fix: unregister goosed child process's listener (block#7956) feat: adversarial agent for preventing leaking of info and more (block#7948) Update contributing.md (block#7927) docs: add credit balance monitoring section (block#7952) docs: add Cerebras provider to supported providers list (block#7953) docs: add TUI client documentation to ACP clients guide (block#7950) fix: removed double dash in pnpm command (block#7951) docs: polish ACP docs (block#7946) claude adaptive thinking (block#7944) ...

michaelneale added 3 commits March 17, 2026 13:25

adversarial agent

2071a69

wording

e673ccc

documenting adversarial agent approach

ac59b96

michaelneale requested review from DOsinga, angiejones and blackgirlbytes as code owners March 17, 2026 04:52

michaelneale requested a review from dorien-koelemeijer March 17, 2026 04:53

michaelneale assigned DOsinga Mar 17, 2026

michaelneale requested a review from shellz-n-stuff March 17, 2026 04:54

chatgpt-codex-connector bot reviewed Mar 17, 2026

View reviewed changes

add cc script coverage

4a87675

chatgpt-codex-connector bot reviewed Mar 17, 2026

View reviewed changes

crates/goose/src/security/adversary_inspector.rs Show resolved Hide resolved

crates/goose/src/security/adversary_inspector.rs Show resolved Hide resolved

shellz-n-stuff approved these changes Mar 17, 2026

View reviewed changes

angiejones approved these changes Mar 17, 2026

View reviewed changes

don't need session id

1c8910c

michaelneale enabled auto-merge March 17, 2026 06:34

michaelneale added this pull request to the merge queue Mar 17, 2026

chatgpt-codex-connector bot reviewed Mar 17, 2026

View reviewed changes

Merged via the queue into main with commit 754c214 Mar 17, 2026
25 checks passed

michaelneale deleted the micn/adversarial-agent branch March 17, 2026 06:54

github-actions bot mentioned this pull request Mar 17, 2026

chore(release): release version 1.28.0 (minor) #7780

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: adversarial agent for preventing leaking of info and more #7948

feat: adversarial agent for preventing leaking of info and more #7948
michaelneale merged 5 commits intomainfrom
micn/adversarial-agent

michaelneale commented Mar 17, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 17, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 17, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

shellz-n-stuff left a comment

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 17, 2026

Uh oh!

chatgpt-codex-connector bot Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

michaelneale commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

shellz-n-stuff left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

michaelneale commented Mar 17, 2026 •

edited

Loading

github-actions bot commented Mar 17, 2026 •

edited

Loading