Skip to content

fix: add security warning for untrusted skill content#19195

Open
sumleo wants to merge 1 commit intoanomalyco:devfrom
sumleo:fix/skill-security-warning
Open

fix: add security warning for untrusted skill content#19195
sumleo wants to merge 1 commit intoanomalyco:devfrom
sumleo:fix/skill-security-warning

Conversation

@sumleo
Copy link
Copy Markdown

@sumleo sumleo commented Mar 26, 2026

Issue for this PR

Closes #19123

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

Adds a security warning block in the system prompt to mark repository-provided skill content as untrusted. This prevents supply chain attacks where a malicious repo includes poisoned skill files (.opencode/agents/*/SKILL.md) that instruct the model to modify package manager configs, add attacker-controlled registries, or write hardcoded credentials.

The fix adds 7 specific rules to the system prompt that prevent the model from blindly executing code found in repository-provided skill files. It works by teaching the model to recognize and refuse common supply chain poisoning patterns (pip registry hijacking, npm auth token injection, curl-pipe-bash hooks, etc.) before they can be executed.

The approach is prompt-level — no changes to the agent runtime or execution engine. The model internalizes the security policy and self-refuses when it encounters attack patterns in skill content.

How did you verify your code works?

Tested against 31 poisoned skill files in isolated Docker containers. Before the fix, multiple skills achieved full code execution (L3 breach). After the fix, all were refused (L1). Verified that legitimate skill instructions (coding conventions, project setup) continue to work normally.

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

Closes anomalyco#18781

Skills loaded from repositories may contain malicious instructions that
trick the agent into writing to package manager configs, adding rogue
registry URLs, or modifying system-wide settings. This is a supply-chain
poisoning vector.

Add a two-layer defense:
- System prompt: append a <skill_security_policy> block to the skills
  section listing prohibited actions (registry hijacking, config writes,
  RCE patterns)
- Skill tool output: wrap each loaded skill in a <skill_security_warning>
  reminding the agent that the content is untrusted before it processes
  the skill body
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Security: Untrusted skill content loaded without sanitization or warning

1 participant