Self-Play Sparring, Offense Meets Defense — Making AI Write Secure Code
Benchmark | Quick Start | Roadmap
LLMs are generating production code at an unprecedented pace, but that code harbors significant security risks. Published benchmarks show that even state-of-the-art commercial models produce vulnerable code up to 48.2% of the time, with top-tier models ranging from 37% to 95.6% (Source: AutoBaxBench, Dec 2025).
Paradoxically, these same SOTA LLMs excel at vulnerability discovery — Claude Opus 4.6 has uncovered hundreds of security vulnerabilities in open-source projects and even independently developed a full exploit chain for a FreeBSD kernel remote code execution vulnerability.
Root cause: When generating code, the LLM's optimization target is functional correctness, not security. Security is merely an implicit constraint that is easily diluted under pressure to produce working code.
Adversarial AI Coding simultaneously activates the "elite hacker" and the "diligent developer" latent within the model, forcing them into adversarial engagement within the same coding session. Through this self-play sparring mechanism, the security of the generated code is ensured.
The architecture features two built-in roles:
- Left Hand (Coder): Responds to the developer's AI Coding requests and generates functional code.
- Right Hand (Reviewer): Automatically activated at the end of each coding session, auditing the code from an attacker's perspective and fixing vulnerabilities in real time.
No prompt changes required. No workflow changes needed. Developers code as usual — adversarial review happens automatically.
| Feature | Description |
|---|---|
| Self-Play Adversarial | A single LLM plays both Coder and Reviewer, forcing internal adversarial engagement to fully unleash its security capabilities |
| Zero-Friction Automation | No manual triggers, no prompt engineering — the plugin integrates transparently into your coding workflow |
| Multi-Language, Multi-Risk | Covers Java, Python, C/C++, and JavaScript, addressing injection, command execution, buffer overflow, deserialization, XSS, and more |
| Full Lifecycle Coverage | Pre-generation security hardening + post-generation security audit, spanning the entire AI Coding lifecycle |
We evaluated the AAC architecture using public datasets (CyberSecEval, SecCodeBench) from two perspectives:
Perspective 1 — Vulnerability Reduction in Generated Code
Experiments conducted using Claude Code with GLM-5 / Kimi-K2.5 / MiniMax-M2.5 / Qwen3.5-397B-A17B:
| Metric | Result |
|---|---|
| Security audit trigger rate | ~80% |
| Overall vulnerability reduction | 79.5% |
Perspective 2 — Malicious Injection Detection
Code containing common OWASP vulnerabilities was injected into Claude Code sessions to simulate poisoned code:
| Metric | Result |
|---|---|
| Probability of entering security audit | 93% |
| Probability of identifying and fixing risks | 90% |
Note: AAC adds 22%–76% overhead to task execution time (varies by model). Given the additional security auditing and code remediation performed, we consider this overhead acceptable.
- Injection: SQL injection, NoSQL injection, template injection
- Command Execution: OS command injection, code injection
- File I/O: Path traversal, arbitrary file read/write
- Deserialization: Java / Python deserialization vulnerabilities
- Sensitive Data: Hardcoded credentials, information disclosure
- Access Control: Privilege escalation, SSRF
- XML: XXE (XML External Entity injection)
- Memory Corruption: Buffer overflow, Use-After-Free, Double Free
- Integer Safety: Integer overflow/underflow, signedness issues
- Dangerous Functions: Use of potentially dangerous functions
- Format String: Format string vulnerabilities
- Concurrency: Race conditions
- Command & Path: OS command execution, path traversal
- Query Injection: QL injection
- Injection: Code injection, QL injection, XSS, prototype pollution
- Command & Path: OS command execution, path traversal
- Network Security: SSRF, insecure transport
- Deserialization: Deserialization vulnerabilities
- Denial of Service: ReDoS
- Cryptography: Weak randomness, timing attacks
- Messaging: PostMessage origin validation
- Sensitive Data: Hardcoded credentials, Buffer handling issues
- Container Security: Privileged containers, misconfigured capabilities
- Network Exposure: Network exposure risks
- Storage Security: Host path mounts
- Secret Management: Hardcoded credentials
- Access Control: RBAC misconfiguration
- Dockerfile: Dockerfile security best practices
| Dependency | Version |
|---|---|
| Python | >= 3.10 |
| Claude Code CLI | Latest |
1. Open Claude Code
2. Add the plugin marketplace
/plugin marketplace add https://github.com/antgroup/adversarial-ai-coding-plugin.git3. Install the plugin
/plugin install adversarial-ai-coding4. Restart Claude Code
Run /exit to quit, then relaunch Claude Code. The plugin will take effect immediately.
That's it. No configuration needed — the plugin works automatically. Just code as you normally would.
/plugin marketplace update adversarial-ai-coding-pluginThen restart Claude Code.
Without AAC — The model passes user input directly to exec():
public static boolean checkForPattern(String command, ...) throws Exception {
Process process = Runtime.getRuntime().exec(command); // String form, injection risk
...
}With AAC — The Reviewer identifies the risk and applies a fix using array-form invocation with command allowlisting:
private static final List<String> ALLOWED_COMMANDS = Arrays.asList("ls", "cat", "grep", "ps", ...);
public static boolean checkPatternInCommandOutput(String[] command, ...) {
validateCommand(command[0]); // Allowlist validation
ProcessBuilder pb = new ProcessBuilder(command); // Array form, prevents injection
Process process = pb.start();
...
}Without AAC — The model uses strcpy() without bounds checking:
char* modify_array(char *arr) {
...
strcpy(arr, result); // No bounds checking, buffer overflow risk
...
}With AAC — The Reviewer adds an explicit buffer size parameter, integer overflow detection, and a safe write function:
bool modify_buffer_secure(char *buffer, size_t buffer_size, ...) {
if (env_len > SIZE_MAX - fixed_len) return false; // Integer overflow check
if (total_len >= buffer_size) return false; // Bounds check
int written = snprintf(buffer, buffer_size, "%s%s", fixed_str, env_value);
if (written < 0 || (size_t)written >= buffer_size) return false;
}| Version | Milestone | Key Features | Status |
|---|---|---|---|
| v0.1.0 | Pre-Generation Security Hardening | Security hardening agent covering high-severity risk types | ✅ Released |
| v1.0.0 | Adversarial AI Coding | Full AAC architecture — self-play sparring, offense meets defense | ✅ Released |
| v2.0.0 | Sensitive Data Protection | Fully automated real-time data masking and recovery (SHS) | 📅 Planned |
Currently adapted for Claude Code. Support for additional AI Coding clients is in progress.
- Become the de facto standard for AI Coding security hardening
- Support more AI Coding clients and programming languages
- Build a community-driven Security Skills ecosystem
- Establish Adversarial AI Coding as a core security paradigm in AI engineering
We welcome all forms of contribution — whether it's adding new Security Skills, supporting new languages, fixing bugs, or improving documentation.
Security Skills are the core knowledge units of the plugin. Each Skill covers a specific vulnerability domain and comes with expert-level reference materials. Contributing new Security Skills is the most impactful way to improve this project — refer to existing implementations under plugin/skills/ for guidance.
- GitHub Issues: Bug reports and feature requests
- GitHub Discussions: Q&A and community discussions
This project is licensed under the Apache License 2.0.
If you use Adversarial AI Coding in your research, please cite:
@software{adversarial_ai_coding,
title={Adversarial AI Coding Plugin: Self-Play Sparring, Offense Meets Defense},
author={Ant Group},
year={2026},
url={https://github.com/antgroup/adversarial-ai-coding-plugin}
}
Self-Play Sparring, Offense Meets Defense
Let the "elite hacker" inside the LLM guard every line of code written by the "diligent developer"

