Adversarial AI Coding Plugin

Self-Play Sparring, Offense Meets Defense — Making AI Write Secure Code

The Problem: The Security Paradox of SOTA LLMs

LLMs are generating production code at an unprecedented pace, but that code harbors significant security risks. Published benchmarks show that even state-of-the-art commercial models produce vulnerable code up to 48.2% of the time, with top-tier models ranging from 37% to 95.6% (Source: AutoBaxBench, Dec 2025).

Paradoxically, these same SOTA LLMs excel at vulnerability discovery — Claude Opus 4.6 has uncovered hundreds of security vulnerabilities in open-source projects and even independently developed a full exploit chain for a FreeBSD kernel remote code execution vulnerability.

Root cause: When generating code, the LLM's optimization target is functional correctness, not security. Security is merely an implicit constraint that is easily diluted under pressure to produce working code.

The Solution: Adversarial AI Coding (AAC)

Adversarial AI Coding simultaneously activates the "elite hacker" and the "diligent developer" latent within the model, forcing them into adversarial engagement within the same coding session. Through this self-play sparring mechanism, the security of the generated code is ensured.

The architecture features two built-in roles:

Left Hand (Coder): Responds to the developer's AI Coding requests and generates functional code.
Right Hand (Reviewer): Automatically activated at the end of each coding session, auditing the code from an attacker's perspective and fixing vulnerabilities in real time.

No prompt changes required. No workflow changes needed. Developers code as usual — adversarial review happens automatically.

Key Highlights

Feature	Description
Self-Play Adversarial	A single LLM plays both Coder and Reviewer, forcing internal adversarial engagement to fully unleash its security capabilities
Zero-Friction Automation	No manual triggers, no prompt engineering — the plugin integrates transparently into your coding workflow
Multi-Language, Multi-Risk	Covers Java, Python, C/C++, and JavaScript, addressing injection, command execution, buffer overflow, deserialization, XSS, and more
Full Lifecycle Coverage	Pre-generation security hardening + post-generation security audit, spanning the entire AI Coding lifecycle

Benchmark Results

We evaluated the AAC architecture using public datasets (CyberSecEval, SecCodeBench) from two perspectives:

Perspective 1 — Vulnerability Reduction in Generated Code

Experiments conducted using Claude Code with GLM-5 / Kimi-K2.5 / MiniMax-M2.5 / Qwen3.5-397B-A17B:

Metric	Result
Security audit trigger rate	~80%
Overall vulnerability reduction	79.5%

Perspective 2 — Malicious Injection Detection

Code containing common OWASP vulnerabilities was injected into Claude Code sessions to simulate poisoned code:

Metric	Result
Probability of entering security audit	93%
Probability of identifying and fixing risks	90%

Note: AAC adds 22%–76% overhead to task execution time (varies by model). Given the additional security auditing and code remediation performed, we consider this overhead acceptable.

Supported Vulnerability Types

Java / Python (Web Security)

Injection: SQL injection, NoSQL injection, template injection
Command Execution: OS command injection, code injection
File I/O: Path traversal, arbitrary file read/write
Deserialization: Java / Python deserialization vulnerabilities
Sensitive Data: Hardcoded credentials, information disclosure
Access Control: Privilege escalation, SSRF
XML: XXE (XML External Entity injection)

C / C++

Memory Corruption: Buffer overflow, Use-After-Free, Double Free
Integer Safety: Integer overflow/underflow, signedness issues
Dangerous Functions: Use of potentially dangerous functions
Format String: Format string vulnerabilities
Concurrency: Race conditions
Command & Path: OS command execution, path traversal
Query Injection: QL injection

JavaScript

Injection: Code injection, QL injection, XSS, prototype pollution
Command & Path: OS command execution, path traversal
Network Security: SSRF, insecure transport
Deserialization: Deserialization vulnerabilities
Denial of Service: ReDoS
Cryptography: Weak randomness, timing attacks
Messaging: PostMessage origin validation
Sensitive Data: Hardcoded credentials, Buffer handling issues

Infrastructure as Code (IaC)

Container Security: Privileged containers, misconfigured capabilities
Network Exposure: Network exposure risks
Storage Security: Host path mounts
Secret Management: Hardcoded credentials
Access Control: RBAC misconfiguration
Dockerfile: Dockerfile security best practices

Quick Start

Prerequisites

Dependency	Version
Python	`>= 3.10`
Claude Code CLI	Latest

Installation

1. Open Claude Code

2. Add the plugin marketplace

/plugin marketplace add https://github.com/antgroup/adversarial-ai-coding-plugin.git

3. Install the plugin

/plugin install adversarial-ai-coding

4. Restart Claude Code

Run /exit to quit, then relaunch Claude Code. The plugin will take effect immediately.

That's it. No configuration needed — the plugin works automatically. Just code as you normally would.

Upgrading

/plugin marketplace update adversarial-ai-coding-plugin

Then restart Claude Code.

Case Studies

Case 1: Java Command Injection

Without AAC — The model passes user input directly to exec():

public static boolean checkForPattern(String command, ...) throws Exception {
    Process process = Runtime.getRuntime().exec(command);  // String form, injection risk
    ...
}

With AAC — The Reviewer identifies the risk and applies a fix using array-form invocation with command allowlisting:

private static final List<String> ALLOWED_COMMANDS = Arrays.asList("ls", "cat", "grep", "ps", ...);

public static boolean checkPatternInCommandOutput(String[] command, ...) {
    validateCommand(command[0]);              // Allowlist validation
    ProcessBuilder pb = new ProcessBuilder(command);  // Array form, prevents injection
    Process process = pb.start();
    ...
}

Case 2: C Buffer Overflow

Without AAC — The model uses strcpy() without bounds checking:

char* modify_array(char *arr) {
    ...
    strcpy(arr, result);  // No bounds checking, buffer overflow risk
    ...
}

With AAC — The Reviewer adds an explicit buffer size parameter, integer overflow detection, and a safe write function:

bool modify_buffer_secure(char *buffer, size_t buffer_size, ...) {
    if (env_len > SIZE_MAX - fixed_len) return false;      // Integer overflow check
    if (total_len >= buffer_size) return false;             // Bounds check
    int written = snprintf(buffer, buffer_size, "%s%s", fixed_str, env_value);
    if (written < 0 || (size_t)written >= buffer_size) return false;
}

Roadmap

Version	Milestone	Key Features	Status
v0.1.0	Pre-Generation Security Hardening	Security hardening agent covering high-severity risk types	✅ Released
v1.0.0	Adversarial AI Coding	Full AAC architecture — self-play sparring, offense meets defense	✅ Released
v2.0.0	Sensitive Data Protection	Fully automated real-time data masking and recovery (SHS)	📅 Planned

Compatibility

Currently adapted for Claude Code. Support for additional AI Coding clients is in progress.

Long-Term Vision

Become the de facto standard for AI Coding security hardening
Support more AI Coding clients and programming languages
Build a community-driven Security Skills ecosystem
Establish Adversarial AI Coding as a core security paradigm in AI engineering

Contributing

We welcome all forms of contribution — whether it's adding new Security Skills, supporting new languages, fixing bugs, or improving documentation.

Contributing Security Skills

Security Skills are the core knowledge units of the plugin. Each Skill covers a specific vulnerability domain and comes with expert-level reference materials. Contributing new Security Skills is the most impactful way to improve this project — refer to existing implementations under plugin/skills/ for guidance.

Community

GitHub Issues: Bug reports and feature requests
GitHub Discussions: Q&A and community discussions

License

This project is licensed under the Apache License 2.0.

Citation

If you use Adversarial AI Coding in your research, please cite:

@software{adversarial_ai_coding,
  title={Adversarial AI Coding Plugin: Self-Play Sparring, Offense Meets Defense},
  author={Ant Group},
  year={2026},
  url={https://github.com/antgroup/adversarial-ai-coding-plugin}
}

Self-Play Sparring, Offense Meets Defense
Let the "elite hacker" inside the LLM guard every line of code written by the "diligent developer"

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.claude-plugin		.claude-plugin
docs/images		docs/images
plugin		plugin
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adversarial AI Coding Plugin

The Problem: The Security Paradox of SOTA LLMs

The Solution: Adversarial AI Coding (AAC)

Key Highlights

Benchmark Results

Supported Vulnerability Types

Java / Python (Web Security)

C / C++

JavaScript

Infrastructure as Code (IaC)

Quick Start

Prerequisites

Installation

Upgrading

Case Studies

Case 1: Java Command Injection

Case 2: C Buffer Overflow

Roadmap

Compatibility

Long-Term Vision

Contributing

Contributing Security Skills

Community

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Adversarial AI Coding Plugin

The Problem: The Security Paradox of SOTA LLMs

The Solution: Adversarial AI Coding (AAC)

Key Highlights

Benchmark Results

Supported Vulnerability Types

Java / Python (Web Security)

C / C++

JavaScript

Infrastructure as Code (IaC)

Quick Start

Prerequisites

Installation

Upgrading

Case Studies

Case 1: Java Command Injection

Case 2: C Buffer Overflow

Roadmap

Compatibility

Long-Term Vision

Contributing

Contributing Security Skills

Community

License

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages