Context
The fuzzer (scripts/fuzz-rtk.py) currently runs manually. After the agentic fuzzing lab week experiment, we have 139 static tests across 35 families that catch real regressions.
Proposal
Add a CI job that runs python3 scripts/fuzz-rtk.py --rounds 0 (static tests only, no LLM) on every PR. Fail if FAIL count exceeds a threshold (currently 22 - all classified as by-design or known limitations).
Requirements
- Docker available in CI (for docker ps/images tests) or skip those families
- Python 3.10+ for the fuzzer script
- rtk binary built from PR branch
- Threshold stored in a config file so it can be tightened over time
Current baseline
- 139 tests, 95 PASS, 12 WARN, 22 FAIL, 10 SKIP
- Failure rate: 15.8% (all classified)
Context
The fuzzer (
scripts/fuzz-rtk.py) currently runs manually. After the agentic fuzzing lab week experiment, we have 139 static tests across 35 families that catch real regressions.Proposal
Add a CI job that runs
python3 scripts/fuzz-rtk.py --rounds 0(static tests only, no LLM) on every PR. Fail if FAIL count exceeds a threshold (currently 22 - all classified as by-design or known limitations).Requirements
Current baseline