-
Notifications
You must be signed in to change notification settings - Fork 321
Open
Description
Related Issues
- Addresses Agent skips and passes features without testing #68 (Agent skips features without testing)
- Addresses Can test evidence be saved inside a feature for easier review #69 (Test evidence storage)
Problem
Currently, the agent can call feature_mark_passing without any verification that the code actually works. This leads to:
- Features marked as "passing" that don't compile - No lint/type-check enforcement
- No evidence of what was tested - We don't know if the agent actually verified the feature
- Features stuck in "in_progress" after agent crash - No auto-recovery mechanism
- No retry logic - If a feature fails, no tracking of failure count or reason
Proposed Solution
I've implemented "Quality Gates" in my fork that addresses all these issues:
1. Auto-detect Linters and Type Checkers
# Automatic detection based on project config files
Linters: ESLint, Biome, ruff, flake8
Type-check: TypeScript (tsc), Python (mypy)2. Enforce Quality Checks Before feature_mark_passing
# New MCP tool: feature_verify_quality
# Runs detected linters/type-checkers
# In strict mode, blocks marking if checks fail3. Store Quality Results as Evidence
New database columns:
quality_result- JSON with lint/type-check output (addresses Can test evidence be saved inside a feature for easier review #69)failure_reason- Why the feature failedfailure_count- Number of retry attempts
4. Auto-clear Stuck Features
On agent startup, features stuck in in_progress for >1 hour are reset to pending with failure tracking.
Implementation Details
- ~400 lines in new
quality_gates.pymodule - Small changes to
feature_mcp.py(addsfeature_verify_quality,feature_report_failuretools) - New DB columns (auto-migrated by SQLAlchemy)
- Backwards compatible: disabled by default
Configuration
// .autocoder/config.json
{
"quality_gates": {
"enabled": true,
"strict_mode": true, // Block marking if checks fail
"checks": {
"lint": true,
"type_check": true
}
}
}Questions for Maintainers
- Would you be interested in this feature?
- Should
strict_modebe the default, or should it just warn? - Any concerns about adding DB columns for tracking?
Happy to submit a PR if there's interest. The implementation is already working in my fork.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels