Automated nightly codebase scanner#1573
Conversation
|
Might be good to try out some alloy models in this and other actions, could potentially give deeper perspective to the agents and better overall results |
0d1c6d7 to
23e46fb
Compare
|
/review |
23e46fb to
65081da
Compare
fixed important issues
65081da to
5a76f90
Compare
5a76f90 to
b147fd1
Compare
sure! what did you have in mind? i'm using 4 different models right now (1 root, 3 sub-agents). is that what you mean? or, like, two sub-agents with different models that both do kind of the same thing, and then the root compares their output? edit: i just saw that you can combine models like |
changes have been made
Adds a read-only agent that scans the codebase daily and creates GitHub issues for security vulnerabilities, bugs, and documentation gaps. - Runs daily at 6am UTC (or manual trigger) - Creates max 2 issues per run to avoid flooding - Deduplicates against existing open issues with 'nightly-scan' label - Dry-run mode for testing - Strong anti-hallucination rules (read-only, must verify files exist)
- Root agent (claude-sonnet) orchestrates scan across sub-agents - Security sub-agent (claude-opus) for vulnerability detection - Bugs sub-agent (claude-sonnet) for logic errors and resource leaks - Documentation sub-agent (claude-haiku) for doc gap detection - Add GitHub Actions cache for persistent scanner memory - Memory stores skip patterns, context, and feedback across runs - Each sub-agent has strict grounding rules to prevent hallucinations
Agent improvements: - Documentation agent reads all markdown files before analysis - All sub-agents explicitly accept empty results as valid outcome - Added read_multiple_files tool to documentation agent Workflow improvements: - Pin actions/cache to v4.2.0 SHA for security - Use static cache key (matches cagent-action pattern)
- Replace custom JSON memory with cagent's SQLite memory toolset - Agent uses get_memories/add_memory tools instead of file I/O - Memory path resolves to .github/agents/scanner-memory.db - Removes manual JSON initialization step
Sub-agents now return findings in simple text format: FILE: path/to/file.go LINE: 123 SEVERITY: high ... Benefits over JSON: - More natural for LLM output - Less prone to formatting errors - Matches cagent-action PR review pattern Root agent still outputs JSON for workflow parsing.
Move issue creation from workflow to agent: - New `reporter` sub-agent uses `gh` CLI to create issues - Checks for duplicates before creating - Selects appropriate labels based on category - Workflow reduced from 175 lines to 55 lines Dry-run mode now passed as prompt to agent.
Model assignments: - Security: openai/o3-mini (reasoning model for subtle vulnerabilities) - Bugs: google/gemini-2.5-flash (fast, good at Go code analysis) - Documentation: anthropic/claude-haiku (sufficient for simpler task) - Orchestrator: anthropic/claude-sonnet (coordination) - Reporter: anthropic/claude-haiku (formatting + gh commands) Workflow now passes all three API keys: - ANTHROPIC_API_KEY - OPENAI_API_KEY - GOOGLE_API_KEY (mapped from GEMINI_API_KEY secret)
Fixes from code review: 1. Restrict shell permissions - Changed `gh *` to `gh issue list *` and `gh issue create *` - Principle of least privilege 2. CATEGORY field mismatch - Root agent now adds CATEGORY when forwarding to reporter - Added explicit "Forwarding to reporter" section with example 3. Inconsistent array/NO_ISSUES terminology - Changed "Return an empty array `[]`" to "Output `NO_ISSUES`" - Consistent across all three analysis agents 4. Documentation trigger clarity - Changed to "ONLY run if BOTH security AND bugs returned `NO_ISSUES`" - Unambiguous trigger condition 5. Better duplicate detection - Changed from downloading 100 issues to `gh issue list --search` - Searches by file path in issue body 6. Sub-agent failure handling - Added explicit error handling strategy - Log errors and continue with other agents - Report partial results if some agents fail - Added FAILED status to reporter output
0620d32 to
50a396e
Compare
|
/review |
There was a problem hiding this comment.
Review Summary
I've analyzed the PR and found 2 issues worth noting:
-
Logic flaw in documentation agent execution - Documentation issues will only be detected when there are NO security or bug issues, which contradicts the stated goal of comprehensive nightly scanning.
-
Incomplete error handling for reporter agent failures - The orchestrator has error handling for analysis agents but not for the final reporter agent, which could lead to unreported failures when creating GitHub issues.
Both issues are in the agent configuration and represent design flaws rather than critical bugs. The workflow will function but may not behave as comprehensively as intended.
- Add rationale comment for documentation agent prioritization (only runs when no bugs/security issues to reduce noise) - Add error handling for reporter agent failures - Add output format for failed issue creation
Summary
Adds an automated nightly codebase scanner that uses a multi-agent architecture to detect code quality issues and automatically create GitHub issues for findings.
Key features:
Haiku), and issue reporter (Claude Haiku)
ghCLI with duplicate detection and a strict 2-issue-per-run limitArchitecture
Files Changed
.github/agents/nightly-scanner.yaml- Multi-agent configuration with instructions, toolsets, and permissions.github/workflows/nightly-scan.yml- GitHub Actions workflow (runs daily at 6am UTC, supports dry-run mode)Test plan
dry-run: trueto verify agents run without creating issuesdry-run: trueon a branch with a known issue to verify issue creation