feat: add /bug-analysis workflow with static and semantic analysis by dean0x · Pull Request #226 · dean0x/devflow

dean0x · 2026-05-23T09:43:19Z

Summary

Adds a new devflow-bug-analysis plugin with a /bug-analysis command for proactive bug finding before merge
Combines tiered static analysis (semgrep, snyk, codeql) with parallel semantic BugAnalyzer agents across four focus areas: security, functional, integration, and usability
Extends /resolve to fall back to bug-analysis reports when no code review exists

Changes

New files:

shared/agents/bug-analyzer.md — Opus-powered BugAnalyzer agent with 5-step methodology (read diff → load plan → focus-specific analysis → self-verify → classify). Iron Law: every finding verified against actual code before reporting
plugins/devflow-bug-analysis/.claude-plugin/plugin.json — Plugin manifest
plugins/devflow-bug-analysis/commands/bug-analysis.md — 7-phase orchestration command (pre-flight, static analysis, context loading, file analysis, parallel bug analysis, synthesis, finalize)

Modified files:

shared/agents/synthesizer.md — Added bug-analysis mode: reads focus reports, cross-tracks confidence with static findings, deduplicates same file:line across analyzers, merges acceptance criteria coverage
shared/skills/resolve:orch/SKILL.md — Phase 1 extended to fall back to .devflow/docs/bug-analysis/ when no unresolved review found
plugins/devflow-resolve/commands/resolve.md — Step 0c extended with bug-analysis fallback (reviews take priority)
src/cli/plugins.ts — devflow-bug-analysis registered in DEVFLOW_PLUGINS (now 21 plugins)
CLAUDE.md — Plugin table updated (21 plugins, row added for devflow-bug-analysis)

Breaking Changes

None. All changes are additive. /resolve behavior is unchanged when a code review report exists.

Reviewer Focus Areas

Bug analyzer agent (shared/agents/bug-analyzer.md): confidence scale matches reviewer.md pattern; plan-context modifier (+10% for acceptance rule citations, -15% ceiling without plan) is new — verify it is calibrated correctly
Synthesizer bug-analysis mode: cross-tracking logic (static-confirmed +10%, same file:line dedup +10% per agent) — verify the merge rules make sense
Command phase structure (bug-analysis.md): Step 2 sub-steps all have Produces/Requires annotations (required by ambient test); verify static tool invocations are correct CLI usage
Resolve fallback: reviews take priority over bug-analysis — verify the priority logic in Step 0c-5b is sound and edge cases are covered

New devflow-bug-analysis plugin providing proactive bug finding before merge: - BugAnalyzer agent (Opus) with 5-step methodology: read diff, load plan context, focus-specific analysis, self-verify every finding, classify and report - Four focus areas: security (+ static findings integration), functional, integration (conditional), usability (conditional) - bug-analysis mode added to Synthesizer for cross-agent deduplication and confidence boosting - /bug-analysis command: incremental analysis, tiered static tools (semgrep -> snyk -> codeql), plan-based acceptance criteria coverage - /resolve extended to fall back to bug-analysis reports when no code review exists - Plugin registered in DEVFLOW_PLUGINS (21 plugins total) Co-Authored-By: Claude <noreply@anthropic.com>

- Add bug-analyzer.md to .gitignore (distributed agent copy should not be tracked) - Remove tracked distributed copy from git index - Update CLAUDE.md: shared agents count 14->15, plugin count 20->21 in project structure, add /bug-analysis to command roster, add bug-analysis directory to docs artifacts tree, add BugAnalyzer to persisting agents, add bug-analyzer to model strategy - Fix plugin.json version 1.0.0 -> 1.8.3 to match monorepo convention

…ch and review documentation for the /bug-analysis feature implementation on feat/bug-analysis branch.

dean0x · 2026-05-23T19:55:31Z

🔴 CRITICAL SECURITY: Symlink attack vulnerability via predictable /tmp paths

File: plugins/devflow-bug-analysis/commands/bug-analysis.md (lines 111-112)
Confidence: 85%

The CodeQL step uses hardcoded /tmp/codeql-db and /tmp/codeql-results.sarif paths. On multi-user systems, an attacker could place a symlink at /tmp/codeql-db before the command runs, redirecting database creation to an arbitrary location. Concurrent /bug-analysis runs would also clobber each other's results.

Fix:

CODEQL_TMP=$(mktemp -d) && \
codeql database create "${CODEQL_TMP}/codeql-db" --language={detected-language} --source-root=. 2>/dev/null && \
codeql database analyze "${CODEQL_TMP}/codeql-db" --format=sarif-latest --output="${CODEQL_TMP}/codeql-results.sarif" 2>/dev/null
# Clean up after parsing: rm -rf "${CODEQL_TMP}"

Review by Claude Code / devflow:security

dean0x · 2026-05-23T19:55:33Z

🔴 CRITICAL SECURITY: Shell injection via unquoted filename expansion

File: plugins/devflow-bug-analysis/commands/bug-analysis.md (lines 98-99)
Confidence: 82%

The CHANGED_FILES variable is built by space-joining filenames, then expanded unquoted into the semgrep command:

CHANGED_FILES=$(git diff --name-only {DIFF_RANGE} | tr '\n' ' ')
semgrep scan --config auto --sarif --quiet {CHANGED_FILES} 2>/dev/null

If a tracked file has shell metacharacters (spaces, semicolons, backticks, $()) in its name, the expansion could split incorrectly or execute injected commands.

Fix:

git diff --name-only {DIFF_RANGE} | xargs -d '\n' semgrep scan --config auto --sarif --quiet 2>/dev/null

Review by Claude Code / devflow:security

dean0x · 2026-05-23T19:55:35Z

🟠 SECURITY: Incomplete exclusion list in resolve fallback

Files: plugins/devflow-resolve/commands/resolve.md (lines 112-114), shared/skills/resolve:orch/SKILL.md (line 65)
Confidence: 83%

When /resolve falls back to bug-analysis directories, it needs to exclude bug-analysis-specific files:

review-summary.md (already excluded)
resolution-summary.md (already excluded)
bug-analysis-summary.md (synthesizer output — NEW)
static-findings.md (raw tool output — NEW)

Without these, Resolver agents will attempt to process synthesizer meta-commentary and raw static tool output as actionable issues.

Fix: Update the exclusion list in both files:

**Exclude from issue extraction:**
- `review-summary.md`
- `resolution-summary.md`
- `bug-analysis-summary.md`
- `static-findings.md`

Review by Claude Code / devflow:security

dean0x · 2026-05-23T19:55:37Z

🔴 CRITICAL RELIABILITY: No execution timeout on static analysis tools

File: plugins/devflow-bug-analysis/commands/bug-analysis.md (lines 97-113)
Confidence: 90%

The three static analysis invocations have no maximum execution time:

semgrep scan
snyk code test
codeql database create + analyze

On large codebases, these tools can run for minutes to hours. A single /bug-analysis run could hang indefinitely.

Fix: Add explicit timeouts to each invocation:

timeout 120 semgrep scan --config auto --sarif --quiet {CHANGED_FILES} 2>/dev/null
timeout 180 snyk code test --sarif . 2>/dev/null
timeout 300 codeql database create ... 2>/dev/null
timeout 300 codeql database analyze ... 2>/dev/null

Document the timeout budget in the Edge Cases table.

Review by Claude Code / devflow:reliability

dean0x · 2026-05-23T19:55:39Z

🟠 RELIABILITY: Unbounded character size for STATIC_FINDINGS

File: plugins/devflow-bug-analysis/commands/bug-analysis.md (lines 116-122)
Confidence: 85%

The STATIC_FINDINGS table is capped at 50 rows by severity, but each finding's Description column has no length limit. A noisy project could pass a very large table that consumes significant context in the BugAnalyzer agents.

Fix: Add a token/character budget:

Cap STATIC_FINDINGS at:
- Top 50 rows by severity, AND
- Total 4000 characters
- Truncate Description column to 120 characters per row if needed

Review by Claude Code / devflow:reliability

dean0x · 2026-05-23T19:55:41Z

🟡 RELIABILITY: Unbounded directory scan in resolve fallback

File: plugins/devflow-resolve/commands/resolve.md (lines 75-81)
Confidence: 80%

The bug-analysis fallback lists all directories under .devflow/docs/bug-analysis/{branch-slug}/ with no upper bound. A user who runs /bug-analysis frequently could accumulate hundreds of timestamped directories.

Fix: Add a scan limit to Step 5b:

Scan the 10 most recent directories (by name, descending).
If none qualify within the last 10, report 'No unresolved bug analysis found.'

Review by Claude Code / devflow:reliability

dean0x · 2026-05-23T19:55:43Z

🟠 ARCHITECTURE: Resolve fallback creates implicit coupling

Files: plugins/devflow-resolve/commands/resolve.md (lines 71-81), shared/skills/resolve:orch/SKILL.md (lines 33-39)
Confidence: 85%

The resolve command now contains hardcoded knowledge of bug-analysis internals:

Directory structure: .devflow/docs/bug-analysis/{branch-slug}/
Focus file names: security.md, functional.md, integration.md, usability.md

This creates tight coupling — if bug-analysis adds a new focus area, resolve must be updated in lockstep. File names are duplicated across three locations: bug-analysis command, resolve command, and resolve:orch skill.

Fix: Replace hardcoded focus names with glob pattern detection:

Contains at least one *.md file (excluding bug-analysis-summary.md, static-findings.md, resolution-summary.md)

This matches the pattern already used for review directories.

Review by Claude Code / devflow:architecture

dean0x · 2026-05-23T19:55:45Z

🟠 ARCHITECTURE: BugAnalyzer missing pattern skill declarations

Files: shared/agents/bug-analyzer.md (lines 1-11), plugins/devflow-bug-analysis/.claude-plugin/plugin.json (lines 25-29)
Confidence: 85%

The BugAnalyzer agent declares skills in frontmatter but the plugin.json manifest is incomplete:

Agent frontmatter declares:

devflow:security
devflow:reliability
devflow:worktree-support
devflow:apply-decisions
devflow:apply-feature-knowledge

plugin.json skills array only has:

agent-teams
worktree-support
apply-feature-knowledge

Missing from plugin.json:

apply-decisions
security (should list all pattern skills)
reliability

Also note: The agent has 4 focus areas (security, functional, integration, usability) but only loads pattern skills for security/reliability. The functional, integration, and usability focuses lack specialized pattern skill support (vs. how Reviewer loads specific pattern skills per focus).

Fix: Add missing declarations to plugin.json:

"skills": [
  "agent-teams",
  "worktree-support",
  "apply-feature-knowledge",
  "apply-decisions",
  "security",
  "reliability"
]

Review by Claude Code / devflow:architecture

dean0x · 2026-05-23T19:56:03Z

Code Review Summary

PR: #226 — Add /bug-analysis workflow with static and semantic analysis
Branch: feat/bug-analysis → main
Status: CHANGES_REQUESTED

Findings Summary

Category	HIGH	MEDIUM	Total
Blocking	4	1	5
Should Fix	0	3	3
Pre-existing	0	1	1

🔴 Blocking Issues (≥80% confidence)

Eight high-confidence blocking issues have been posted as detailed comments above:

Security: Symlink attack via predictable /tmp paths (85%)
Security: Shell injection via unquoted filename expansion (82%)
Security: Incomplete resolve exclusion list (83%)
Reliability: Missing execution timeouts on static tools (90%)
Reliability: Unbounded STATIC_FINDINGS character size (85%)
Reliability: Unbounded directory scan in resolve fallback (80%)
Architecture: Resolve fallback implicit coupling (85%)
Architecture: BugAnalyzer incomplete skill declarations (85%)

🟡 Lower-Confidence Suggestions (60-79%)

These findings were below the 80% threshold but are worth noting:

Ambiguous/Deferred Issues

PR_DESCRIPTION untrusted content handling (70%, security.md line 54)
- The command captures PR_DESCRIPTION without a containment marker note, though agent-level protection exists. Mitigated by explicit agent instructions.
No cleanup of CodeQL artifacts on partial failure (65%, security.md line 56)
- If CodeQL database creation succeeds but analysis fails, /tmp/codeql-db remains (potential resource leak). Minor concern given the mktemp fix above will isolate this.
Bug-analysis lacks multi-worktree support (72%, architecture.md line 62)
- Unlike /code-review and /resolve, /bug-analysis only operates on the current branch. May be intentional for v1.
Static analysis runs in orchestrator, not agent (65%, architecture.md line 64)
- Phase 2 (static analysis) executes directly in the command orchestrator rather than delegating to an agent. This blurs the project convention of "commands spawn agents only." The static phase is infrastructure setup, so this is arguably acceptable.
No apply-decisions in plugin.json (70%, architecture.md line 67)
- Same as issue 🔒 SECURITY: Implement script integrity verification #8 above — the agent declares it but plugin.json doesn't list it. Works due to universal skill installation but understates dependencies.
SARIF parsing has no error recovery (70%, reliability.md line 46)
- If SARIF output is malformed or truncated, parsing fails silently. Consider documenting a "SARIF parse failure" edge case.
BugAnalyzer finding count uncapped (65%, reliability.md line 48)
- No maximum on findings a single agent reports. Noisy diffs could consume excessive Synthesizer context. The consolidation rules (3+ similar = group) help but don't guarantee a bound.

📋 Pre-existing Issues

One pre-existing concern was flagged:

Synthesizer agent accumulates responsibilities (82%, architecture.md line 55)
- The agent now handles 6 modes (exploration, planning, review, bug-analysis, design, research). Adding bug-analysis mode pushed it to 401 lines, exceeding the 300-line warning threshold. No immediate refactoring needed, but growth should be monitored.

✅ Well-Executed Areas

The new plugin demonstrates strong architecture in several ways:

Proper phase decomposition: Bug-analysis command follows the established Phase Protocol with Produces/Requires annotations
Honest security practices: PR_DESCRIPTION correctly wrapped in containment markers; static findings validated before reporting
Incremental by default: Built-in support for repeat analyses without reprocessing unchanged code
Proper agent delegation: Synthesizer and BugAnalyzer agents handle analysis; command is orchestration-only
Clear /resolve integration: Bug-analysis fallback maintains priority semantics (reviews > bug-analysis)

🎯 Recommended Next Steps

Fix all HIGH blocking issues immediately — they are within the scope of this PR and non-disruptive
MEDIUM issues — Also addressable in this PR (resolve exclusion list, implicit coupling fixes)
Review the lower-confidence (60-79%) suggestions — Some warrant immediate fixes (SARIF error recovery, CodeQL cleanup); others can be deferred (multi-worktree support, agent-level concerns)

Review by Claude Code / devflow

…ssues - sec-1: Replace hardcoded /tmp/codeql-db and /tmp/codeql-results.sarif with mktemp -d unique temp directory to prevent symlink attacks and concurrent process clobbering; add rm -rf cleanup in always-runs position - sec-2: Replace unquoted CHANGED_FILES expansion into semgrep with xargs -d newline to prevent shell injection from filenames with metacharacters - perf-1: Scope snyk code test to changed files via xargs instead of scanning entire project directory - rel-1: Wrap semgrep, snyk, and codeql with timeout command to prevent multi-hour hangs (300s fast tools, 600s CodeQL) - rel-2: Add 200-character truncation to Description column entries to bound serialized size of STATIC_FINDINGS passed to BugAnalyzer agents Co-Authored-By: Claude <noreply@anthropic.com>

…roduced when the /bug-analysis plugin was added: - Directory tree omitted resolution-summary.md from the bug-analysis path - Persisting agents line listed only the reviews-mode Resolver output path - Incremental Reviews paragraph made no mention of /bug-analysis and its analogous .last-analysis-head tracking mechanism

…solve:orch - Extend exclusion list in both resolve.md and resolve:orch/SKILL.md to exclude bug-analysis-summary.md and static-findings.md, preventing parse failures when /resolve targets a bug-analysis directory - Update Step 0b blocked error message to suggest /bug-analysis alongside /code-review - Add explicit 10-directory scan bound to bug-analysis fallback search in both files Co-Authored-By: Claude <noreply@anthropic.com>

…patibility - Change Bugs Found section (CRITICAL/HIGH/MEDIUM/LOW headers) to the 3-category structure (BLOCKING / Should Fix / Pre-existing) that resolve:orch and /resolve use when parsing per-focus .md reports. CRITICAL/HIGH map to Blocking, MEDIUM to Should Fix, LOW to Pre-existing. - Add devflow:regression, devflow:consistency, devflow:complexity skills to BugAnalyzer frontmatter so functional/integration/usability focuses have pattern skills (mirrors how Reviewer loads focus-specific skills). - Remove agent-teams from devflow-bug-analysis plugin.json skills array since no -teams.md command variant exists for V1. Co-Authored-By: Claude <noreply@anthropic.com>

…ve fallback - Add devflow-bug-analysis plugin registration test in tests/plugins.test.ts, covering agents (git, bug-analyzer, synthesizer), skills (agent-teams, worktree-support, apply-feature-knowledge), command, and non-optional flag - Create tests/bug-analysis/structural.test.ts (36 tests) covering all 7 phases with Produces/Requires annotations, incremental .last-analysis-head semantics, static tool safety patterns (xargs, mktemp, timeout bounds, 200-char truncation), BugAnalyzer parallel spawning, Synthesizer mode, and resolve compatibility - Add tests/resolve/bug-analysis-fallback.test.ts (13 tests) covering priority invariant (reviews before bug-analysis), Step 0c-5b fallback path, 10-directory scan limit, exclusion list (bug-analysis-summary.md, static-findings.md), and error message guidance for both /code-review and /bug-analysis - Fix plugin.json skills[] to include agent-teams, matching plugins.ts registry (resolves pre-existing skill-references.test.ts failure) Co-Authored-By: Claude <noreply@anthropic.com>

…solve:orch Align resolve.md documentation with resolve:orch skill Phase Protocol, correct FIXMEs, and consolidate documentation to the skill source. Update bug-analyzer agent output format and resolve:orch Phase completion to account for bug-analysis workflow mode. Co-Authored-By: Claude <noreply@anthropic.com>

…eview cycle Update decisions.md, pitfalls.md, and feature knowledge bases with learnings from the bug-analysis implementation review. Co-Authored-By: Claude <noreply@anthropic.com>

…Resolution summary documenting the fixes and improvements made to bug-analysis implementation during the 2026-05-23 review cycle. Co-Authored-By: Claude <noreply@anthropic.com>

dean0x · 2026-05-24T06:45:18Z

PR Comment: bug-analysis.md line 99

xargs portability issue (reliability + security)

The -d '\n' flag is a GNU extension not available on macOS default xargs. Since devflow targets macOS (darwin platform), this fails with xargs: illegal option -- d on the primary development environment.

Impact: Static analysis silently never runs on macOS, undermining the hybrid static+semantic architecture.

Fix: Use tr '\n' '\0' | xargs -0 which is portable across GNU and BSD:

git diff --name-only {DIFF_RANGE} | tr '\n' '\0' | xargs -0 timeout 300 semgrep scan --config auto --sarif --quiet 2>/dev/null

Also update line 98 comment to say 'NUL-delimited' to match the actual implementation.

Confidence: 92% (reliability) + 82% (security: flag injection at line 106 uses same pattern). Flagged by: reliability, security, performance reviewers.

dean0x · 2026-05-24T06:45:20Z

PR Comment: bug-analysis.md line 106

Snyk per-file O(n) performance + flag injection vulnerability

Two issues compound here:

Performance (85% confidence): snyk code test runs once per file via xargs -I{}. Each invocation pays fixed startup cost, turning O(1) batch into O(n). With 20+ files, worst-case timeout is 300s × n instead of 300s.
Security (82% confidence): The -I{} pattern allows flag injection. A filename like --json-file-output=/tmp/evil would be parsed as a CLI flag, potentially altering behavior or writing to attacker-controlled locations.

Fix: Run Snyk Code once with project-level scan, then filter results to changed files:

timeout 300 snyk code test --sarif 2>/dev/null

Filter SARIF output to only include findings in files from git diff --name-only {DIFF_RANGE}.

Confidence: 85% (performance, HIGH) + 82% (security, MEDIUM). Flagged by: performance, security, reliability, architecture reviewers.

dean0x · 2026-05-24T06:45:22Z

PR Comment: bug-analysis.md line 119

CodeQL SARIF ordering ambiguity (reliability)

The bash code (lines 117-119) runs rm -rf "${CODEQL_TMP}" immediately after capturing exit status, but prose (line 121) says 'Parse SARIF output... immediately after... and before the rm -rf cleanup.' An orchestrator following the code literally would delete before parsing, losing results.

Fix: Restructure to make read-before-delete unambiguous:

CODEQL_TMP=$(mktemp -d)
timeout 600 codeql database create "${CODEQL_TMP}/db" --language={detected-language} --source-root=. 2>/dev/null && \
timeout 600 codeql database analyze "${CODEQL_TMP}/db" --format=sarif-latest --output="${CODEQL_TMP}/results.sarif" 2>/dev/null
CODEQL_EXIT=$?
# Parse SARIF output HERE — before cleanup

Then add cleanup block after parsing.

Confidence: 88% (HIGH severity, reliability). Flagged by: reliability reviewer.

dean0x · 2026-05-24T06:45:24Z

PR Comment: bug-analysis.md line 152

Plan artifact scan limit inconsistency (reliability)

Step 3 lists .devflow/docs/design/*.md with no upper bound, but resolve command's directory scans explicitly bound to '10 most recent' in both resolve.md (line 71) and resolve:orch (line 29). This creates reliability variance — one path bounded, other unbounded.

Fix: Add 10-directory scan bound to match the pattern:

1. List `.devflow/docs/design/*.md` — sort descending by filename (timestamps are naturally sortable), scan the 10 most recent

Confidence: 80% (MEDIUM severity, reliability). Flagged by: reliability reviewer.

dean0x · 2026-05-24T06:45:26Z

PR Comment: bug-analysis.md line 66

Redundant git diff invocations across phases (performance)

git diff --name-only {DIFF_RANGE} is computed independently in Step 2b (line 66), piped to Semgrep (line 99), piped to Snyk (line 106), and again in Phase 4 (line 166). Each invocation walks the same git object graph — four separate subprocess spawns for identical output.

This also creates consistency risk: if working tree changes between invocations (e.g., parallel process commits), different phases could see different file lists.

Fix: Compute once and reuse as variable:

# Step 2b: compute once, reuse everywhere
CHANGED_FILES=$(git diff --name-only {DIFF_RANGE})
if [ -z "$CHANGED_FILES" ]; then echo "No changes to analyze."; exit 0; fi

Then reference $CHANGED_FILES in Steps 2d and Phase 4 instead of re-running git command.

Confidence: 82% (MEDIUM severity, performance). Flagged by: performance reviewer.

dean0x · 2026-05-24T06:45:41Z

PR Comment: shared/agents/bug-analyzer.md line 113

BugAnalyzer severity-to-category mapping conflates code location with severity (architecture)

The category mapping uses severity as a proxy for location:

CRITICAL/HIGH → 'Issues in Your Changes (BLOCKING)'
MEDIUM → 'Issues in Code You Touched (Should Fix)'
LOW → 'Pre-existing Issues (Not Blocking)'

But the Reviewer agent (established pattern) determines categories based on where the issue occurs (in your changes vs same function vs untouched code), not severity. A CRITICAL bug in untouched code should still be 'Pre-existing' per review methodology.

While BugAnalyzer focuses on changed code (diff-first principle per line 124), the semantic mapping creates downstream parsing confusion — /resolve may misclassify findings as pre-existing and give lower priority.

Fix: Document that all BugAnalyzer findings are in changed code by design, so severity-to-category mapping is reasonable for /resolve compatibility. Add clarifying comment:

**Category mapping** (for /resolve compatibility — approximation since BugAnalyzer focuses on diff-changed code):

Confidence: 82% (MEDIUM severity, architectural). Flagged by: architecture, regression reviewers.

dean0x · 2026-05-24T06:45:43Z

PR Comment: shared/agents/bug-analyzer.md line 188

Bug-analyzer summary table format diverges from reviewer pattern (consistency)

The output template uses flat Severity | Count summary table, but reviewer agent (established pattern) uses Category | CRITICAL | HIGH | MEDIUM | LOW matrix. Since BugAnalyzer was explicitly changed to adopt reviewer's 3-category structure, summary table should follow same matrix pattern for downstream parser consistency.

/resolve pipeline and Synthesizer parse these reports programmatically — different table shape creates parsing inconsistency between review and bug-analysis reports.

Fix: Replace with reviewer's matrix format:

## Summary
| Category | CRITICAL | HIGH | MEDIUM | LOW |
|----------|----------|------|--------|-----|
| Blocking | {n} | {n} | - | - |
| Should Fix | - | - | {n} | - |
| Pre-existing | - | - | - | {n} |
| Suggestions | {n} | - | - | - |

**{Focus} Risk**: {CRITICAL | HIGH | MEDIUM | LOW | CLEAN}

Confidence: 85% (HIGH severity, consistency). Flagged by: consistency reviewer.

dean0x · 2026-05-24T06:45:45Z

PR Comment: shared/agents/bug-analyzer.md line 197

Bug-analyzer report omits Recommendation footer present in reviewer reports (consistency)

Reviewer agent's template ends with both **{Focus} Score**: {1-10} and **Recommendation**: {BLOCK | CHANGES_REQUESTED | APPROVED_WITH_CONDITIONS | APPROVED}.

Bug-analyzer has **{Focus} Risk**: {CRITICAL | HIGH | MEDIUM | LOW | CLEAN} but omits Recommendation footer. Risk label is semantically appropriate for bug analysis, but missing footer means Synthesizer must handle two distinct footer schemas when aggregating review and bug-analysis reports. This could cause Synthesizer to skip recommendation.

Fix: Add recommendation line after Risk line:

**{Focus} Risk**: {CRITICAL | HIGH | MEDIUM | LOW | CLEAN}
**Recommendation**: {BLOCK | CHANGES_REQUESTED | APPROVED_WITH_CONDITIONS | APPROVED}

Confidence: 82% (HIGH severity, consistency). Flagged by: consistency reviewer.

dean0x · 2026-05-24T06:45:56Z

PR Comment: shared/skills/resolve:orch/SKILL.md line 29 + plugins/devflow-resolve/commands/resolve.md line 71

Inconsistent scan limit between resolve.md and resolve:orch for review directory discovery (regression)

resolve:orch Phase 1 (line 29) now scans 'the 10 most recent' review directories, but resolve command's Step 0c (line 71) still uses 'select the latest' with no scan limit. Both correctly applied 10-directory limit for bug-analysis fallback, but only resolve:orch applied it to primary review directory scan.

This behavioral divergence is a regression risk — two resolve paths should agree on directory scanning behavior. /resolve (command) scans all reviews, while /resolve in ambient mode (via resolve:orch) limits to 10. If user has >10 resolved review directories, 11th-most-recent unresolved review could be missed.

Fix: Add 10-directory scan limit to resolve.md Step 0c to match resolve:orch:

3. **Otherwise:** sort directories by name descending (timestamps are naturally sortable), scan the 10 most recent directories only. Select the first that contains `review-summary.md` (complete review)

Confidence: 85% (HIGH severity, regression). Flagged by: regression, consistency reviewers.

dean0x · 2026-05-24T06:45:58Z

PR Comment: plugins/devflow-bug-analysis/.claude-plugin/plugin.json line 25

Plugin.json missing skill declarations for agent dependencies (consistency)

The bug-analyzer agent frontmatter declares 5 focus-specific skills (devflow:security, devflow:reliability, devflow:regression, devflow:consistency, devflow:complexity) and devflow:apply-decisions, but none appear in plugin.json skills array.

While Universal Skill Installation ensures they are present at runtime regardless, manifest gap means plugin's declared dependencies are incomplete — a developer inspecting plugin.json would not see full dependency surface.

The devflow-code-review plugin declares all its pattern skills, making bug-analysis plugin inconsistent with established convention.

Fix: Add missing skills to skills array in plugin.json:

"skills": [
  "agent-teams",
  "worktree-support",
  "apply-feature-knowledge",
  "apply-decisions",
  "security",
  "reliability",
  "regression",
  "consistency",
  "complexity"
]

Confidence: 85% (MEDIUM severity, consistency/architecture). Flagged by: consistency, architecture reviewers.

dean0x · 2026-05-24T06:46:14Z

PR Review Summary: /bug-analysis Workflow

Overall Assessment: CHANGES_REQUESTED (7 HIGH-confidence issues block merge; 3 MEDIUM should-fix; architectural design sound)

High-Confidence Issues (≥80%, Block Merge)

Reliability & Security (PRIMARY BLOCKERS)

xargs -d '\n' portability (92%) — Silent failure on macOS (primary platform). Use tr | xargs -0 (bug-analysis.md:99)
CodeQL SARIF read-after-delete (88%) — Ambiguous prose vs code order risks losing analysis results (bug-analysis.md:119)
Snyk per-file O(n) performance (85%) — Scales timeout linearly with file count; flag injection vector (bug-analysis.md:106)
Review directory scan limit divergence (85%) — resolve:orch applies limit, resolve.md doesn't; behavioral mismatch (resolve.md:71, resolve:orch/SKILL.md:29)

Consistency (PARSER COMPATIBILITY)
5. Summary table format divergence (85%) — Flat table vs 3-category matrix; Synthesizer expects consistent shape (bug-analyzer.md:188)
6. Missing Recommendation footer (82%) — Synthesizer must handle two footer schemas; may skip recommendation (bug-analyzer.md:197)
7. Plugin.json missing skill declarations (85%) — Agent declares skills, manifest doesn't list them; breaks discoverability (plugin.json:25)

Architecture & Semantics
8. Severity-to-category conflation (82%) — Maps severity to location, but Reviewer maps location to category; semantic mismatch (bug-analyzer.md:113)

Medium-Priority Issues (60-79% or SHOULD_FIX)

Redundant git diff invocations (82%) — Four subprocess spawns for identical output; consistency risk if working tree changes (bug-analysis.md:66)
Plan artifact scan limit inconsistency (80%) — No bound like resolve's 10-directory limit (bug-analysis.md:152)

Architecture Notes

✓ Design strengths: Parallel BugAnalyzer agents, incremental diffing, static + semantic hybrid, proper separation from Evaluator (applies ADR-004, ADR-006)

✓ Security: Strong injection prevention in PR_DESCRIPTION containment markers, timeouts on all static tools, mktemp for CodeQL isolation

⚠ Pre-existing pattern inconsistency: resolve:orch and resolve.md now diverge on scan limits — this PR surfaces that inconsistency

Inline Comments

See PR comments above for specific fixes and code examples for each issue. All HIGH-confidence findings have suggested remediation with confidence levels and reviewer attribution.

Next Steps: Address HIGH-confidence issues (estimated 2-3 hours). Consider opening a separate PR for medium-priority issues (redundant git diff, plan scan bound) which are improvements rather than blockers.

Claude Code | Devflow PR Review

…ng, and redundant git diff Replace xargs -d which is GNU-only and broken on macOS BSD xargs with tr plus xargs -0 for portability. Replace per-file snyk invocations with a single project-level snyk code test scan followed by SARIF filtering. Compute CHANGED_FILES once in Step 2b and reference it throughout to eliminate 3 redundant git diff calls. Capture CODEQL_SARIF before rm -rf to fix the ordering bug that would delete results before parsing. Document Semgrep and Snyk parallelism. Co-Authored-By: Claude <noreply@anthropic.com>

…ummary table with reviewer format - Document severity→category mapping as approximation (all BugAnalyzer findings are in diff-changed code, so location-based categories don't apply literally; LOW is placed in Pre-existing for urgency signaling only) - Replace flat Severity|Count summary table with reviewer's matrix format (Category × CRITICAL/HIGH/MEDIUM/LOW) for downstream parser consistency - Add Recommendation footer line (BLOCK/CHANGES_REQUESTED/APPROVED_WITH_CONDITIONS/APPROVED) to match reviewer agent schema used by Synthesizer and /resolve Co-Authored-By: Claude <noreply@anthropic.com>

…gin manifest - resolve.md Step 0c: add '10 most recent directories only' scan bound to primary reviews path, matching the existing bug-analysis fallback (5b). Previously the reviews path had no limit; the fallback did — creating a behavioral divergence when more than 10 directories exist. - CLAUDE.md: correct misleading claim that both /code-review and /bug-analysis auto-discover worktrees. /bug-analysis is single-worktree only; /code-review does full multi-worktree discovery. - devflow-bug-analysis plugin.json: add the 6 skills that bug-analyzer.md declares in its frontmatter (security, reliability, regression, consistency, complexity, apply-decisions) so the manifest matches the code-review plugin pattern and is complete for discoverability. - resolve:orch SKILL.md: add parenthetical to Phase 5 reference so cross-references to resolve.md Phase 4 are unambiguous. - bug-analysis.md Plan Artifact: add '10 most recent' scan bound to the design listing step, matching the established pattern in resolve.md. Co-Authored-By: Claude <noreply@anthropic.com>

…rage - Add resolve:orch describe block to bug-analysis-fallback.test.ts asserting Phase 1 bug-analysis fallback, 10-directory scan limit, exclusion list (bug-analysis-summary.md, static-findings.md), and error handling mentions both /code-review and /bug-analysis (resolves resolve-orch:29) - Add Groups 7-8 to structural.test.ts: output format section headers (Issues in Your Changes/Code You Touched/Pre-existing), severity-to-category mapping, summary matrix, Recommendation footer, and frontmatter skill declarations (regression, consistency, complexity) (resolves bug-analyzer:111, bug-analyzer:8) - Replace fragile if/else conditional with unconditional extractSection assert on ## Edge Cases section in resolve.md (resolves fallback-test:112) - Remove redundant loadFile call on line 34; reuse existing content variable (resolves fallback-test:34) - Fix resolve:orch SKILL.md to use {worktree} placeholder in decisions-index.cjs invocation (was "." — breaks worktree-aware resolution) - Align devflow-bug-analysis entry in plugins.ts with plugin.json skills array (was missing apply-decisions, complexity, consistency, regression, reliability, security) Co-Authored-By: Claude <noreply@anthropic.com>

…g-analysis tests

dean0x · 2026-05-24T09:12:32Z

tests/resolve/bug-analysis-fallback.test.ts:126

Unsafe .slice(.search()) pattern (90% confidence)\n\nThe test uses .slice(phase1.search(/bug.analysis/i)) but .search() returns -1 if the anchor is not found. This causes .slice(-1) to return only the last character, which could silently pass the subsequent .toMatch() assertion.\n\nFix: Add explicit guard before slicing:\ntypescript\nconst bugAnalysisIdx = phase1.search(/bug.analysis/i);\nexpect(bugAnalysisIdx, "Phase 1 should contain bug-analysis reference").not.toBe(-1);\nconst bugAnalysisPart = phase1.slice(bugAnalysisIdx);\nexpect(bugAnalysisPart).toMatch(/10\\s+(most recent|directories)|scan.*10|10.*scan/i);\n\n\nThis pattern also exists at line 42 (pre-existing, same fix applies).\n\n— Claude Code

dean0x · 2026-05-24T09:12:35Z

tests/plugins.test.ts:251

Missing plugin skill assertions for newly added skills (85% confidence)\n\nThe test checks for agent-teams, worktree-support, and apply-feature-knowledge but PR added 6 new skills: apply-decisions, complexity, consistency, regression, reliability, security. These are declared in bug-analyzer.md frontmatter but not verified in the test.\n\nFix: Expand assertions to cover the new skills:\ntypescript\nexpect(bugAnalysis!.skills).toContain("apply-decisions");\nexpect(bugAnalysis!.skills).toContain("security");\nexpect(bugAnalysis!.skills).toContain("reliability");\nexpect(bugAnalysis!.skills).toContain("regression");\nexpect(bugAnalysis!.skills).toContain("consistency");\nexpect(bugAnalysis!.skills).toContain("complexity");\n\n\nWithout this, a future change could silently remove one of these skills from the manifest without test coverage.\n\n— Claude Code

dean0x · 2026-05-24T09:12:38Z

plugins/devflow-bug-analysis/commands/bug-analysis.md:109

Snyk filtering boundary moved to LLM execution (82% confidence)\n\nThe comment states "filter findings to only those whose file path appears in CHANGED_FILES." This filtering is documented but not enforced programmatically—it relies entirely on LLM implementation in the BugAnalyzer agent. If the filtering step is skipped or implemented incorrectly, findings from unrelated files could leak into the security analysis.\n\nFix (defense-in-depth): Add programmatic SARIF filtering in the bash block:\nbash\ntimeout 300 snyk code test --sarif 2>/dev/null | \\\n node -e "const s=JSON.parse(require(\"fs\").readFileSync(0,\"utf8\")); s.runs?.forEach(r=>{r.results=r.results?.filter(res=>CHANGED.has(res.locations?.[0]?.physicalLocation?.artifactLocation?.uri))}); process.stdout.write(JSON.stringify(s))"\n\n\nNote: BugAnalyzers Step 4 self-verification already acts as a secondary filter, but this hardens the boundary per defense-in-depth principle.\n\n— Claude Code

dean0x · 2026-05-24T09:12:54Z

plugins/devflow-bug-analysis/commands/bug-analysis.md:116-128

CodeQL cleanup not guaranteed on orchestrator-level interruption (82% confidence)

The rm -rf "${CODEQL_TMP}" cleanup relies on reaching line 128, but there is no trap handler. If the Bash tool invocation itself is interrupted (context compaction, session abort), the temp directory is orphaned.

Fix: Add explicit trap to guarantee cleanup within the bash block:

CODEQL_TMP=$(mktemp -d)
trap 'rm -rf "${CODEQL_TMP}"' EXIT
timeout 600 codeql database create "${CODEQL_TMP}/db" ...
# ... rest of block

Note: This approach handles cleanup within a single Bash invocation. Cross-invocation cleanup would require orchestrator support, which is beyond this PR's scope.

— Claude Code

dean0x · 2026-05-24T09:12:57Z

shared/agents/bug-analyzer.md:113-118

Severity-to-category mapping conflates location-based and severity-based classification (82% confidence)

The mapping (CRITICAL/HIGH → Blocking, MEDIUM → Should Fix, LOW → Pre-existing) uses severity, not location. A LOW-severity bug in newly-added code gets classified as "Pre-existing (Not Blocking)" despite the developer writing it. This breaks the location-based Iron Law: "If you didn't add it, you don't own it."

Fix: Add a note clarifying the approximation in the agent documentation that all BugAnalyzer findings are developer-owned regardless of severity classification since they appear in changed code.

Alternatively, document in resolve:orch that BugAnalyzer findings bypass location-based category semantics.

— Claude Code

dean0x · 2026-05-24T09:13:00Z

plugins/devflow-bug-analysis/commands/bug-analysis.md (missing section)

Missing Phase Completion Checklist (82% confidence)

The resolve:orch SKILL.md has a Phase Completion Checklist to verify all phases executed. The bug-analysis.md command has a 7-phase pipeline but no checklist. This is a consistency gap with established orchestration command patterns.

Fix: Add a ## Phase Completion Checklist section after the ## Principles section to verify all phases were announced before reporting results (matching resolve:orch pattern).

— Claude Code

dean0x · 2026-05-24T09:13:03Z

tests/bug-analysis/structural.test.ts:298-312

Frontmatter skill assertions incomplete (82% confidence)

Group 8 tests 3 of 8 declared skills: devflow:regression, devflow:consistency, devflow:complexity. Missing assertions for: devflow:security, devflow:reliability, devflow:worktree-support, devflow:apply-decisions, devflow:apply-feature-knowledge.

Fix (option 1 — expand coverage): Add assertions for the missing 5 skills.

Or (option 2 — rename to set accurate expectations): Change describe block title to "frontmatter declares batch-2 skill additions" to document that this group only covers the batch-2 additions.

— Claude Code

dean0x · 2026-05-24T09:13:06Z

tests/resolve/bug-analysis-fallback.test.ts:114-148

Repeated extractSection calls (pattern regression) (80% confidence)

Group 5 (resolve:orch tests) calls extractSection(content, '## Phase 1:', '## Phase 2:') four times and extractSection(content, '## Phase 3:', '## Phase 4:') twice. Prior commits in this PR (eb12a02) deduplicated file loads to describe scope for Groups 1-4. This Group 5 pattern contradicts that refactoring.

Fix: Hoist the repeated extractions to describe scope (matching the deduplication pattern in Groups 1-4) to reduce redundant parsing.

— Claude Code

dean0x · 2026-05-24T09:13:17Z

Summary: Review Findings

Blocking Issues Fixed: 1
Should-Fix Issues: 5
Lower-Confidence Suggestions: 8

✅ Inline Comments Posted (≥80% Confidence)

Testing: Unsafe .slice(.search()) pattern (90%)
Testing: Missing plugin skill assertions (85%)
Regression: Path placeholder inconsistency (85%)
Security: Snyk filtering boundary (82%)
Reliability: CodeQL cleanup trap (82%)
Architecture: BugAnalyzer category mapping (82%)
Architecture: Missing Phase Completion Checklist (82%)
Testing: Frontmatter assertions incomplete (82%)
Testing: Repeated extractSection calls (80%)

💡 Lower-Confidence Suggestions (60-79%)

These are documented in the review reports but fall below the 80% threshold for blocking inline comments:

which vs command -v for tool detection (65%) — bug-analysis.md:80-82 — Use POSIX-portable command -v instead of which
CodeQL temp directory race window (70%) — bug-analysis.md:117-127 — Documented; mktemp already mitigates
PR_DESCRIPTION injection surface (72%) — bug-analysis.md:197 — Containment markers and explicit instructions provide defense
Snyk single-project scan performance (68%) — bug-analysis.md:109-113 — Trades per-file invocations for one scan + filtering; acceptable tradeoff
Phase 4 Requires annotation mismatch (72%) — bug-analysis.md:168 — Phase Protocol annotation says DIFF_RANGE but implementation uses CHANGED_FILES
resolve:orch Phase 2 feature-knowledge path hardcoding (75%) — shared/skills/resolve:orch/SKILL.md:61 — Minor inconsistency with decisions-index call
Feature Knowledge scan unbounded (80% pre-existing) — bug-analysis.md:152-155 — No cap on feature knowledge entries loaded; shared with /code-review
Plan artifact acceptance criteria unbounded (65%) — bug-analysis.md:161 — No row-count cap on parsed criteria

See review reports in .devflow/docs/reviews/feat-bug-analysis/2026-05-24_1205/ for full details.

— Claude Code

…in Phase 2 Phase 2 decisions-index.cjs call used {worktree} as a literal placeholder with no substitution, causing decisions-index.cjs to fail on every resolve:orch invocation. resolve:orch explicitly excludes multi-worktree flow (line 11), so the correct argument is "." (cwd). Aligns with the adjacent feature-knowledge stale check on line 61 which already used ".". Co-Authored-By: Claude <noreply@anthropic.com>

The test at line 251 only asserted the original 3 skills; after batch-2 added apply-decisions, security, reliability, regression, consistency, and complexity to the plugin manifest, no test caught those additions. Add explicit assertions for all 6 so a future manifest change is caught immediately. Co-Authored-By: Claude <noreply@anthropic.com>

… 5 section extractions - Add explicit expect(idx).not.toBe(-1) guards before .slice() at lines 42 and 126 so that missing sub-anchors produce a clear assertion failure rather than silently operating on the last character of the parent string. - Hoist phase1 and phase3 extractSection calls to describe scope in Group 5 (resolve:orch SKILL.md), matching the deduplication pattern established in Groups 1-4 per commit eb12a02. Co-Authored-By: Claude <noreply@anthropic.com>

…phase checklist - snyk-filtering-boundary: Add jq-based programmatic SARIF filter on Snyk output to restrict findings to CHANGED_FILES before LLM processing. Implements defense-in- depth per ADR-006 (hybrid static + LLM): filtering boundary is now programmatic (data layer) + LLM (BugAnalyzer Step 4), not LLM alone. Falls back to raw SARIF if jq is unavailable. - codeql-cleanup-trap: Register `trap 'rm -rf "${CODEQL_TMP}"' EXIT INT TERM` immediately after mktemp so orphaned temp directories cannot accumulate on SIGTERM/SIGINT or session crash. Explicit rm -rf is retained as belt-and-suspenders; trap is cleared afterward. - missing-phase-checklist: Add ## Phase Completion Checklist section after Principles, listing all 7 phases with verification criteria. Follows the resolve:orch pattern (SKILL.md lines 161-174) so agents cannot silently skip phases. Co-Authored-By: Claude <noreply@anthropic.com>

…nd fix adoption test

… Add devflow-bug-analysis (plus explore, research, release) to marketplace.json. Update plugin tables, command references, CLI docs, file-organization, release-process, and docs-framework skill to reflect the bug-analysis workflow. Bump shared skill count from 58 to 66.

Dean Sharon and others added 3 commits May 23, 2026 12:42

docs: add /bug-analysis research and review artifacts Generate resear…

dc00f1a

…ch and review documentation for the /bug-analysis feature implementation on feat/bug-analysis branch.

Dean Sharon and others added 9 commits May 24, 2026 00:16

docs: add review and research artifacts for bug-analysis branch

e3d35de

chore(decisions,features): update auto-generated context from prior r…

fbe68a9

…eview cycle Update decisions.md, pitfalls.md, and feature knowledge bases with learnings from the bug-analysis implementation review. Co-Authored-By: Claude <noreply@anthropic.com>

docs(reviews): add resolution summary from bug-analysis review cycle …

4b0bd1d

…Resolution summary documenting the fixes and improvements made to bug-analysis implementation during the 2026-05-23 review cycle. Co-Authored-By: Claude <noreply@anthropic.com>

Dean Sharon and others added 5 commits May 24, 2026 11:03

refactor(tests): deduplicate file loads and section extractions in bu…

eb12a02

…g-analysis tests

Dean Sharon and others added 6 commits May 24, 2026 14:54

refactor(tests): replace fragile .search/.slice with extractSection a…

067e580

…nd fix adoption test

dean0x merged commit 54039d2 into main May 24, 2026
3 of 4 checks passed

dean0x deleted the feat/bug-analysis branch May 24, 2026 21:11

Conversation

dean0x commented May 23, 2026

Summary

Changes

Breaking Changes

Reviewer Focus Areas

Uh oh!

dean0x commented May 23, 2026

🔴 CRITICAL SECURITY: Symlink attack vulnerability via predictable /tmp paths

Uh oh!

dean0x commented May 23, 2026

🔴 CRITICAL SECURITY: Shell injection via unquoted filename expansion

Uh oh!

dean0x commented May 23, 2026

🟠 SECURITY: Incomplete exclusion list in resolve fallback

Uh oh!

dean0x commented May 23, 2026

🔴 CRITICAL RELIABILITY: No execution timeout on static analysis tools

Uh oh!

dean0x commented May 23, 2026

🟠 RELIABILITY: Unbounded character size for STATIC_FINDINGS

Uh oh!

dean0x commented May 23, 2026

🟡 RELIABILITY: Unbounded directory scan in resolve fallback

Uh oh!

dean0x commented May 23, 2026

🟠 ARCHITECTURE: Resolve fallback creates implicit coupling

Uh oh!

dean0x commented May 23, 2026

🟠 ARCHITECTURE: BugAnalyzer missing pattern skill declarations

Uh oh!

dean0x commented May 23, 2026

Code Review Summary

Findings Summary

🔴 Blocking Issues (≥80% confidence)

🟡 Lower-Confidence Suggestions (60-79%)

Ambiguous/Deferred Issues

📋 Pre-existing Issues

✅ Well-Executed Areas

🎯 Recommended Next Steps

Uh oh!

dean0x commented May 24, 2026

Uh oh!

dean0x commented May 24, 2026

Uh oh!

dean0x commented May 24, 2026

Uh oh!

dean0x commented May 24, 2026

Uh oh!

dean0x commented May 24, 2026

Uh oh!

dean0x commented May 24, 2026

Uh oh!

dean0x commented May 24, 2026

Uh oh!

dean0x commented May 24, 2026

Uh oh!

dean0x commented May 24, 2026

Uh oh!

dean0x commented May 24, 2026

Uh oh!

dean0x commented May 24, 2026

PR Review Summary: /bug-analysis Workflow

High-Confidence Issues (≥80%, Block Merge)

Medium-Priority Issues (60-79% or SHOULD_FIX)

Architecture Notes

Inline Comments

Uh oh!

dean0x commented May 24, 2026

Uh oh!

dean0x commented May 24, 2026

Uh oh!

dean0x commented May 24, 2026

Uh oh!

dean0x commented May 24, 2026

Uh oh!

dean0x commented May 24, 2026

Uh oh!

dean0x commented May 24, 2026

Uh oh!