Skip to content

feat: add /bug-analysis workflow with static and semantic analysis#226

Merged
dean0x merged 23 commits into
mainfrom
feat/bug-analysis
May 24, 2026
Merged

feat: add /bug-analysis workflow with static and semantic analysis#226
dean0x merged 23 commits into
mainfrom
feat/bug-analysis

Conversation

@dean0x
Copy link
Copy Markdown
Owner

@dean0x dean0x commented May 23, 2026

Summary

  • Adds a new devflow-bug-analysis plugin with a /bug-analysis command for proactive bug finding before merge
  • Combines tiered static analysis (semgrep, snyk, codeql) with parallel semantic BugAnalyzer agents across four focus areas: security, functional, integration, and usability
  • Extends /resolve to fall back to bug-analysis reports when no code review exists

Changes

New files:

  • shared/agents/bug-analyzer.md — Opus-powered BugAnalyzer agent with 5-step methodology (read diff → load plan → focus-specific analysis → self-verify → classify). Iron Law: every finding verified against actual code before reporting
  • plugins/devflow-bug-analysis/.claude-plugin/plugin.json — Plugin manifest
  • plugins/devflow-bug-analysis/commands/bug-analysis.md — 7-phase orchestration command (pre-flight, static analysis, context loading, file analysis, parallel bug analysis, synthesis, finalize)

Modified files:

  • shared/agents/synthesizer.md — Added bug-analysis mode: reads focus reports, cross-tracks confidence with static findings, deduplicates same file:line across analyzers, merges acceptance criteria coverage
  • shared/skills/resolve:orch/SKILL.md — Phase 1 extended to fall back to .devflow/docs/bug-analysis/ when no unresolved review found
  • plugins/devflow-resolve/commands/resolve.md — Step 0c extended with bug-analysis fallback (reviews take priority)
  • src/cli/plugins.tsdevflow-bug-analysis registered in DEVFLOW_PLUGINS (now 21 plugins)
  • CLAUDE.md — Plugin table updated (21 plugins, row added for devflow-bug-analysis)

Breaking Changes

None. All changes are additive. /resolve behavior is unchanged when a code review report exists.

Reviewer Focus Areas

  • Bug analyzer agent (shared/agents/bug-analyzer.md): confidence scale matches reviewer.md pattern; plan-context modifier (+10% for acceptance rule citations, -15% ceiling without plan) is new — verify it is calibrated correctly
  • Synthesizer bug-analysis mode: cross-tracking logic (static-confirmed +10%, same file:line dedup +10% per agent) — verify the merge rules make sense
  • Command phase structure (bug-analysis.md): Step 2 sub-steps all have Produces/Requires annotations (required by ambient test); verify static tool invocations are correct CLI usage
  • Resolve fallback: reviews take priority over bug-analysis — verify the priority logic in Step 0c-5b is sound and edge cases are covered

Dean Sharon and others added 3 commits May 23, 2026 12:42
New devflow-bug-analysis plugin providing proactive bug finding before merge:
- BugAnalyzer agent (Opus) with 5-step methodology: read diff, load plan context,
 focus-specific analysis, self-verify every finding, classify and report
- Four focus areas: security (+ static findings integration), functional,
 integration (conditional), usability (conditional)
- bug-analysis mode added to Synthesizer for cross-agent deduplication and
 confidence boosting
- /bug-analysis command: incremental analysis, tiered static tools (semgrep ->
 snyk -> codeql), plan-based acceptance criteria coverage
- /resolve extended to fall back to bug-analysis reports when no code review exists
- Plugin registered in DEVFLOW_PLUGINS (21 plugins total)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add bug-analyzer.md to .gitignore (distributed agent copy should not be tracked)
- Remove tracked distributed copy from git index
- Update CLAUDE.md: shared agents count 14->15, plugin count 20->21 in project
  structure, add /bug-analysis to command roster, add bug-analysis directory to
  docs artifacts tree, add BugAnalyzer to persisting agents, add bug-analyzer to
  model strategy
- Fix plugin.json version 1.0.0 -> 1.8.3 to match monorepo convention
…ch and review documentation for the /bug-analysis feature implementation on feat/bug-analysis branch.
@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented May 23, 2026

🔴 CRITICAL SECURITY: Symlink attack vulnerability via predictable /tmp paths

File: plugins/devflow-bug-analysis/commands/bug-analysis.md (lines 111-112)
Confidence: 85%

The CodeQL step uses hardcoded /tmp/codeql-db and /tmp/codeql-results.sarif paths. On multi-user systems, an attacker could place a symlink at /tmp/codeql-db before the command runs, redirecting database creation to an arbitrary location. Concurrent /bug-analysis runs would also clobber each other's results.

Fix:

CODEQL_TMP=$(mktemp -d) && \
codeql database create "${CODEQL_TMP}/codeql-db" --language={detected-language} --source-root=. 2>/dev/null && \
codeql database analyze "${CODEQL_TMP}/codeql-db" --format=sarif-latest --output="${CODEQL_TMP}/codeql-results.sarif" 2>/dev/null
# Clean up after parsing: rm -rf "${CODEQL_TMP}"

Review by Claude Code / devflow:security

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented May 23, 2026

🔴 CRITICAL SECURITY: Shell injection via unquoted filename expansion

File: plugins/devflow-bug-analysis/commands/bug-analysis.md (lines 98-99)
Confidence: 82%

The CHANGED_FILES variable is built by space-joining filenames, then expanded unquoted into the semgrep command:

CHANGED_FILES=$(git diff --name-only {DIFF_RANGE} | tr '\n' ' ')
semgrep scan --config auto --sarif --quiet {CHANGED_FILES} 2>/dev/null

If a tracked file has shell metacharacters (spaces, semicolons, backticks, $()) in its name, the expansion could split incorrectly or execute injected commands.

Fix:

git diff --name-only {DIFF_RANGE} | xargs -d '\n' semgrep scan --config auto --sarif --quiet 2>/dev/null

Review by Claude Code / devflow:security

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented May 23, 2026

🟠 SECURITY: Incomplete exclusion list in resolve fallback

Files: plugins/devflow-resolve/commands/resolve.md (lines 112-114), shared/skills/resolve:orch/SKILL.md (line 65)
Confidence: 83%

When /resolve falls back to bug-analysis directories, it needs to exclude bug-analysis-specific files:

  • review-summary.md (already excluded)
  • resolution-summary.md (already excluded)
  • bug-analysis-summary.md (synthesizer output — NEW)
  • static-findings.md (raw tool output — NEW)

Without these, Resolver agents will attempt to process synthesizer meta-commentary and raw static tool output as actionable issues.

Fix: Update the exclusion list in both files:

**Exclude from issue extraction:**
- `review-summary.md`
- `resolution-summary.md`
- `bug-analysis-summary.md`
- `static-findings.md`

Review by Claude Code / devflow:security

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented May 23, 2026

🔴 CRITICAL RELIABILITY: No execution timeout on static analysis tools

File: plugins/devflow-bug-analysis/commands/bug-analysis.md (lines 97-113)
Confidence: 90%

The three static analysis invocations have no maximum execution time:

  • semgrep scan
  • snyk code test
  • codeql database create + analyze

On large codebases, these tools can run for minutes to hours. A single /bug-analysis run could hang indefinitely.

Fix: Add explicit timeouts to each invocation:

timeout 120 semgrep scan --config auto --sarif --quiet {CHANGED_FILES} 2>/dev/null
timeout 180 snyk code test --sarif . 2>/dev/null
timeout 300 codeql database create ... 2>/dev/null
timeout 300 codeql database analyze ... 2>/dev/null

Document the timeout budget in the Edge Cases table.


Review by Claude Code / devflow:reliability

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented May 23, 2026

🟠 RELIABILITY: Unbounded character size for STATIC_FINDINGS

File: plugins/devflow-bug-analysis/commands/bug-analysis.md (lines 116-122)
Confidence: 85%

The STATIC_FINDINGS table is capped at 50 rows by severity, but each finding's Description column has no length limit. A noisy project could pass a very large table that consumes significant context in the BugAnalyzer agents.

Fix: Add a token/character budget:

Cap STATIC_FINDINGS at:
- Top 50 rows by severity, AND
- Total 4000 characters
- Truncate Description column to 120 characters per row if needed

Review by Claude Code / devflow:reliability

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented May 23, 2026

🟡 RELIABILITY: Unbounded directory scan in resolve fallback

File: plugins/devflow-resolve/commands/resolve.md (lines 75-81)
Confidence: 80%

The bug-analysis fallback lists all directories under .devflow/docs/bug-analysis/{branch-slug}/ with no upper bound. A user who runs /bug-analysis frequently could accumulate hundreds of timestamped directories.

Fix: Add a scan limit to Step 5b:

Scan the 10 most recent directories (by name, descending).
If none qualify within the last 10, report 'No unresolved bug analysis found.'

Review by Claude Code / devflow:reliability

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented May 23, 2026

🟠 ARCHITECTURE: Resolve fallback creates implicit coupling

Files: plugins/devflow-resolve/commands/resolve.md (lines 71-81), shared/skills/resolve:orch/SKILL.md (lines 33-39)
Confidence: 85%

The resolve command now contains hardcoded knowledge of bug-analysis internals:

  • Directory structure: .devflow/docs/bug-analysis/{branch-slug}/
  • Focus file names: security.md, functional.md, integration.md, usability.md

This creates tight coupling — if bug-analysis adds a new focus area, resolve must be updated in lockstep. File names are duplicated across three locations: bug-analysis command, resolve command, and resolve:orch skill.

Fix: Replace hardcoded focus names with glob pattern detection:

Contains at least one *.md file (excluding bug-analysis-summary.md, static-findings.md, resolution-summary.md)

This matches the pattern already used for review directories.


Review by Claude Code / devflow:architecture

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented May 23, 2026

🟠 ARCHITECTURE: BugAnalyzer missing pattern skill declarations

Files: shared/agents/bug-analyzer.md (lines 1-11), plugins/devflow-bug-analysis/.claude-plugin/plugin.json (lines 25-29)
Confidence: 85%

The BugAnalyzer agent declares skills in frontmatter but the plugin.json manifest is incomplete:

Agent frontmatter declares:

  • devflow:security
  • devflow:reliability
  • devflow:worktree-support
  • devflow:apply-decisions
  • devflow:apply-feature-knowledge

plugin.json skills array only has:

  • agent-teams
  • worktree-support
  • apply-feature-knowledge

Missing from plugin.json:

  • apply-decisions
  • security (should list all pattern skills)
  • reliability

Also note: The agent has 4 focus areas (security, functional, integration, usability) but only loads pattern skills for security/reliability. The functional, integration, and usability focuses lack specialized pattern skill support (vs. how Reviewer loads specific pattern skills per focus).

Fix: Add missing declarations to plugin.json:

"skills": [
  "agent-teams",
  "worktree-support",
  "apply-feature-knowledge",
  "apply-decisions",
  "security",
  "reliability"
]

Review by Claude Code / devflow:architecture

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented May 23, 2026

Code Review Summary

PR: #226 — Add /bug-analysis workflow with static and semantic analysis
Branch: feat/bug-analysis → main
Status: CHANGES_REQUESTED

Findings Summary

Category CRITICAL HIGH MEDIUM LOW Total
Blocking 0 4 1 0 5
Should Fix 0 0 3 0 3
Pre-existing 0 0 1 0 1

🔴 Blocking Issues (≥80% confidence)

Eight high-confidence blocking issues have been posted as detailed comments above:

  1. Security: Symlink attack via predictable /tmp paths (85%)
  2. Security: Shell injection via unquoted filename expansion (82%)
  3. Security: Incomplete resolve exclusion list (83%)
  4. Reliability: Missing execution timeouts on static tools (90%)
  5. Reliability: Unbounded STATIC_FINDINGS character size (85%)
  6. Reliability: Unbounded directory scan in resolve fallback (80%)
  7. Architecture: Resolve fallback implicit coupling (85%)
  8. Architecture: BugAnalyzer incomplete skill declarations (85%)

🟡 Lower-Confidence Suggestions (60-79%)

These findings were below the 80% threshold but are worth noting:

Ambiguous/Deferred Issues

  • PR_DESCRIPTION untrusted content handling (70%, security.md line 54)

    • The command captures PR_DESCRIPTION without a containment marker note, though agent-level protection exists. Mitigated by explicit agent instructions.
  • No cleanup of CodeQL artifacts on partial failure (65%, security.md line 56)

    • If CodeQL database creation succeeds but analysis fails, /tmp/codeql-db remains (potential resource leak). Minor concern given the mktemp fix above will isolate this.
  • Bug-analysis lacks multi-worktree support (72%, architecture.md line 62)

    • Unlike /code-review and /resolve, /bug-analysis only operates on the current branch. May be intentional for v1.
  • Static analysis runs in orchestrator, not agent (65%, architecture.md line 64)

    • Phase 2 (static analysis) executes directly in the command orchestrator rather than delegating to an agent. This blurs the project convention of "commands spawn agents only." The static phase is infrastructure setup, so this is arguably acceptable.
  • No apply-decisions in plugin.json (70%, architecture.md line 67)

  • SARIF parsing has no error recovery (70%, reliability.md line 46)

    • If SARIF output is malformed or truncated, parsing fails silently. Consider documenting a "SARIF parse failure" edge case.
  • BugAnalyzer finding count uncapped (65%, reliability.md line 48)

    • No maximum on findings a single agent reports. Noisy diffs could consume excessive Synthesizer context. The consolidation rules (3+ similar = group) help but don't guarantee a bound.

📋 Pre-existing Issues

One pre-existing concern was flagged:

  • Synthesizer agent accumulates responsibilities (82%, architecture.md line 55)
    • The agent now handles 6 modes (exploration, planning, review, bug-analysis, design, research). Adding bug-analysis mode pushed it to 401 lines, exceeding the 300-line warning threshold. No immediate refactoring needed, but growth should be monitored.

✅ Well-Executed Areas

The new plugin demonstrates strong architecture in several ways:

  • Proper phase decomposition: Bug-analysis command follows the established Phase Protocol with Produces/Requires annotations
  • Honest security practices: PR_DESCRIPTION correctly wrapped in containment markers; static findings validated before reporting
  • Incremental by default: Built-in support for repeat analyses without reprocessing unchanged code
  • Proper agent delegation: Synthesizer and BugAnalyzer agents handle analysis; command is orchestration-only
  • Clear /resolve integration: Bug-analysis fallback maintains priority semantics (reviews > bug-analysis)

🎯 Recommended Next Steps

  1. Fix all HIGH blocking issues immediately — they are within the scope of this PR and non-disruptive
  2. MEDIUM issues — Also addressable in this PR (resolve exclusion list, implicit coupling fixes)
  3. Review the lower-confidence (60-79%) suggestions — Some warrant immediate fixes (SARIF error recovery, CodeQL cleanup); others can be deferred (multi-worktree support, agent-level concerns)

Review by Claude Code / devflow

Dean Sharon and others added 9 commits May 24, 2026 00:16
…ssues

- sec-1: Replace hardcoded /tmp/codeql-db and /tmp/codeql-results.sarif with
  mktemp -d unique temp directory to prevent symlink attacks and concurrent
  process clobbering; add rm -rf cleanup in always-runs position
- sec-2: Replace unquoted CHANGED_FILES expansion into semgrep with
  xargs -d newline to prevent shell injection from filenames with metacharacters
- perf-1: Scope snyk code test to changed files via xargs instead of scanning
  entire project directory
- rel-1: Wrap semgrep, snyk, and codeql with timeout command to prevent
  multi-hour hangs (300s fast tools, 600s CodeQL)
- rel-2: Add 200-character truncation to Description column entries to bound
  serialized size of STATIC_FINDINGS passed to BugAnalyzer agents

Co-Authored-By: Claude <noreply@anthropic.com>
…roduced when the /bug-analysis plugin was added: - Directory tree omitted resolution-summary.md from the bug-analysis path - Persisting agents line listed only the reviews-mode Resolver output path - Incremental Reviews paragraph made no mention of /bug-analysis and its analogous .last-analysis-head tracking mechanism
…solve:orch

- Extend exclusion list in both resolve.md and resolve:orch/SKILL.md to exclude
  bug-analysis-summary.md and static-findings.md, preventing parse failures when
  /resolve targets a bug-analysis directory
- Update Step 0b blocked error message to suggest /bug-analysis alongside /code-review
- Add explicit 10-directory scan bound to bug-analysis fallback search in both files

Co-Authored-By: Claude <noreply@anthropic.com>
…patibility - Change Bugs Found section (CRITICAL/HIGH/MEDIUM/LOW headers) to the 3-category structure (BLOCKING / Should Fix / Pre-existing) that resolve:orch and /resolve use when parsing per-focus .md reports. CRITICAL/HIGH map to Blocking, MEDIUM to Should Fix, LOW to Pre-existing. - Add devflow:regression, devflow:consistency, devflow:complexity skills to BugAnalyzer frontmatter so functional/integration/usability focuses have pattern skills (mirrors how Reviewer loads focus-specific skills). - Remove agent-teams from devflow-bug-analysis plugin.json skills array since no -teams.md command variant exists for V1. Co-Authored-By: Claude <noreply@anthropic.com>
…ve fallback

- Add devflow-bug-analysis plugin registration test in tests/plugins.test.ts,
  covering agents (git, bug-analyzer, synthesizer), skills (agent-teams,
  worktree-support, apply-feature-knowledge), command, and non-optional flag
- Create tests/bug-analysis/structural.test.ts (36 tests) covering all 7 phases
  with Produces/Requires annotations, incremental .last-analysis-head semantics,
  static tool safety patterns (xargs, mktemp, timeout bounds, 200-char truncation),
  BugAnalyzer parallel spawning, Synthesizer mode, and resolve compatibility
- Add tests/resolve/bug-analysis-fallback.test.ts (13 tests) covering priority
  invariant (reviews before bug-analysis), Step 0c-5b fallback path, 10-directory
  scan limit, exclusion list (bug-analysis-summary.md, static-findings.md), and
  error message guidance for both /code-review and /bug-analysis
- Fix plugin.json skills[] to include agent-teams, matching plugins.ts registry
  (resolves pre-existing skill-references.test.ts failure)

Co-Authored-By: Claude <noreply@anthropic.com>
…solve:orch

Align resolve.md documentation with resolve:orch skill Phase Protocol, correct
FIXMEs, and consolidate documentation to the skill source. Update bug-analyzer
agent output format and resolve:orch Phase completion to account for bug-analysis
workflow mode.

Co-Authored-By: Claude <noreply@anthropic.com>
…eview cycle

Update decisions.md, pitfalls.md, and feature knowledge bases with learnings
from the bug-analysis implementation review.

Co-Authored-By: Claude <noreply@anthropic.com>
…Resolution summary documenting the fixes and improvements made to bug-analysis implementation during the 2026-05-23 review cycle. Co-Authored-By: Claude <noreply@anthropic.com>
@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented May 24, 2026

PR Comment: bug-analysis.md line 99

xargs portability issue (reliability + security)

The -d '\n' flag is a GNU extension not available on macOS default xargs. Since devflow targets macOS (darwin platform), this fails with xargs: illegal option -- d on the primary development environment.

Impact: Static analysis silently never runs on macOS, undermining the hybrid static+semantic architecture.

Fix: Use tr '\n' '\0' | xargs -0 which is portable across GNU and BSD:

git diff --name-only {DIFF_RANGE} | tr '\n' '\0' | xargs -0 timeout 300 semgrep scan --config auto --sarif --quiet 2>/dev/null

Also update line 98 comment to say 'NUL-delimited' to match the actual implementation.

Confidence: 92% (reliability) + 82% (security: flag injection at line 106 uses same pattern). Flagged by: reliability, security, performance reviewers.

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented May 24, 2026

PR Comment: bug-analysis.md line 106

Snyk per-file O(n) performance + flag injection vulnerability

Two issues compound here:

  1. Performance (85% confidence): snyk code test runs once per file via xargs -I{}. Each invocation pays fixed startup cost, turning O(1) batch into O(n). With 20+ files, worst-case timeout is 300s × n instead of 300s.

  2. Security (82% confidence): The -I{} pattern allows flag injection. A filename like --json-file-output=/tmp/evil would be parsed as a CLI flag, potentially altering behavior or writing to attacker-controlled locations.

Fix: Run Snyk Code once with project-level scan, then filter results to changed files:

timeout 300 snyk code test --sarif 2>/dev/null

Filter SARIF output to only include findings in files from git diff --name-only {DIFF_RANGE}.

Confidence: 85% (performance, HIGH) + 82% (security, MEDIUM). Flagged by: performance, security, reliability, architecture reviewers.

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented May 24, 2026

PR Comment: bug-analysis.md line 119

CodeQL SARIF ordering ambiguity (reliability)

The bash code (lines 117-119) runs rm -rf "${CODEQL_TMP}" immediately after capturing exit status, but prose (line 121) says 'Parse SARIF output... immediately after... and before the rm -rf cleanup.' An orchestrator following the code literally would delete before parsing, losing results.

Fix: Restructure to make read-before-delete unambiguous:

CODEQL_TMP=$(mktemp -d)
timeout 600 codeql database create "${CODEQL_TMP}/db" --language={detected-language} --source-root=. 2>/dev/null && \
timeout 600 codeql database analyze "${CODEQL_TMP}/db" --format=sarif-latest --output="${CODEQL_TMP}/results.sarif" 2>/dev/null
CODEQL_EXIT=$?
# Parse SARIF output HERE — before cleanup

Then add cleanup block after parsing.

Confidence: 88% (HIGH severity, reliability). Flagged by: reliability reviewer.

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented May 24, 2026

PR Comment: bug-analysis.md line 152

Plan artifact scan limit inconsistency (reliability)

Step 3 lists .devflow/docs/design/*.md with no upper bound, but resolve command's directory scans explicitly bound to '10 most recent' in both resolve.md (line 71) and resolve:orch (line 29). This creates reliability variance — one path bounded, other unbounded.

Fix: Add 10-directory scan bound to match the pattern:

1. List `.devflow/docs/design/*.md` — sort descending by filename (timestamps are naturally sortable), scan the 10 most recent

Confidence: 80% (MEDIUM severity, reliability). Flagged by: reliability reviewer.

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented May 24, 2026

PR Comment: bug-analysis.md line 66

Redundant git diff invocations across phases (performance)

git diff --name-only {DIFF_RANGE} is computed independently in Step 2b (line 66), piped to Semgrep (line 99), piped to Snyk (line 106), and again in Phase 4 (line 166). Each invocation walks the same git object graph — four separate subprocess spawns for identical output.

This also creates consistency risk: if working tree changes between invocations (e.g., parallel process commits), different phases could see different file lists.

Fix: Compute once and reuse as variable:

# Step 2b: compute once, reuse everywhere
CHANGED_FILES=$(git diff --name-only {DIFF_RANGE})
if [ -z "$CHANGED_FILES" ]; then echo "No changes to analyze."; exit 0; fi

Then reference $CHANGED_FILES in Steps 2d and Phase 4 instead of re-running git command.

Confidence: 82% (MEDIUM severity, performance). Flagged by: performance reviewer.

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented May 24, 2026

PR Comment: shared/agents/bug-analyzer.md line 113

BugAnalyzer severity-to-category mapping conflates code location with severity (architecture)

The category mapping uses severity as a proxy for location:

  • CRITICAL/HIGH → 'Issues in Your Changes (BLOCKING)'
  • MEDIUM → 'Issues in Code You Touched (Should Fix)'
  • LOW → 'Pre-existing Issues (Not Blocking)'

But the Reviewer agent (established pattern) determines categories based on where the issue occurs (in your changes vs same function vs untouched code), not severity. A CRITICAL bug in untouched code should still be 'Pre-existing' per review methodology.

While BugAnalyzer focuses on changed code (diff-first principle per line 124), the semantic mapping creates downstream parsing confusion — /resolve may misclassify findings as pre-existing and give lower priority.

Fix: Document that all BugAnalyzer findings are in changed code by design, so severity-to-category mapping is reasonable for /resolve compatibility. Add clarifying comment:

**Category mapping** (for /resolve compatibility — approximation since BugAnalyzer focuses on diff-changed code):

Confidence: 82% (MEDIUM severity, architectural). Flagged by: architecture, regression reviewers.

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented May 24, 2026

PR Comment: shared/agents/bug-analyzer.md line 188

Bug-analyzer summary table format diverges from reviewer pattern (consistency)

The output template uses flat Severity | Count summary table, but reviewer agent (established pattern) uses Category | CRITICAL | HIGH | MEDIUM | LOW matrix. Since BugAnalyzer was explicitly changed to adopt reviewer's 3-category structure, summary table should follow same matrix pattern for downstream parser consistency.

/resolve pipeline and Synthesizer parse these reports programmatically — different table shape creates parsing inconsistency between review and bug-analysis reports.

Fix: Replace with reviewer's matrix format:

## Summary
| Category | CRITICAL | HIGH | MEDIUM | LOW |
|----------|----------|------|--------|-----|
| Blocking | {n} | {n} | - | - |
| Should Fix | - | - | {n} | - |
| Pre-existing | - | - | - | {n} |
| Suggestions | {n} | - | - | - |

**{Focus} Risk**: {CRITICAL | HIGH | MEDIUM | LOW | CLEAN}

Confidence: 85% (HIGH severity, consistency). Flagged by: consistency reviewer.

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented May 24, 2026

PR Comment: shared/agents/bug-analyzer.md line 197

Bug-analyzer report omits Recommendation footer present in reviewer reports (consistency)

Reviewer agent's template ends with both **{Focus} Score**: {1-10} and **Recommendation**: {BLOCK | CHANGES_REQUESTED | APPROVED_WITH_CONDITIONS | APPROVED}.

Bug-analyzer has **{Focus} Risk**: {CRITICAL | HIGH | MEDIUM | LOW | CLEAN} but omits Recommendation footer. Risk label is semantically appropriate for bug analysis, but missing footer means Synthesizer must handle two distinct footer schemas when aggregating review and bug-analysis reports. This could cause Synthesizer to skip recommendation.

Fix: Add recommendation line after Risk line:

**{Focus} Risk**: {CRITICAL | HIGH | MEDIUM | LOW | CLEAN}
**Recommendation**: {BLOCK | CHANGES_REQUESTED | APPROVED_WITH_CONDITIONS | APPROVED}

Confidence: 82% (HIGH severity, consistency). Flagged by: consistency reviewer.

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented May 24, 2026

PR Comment: shared/skills/resolve:orch/SKILL.md line 29 + plugins/devflow-resolve/commands/resolve.md line 71

Inconsistent scan limit between resolve.md and resolve:orch for review directory discovery (regression)

resolve:orch Phase 1 (line 29) now scans 'the 10 most recent' review directories, but resolve command's Step 0c (line 71) still uses 'select the latest' with no scan limit. Both correctly applied 10-directory limit for bug-analysis fallback, but only resolve:orch applied it to primary review directory scan.

This behavioral divergence is a regression risk — two resolve paths should agree on directory scanning behavior. /resolve (command) scans all reviews, while /resolve in ambient mode (via resolve:orch) limits to 10. If user has >10 resolved review directories, 11th-most-recent unresolved review could be missed.

Fix: Add 10-directory scan limit to resolve.md Step 0c to match resolve:orch:

3. **Otherwise:** sort directories by name descending (timestamps are naturally sortable), scan the 10 most recent directories only. Select the first that contains `review-summary.md` (complete review)

Confidence: 85% (HIGH severity, regression). Flagged by: regression, consistency reviewers.

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented May 24, 2026

PR Comment: plugins/devflow-bug-analysis/.claude-plugin/plugin.json line 25

Plugin.json missing skill declarations for agent dependencies (consistency)

The bug-analyzer agent frontmatter declares 5 focus-specific skills (devflow:security, devflow:reliability, devflow:regression, devflow:consistency, devflow:complexity) and devflow:apply-decisions, but none appear in plugin.json skills array.

While Universal Skill Installation ensures they are present at runtime regardless, manifest gap means plugin's declared dependencies are incomplete — a developer inspecting plugin.json would not see full dependency surface.

The devflow-code-review plugin declares all its pattern skills, making bug-analysis plugin inconsistent with established convention.

Fix: Add missing skills to skills array in plugin.json:

"skills": [
  "agent-teams",
  "worktree-support",
  "apply-feature-knowledge",
  "apply-decisions",
  "security",
  "reliability",
  "regression",
  "consistency",
  "complexity"
]

Confidence: 85% (MEDIUM severity, consistency/architecture). Flagged by: consistency, architecture reviewers.

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented May 24, 2026

PR Review Summary: /bug-analysis Workflow

Overall Assessment: CHANGES_REQUESTED (7 HIGH-confidence issues block merge; 3 MEDIUM should-fix; architectural design sound)

High-Confidence Issues (≥80%, Block Merge)

Reliability & Security (PRIMARY BLOCKERS)

  1. xargs -d '\n' portability (92%) — Silent failure on macOS (primary platform). Use tr | xargs -0 (bug-analysis.md:99)
  2. CodeQL SARIF read-after-delete (88%) — Ambiguous prose vs code order risks losing analysis results (bug-analysis.md:119)
  3. Snyk per-file O(n) performance (85%) — Scales timeout linearly with file count; flag injection vector (bug-analysis.md:106)
  4. Review directory scan limit divergence (85%) — resolve:orch applies limit, resolve.md doesn't; behavioral mismatch (resolve.md:71, resolve:orch/SKILL.md:29)

Consistency (PARSER COMPATIBILITY)
5. Summary table format divergence (85%) — Flat table vs 3-category matrix; Synthesizer expects consistent shape (bug-analyzer.md:188)
6. Missing Recommendation footer (82%) — Synthesizer must handle two footer schemas; may skip recommendation (bug-analyzer.md:197)
7. Plugin.json missing skill declarations (85%) — Agent declares skills, manifest doesn't list them; breaks discoverability (plugin.json:25)

Architecture & Semantics
8. Severity-to-category conflation (82%) — Maps severity to location, but Reviewer maps location to category; semantic mismatch (bug-analyzer.md:113)

Medium-Priority Issues (60-79% or SHOULD_FIX)

  1. Redundant git diff invocations (82%) — Four subprocess spawns for identical output; consistency risk if working tree changes (bug-analysis.md:66)
  2. Plan artifact scan limit inconsistency (80%) — No bound like resolve's 10-directory limit (bug-analysis.md:152)

Architecture Notes

Design strengths: Parallel BugAnalyzer agents, incremental diffing, static + semantic hybrid, proper separation from Evaluator (applies ADR-004, ADR-006)

Security: Strong injection prevention in PR_DESCRIPTION containment markers, timeouts on all static tools, mktemp for CodeQL isolation

Pre-existing pattern inconsistency: resolve:orch and resolve.md now diverge on scan limits — this PR surfaces that inconsistency

Inline Comments

See PR comments above for specific fixes and code examples for each issue. All HIGH-confidence findings have suggested remediation with confidence levels and reviewer attribution.


Next Steps: Address HIGH-confidence issues (estimated 2-3 hours). Consider opening a separate PR for medium-priority issues (redundant git diff, plan scan bound) which are improvements rather than blockers.

Claude Code | Devflow PR Review

Dean Sharon and others added 5 commits May 24, 2026 11:03
…ng, and redundant git diff Replace xargs -d which is GNU-only and broken on macOS BSD xargs with tr plus xargs -0 for portability. Replace per-file snyk invocations with a single project-level snyk code test scan followed by SARIF filtering. Compute CHANGED_FILES once in Step 2b and reference it throughout to eliminate 3 redundant git diff calls. Capture CODEQL_SARIF before rm -rf to fix the ordering bug that would delete results before parsing. Document Semgrep and Snyk parallelism. Co-Authored-By: Claude <noreply@anthropic.com>
…ummary table with reviewer format

- Document severity→category mapping as approximation (all BugAnalyzer
  findings are in diff-changed code, so location-based categories don't
  apply literally; LOW is placed in Pre-existing for urgency signaling
  only)
- Replace flat Severity|Count summary table with reviewer's matrix format
  (Category × CRITICAL/HIGH/MEDIUM/LOW) for downstream parser consistency
- Add Recommendation footer line (BLOCK/CHANGES_REQUESTED/APPROVED_WITH_CONDITIONS/APPROVED)
  to match reviewer agent schema used by Synthesizer and /resolve

Co-Authored-By: Claude <noreply@anthropic.com>
…gin manifest - resolve.md Step 0c: add '10 most recent directories only' scan bound to primary reviews path, matching the existing bug-analysis fallback (5b). Previously the reviews path had no limit; the fallback did — creating a behavioral divergence when more than 10 directories exist. - CLAUDE.md: correct misleading claim that both /code-review and /bug-analysis auto-discover worktrees. /bug-analysis is single-worktree only; /code-review does full multi-worktree discovery. - devflow-bug-analysis plugin.json: add the 6 skills that bug-analyzer.md declares in its frontmatter (security, reliability, regression, consistency, complexity, apply-decisions) so the manifest matches the code-review plugin pattern and is complete for discoverability. - resolve:orch SKILL.md: add parenthetical to Phase 5 reference so cross-references to resolve.md Phase 4 are unambiguous. - bug-analysis.md Plan Artifact: add '10 most recent' scan bound to the design listing step, matching the established pattern in resolve.md. Co-Authored-By: Claude <noreply@anthropic.com>
…rage

- Add resolve:orch describe block to bug-analysis-fallback.test.ts asserting
  Phase 1 bug-analysis fallback, 10-directory scan limit, exclusion list
  (bug-analysis-summary.md, static-findings.md), and error handling mentions
  both /code-review and /bug-analysis (resolves resolve-orch:29)
- Add Groups 7-8 to structural.test.ts: output format section headers
  (Issues in Your Changes/Code You Touched/Pre-existing), severity-to-category
  mapping, summary matrix, Recommendation footer, and frontmatter skill
  declarations (regression, consistency, complexity) (resolves bug-analyzer:111,
  bug-analyzer:8)
- Replace fragile if/else conditional with unconditional extractSection assert
  on ## Edge Cases section in resolve.md (resolves fallback-test:112)
- Remove redundant loadFile call on line 34; reuse existing content variable
  (resolves fallback-test:34)
- Fix resolve:orch SKILL.md to use {worktree} placeholder in decisions-index.cjs
  invocation (was "." — breaks worktree-aware resolution)
- Align devflow-bug-analysis entry in plugins.ts with plugin.json skills array
  (was missing apply-decisions, complexity, consistency, regression, reliability,
  security)

Co-Authored-By: Claude <noreply@anthropic.com>
@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented May 24, 2026

tests/resolve/bug-analysis-fallback.test.ts:126

Unsafe .slice(.search()) pattern (90% confidence)\n\nThe test uses .slice(phase1.search(/bug.analysis/i)) but .search() returns -1 if the anchor is not found. This causes .slice(-1) to return only the last character, which could silently pass the subsequent .toMatch() assertion.\n\nFix: Add explicit guard before slicing:\ntypescript\nconst bugAnalysisIdx = phase1.search(/bug.analysis/i);\nexpect(bugAnalysisIdx, "Phase 1 should contain bug-analysis reference").not.toBe(-1);\nconst bugAnalysisPart = phase1.slice(bugAnalysisIdx);\nexpect(bugAnalysisPart).toMatch(/10\\s+(most recent|directories)|scan.*10|10.*scan/i);\n\n\nThis pattern also exists at line 42 (pre-existing, same fix applies).\n\n— Claude Code

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented May 24, 2026

tests/plugins.test.ts:251

Missing plugin skill assertions for newly added skills (85% confidence)\n\nThe test checks for agent-teams, worktree-support, and apply-feature-knowledge but PR added 6 new skills: apply-decisions, complexity, consistency, regression, reliability, security. These are declared in bug-analyzer.md frontmatter but not verified in the test.\n\nFix: Expand assertions to cover the new skills:\ntypescript\nexpect(bugAnalysis!.skills).toContain("apply-decisions");\nexpect(bugAnalysis!.skills).toContain("security");\nexpect(bugAnalysis!.skills).toContain("reliability");\nexpect(bugAnalysis!.skills).toContain("regression");\nexpect(bugAnalysis!.skills).toContain("consistency");\nexpect(bugAnalysis!.skills).toContain("complexity");\n\n\nWithout this, a future change could silently remove one of these skills from the manifest without test coverage.\n\n— Claude Code

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented May 24, 2026

plugins/devflow-bug-analysis/commands/bug-analysis.md:109

Snyk filtering boundary moved to LLM execution (82% confidence)\n\nThe comment states "filter findings to only those whose file path appears in CHANGED_FILES." This filtering is documented but not enforced programmatically—it relies entirely on LLM implementation in the BugAnalyzer agent. If the filtering step is skipped or implemented incorrectly, findings from unrelated files could leak into the security analysis.\n\nFix (defense-in-depth): Add programmatic SARIF filtering in the bash block:\nbash\ntimeout 300 snyk code test --sarif 2>/dev/null | \\\n node -e "const s=JSON.parse(require(\"fs\").readFileSync(0,\"utf8\")); s.runs?.forEach(r=>{r.results=r.results?.filter(res=>CHANGED.has(res.locations?.[0]?.physicalLocation?.artifactLocation?.uri))}); process.stdout.write(JSON.stringify(s))"\n\n\nNote: BugAnalyzers Step 4 self-verification already acts as a secondary filter, but this hardens the boundary per defense-in-depth principle.\n\n— Claude Code

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented May 24, 2026

plugins/devflow-bug-analysis/commands/bug-analysis.md:116-128

CodeQL cleanup not guaranteed on orchestrator-level interruption (82% confidence)

The rm -rf "${CODEQL_TMP}" cleanup relies on reaching line 128, but there is no trap handler. If the Bash tool invocation itself is interrupted (context compaction, session abort), the temp directory is orphaned.

Fix: Add explicit trap to guarantee cleanup within the bash block:

CODEQL_TMP=$(mktemp -d)
trap 'rm -rf "${CODEQL_TMP}"' EXIT
timeout 600 codeql database create "${CODEQL_TMP}/db" ...
# ... rest of block

Note: This approach handles cleanup within a single Bash invocation. Cross-invocation cleanup would require orchestrator support, which is beyond this PR's scope.

— Claude Code

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented May 24, 2026

shared/agents/bug-analyzer.md:113-118

Severity-to-category mapping conflates location-based and severity-based classification (82% confidence)

The mapping (CRITICAL/HIGH → Blocking, MEDIUM → Should Fix, LOW → Pre-existing) uses severity, not location. A LOW-severity bug in newly-added code gets classified as "Pre-existing (Not Blocking)" despite the developer writing it. This breaks the location-based Iron Law: "If you didn't add it, you don't own it."

Fix: Add a note clarifying the approximation in the agent documentation that all BugAnalyzer findings are developer-owned regardless of severity classification since they appear in changed code.

Alternatively, document in resolve:orch that BugAnalyzer findings bypass location-based category semantics.

— Claude Code

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented May 24, 2026

plugins/devflow-bug-analysis/commands/bug-analysis.md (missing section)

Missing Phase Completion Checklist (82% confidence)

The resolve:orch SKILL.md has a Phase Completion Checklist to verify all phases executed. The bug-analysis.md command has a 7-phase pipeline but no checklist. This is a consistency gap with established orchestration command patterns.

Fix: Add a ## Phase Completion Checklist section after the ## Principles section to verify all phases were announced before reporting results (matching resolve:orch pattern).

— Claude Code

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented May 24, 2026

tests/bug-analysis/structural.test.ts:298-312

Frontmatter skill assertions incomplete (82% confidence)

Group 8 tests 3 of 8 declared skills: devflow:regression, devflow:consistency, devflow:complexity. Missing assertions for: devflow:security, devflow:reliability, devflow:worktree-support, devflow:apply-decisions, devflow:apply-feature-knowledge.

Fix (option 1 — expand coverage): Add assertions for the missing 5 skills.

Or (option 2 — rename to set accurate expectations): Change describe block title to "frontmatter declares batch-2 skill additions" to document that this group only covers the batch-2 additions.

— Claude Code

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented May 24, 2026

tests/resolve/bug-analysis-fallback.test.ts:114-148

Repeated extractSection calls (pattern regression) (80% confidence)

Group 5 (resolve:orch tests) calls extractSection(content, '## Phase 1:', '## Phase 2:') four times and extractSection(content, '## Phase 3:', '## Phase 4:') twice. Prior commits in this PR (eb12a02) deduplicated file loads to describe scope for Groups 1-4. This Group 5 pattern contradicts that refactoring.

Fix: Hoist the repeated extractions to describe scope (matching the deduplication pattern in Groups 1-4) to reduce redundant parsing.

— Claude Code

@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented May 24, 2026

Summary: Review Findings

Blocking Issues Fixed: 1
Should-Fix Issues: 5
Lower-Confidence Suggestions: 8

✅ Inline Comments Posted (≥80% Confidence)

  • Testing: Unsafe .slice(.search()) pattern (90%)
  • Testing: Missing plugin skill assertions (85%)
  • Regression: Path placeholder inconsistency (85%)
  • Security: Snyk filtering boundary (82%)
  • Reliability: CodeQL cleanup trap (82%)
  • Architecture: BugAnalyzer category mapping (82%)
  • Architecture: Missing Phase Completion Checklist (82%)
  • Testing: Frontmatter assertions incomplete (82%)
  • Testing: Repeated extractSection calls (80%)

💡 Lower-Confidence Suggestions (60-79%)

These are documented in the review reports but fall below the 80% threshold for blocking inline comments:

  1. which vs command -v for tool detection (65%) — bug-analysis.md:80-82 — Use POSIX-portable command -v instead of which
  2. CodeQL temp directory race window (70%) — bug-analysis.md:117-127 — Documented; mktemp already mitigates
  3. PR_DESCRIPTION injection surface (72%) — bug-analysis.md:197 — Containment markers and explicit instructions provide defense
  4. Snyk single-project scan performance (68%) — bug-analysis.md:109-113 — Trades per-file invocations for one scan + filtering; acceptable tradeoff
  5. Phase 4 Requires annotation mismatch (72%) — bug-analysis.md:168 — Phase Protocol annotation says DIFF_RANGE but implementation uses CHANGED_FILES
  6. resolve:orch Phase 2 feature-knowledge path hardcoding (75%) — shared/skills/resolve:orch/SKILL.md:61 — Minor inconsistency with decisions-index call
  7. Feature Knowledge scan unbounded (80% pre-existing) — bug-analysis.md:152-155 — No cap on feature knowledge entries loaded; shared with /code-review
  8. Plan artifact acceptance criteria unbounded (65%) — bug-analysis.md:161 — No row-count cap on parsed criteria

See review reports in .devflow/docs/reviews/feat-bug-analysis/2026-05-24_1205/ for full details.

— Claude Code

Dean Sharon and others added 6 commits May 24, 2026 14:54
…in Phase 2 Phase 2 decisions-index.cjs call used {worktree} as a literal placeholder with no substitution, causing decisions-index.cjs to fail on every resolve:orch invocation. resolve:orch explicitly excludes multi-worktree flow (line 11), so the correct argument is "." (cwd). Aligns with the adjacent feature-knowledge stale check on line 61 which already used ".". Co-Authored-By: Claude <noreply@anthropic.com>
The test at line 251 only asserted the original 3 skills; after batch-2 added
apply-decisions, security, reliability, regression, consistency, and complexity
to the plugin manifest, no test caught those additions. Add explicit assertions
for all 6 so a future manifest change is caught immediately.

Co-Authored-By: Claude <noreply@anthropic.com>
… 5 section extractions

- Add explicit expect(idx).not.toBe(-1) guards before .slice() at lines 42 and 126
  so that missing sub-anchors produce a clear assertion failure rather than silently
  operating on the last character of the parent string.
- Hoist phase1 and phase3 extractSection calls to describe scope in Group 5
  (resolve:orch SKILL.md), matching the deduplication pattern established in
  Groups 1-4 per commit eb12a02.

Co-Authored-By: Claude <noreply@anthropic.com>
…phase checklist

- snyk-filtering-boundary: Add jq-based programmatic SARIF filter on Snyk output
  to restrict findings to CHANGED_FILES before LLM processing. Implements defense-in-
  depth per ADR-006 (hybrid static + LLM): filtering boundary is now programmatic
  (data layer) + LLM (BugAnalyzer Step 4), not LLM alone. Falls back to raw SARIF
  if jq is unavailable.

- codeql-cleanup-trap: Register `trap 'rm -rf "${CODEQL_TMP}"' EXIT INT TERM`
  immediately after mktemp so orphaned temp directories cannot accumulate on
  SIGTERM/SIGINT or session crash. Explicit rm -rf is retained as belt-and-suspenders;
  trap is cleared afterward.

- missing-phase-checklist: Add ## Phase Completion Checklist section after Principles,
  listing all 7 phases with verification criteria. Follows the resolve:orch pattern
  (SKILL.md lines 161-174) so agents cannot silently skip phases.

Co-Authored-By: Claude <noreply@anthropic.com>
… Add devflow-bug-analysis (plus explore, research, release) to marketplace.json. Update plugin tables, command references, CLI docs, file-organization, release-process, and docs-framework skill to reflect the bug-analysis workflow. Bump shared skill count from 58 to 66.
@dean0x dean0x merged commit 54039d2 into main May 24, 2026
3 of 4 checks passed
@dean0x dean0x deleted the feat/bug-analysis branch May 24, 2026 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant