Evidence-based sprint retrospectives for human-agent collaboration. No AI slop - every finding links to specific commits, PRs, or decisions.
Most retrospective tools produce vague observations like "58 tool invocations across 7 tools". That's useless.
This tool produces objective, evidence-based metrics:
| Metric | Example Output |
|---|---|
| Commit Breakdown | "59.7% fixes (227/380), 10.8% features (41/380), 1.6% tests (6/380)" |
| Rework Rate | "15/20 PRs superseded (75% rework)" |
| Testing Discipline | "2% of decisions mention testing" |
| Decision Quality | "78% have both rationale AND context" |
| Agent Contribution | "45 commits (11.8%) by agents" |
Before (AI Slop):
### Tool Usage
Observation: 58 total tool invocations across 7 different tools
Top Tools: Bash (39), Glob (6), Read (4)
After (Evidence-Based):
### Executive Summary
| Metric | Value |
|--------|-------|
| Total Commits | 380 |
| PRs Superseded (Rework) | 15 (75.0%) |
| Testing Loop Adherence | 2.0% |
| Decision Quality Score | 78.0% |
| Reactive Work Ratio | 87.6% |
### What Didn't Work
| Metric | Value | Issue |
|--------|-------|-------|
| Testing Discipline | 2% | Agent not testing before push |
| Supersession Rate | 75% | Poor initial PR quality |
npx @daax-dev/retrospectivenpm install -g @daax-dev/retrospective
# Then run:
agentic-retrospectiveclaude mcp install github.com/daax-dev/agentic-retrospective# Analyze last 2 weeks (default)
npx @daax-dev/retrospective
# Analyze from specific ref
npx @daax-dev/retrospective --from HEAD~50
npx @daax-dev/retrospective --from "2 weeks ago"
# Output JSON only
npx @daax-dev/retrospective --jsonnpx @daax-dev/retrospective feedback30-second survey to capture alignment, rework needed, and session quality.
- Commit Type Breakdown: feat/fix/docs/test/refactor/chore
- Checkpoint Commits: wip/save/tmp patterns
- Reactive vs Proactive Ratio: fixes vs features
- Agent Commit Detection: Co-authored-by patterns, bot emails
- Rework Chains: fix commits following features
- Code Hotspots: files changed 3+ times
- PR Supersession Rate: PRs that replaced other PRs
- PRs with Test Files: PRs including test/spec files
- Review Metrics: approval times, revision cycles
- Negative Reviews: CHANGES_REQUESTED count
- Decision Quality Score: % with both rationale AND context
- Testing Discipline: % mentioning tests
- Decision Thrash: conflicting decisions on same topic
- Risk Profile: one-way-doors missing reversibility plans
- Vulnerability Counts: critical/high/medium/low
- Scan Sources: Trivy, npm audit, Snyk
- Git History - Automatically extracted
| Source | Location | Enables |
|---|---|---|
| Decision Logs | .logs/decisions/*.jsonl |
Decision quality, testing discipline |
| Security Scans | .logs/security/*.json |
Vulnerability analysis |
| Feedback | .logs/feedback/*.jsonl |
Alignment tracking |
| Tool Logs | .logs/tools/*.jsonl |
Tool usage patterns |
mkdir -p .logs/decisions .logs/security .logs/feedback .logs/toolsReports are generated at docs/retrospectives/retrospective-YYYY-MM-DD/:
| File | Contents |
|---|---|
retrospective.md |
Human-readable report |
retrospective.json |
Structured data |
evidence_map.json |
Commit/decision index |
alerts.json |
Critical findings for automation |
- Executive Summary - 10+ key metrics in table format
- Code Hotspots - Files with high churn
- Commit Type Breakdown - Work classification
- PR Analysis - Rework rate, test coverage
- What Worked / What Didn't - Threshold-based (>70% good, <50% bad)
- Testing Discipline - Loop adherence %
- Mistakes & Corrections - Documented learnings
- Recommendations - Current | Target | Action format
Always produces useful output with available data:
| Missing Source | Impact |
|---|---|
| Decision logs | Decision score = null, gap recorded |
| Security scans | Security score = null, gap recorded |
| GitHub access | No PR metrics, git-only analysis |
| Feedback logs | No alignment tracking |
- Evidence-Driven - Every claim links to commits, PRs, or decisions
- Blameless - Evaluates systems and patterns, not people
- Objective - Percentages and counts, not vague observations
- Actionable - Recommendations include current state, target, and action
# Install
pnpm install
# Build
pnpm run build
# Test (243 tests)
pnpm test
# Full validation
pnpm run validateApache 2.0
