feat: journal stats dashboard (v2 backlog #7)#37
Conversation
Adds `tempyr journal stats` (and the matching `journal_stats` MCP tool) for diagnostic visibility into journal usage patterns. The report covers: - Totals: entries, unique sessions, unique agents - Provisional vs final entry counts (so a stuck-not-finalizing flow shows up immediately) - Kind distribution: per-kind counts + percentages, ordered by frequency - Dead-end ratio: dead_end / (decision + dead_end). Below 10% triggers a CLI callout — agents likely aren't logging failures, which is the journal's highest-value content per the spec - Sessions per agent - Top tags, top files (configurable cap) - Per-day activity histogram for the last N days, including zero-count days so the timeline shape is stable CLI renders a human-readable text report by default with a small ASCII bar chart for the activity histogram. `--json` swaps to the structured payload (same shape as the MCP response). Optional `--since-days` scopes every section except the activity histogram (which has its own window via `--activity-window-days`). Implementation: - New `tempyr_journal_index::stats` module with `StatsOptions`, `StatsReport`, and `compute_stats()`. All numbers come from cheap SQL aggregates over the existing schema — no temp tables, no joins past what's already indexed. Activity histogram uses `substr(ts, 1, 10)` for day bucketing so the computation stays in SQL rather than pulling rows into Rust. - CLI: `StatsCmdArgs` + `run_stats` in `journal_cmd.rs` with a text renderer for the human-readable mode. - MCP: `JournalStatsParams` + `journal_stats` tool with `read_only` + `idempotent` annotations. Tests: 6 new unit tests in `stats::tests` (empty index, kind distribution, top files ordering, dead-end ratio when no decisions/dead-ends, activity histogram zero-fill, `since_days` filter passthrough) + 1 CLI integration test driving 3 entries and asserting on the JSON aggregates. Docs: CLAUDE.md / AGENTS.md "Diagnostics" gain a journal stats bullet; v2 backlog item in the spec struck through. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughAdds a new journal statistics feature: CLI command Changes
Sequence DiagramsequenceDiagram
participant User
participant CLI as CLI
participant MCP as MCP Server
participant Index as Journal Index
participant DB as Database
User->>CLI: request "tempyr journal stats" (or via MCP)
CLI->>Index: refresh structural index
MCP->>Index: refresh structural index
Index->>DB: apply updates / ensure latest entries
DB-->>Index: acknowledgement/results
CLI->>Index: open DB connection & construct StatsOptions
MCP->>Index: open DB connection & construct StatsOptions
Index->>DB: run aggregate queries (totals, distributions, top-N, per-day)
DB-->>Index: query rows
Index->>Index: build StatsReport (histogram, ratios, top lists)
Index-->>CLI: StatsReport (or JSON)
Index-->>MCP: StatsReport (or JSON)
CLI->>User: formatted text or pretty JSON
MCP->>User: pretty JSON
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
Review rate limit: 4/5 reviews remaining, refill in 12 minutes. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@crates/tempyr-cli/src/commands/journal_cmd.rs`:
- Around line 1246-1263: Add explicit clap value ranges to StatsCmdArgs to
prevent excessively large inputs: for since_days (Option<u32>) add a
validator/value range (e.g., 0..=36500) to cap to a reasonable history window,
for top_tags and top_files add a range (e.g., 1..=1000) to avoid huge result
sets, and for activity_window_days add a range (e.g., 1..=365) to limit
histogram size; implement these using clap attributes on the fields (e.g.,
#[arg(long, value_parser = ... .range(...))] or a custom validator) so the
symbols StatsCmdArgs, since_days, top_tags, top_files, and activity_window_days
enforce the bounds without changing defaults.
In `@crates/tempyr-journal-index/src/stats.rs`:
- Around line 286-317: The histogram omits the partial oldest day because the
day-bucket loop only emits window distinct dates (0..window) while the SQL query
uses cutoff = now - window days; fix by including the day that contains that
cutoff: make the bucket loop cover 0..=window (inclusive) and adjust the vector
capacity to window as usize + 1 so you emit DayCount entries for the cutoff day
through today; keep the SQL cutoff logic and the reverse() behavior unchanged.
- Around line 527-544: The test named since_days_filter_excludes_old_entries
currently uses since_days: Some(365) and therefore verifies inclusion rather
than exclusion; change it to actually exercise exclusion by creating an entry
older than the cutoff and then running compute_stats with a short window: create
the repo as before (fresh_repo, write_one/refresh_index), then either backdate
that entry's timestamp in the index DB directly (open schema::open and UPDATE
the timestamp column for the written entry to now - 366 days) or create the
entry with an explicit old timestamp if write_one supports it, set StatsOptions
{ since_days: Some(365) } (or use since_days: Some(0) and backdate accordingly),
call compute_stats(&conn, &opts) and assert report.total_entries == 0; update
the test name or comments to reflect that the exclusion path is being exercised.
In `@crates/tempyr-mcp/src/handler.rs`:
- Around line 2452-2457: The StatsOptions construction uses unbounded user input
(p.top_tags, p.top_files, p.activity_window_days); clamp these values to safe
ranges before building tempyr_journal_index::StatsOptions to avoid heavy DB work
and huge payloads. Replace the direct unwrap_or calls in the StatsOptions block
with clamped values (e.g., let top_tags = p.top_tags.unwrap_or(20).clamp(1,
MAX_TOP_TAGS); let top_files = p.top_files.unwrap_or(20).clamp(1,
MAX_TOP_FILES); let activity_window_days =
p.activity_window_days.unwrap_or(30).clamp(1, MAX_ACTIVITY_DAYS)) using
appropriately chosen MAX_* constants, or a small helper fn clamp_param(value,
default, min, max), and then use these variables when constructing StatsOptions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: ae3e17f3-f327-4445-8a77-59739867e7d4
📒 Files selected for processing (9)
AGENTS.mdCLAUDE.mdcrates/tempyr-cli/src/commands/journal_cmd.rscrates/tempyr-cli/src/main.rscrates/tempyr-cli/tests/integration.rscrates/tempyr-journal-index/src/lib.rscrates/tempyr-journal-index/src/stats.rscrates/tempyr-mcp/src/handler.rsdocs/journal-spec.md
…test Four findings on PR #37: 1. **CLI input ranges.** `StatsCmdArgs` accepted any u32, so a typo like `--top-tags 9999999` would produce a giant SQL LIMIT and a payload that nobody asked for. Added clap `value_parser` ranges sourced from new `pub const` bounds in `tempyr_journal_index::stats`: 0..=36500 days for `since_days` (~100 years), 1..=1000 for the top-list lengths, 1..=365 for the histogram window. Defaults unchanged. 2. **Histogram off-by-one.** The SQL cutoff was `now - window days` (inclusive) but the bucket loop only emitted `0..window` distinct dates — so an entry written exactly `window` days ago would land in the SQL result with no Rust bucket to receive it and silently drop from the rendered timeline. Loop now runs `0..=window` (inclusive); a `window=7` query returns 8 buckets covering the cutoff day through today. The `activity_histogram_includes_zero_days` test got updated to match. 3. **Misnamed `since_days_filter_excludes_old_entries` test.** The prior version used `since_days = Some(365)` against a freshly- written entry, which only verified the inclusion path (the filter could be a no-op and the test would still pass). Rewrote it to backdate the entry's `ts` to two years ago via a direct `UPDATE entries SET ts = ?`, then assert `total_entries == 0` with a 30-day window AND `total_entries == 1` with a 1000-day window — both inclusion and exclusion paths now exercised. 4. **MCP unbounded user input.** `journal_stats` constructed `StatsOptions` directly from `p.top_tags.unwrap_or(20)` with no ceiling, so a malicious or buggy MCP client could request `top_tags = usize::MAX` or `activity_window_days = 100000`. Replaced with `.clamp()` against the same `MIN_*` / `MAX_*` constants the CLI's value_parser uses; bounds documented in one place, enforced on both transports. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
`tempyr journal stats` (and the matching `journal_stats` MCP tool) for diagnostic visibility into journal usage patterns. The report includes:
`--since-days` scopes the totals; activity histogram has its own `--activity-window-days` flag. `--json` swaps to the structured payload (same shape as the MCP response).
Implementation
Test plan
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
tempyr journal statsCLI: aggregated metrics (totals, per-kind distribution, provisional/final counts, dead-end ratio), sessions-per-agent, top tags/files, per-day activity histogram with ASCII chart, JSON output, and time-scoping/options for top-N and activity window.journal_statsMCP tool returning the same JSON statistics.Tests
Documentation