Skip to content

feat: journal stats dashboard (v2 backlog #7)#37

Merged
cleak merged 2 commits into
masterfrom
claude/journal-stats
Apr 30, 2026
Merged

feat: journal stats dashboard (v2 backlog #7)#37
cleak merged 2 commits into
masterfrom
claude/journal-stats

Conversation

@cleak
Copy link
Copy Markdown
Owner

@cleak cleak commented Apr 30, 2026

Summary

`tempyr journal stats` (and the matching `journal_stats` MCP tool) for diagnostic visibility into journal usage patterns. The report includes:

  • Totals — entries, unique sessions, unique agents
  • Provisional vs final counts (a stuck-not-finalizing flow is immediately visible)
  • Kind distribution — per-kind counts + percentages, ordered by frequency
  • Dead-end ratio — `dead_end / (decision + dead_end)`. Below 10% the CLI prints a callout: "agents likely aren't logging failures" — the journal's highest-value content per the spec.
  • Sessions per agent
  • Top tags / top files — configurable cap (default 20)
  • Per-day activity histogram — last N days (default 30), including zero-count days so the shape is stable. CLI renders a small ASCII bar chart.

`--since-days` scopes the totals; activity histogram has its own `--activity-window-days` flag. `--json` swaps to the structured payload (same shape as the MCP response).

Implementation

File Change
`tempyr-journal-index/src/stats.rs` (new) `StatsOptions` + `StatsReport` + `compute_stats()`. Cheap SQL aggregates over existing schema — `substr(ts, 1, 10)` for day bucketing keeps the histogram computation in SQL
`tempyr-cli/src/commands/journal_cmd.rs` `StatsCmdArgs` + `run_stats` + text renderer with ASCII bar chart
`tempyr-mcp/src/handler.rs` `JournalStatsParams` + `journal_stats` tool, `read_only` + `idempotent` annotations
`CLAUDE.md` / `AGENTS.md` New `journal stats` bullet under "Diagnostics"
`docs/journal-spec.md` V2 backlog item struck through

Test plan

  • `cargo test --workspace` — 6 new unit tests in `stats::tests` + 1 CLI integration test
  • `cargo clippy --workspace --all-targets -- -D warnings`
  • `cargo fmt --check`
  • Manual smoke: `tempyr journal stats` in a real repo with mixed kinds, verify the dead-end callout fires when expected

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added tempyr journal stats CLI: aggregated metrics (totals, per-kind distribution, provisional/final counts, dead-end ratio), sessions-per-agent, top tags/files, per-day activity histogram with ASCII chart, JSON output, and time-scoping/options for top-N and activity window.
    • Added journal_stats MCP tool returning the same JSON statistics.
  • Tests

    • Added integration test validating JSON stats output.
  • Documentation

    • Updated docs/spec with the comprehensive stats behavior and CLI options.

Adds `tempyr journal stats` (and the matching `journal_stats` MCP
tool) for diagnostic visibility into journal usage patterns. The
report covers:

- Totals: entries, unique sessions, unique agents
- Provisional vs final entry counts (so a stuck-not-finalizing
  flow shows up immediately)
- Kind distribution: per-kind counts + percentages, ordered by
  frequency
- Dead-end ratio: dead_end / (decision + dead_end). Below 10%
  triggers a CLI callout — agents likely aren't logging failures,
  which is the journal's highest-value content per the spec
- Sessions per agent
- Top tags, top files (configurable cap)
- Per-day activity histogram for the last N days, including
  zero-count days so the timeline shape is stable

CLI renders a human-readable text report by default with a small
ASCII bar chart for the activity histogram. `--json` swaps to the
structured payload (same shape as the MCP response). Optional
`--since-days` scopes every section except the activity histogram
(which has its own window via `--activity-window-days`).

Implementation:
- New `tempyr_journal_index::stats` module with `StatsOptions`,
  `StatsReport`, and `compute_stats()`. All numbers come from
  cheap SQL aggregates over the existing schema — no temp
  tables, no joins past what's already indexed. Activity
  histogram uses `substr(ts, 1, 10)` for day bucketing so the
  computation stays in SQL rather than pulling rows into Rust.
- CLI: `StatsCmdArgs` + `run_stats` in `journal_cmd.rs` with a
  text renderer for the human-readable mode.
- MCP: `JournalStatsParams` + `journal_stats` tool with
  `read_only` + `idempotent` annotations.

Tests: 6 new unit tests in `stats::tests` (empty index, kind
distribution, top files ordering, dead-end ratio when no
decisions/dead-ends, activity histogram zero-fill,
`since_days` filter passthrough) + 1 CLI integration test
driving 3 entries and asserting on the JSON aggregates.

Docs: CLAUDE.md / AGENTS.md "Diagnostics" gain a journal stats
bullet; v2 backlog item in the spec struck through.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 30, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d48c520e-3381-4e9d-9ae1-7d5c2ec8d26d

📥 Commits

Reviewing files that changed from the base of the PR and between acdedad and 49ba77d.

📒 Files selected for processing (3)
  • crates/tempyr-cli/src/commands/journal_cmd.rs
  • crates/tempyr-journal-index/src/stats.rs
  • crates/tempyr-mcp/src/handler.rs

📝 Walkthrough

Walkthrough

Adds a new journal statistics feature: CLI command tempyr journal stats, MCP tool journal_stats, a new stats module in the journal-index crate computing aggregate metrics, integration tests, and documentation updates.

Changes

Cohort / File(s) Summary
Documentation
AGENTS.md, CLAUDE.md, docs/journal-spec.md
Documented tempyr journal stats CLI output and parity with the journal_stats MCP tool; updated backlog/spec to reflect Phase 4 metrics and options.
CLI Command Implementation
crates/tempyr-cli/src/commands/journal_cmd.rs
Added StatsCmdArgs and run_stats() to refresh index, build options, call compute_stats(), and render either pretty JSON or a formatted text report (distributions, percentages, dead-end ratio, ASCII activity chart).
CLI Routing
crates/tempyr-cli/src/main.rs
Added JournalAction::Stats(StatsCmdArgs) and wired dispatcher to invoke run_stats() with json_output passthrough.
Statistics Core Module
crates/tempyr-journal-index/src/lib.rs, crates/tempyr-journal-index/src/stats.rs
New stats module with public constants, StatsOptions, StatsReport, typed count structs, and compute_stats() implementing aggregate SQLite queries, histograms (zero-filled), top-N lists, and dead-end ratio; includes unit tests.
MCP Tool Integration
crates/tempyr-mcp/src/handler.rs
Added JournalStatsParams and a journal_stats read-only tool handler that refreshes index, clamps inputs to stats constants, calls compute_stats(), and returns pretty JSON.
Integration Tests
crates/tempyr-cli/tests/integration.rs
New test validating tempyr journal stats --json output fields (total entries, per-kind counts, dead_end_ratio) against generated test data.

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI as CLI
    participant MCP as MCP Server
    participant Index as Journal Index
    participant DB as Database

    User->>CLI: request "tempyr journal stats" (or via MCP)
    CLI->>Index: refresh structural index
    MCP->>Index: refresh structural index
    Index->>DB: apply updates / ensure latest entries
    DB-->>Index: acknowledgement/results
    CLI->>Index: open DB connection & construct StatsOptions
    MCP->>Index: open DB connection & construct StatsOptions
    Index->>DB: run aggregate queries (totals, distributions, top-N, per-day)
    DB-->>Index: query rows
    Index->>Index: build StatsReport (histogram, ratios, top lists)
    Index-->>CLI: StatsReport (or JSON)
    Index-->>MCP: StatsReport (or JSON)
    CLI->>User: formatted text or pretty JSON
    MCP->>User: pretty JSON
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 I hopped through logs at break of day,
Counting hops, the kinds, the way.
Histograms rise, dead-ends unveiled,
Tags and files in tidy tail.
Joyful thump — the stats are made! 📊

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: journal stats dashboard (v2 backlog #7)' directly describes the main feature being implemented: a journal statistics dashboard that fulfills backlog item #7.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch

Review rate limit: 4/5 reviews remaining, refill in 12 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/tempyr-cli/src/commands/journal_cmd.rs`:
- Around line 1246-1263: Add explicit clap value ranges to StatsCmdArgs to
prevent excessively large inputs: for since_days (Option<u32>) add a
validator/value range (e.g., 0..=36500) to cap to a reasonable history window,
for top_tags and top_files add a range (e.g., 1..=1000) to avoid huge result
sets, and for activity_window_days add a range (e.g., 1..=365) to limit
histogram size; implement these using clap attributes on the fields (e.g.,
#[arg(long, value_parser = ... .range(...))] or a custom validator) so the
symbols StatsCmdArgs, since_days, top_tags, top_files, and activity_window_days
enforce the bounds without changing defaults.

In `@crates/tempyr-journal-index/src/stats.rs`:
- Around line 286-317: The histogram omits the partial oldest day because the
day-bucket loop only emits window distinct dates (0..window) while the SQL query
uses cutoff = now - window days; fix by including the day that contains that
cutoff: make the bucket loop cover 0..=window (inclusive) and adjust the vector
capacity to window as usize + 1 so you emit DayCount entries for the cutoff day
through today; keep the SQL cutoff logic and the reverse() behavior unchanged.
- Around line 527-544: The test named since_days_filter_excludes_old_entries
currently uses since_days: Some(365) and therefore verifies inclusion rather
than exclusion; change it to actually exercise exclusion by creating an entry
older than the cutoff and then running compute_stats with a short window: create
the repo as before (fresh_repo, write_one/refresh_index), then either backdate
that entry's timestamp in the index DB directly (open schema::open and UPDATE
the timestamp column for the written entry to now - 366 days) or create the
entry with an explicit old timestamp if write_one supports it, set StatsOptions
{ since_days: Some(365) } (or use since_days: Some(0) and backdate accordingly),
call compute_stats(&conn, &opts) and assert report.total_entries == 0; update
the test name or comments to reflect that the exclusion path is being exercised.

In `@crates/tempyr-mcp/src/handler.rs`:
- Around line 2452-2457: The StatsOptions construction uses unbounded user input
(p.top_tags, p.top_files, p.activity_window_days); clamp these values to safe
ranges before building tempyr_journal_index::StatsOptions to avoid heavy DB work
and huge payloads. Replace the direct unwrap_or calls in the StatsOptions block
with clamped values (e.g., let top_tags = p.top_tags.unwrap_or(20).clamp(1,
MAX_TOP_TAGS); let top_files = p.top_files.unwrap_or(20).clamp(1,
MAX_TOP_FILES); let activity_window_days =
p.activity_window_days.unwrap_or(30).clamp(1, MAX_ACTIVITY_DAYS)) using
appropriately chosen MAX_* constants, or a small helper fn clamp_param(value,
default, min, max), and then use these variables when constructing StatsOptions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ae3e17f3-f327-4445-8a77-59739867e7d4

📥 Commits

Reviewing files that changed from the base of the PR and between e538322 and acdedad.

📒 Files selected for processing (9)
  • AGENTS.md
  • CLAUDE.md
  • crates/tempyr-cli/src/commands/journal_cmd.rs
  • crates/tempyr-cli/src/main.rs
  • crates/tempyr-cli/tests/integration.rs
  • crates/tempyr-journal-index/src/lib.rs
  • crates/tempyr-journal-index/src/stats.rs
  • crates/tempyr-mcp/src/handler.rs
  • docs/journal-spec.md

Comment thread crates/tempyr-cli/src/commands/journal_cmd.rs
Comment thread crates/tempyr-journal-index/src/stats.rs
Comment thread crates/tempyr-journal-index/src/stats.rs
Comment thread crates/tempyr-mcp/src/handler.rs
…test

Four findings on PR #37:

1. **CLI input ranges.** `StatsCmdArgs` accepted any u32, so a typo
   like `--top-tags 9999999` would produce a giant SQL LIMIT and a
   payload that nobody asked for. Added clap `value_parser` ranges
   sourced from new `pub const` bounds in
   `tempyr_journal_index::stats`: 0..=36500 days for `since_days`
   (~100 years), 1..=1000 for the top-list lengths, 1..=365 for
   the histogram window. Defaults unchanged.

2. **Histogram off-by-one.** The SQL cutoff was `now - window
   days` (inclusive) but the bucket loop only emitted `0..window`
   distinct dates — so an entry written exactly `window` days ago
   would land in the SQL result with no Rust bucket to receive it
   and silently drop from the rendered timeline. Loop now runs
   `0..=window` (inclusive); a `window=7` query returns 8 buckets
   covering the cutoff day through today. The
   `activity_histogram_includes_zero_days` test got updated to
   match.

3. **Misnamed `since_days_filter_excludes_old_entries` test.** The
   prior version used `since_days = Some(365)` against a freshly-
   written entry, which only verified the inclusion path (the
   filter could be a no-op and the test would still pass). Rewrote
   it to backdate the entry's `ts` to two years ago via a direct
   `UPDATE entries SET ts = ?`, then assert
   `total_entries == 0` with a 30-day window AND
   `total_entries == 1` with a 1000-day window — both inclusion
   and exclusion paths now exercised.

4. **MCP unbounded user input.** `journal_stats` constructed
   `StatsOptions` directly from `p.top_tags.unwrap_or(20)` with no
   ceiling, so a malicious or buggy MCP client could request
   `top_tags = usize::MAX` or `activity_window_days = 100000`.
   Replaced with `.clamp()` against the same `MIN_*` / `MAX_*`
   constants the CLI's value_parser uses; bounds documented in
   one place, enforced on both transports.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cleak cleak merged commit 6fb679c into master Apr 30, 2026
1 check passed
@cleak cleak deleted the claude/journal-stats branch April 30, 2026 05:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant