Summary
The persisted daily product-usage rollup stores per-event counts via countProductUsageDimensions, which keeps only the top 20 by frequency:
// src/db/repositories.ts:3542-3544 (buildProductUsageDailyRollupRecord)
byEvent: countProductUsageDimensions(args.events.map((event) => event.eventName)).map(({ key, count }) => ({ eventName: key, count })),
byRepo: countProductUsageDimensions(args.events.map((event) => event.repoFullName)),
byCommand: countProductUsageDimensions(args.events.map((event) => productUsageMetadataString(event, "command"))),
// src/db/repositories.ts:3836 (default limit = 20, sorted by count desc)
function countProductUsageDimensions(values: Array<string | null | undefined>, limit = 20): Array<{ key: string; count: number }> {
...
return [...counts.entries()].map(([key, count]) => ({ key, count })).sort((a, b) => b.count - a.count || a.key.localeCompare(b.key)).slice(0, limit);
}
byEvent is persisted as eventsJson (repositories.ts:3469) and read back verbatim (:3332). The weekly value report then does exact event-name lookups against that (possibly truncated) list:
// src/services/weekly-value-report.ts:274-277, 339-340
const githubCommandEvents = sumEvent(rollups, "agent_command_replied") + sumEvent(rollups, "agent_command_skipped");
const quietSkips = sumEvent(rollups, "agent_command_skipped");
const prPackets = sumEvent(rollups, "agent_pr_packet_completed");
const prPreflights = sumEvent(rollups, "agent_preflight_branch_completed") + sumEvent(rollups, "local_branch_analysis_completed");
...
function sumEvent(rollups, eventName) {
return sum(rollups.map((rollup) => rollup.byEvent.find((entry) => entry.eventName === eventName)?.count ?? 0));
}
eventName is a bounded enum — the system emits ~22+ distinct event names across processors.ts, api/routes.ts, and mcp/server.ts (auth_session_created, extension_session_created, command_previewed, pull_context_viewed, local_branch_analysis_completed, agent_run_started, agent_plan_next_work_completed, agent_preflight_branch_completed, agent_pr_packet_completed, agent_blockers_completed, agent_command_replied, agent_command_skipped, github_installation_created, pr_public_surface_published, pr_visibility_skipped, mcp_tool_called, mcp_request, …). On any day with more than 20 distinct event types, the lowest-frequency events fall off the top-20 cut — and the report's flagship signals (agent_pr_packet_completed, agent_preflight_branch_completed, agent_blockers_completed) are exactly the low-frequency ones. A frequency-truncated top-N is the wrong storage shape for a dimension that is later consumed by exact-name lookup.
Why this is wrong
Frequency-truncating to a display top-N is reasonable for unbounded dimensions (byRepo, byCommand) that are only shown as "top repos/commands". But byEvent is a small bounded set that downstream code reads by exact name (sumEvent → byEvent.find(eventName)). Truncating it silently zeroes any event that didn't make the per-day top 20, so the aggregated weekly count is an under-count rather than the true total. (The rollup already keeps an authoritative distinct-repo count, activeRepos, precisely because the truncated byRepo can't answer that question — byEvent has no such authoritative companion and is the only per-event source.)
Failure mode (concrete example)
On a UTC day with 25 distinct event types where agent_pr_packet_completed fired 3 times but ranks 22nd by frequency (auth/session/mcp/installation events dominate):
- Current:
byEvent keeps only the top 20, so agent_pr_packet_completed is absent. sumEvent(rollups, "agent_pr_packet_completed") returns 0 for that day, and the weekly pr_packets metric (weekly-value-report.ts:254, surfaced in the public summary line :151/:156) under-reports PR-packet generation. The same hides agent_preflight_branch_completed (prPreflights) and can drop agent_command_replied/agent_command_skipped (maintainer-value signals).
- Correct:
byEvent retains all bounded event names, so the weekly metrics reflect the true totals.
Downstream impact
generateWeeklyValueReport produces a public + operator-facing trust artifact. The truncation makes it systematically under-report miner-utility (pr_packets, pr_preflights) and maintainer-value (agent_command_replied/skipped) headline metrics on the busiest, highest-diversity days — the opposite of what a value report should do, and it is persisted to an audit event.
Steps to reproduce
- Seed a single day with >20 distinct
eventNames where a flagship event (e.g. agent_pr_packet_completed) is among the lowest-frequency.
- Run
rollupProductUsageDaily for that day, then generateWeeklyValueReport.
- Observe the rollup's
byEvent omits the low-frequency flagship event, and the report's pr_packets (etc.) under-counts it (reads 0 for that day).
Expected behavior
Dimensions that downstream code consumes by exact-name lookup (byEvent, a bounded event enum) are stored complete, so the weekly value report's sumEvent totals are accurate regardless of per-day event diversity.
Actual behavior
byEvent is frequency-truncated to the top 20, so low-frequency flagship events are dropped on high-diversity days and the weekly report under-counts them.
Suggested fix
- Stop frequency-truncating the bounded, exact-lookup
byEvent dimension in buildProductUsageDailyRollupRecord (repositories.ts:3542) — pass an uncapped/high limit (event names are a small bounded set, so the stored list stays small), while leaving the genuinely-unbounded byRepo / byCommand with their display top-N. Concretely, separate "exact-lookup" dimensions (no cap) from "display top-N" dimensions, e.g. a countProductUsageDimensions(..., { full: true }) / explicit limit so the intent is clear and the two classes can't be conflated again.
- Add fail-on-revert coverage: a day with >20 distinct event types including a low-frequency
agent_pr_packet_completed must keep that event in the persisted byEvent, and generateWeeklyValueReport must count it in pr_packets (non-zero), instead of dropping it.
Summary
The persisted daily product-usage rollup stores per-event counts via
countProductUsageDimensions, which keeps only the top 20 by frequency:byEventis persisted aseventsJson(repositories.ts:3469) and read back verbatim (:3332). The weekly value report then does exact event-name lookups against that (possibly truncated) list:eventNameis a bounded enum — the system emits ~22+ distinct event names acrossprocessors.ts,api/routes.ts, andmcp/server.ts(auth_session_created,extension_session_created,command_previewed,pull_context_viewed,local_branch_analysis_completed,agent_run_started,agent_plan_next_work_completed,agent_preflight_branch_completed,agent_pr_packet_completed,agent_blockers_completed,agent_command_replied,agent_command_skipped,github_installation_created,pr_public_surface_published,pr_visibility_skipped,mcp_tool_called,mcp_request, …). On any day with more than 20 distinct event types, the lowest-frequency events fall off the top-20 cut — and the report's flagship signals (agent_pr_packet_completed,agent_preflight_branch_completed,agent_blockers_completed) are exactly the low-frequency ones. A frequency-truncated top-N is the wrong storage shape for a dimension that is later consumed by exact-name lookup.Why this is wrong
Frequency-truncating to a display top-N is reasonable for unbounded dimensions (
byRepo,byCommand) that are only shown as "top repos/commands". ButbyEventis a small bounded set that downstream code reads by exact name (sumEvent→byEvent.find(eventName)). Truncating it silently zeroes any event that didn't make the per-day top 20, so the aggregated weekly count is an under-count rather than the true total. (The rollup already keeps an authoritative distinct-repo count,activeRepos, precisely because the truncatedbyRepocan't answer that question —byEventhas no such authoritative companion and is the only per-event source.)Failure mode (concrete example)
On a UTC day with 25 distinct event types where
agent_pr_packet_completedfired 3 times but ranks 22nd by frequency (auth/session/mcp/installation events dominate):byEventkeeps only the top 20, soagent_pr_packet_completedis absent.sumEvent(rollups, "agent_pr_packet_completed")returns0for that day, and the weeklypr_packetsmetric (weekly-value-report.ts:254, surfaced in the public summary line:151/:156) under-reports PR-packet generation. The same hidesagent_preflight_branch_completed(prPreflights) and can dropagent_command_replied/agent_command_skipped(maintainer-value signals).byEventretains all bounded event names, so the weekly metrics reflect the true totals.Downstream impact
generateWeeklyValueReportproduces a public + operator-facing trust artifact. The truncation makes it systematically under-report miner-utility (pr_packets,pr_preflights) and maintainer-value (agent_command_replied/skipped) headline metrics on the busiest, highest-diversity days — the opposite of what a value report should do, and it is persisted to an audit event.Steps to reproduce
eventNames where a flagship event (e.g.agent_pr_packet_completed) is among the lowest-frequency.rollupProductUsageDailyfor that day, thengenerateWeeklyValueReport.byEventomits the low-frequency flagship event, and the report'spr_packets(etc.) under-counts it (reads 0 for that day).Expected behavior
Dimensions that downstream code consumes by exact-name lookup (
byEvent, a bounded event enum) are stored complete, so the weekly value report'ssumEventtotals are accurate regardless of per-day event diversity.Actual behavior
byEventis frequency-truncated to the top 20, so low-frequency flagship events are dropped on high-diversity days and the weekly report under-counts them.Suggested fix
byEventdimension inbuildProductUsageDailyRollupRecord(repositories.ts:3542) — pass an uncapped/high limit (event names are a small bounded set, so the stored list stays small), while leaving the genuinely-unboundedbyRepo/byCommandwith their display top-N. Concretely, separate "exact-lookup" dimensions (no cap) from "display top-N" dimensions, e.g. acountProductUsageDimensions(..., { full: true })/ explicitlimitso the intent is clear and the two classes can't be conflated again.agent_pr_packet_completedmust keep that event in the persistedbyEvent, andgenerateWeeklyValueReportmust count it inpr_packets(non-zero), instead of dropping it.