feat(agents): cron trigger scheduler v0#61028
Conversation
First slice of cron-trigger-scheduler.md v0. Purely additive substrate: no
firing path, no scheduler, no behaviour change. Lays the schema groundwork
so PR-2 (idempotency-aware enqueue) and PR-3 (cronTick) have something to
land on.
Schema additions:
- New migration `1780346228100_agent_session_idempotency_key.sql`: nullable
`idempotency_key` column + partial unique index on
`(application_id, idempotency_key) WHERE NOT NULL`, plus a sibling
`trigger_metadata JSONB` column for the firing-context envelope.
Semantically distinct from `external_key` — see migration header.
- `AgentSession.idempotency_key` + `trigger_metadata` on the TS interface;
`PgSessionQueue` and all literal constructions round-trip them.
- `TriggerSchema.cron.config` extended with `name`, `prompt`, `external_key`,
`catch_up` ('all' | 'most_recent' | 'skip'), `max_catch_up_age_seconds`
(1s — 7d). `schedule` becomes `z.string().min(1)`. Existing
`schedule` + `timezone` carried forward.
- `validate-spec.ts` gains four cron-specific codes
(`invalid_cron_schedule`, `invalid_cron_timezone`, `duplicate_cron_name`,
`unknown_cron_placeholder`), parses the schedule against `cron-parser`,
validates the timezone against `Intl.DateTimeFormat`, and whitelists the
`{fired_at:iso|date|week}` / `{schedule}` / `{cron_name}` placeholders
in both `prompt` and `external_key`. `cron-parser` ^4.9.0 added as an
agent-janitor dep — it's load-bearing here, not just in the scheduler.
Janitor `/sessions/list` response now surfaces the two new fields. Ingress
`enqueueOrResume()` writes `null` for both — PR-2 wires the upsert-on-
conflict path that actually consumes `idempotency_key`.
Tests: 47 new (13 cron-config parsing in spec.test.ts; 7 cron-validation
cases in validate-spec.test.ts; all existing tests across shared/janitor/
runner/tests still pass). Three pre-existing pre-PR test failures
(`config.test.ts` + `platform.test.ts` env-var defaults, `server.test.ts`
auth-shape) confirmed by `git stash` round-trip — unrelated to this PR.
… redelivery dedupe Second slice of cron-trigger-scheduler.md v0. Wires the dedupe primitive PR-1 added the column for. PR-3 (cronTick) consumes this, but the webhook side benefits independently — Stripe / GitHub / Slack retries that previously created duplicate sessions now no-op cleanly. SessionQueue interface gains `findByIdempotencyKey(applicationId, key)`. PG impl reads through the partial unique index that PR-1 created; memory impl walks the in-memory map. Implemented on both for harness symmetry. enqueueOrResume() flow: - Pre-check: if `idempotencyKey` is supplied and matches an existing session, return that session id immediately. Stripe-shaped: the duplicate request's principal + seed are dropped (the original is source of truth). - On insert: if the unique index fires anyway (race window between the pre-check and the INSERT), re-look-up by key and return the row a concurrent writer created. Only engaged when a key is set; unique-violations without a key still propagate (the bug surface that path was protecting against is preserved). Webhook trigger forwards provider-supplied idempotency headers in precedence order: `Idempotency-Key` → `X-Idempotency-Key` → `X-GitHub-Delivery`. Namespaced as `webhook:<header value>` so a future cron firing can never collide with a webhook key under the same agent. Tests: 6 new unit cases in `enqueue.test.ts` (first-create, duplicate- returns-original, trigger_metadata round-trip, idempotency-beats- external_key composition, race-window via Proxy-wrapped queue, no-key propagation). 2 new e2e cases in `webhook-mcp-trigger.test.ts` covering both `Idempotency-Key` and `X-GitHub-Delivery` redelivery dedupe via real Postgres + the unique index. All existing enqueue / webhook tests still pass. Same 3 pre-existing ingress server.test failures (auth.modes shape mismatch from Ben's `24e9577e17`) confirmed unrelated.
Third slice of cron-trigger-scheduler.md v0. Wires the scheduler itself.
Cron-fired agents now actually run end-to-end once a session is fired.
Authoring loop closes in PR-4 (manual fire + observability).
New `RevisionStore.listLiveCronRevisions()`. v0 strategy per plan §6: a
SQL-side JOIN on `live_revision_id` + Node-side filter on
`spec.triggers.some(t => t.type === 'cron')`. Upgrade path
(`spec @> '{"triggers": [{"type": "cron"}]}'::jsonb` + a GIN index)
documented inline for when query volume forces it.
New `services/agent-janitor/src/cron-tick.ts`:
- `cronTick(deps, state)` lists live cron revisions, enumerates firings
via `cron-parser` within `(lastTickAt, now]`, applies the `catch_up`
policy (`all` / `most_recent` / `skip`) bounded by
`max_catch_up_age_seconds`, and fires each survivor through
`enqueueOrResume()` with `idempotencyKey =
cron:<rev>:<name>:<minute>`. The unique index from PR-1 + the
dedupe-on-conflict path from PR-2 handle two-replica racing without
any leader-election machinery.
- `lastTickAt` is per-process state (the plan §6 deliberate choice);
the unique index is the source of truth for "did we fire this
minute," not any persisted clock.
- Placeholder expansion (`{fired_at:iso}`, `{fired_at:date}`,
`{fired_at:week}`, `{cron_name}`, `{schedule}`) renders into both
`prompt` and `external_key`. Matches the whitelist the validator
enforces at freeze time (PR-1).
- ISO-8601 week date computed inline (no moment.js dep) including
year-boundary 53-week edge cases.
`services/agent-janitor/src/index.ts` invokes sweep + cronTick in
parallel inside the same `setInterval`. Independent try/catch on each
so a slow / throwing tick can't starve the other.
Workspace dep `@posthog/agent-ingress` added to agent-janitor so the
janitor can call the same `enqueueOrResume` chat/webhook/Slack use —
plan §2 is explicit about reusing that primitive.
Tests: 11 new cases in `cron-tick.test.ts` covering: empty no-op,
first-tick lastTickAt init, window-match firing, `all` /
`most_recent` / `skip` catch-up modes, `max_catch_up_age_seconds`
bound, two-replica idempotency, `trigger_metadata` stamping, prompt
+ external_key placeholder expansion, malformed-schedule recovery.
Existing janitor + chat + webhook e2e all pass.
…sweep
Final v0 slice of cron-trigger-scheduler.md. Closes the authoring loop
(you can now build and validate a cron agent without waiting for a real
firing) and gives the partial unique index a retention story.
`POST /revisions/:id/cron/fire` on the janitor:
- Body { cron_name, request_id?, fired_at? } — fires the named cron via
the same `fireCronManually()` helper that the scheduler walks, with a
dedupe key shape `cron-manual:<rev>:<name>:<requestId>` (distinct from
the scheduled `cron:<rev>:<name>:<minute>` so manual + scheduled
firings at the same minute don't collide).
- Optional `request_id` makes repeated UI clicks idempotent; without
it every call generates a fresh UUID and fires unconditionally.
- Plan §9 calls this out as load-bearing for authoring — you can't
sanely build a cron agent if "did the prompt work?" only has an
answer at the next real firing.
`fireCronManually()` lives next to `cronTick()` in `cron-tick.ts` and
shares the placeholder expansion + enqueueOrResume call. Renders a
`{ kind: 'cron', cron_name, schedule, fired_at, manual: true }`
metadata stamp so the session-detail UI can distinguish manual fires
from scheduled ones.
Idempotency-key retention sweep:
- New `SessionQueue.clearStaleIdempotencyKeys(cutoff)` (PG + Memory).
- New `SweepDeps.idempotencyKeyTtlMs` (default 30d, set 0 to disable);
surfaced as `IDEMPOTENCY_KEY_TTL_MS` env on the janitor config.
- `sweepOnce` runs it as Policy 4 and reports the count on
`SweepResult.cleared_idempotency_keys`.
- Keeps the partial unique index compact; by 30d any retry that would
have collided has long since happened (plan §6 "Retention").
What v0 still doesn't include (intentionally — these are E.1 + Django
work, separately scoped):
- Session-detail UI badge ("fired by <cron_name> at <fired_at>") — the
`trigger_metadata` JSONB is on the row, just needs Ben to wire the
console.
- MCP tool `agent-applications-revisions-cron-fire-create` — Django
side, lives in `products/agent_stack/backend/`. The HTTP endpoint
this PR adds is what the tool will call.
Tests: 3 new sweep cases (TTL default, custom TTL, disabled), 3 new
fireCronManually cases (idempotency-key shape, request_id dedupe,
unknown cron rejection), 3 new server.test.ts route cases (success,
dedupe, 404 on unknown cron). Existing sweep + server tests updated
for the new `cleared_idempotency_keys` field in SweepResult.
Janitor 110/110, shared persistence 35/35, webhook e2e 10/10.
… flow
Pins the customer experience nothing else exercised in concert: cron tick
→ enqueueOrResume → real worker claims → runner streams faux model →
session completes with the right `trigger_metadata` + idempotency_key.
Without this, a regression in any wiring point (cron seed shape,
metadata stamp, runner-side message handling, harness setup) would slip
past the per-layer unit tests.
Three new e2e cases in `services/agent-tests/src/cases/cron-trigger.test.ts`:
- Scheduled firing: `cronTick(deps, state)` at a controlled time fires
for the matched minute, runner completes, idempotency_key shape =
`cron:<rev>:<name>:<minute>`, trigger_metadata stamped, prompt
placeholder-expanded (`{fired_at:date}` → `2026-06-01`).
- Manual fire: `POST /revisions/:id/cron/fire` on the real janitor app
enqueues + the runner completes, idempotency_key shape =
`cron-manual:<rev>:<name>:<requestId>`, `trigger_metadata.manual` true.
- Manual-fire dedupe: two POSTs with the same `request_id` resolve to
the same session id end-to-end (real PG unique-violation round-trip).
Required two harness changes:
- Expose `cronTick`, `newCronTickState`, `fireCronManually` from
`agent-janitor`'s `lib.ts` so e2e cases can drive cron deterministically
without standing up the setInterval.
- Wire `revisions` + `bundles` into the harness's `buildJanitorApp` call.
They were already constructed in the cluster setup; the janitor just
wasn't getting them. Unblocks `/revisions/:id/cron/fire` (and would have
been needed eventually for any cron-related janitor endpoint).
Janitor sweep test in `cases/janitor.test.ts` updated for the
`cleared_idempotency_keys` field added in PR-4.
156/156 of the existing e2e suite still passes (excluding real-inference,
which fails fast without provider keys per harness contract).
Completes v0 of cron-trigger-scheduler.md per plan §10. Two pieces that
make the cron rollout actually usable from the human + agent surfaces:
1. **Cron fire badge on session-detail.** The session row already carries
`trigger_metadata` JSONB from PR-1; this surfaces it.
- Django: `trigger_metadata` field added to all three session
serializer shapes (list summary, detail, fleet-live). Janitor was
already including the field in its response; Django just wasn't
declaring it. OpenAPI regenerated.
- Frontend: `triggerMetadataToSessionTrigger()` in `apiClient.ts`
maps the raw JSONB shape onto the typed `SessionTrigger` the
playback consumes. New `CronTriggerBadge` component renders
"Fired by `<cron_name>` at `<fired_at>`" with a `manual` pill
when the firing came from the manual-fire endpoint vs the
scheduler. `SessionTrigger.cron` extended with `cronName` +
`manual` to carry the new fields; existing storybook fixtures
migrated.
2. **`agent-applications-revisions-cron-fire-create` MCP tool.** Wires
the janitor `POST /revisions/:id/cron/fire` endpoint as an MCP tool
so the concierge (and any other MCP client) can manually fire a cron
for authoring iteration.
- Django: new `cron_fire` action on `AgentRevisionViewSet` proxying
to `janitor_client.cron_fire`. Validates `cron_name` is non-empty;
`request_id` is optional (forwards None so the janitor mints a UUID).
- `janitor_client.py`: new `cron_fire(revision_id, cron_name,
request_id)` method.
- MCP definition added to `services/mcp/definitions/agent_stack.yaml`
with `destructive: false, idempotent: true` annotations (fire is
reversible — sessions can be cancelled — and same `request_id`
dedupes). Generated tool catalog regenerated.
Tests: 4 new Django tests in `test_cron_fire.py` exercising the action
contract (cron_name forwarding, optional request_id, missing-name
rejection, empty-name rejection). MCP tool-schemas snapshot updated to
include the new tool. Storybook `SessionDetail > CronFire` story now
shows the badge against the existing cron fixture.
Generated artefacts regenerated via `hogli build:openapi` — touches:
- `services/agent-console/src/generated/agent-stack.api.schemas.ts`
- `services/mcp/src/{generated,tools/generated}/agent_stack*.ts`
- `services/mcp/schema/{generated-tool-definitions,tool-definitions-all}.json`
- `products/agent_stack/frontend/generated/api*.ts`
- MCP unit snapshots under `tests/unit/__snapshots__/tool-schemas/`
MCP UI Apps size report
|
|
Size Change: 0 B Total Size: 81 MB ℹ️ View Unchanged
|
PR overviewThis pull request adds an initial cron trigger scheduler for agents, including janitor tick processing for scheduled trigger firings and catch-up behavior after missed runs. There is one open security concern remaining, while one prior issue has already been addressed. The remaining issue is a resource-exhaustion risk: a high-frequency cron trigger with broad catch-up settings could cause a janitor tick to enumerate and attempt a very large number of firings after downtime. Adding a per-trigger/per-tick cap or rejecting overly frequent schedules would substantially reduce the current risk. Open issues (1)
Fixed/addressed: 1 · PR risk: 5/10 |
|
🎭 Playwright report
These issues are not necessarily caused by your changes. |
Veria flagged the webhook idempotency key as forgeable: an attacker with reach to a public webhook could POST first with a guessed provider header (e.g. a Stripe event id leaked via a log) so a later legitimate delivery dedupes to the attacker's session and drops the real payload. Namespacing the dedupe key with a sha256 of the parsed body fixes this — same header + same body still collapses for provider retries, but spoof + different body lands a separate session. This is defence-in-depth; provider signature verification stays the primary guard. Also bundles the small CI-driven cleanups so the cron PR is green: - MCP exec-tool snapshot picks up the new `agent-applications- revisions-cron-fire-create` tool description. - `products/agent_stack/product.yaml` re-indented to satisfy oxfmt (pre-existing drift on the file, surfaced once the workflow ran against this branch). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-existing whitespace drift on `products/agent_stack/product.yaml` surfaced once `Frontend formatting` ran against this branch. Mirrors the same fix landed on the sibling cron PR (#61028). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
🔍 Migration Risk AnalysisWe've analyzed your migrations for potential risks. Summary: 2 Safe | 0 Needs Review | 0 Blocked ✅ SafeBrief or no lock, backwards compatible 📚 How to Deploy These Changes SafelyAddField: This operation acquires a brief lock but doesn't rewrite the table. Deployment uses lock timeouts with automatic retries, so lock contention will cause retries rather than connection pile-up. Last updated: 2026-06-01 23:05 UTC (fd32158) |
…ests
Three follow-ups from the cron-trigger review:
1. Add `cron_fire` to AgentRevisionViewSet.scope_object_write_actions.
Without it, posthog.permissions._get_required_scopes() returns None
for the action, and any PAT/OAuth caller (i.e. the MCP transport,
which is the only caller of the new fire tool) gets a 403 with
"this action does not support personal API key access". Session-auth
tests didn't catch it because they short-circuit before the scope
check.
2. Clamp the enumeration window in cronTick BEFORE walking firings.
cron-parser accepts 6-field sub-minute schedules and the zod schema
does not gate them; combined with the 7-day max_catch_up_age cap,
a paused janitor returning to life could enumerate ~604k firings
in a single tick only to throw them away in applyCatchUp. The clamp
keeps firings.length bounded by the cap regardless of how long the
janitor was down. Added a regression test asserting a 7-day pause
on `* * * * *` enumerates exactly 2 firings (= cap / minute) and
skips zero on the catch-up discard path.
3. Pin the idempotency-key guarantees against real Postgres. The
ingress's enqueueOrResume relies on:
- the partial unique index throwing 23505 on duplicate (app, key),
- NULL keys NOT colliding (partial WHERE NOT NULL),
- key collisions scoped to application_id (multi-app safe),
- clearStaleIdempotencyKeys nulling only rows past cutoff +
surfacing the affected count.
Previously only the in-memory MemorySessionQueue covered these
behaviours. Add four PG-backed tests in pg-impls.test.ts so a
future index change (e.g. dropping the WHERE clause, widening the
key scope) fails loudly.
| const windowFrom = lastTickAt > earliestAllowed ? lastTickAt : earliestAllowed | ||
| let firings: Date[] | ||
| try { | ||
| firings = enumerateFirings(cfg.schedule, cfg.timezone, windowFrom, now) |
There was a problem hiding this comment.
Medium: Unbounded cron catch-up work
An agent author can publish a cron trigger such as * * * * * * with catch_up: "all" and a 7-day max_catch_up_age_seconds, causing a janitor tick after a pause to enumerate and attempt hundreds of thousands of firings for that one trigger. Add a scheduler-side cap on firings per trigger/tick, or reject sub-minute/high-frequency schedules at validation time.
# Conflicts: # pnpm-lock.yaml # services/mcp/tests/unit/__snapshots__/tool-schemas/activity-log-list.json
Query snapshots: Backend query snapshots updatedChanges: 8 snapshots (8 modified, 0 added, 0 deleted) What this means:
Next steps:
|
…tick cap Addresses the unaddressed Veria review on #61028 (cron-tick.ts): a high-frequency schedule with catch_up=all and a large max_catch_up_age could fire hundreds of thousands of sessions in a single tick. - validate-spec: reject schedules that fire more than once a minute (cron_schedule_too_frequent), measured from the first two firings. - cron-tick: cap firings per trigger per tick at 100, keeping the most recent and logging the dropped tail (cron.tick.firings_capped). Also corrects a pre-existing failing test (max_catch_up_age_seconds bounds the catch-up) to match the clamp's boundary-exclusive behavior. Generated-By: PostHog Code Task-Id: 2799c922-58af-49d2-af31-fb1e59c74c30
Problem
TriggerSchemahas had acronvariant (schedule+timezone) since the v2 cutover, but nothing has woken those agents up — the field told you when but not what to do. Eight of the agents in_APP_IDEAS.md(weekly digests, gap analysis, growth review, etc.) plus self-healing v3 are blocked behind this. The plan atdocs/agent-platform/plans/cron-trigger-scheduler.mdis the design.This PR is v0 end-to-end per plan §10. Cron-fired agents now actually run, two janitor replicas can't double-fire the same minute, and authors can iterate on a cron prompt without waiting for the next real firing.
Changes
Six commits, each a discrete slice:
Schema foundation (
05837f1f32) — migration addsidempotency_key TEXT NULLtoagent_sessionwith a partial unique index on(application_id, idempotency_key) WHERE NOT NULL, plus a siblingtrigger_metadata JSONB.TriggerSchema.cron.configgainsname,prompt,external_key,catch_up(all/most_recent/skip),max_catch_up_age_seconds. Freeze-time validation invalidate-spec.tsparses the schedule viacron-parser, validates timezones viaIntl.DateTimeFormat, and whitelists the placeholder set ({fired_at:iso|date|week},{schedule},{cron_name}) on bothpromptandexternal_key.Idempotency-aware
enqueueOrResume(989a6e5f09) — newSessionQueue.findByIdempotencyKey. The enqueue path checks for a duplicate before insert and falls back on the unique-violation lookup so a race between the pre-check and the INSERT resolves to the original session id rather than throwing. The webhook trigger forwards provider-supplied idempotency headers (Idempotency-Key,X-Idempotency-Key,X-GitHub-Delivery) so Stripe / GitHub / Slack redeliveries dedupe automatically — a strict improvement over today.cronTick()scheduler (a127f7a428) — runs alongsidesweepOnceon the samesetInterval. Lists every live cron revision, enumerates firings in(lastTickAt, now]viacron-parser, applies thecatch_uppolicy bounded bymax_catch_up_age_seconds, expands placeholders, and fires each survivor throughenqueueOrResumewithidempotencyKey = cron:<rev>:<name>:<minute>. Two janitor replicas hit the same partial unique index; one wins and the other no-ops via the lookup path.lastTickAtis per-process — the unique index is the source of truth for "did we fire this minute," not any persisted clock.Manual fire endpoint + 30d sweep (
ee988bc906) —POST /revisions/:id/cron/fireon the janitor +fireCronManually()helper. Dedupe key shapecron-manual:<rev>:<name>:<requestId>(distinct from the scheduled form so the two never collide). Authoring loop closes: you can iterate on a cron prompt without waiting for the next firing. NewclearStaleIdempotencyKeys(cutoff)onSessionQueue;sweepOnceruns it as Policy 4 with a 30-day default so the partial index stays compact.e2e coverage (
50303eca92) —cases/cron-trigger.test.tsdrivescronTickdirectly against the harness's real Postgres + worker + faux pi-ai. Three cases: scheduled firing → completed session, manual fire via janitor HTTP, manual-fire dedupe round-trip. Wiresrevisions+bundlesinto the harness janitor (they were already constructed but not passed in). ExportscronTick,newCronTickState,fireCronManuallyfrom@posthog/agent-janitor'slib.ts.UI badge + MCP fire tool (
caa920c304) — Django serializers exposetrigger_metadataon session responses. NewCronTriggerBadgeonSessionDetailrenders "Fired by<cron_name>at<fired_at>" with amanualpill for manually-fired sessions. Newcron_fireviewset action proxies to the janitor, surfaced as MCP toolagent-applications-revisions-cron-fire-createso the concierge (and any Claude Code / Cursor user) can fire a cron through MCP.The
idempotency_keyprimitive (1 + 2) lands webhook double-delivery safety as an independent side effect — the same key shape that cron uses is what the webhook trigger now forwards from provider headers.How did you test this code?
spec.test.ts(13 new cron-config cases),validate-spec.test.ts(7 freeze-time cases),enqueue.test.ts(6 idempotency cases including a race-window simulation via Proxy-wrapped queue),sweep.test.ts(3 retention cases),cron-tick.test.ts(14 scheduler cases covering every catch-up mode, max-age bound, two-replica dedupe, placeholder expansion, malformed-schedule recovery, and the manual-fire helper),server.test.ts(3 route cases).test_cron_fire.pycovering the action's contract against a mockedjanitor_client. Ran viahogli testagainst the real Django test DB — 4 passed in 189s.cases/cron-trigger.test.ts+ 2 webhook redelivery cases incases/webhook-mcp-trigger.test.ts. The full agent-tests suite (excludingreal-inference.test.tswhich fails fast without provider keys) was 156/156 green.agent-applications-revisions-cron-fire-createnow lands in the tool-schema snapshot suite.Automatic notifications
Docs update
docs/agent-platform/plans/cron-trigger-scheduler.md— the design doc this PR implements. Ben's last edit on it (078ce5bb89 wip — cron plan) is the version this PR is built against.docs/agent-platform/plans/_TODO.mdalready lists cron as in-progress and assigned to me (per the audit doc commit onass); no doc updates in this PR.🤖 Agent context
Agent-authored (Claude Opus 4.7) via Claude Code. Tools used: Read, Edit, Write, Bash, MultiEdit, Monitor (for waiting on
hogli build:openapi+ Django test runs). Six commits in the history because we discussed the rollout up front and treated each as an independently reviewable slice — same pattern as the runtime-mcps PR.Key design decisions made along the way:
idempotency_keyis the cross-cutting primitive, not a cron-specific column. PR-2 wires it for webhook redelivery as a side effect; future triggers (Slack message dedupe, scheduled retries) compose without new schema. Stripe-shaped semantics: duplicate → no-op, return original. Distinct fromexternal_keywhich is "append on collision" — the two compose orthogonally.lastTickAtis per-process in memory, not persisted. Multi-replica safety comes from the partial unique index on(application_id, idempotency_key), not from any persisted clock — the plan §6 makes this deliberate. The catch-up policy handles missed firings after a restart.fireCronManuallylives next tocronTickrather than getting duplicated into the route handler. Same execution path means a future bug in the scheduler's seed-message construction also breaks manual fire — caught once, in one place.@posthog/agent-ingressadded toagent-janitorso the janitor calls the sameenqueueOrResumechat / webhook / Slack use (plan §2 was explicit about this). The agent-shared library doesn't carry ACL / elevation logic, so extracting a third primitive would have been bigger than this PR's scope.destructive: false, idempotent: true— firing a cron is reversible (the session can be cancelled) and the samerequest_iddedupes; gating it as destructive would surface a confirmation modal in clients that respect annotations, which is the wrong UX for an authoring tool.Reviewer asks: a second pair of eyes on (1) the catch-up
skipsemantics — I went with "fire only if there's exactly one survivor; drop everything if there are multiple missed firings" but the plan §7 could read as "always drop missed firings, only fire the live one" — happy to flip if the latter reads better. (2) TheCronTriggerBadgerendering — visual look-and-feel is Ben's call; happy to tweak.