spec(slice-b): B.1 trunk drafts — scheduler, kensa-executor, transaction-log-writer#415
Draft
remyluslosius wants to merge 2 commits into
Draft
spec(slice-b): B.1 trunk drafts — scheduler, kensa-executor, transaction-log-writer#415remyluslosius wants to merge 2 commits into
remyluslosius wants to merge 2 commits into
Conversation
…nsaction-log-writer
Three draft Specter specs for Slice B's "trunk" wave: the end-to-end
path that takes a scheduled scan through Kensa execution and into the
write-on-change transaction log. All three are status=draft, tier=1
(100% coverage target once implementation lands).
system-scheduler (11 ACs)
Adaptive scan scheduling. 60s cron tick, SKIP LOCKED dispatch,
tier intervals from Ed25519-signed `schedules` policy, snapshot
policy_version at enqueue so reloads don't affect in-flight scans.
48h max interval cap. Maintenance mode honors policy. Manual POST
/scans bypasses the schedule entirely.
system-kensa-executor (12 ACs)
Bridge from OpenWatch credentials to Kensa Go module. In-memory
SSH key parsing only — never writes to /tmp (the major fix vs the
Python implementation). Per-host concurrency guard, policy-tunable
timeout, ctx cancellation honored, credential zero-after-use,
emits scan.started/completed/failed. Source-inspection ACs verify
no engine-abstraction interface (Kensa is the only engine in B).
system-transaction-log-writer (12 ACs)
Write-on-change persistence: host_rule_state UPSERT every scan,
transactions INSERT only on state change or first-seen. Single DB
transaction per Apply call, idempotent on scan_id, FK constraints
with ON DELETE RESTRICT. Evidence JSONB validated against the
KensaEvidence OpenAPI schema. Explicitly drops Python's
scan_baselines table — the prior transactions row IS the baseline.
OpenAPI delta
Adds Scan.policy_version field (snapshotted policy version at
enqueue). The existing Scan.initiator object already carries the
scheduler-vs-manual distinction.
Coverage status
All 33 specs parse and pass structural checks. Coverage shows the
3 new drafts at 0% — expected, no tests yet. Implementation work
follows in per-component PRs:
B.1a system-scheduler impl + tests
B.1b system-kensa-executor impl + tests
B.1c system-transaction-log-writer impl + tests
Each will bring its spec from draft -> approved and lift coverage
to 100% before merge. Until then this PR's coverage gate will fail
by design.
Pass the three drafts through a security-first lens. Net adds: 9
constraints, 11 acceptance criteria. Coverage of new ACs stays at
the expected 0% until impl lands.
system-scheduler (+4 constraints, +4 ACs)
C-08 minimum 5-minute interval floor (anti scan-storm DoS)
C-09 audit emission on every host_compliance_schedule UPDATE
C-10 signing-key revocation list checked alongside Ed25519 sig
C-11 HMAC over job payload, verified at dequeue
AC-10 amended: policy verified at boot AND every reload
ACs 12-15 cover the new constraints
system-kensa-executor (+3 constraints, +4 ACs)
C-09 SSH host-key verification via internal/ssh/known_hosts;
first-connect policy from policy.Schedules.HostKeyPolicy
C-10 per-rule evidence cap at 10 MB (target-OOM defense)
C-11 per-host backoff after 3 consecutive failures
ACs 13-16 cover host-key, oversize, decryption-failure audit,
and backoff state visible to scheduler
system-transaction-log-writer (+2 constraints, +3 ACs)
C-04 amended: scan_id MUST be server-generated UUIDv4 (anti-replay)
C-09 sqlc-only DB access; no string-concat SQL in this package
C-10 256 KB per-rule evidence cap at writer (defense in depth on
top of executor's 10 MB cap)
ACs 13-15 cover sqlc-only source inspection, oversize rejection,
and writer.apply.failed audit emission
Deferred (per discussion): evidence integrity hash defense-in-depth
In-process trust boundary makes this belt-and-suspenders. Revisit
if threat model warrants persisting a SHA-256 alongside evidence.
Local validation
specter parse: 33/33 PASS
specter check: all constraints referenced, no orphans
specter coverage: 3 new drafts still at 0% (by design, no impl yet)
This was referenced May 28, 2026
remyluslosius
added a commit
that referenced
this pull request
May 29, 2026
…00%) (#418) * feat(scheduler): B.1a foundation — spec, migration, audit events, ladder logic First reviewable chunk of system-scheduler implementation. The trunk is laid; the dispatcher + cron tick + HMAC + post-scan update work follows in subsequent commits before this PR merges. Spec Promoted system-scheduler from draft → approved. Identical AC set as the draft in PR #415 (15 ACs, 11 constraints) — this PR is where the spec actually lands on main alongside its implementation. Migration 0011 - host_compliance_schedule (host_id PK, compliance_state, score, has_critical, current_interval_minutes, next_scheduled_scan, last_scan_completed_at, maintenance_mode, maintenance_until, policy_version_at_last_scan, timestamps) - host_backoff_state (host_id, probe_type {scan|intel}, consecutive_failures, suppress_until, last_error_code, ...) Per the open-question lean accepted earlier: backoff lives in a separate table so executor-domain writes don't touch scheduler-owned schedule columns. Index supports the dispatcher's WHERE next_scheduled_scan <= now() AND maintenance_mode = FALSE pattern. Audit events (codegen) Added category "scheduler" + 7 codes: scheduler.startup.failed scheduler.schedule.updated scheduler.policy.reload.rejected scheduler.policy.clamped scheduler.policy.revoked_key.rejected scheduler.job.hmac_rejected scheduler.tick.dispatched Each carries a typed detail_schema. events.gen.go regenerated; total registry now 103 events. internal/scheduler package - types.go: ComplianceState enum (5 tiers), TierLadder type, hard safety floors (MinIntervalFloor=5m, MaxIntervalCap=48h), LoadResult + ClampRecord types. Detailed package comment documents architectural choices (scheduler-owned schedule writes, separate backoff table, manual scans bypass scheduler entirely). - ladder.go: LoadIntervals (pure function consuming PolicyTiers, returns clamped TierLadder + ClampRecords for audit emission), NextScanFor (lastFinishedAt + ladder[state], clamps to ceiling, zero-time signals immediate-schedule). Tests (4 of 15 ACs satisfied — 26.6% coverage on the spec) AC-01 TestLoadIntervals_TierLookup_Default48hForMissingTier AC-02 TestNextScanFor_AddsLadderInterval TestNextScanFor_ClampsToMaxIntervalCap TestNextScanFor_ZeroLastFinishedAtMeansImmediate AC-09 TestLoadIntervals_PolicyVersionSnapshotted AC-12 TestLoadIntervals_ClampsBelow5MinToFloor TestLoadIntervals_ClampsAbove48hToCeiling TestLoadIntervals_NoClampForInBudgetValues Spec promotion + missing 11 ACs means CI coverage will fail (T1 threshold = 100%). PR stays draft until the remaining ACs land: AC-03 Cron tick at 60s, no double-dispatch on restart AC-04 Dispatcher uses FOR UPDATE SKIP LOCKED AC-05 Maintenance mode skips dispatch, advances next_scan after expiry AC-06 Job payload includes policy_version, host_id, framework_id AC-07 Manual POST /scans bypasses schedule (no row writes) AC-08 UpdateAfterScan recomputes state + next_scheduled_scan AC-10 Bad policy at boot refuses startup + audit AC-11 Metrics counters AC-13 Every host_compliance_schedule UPDATE emits audit AC-14 Revoked-key policy rejected even with valid sig AC-15 Tampered job payload fails HMAC verification at dequeue These break down into: - 2 more pure-logic ACs (AC-11 metrics, AC-08 state derivation) - 5 DB-integration ACs (AC-03/04/05/06/07 — need pgxpool + real schema) - 3 audit/policy ACs (AC-10, AC-13 — emission verification) - 2 HMAC ACs (AC-14, AC-15 — needs internal/secretkey + HKDF) * feat(scheduler): B.1a — pure-logic ACs (state derivation, metrics, HMAC, startup) Adds 5 more ACs of system-scheduler coverage. All pure functions; no DB integration in this chunk. After this commit B.1a sits at 9/15 = 60% coverage. The remaining 6 ACs (AC-03/04/05/07/13/14) all require DB integration or external infrastructure; they land in the next chunk. update.go — AC-08 StateFromScore(score, hasCritical) → ComplianceState (5-bucket mapping with hasCritical override). UpdateAfterScan combines it with NextScanFor to produce a ScanResult ready for the (later) Service.PersistAfterScan UPSERT. metrics.go — AC-11 Metrics struct with atomic counters: DueCount, DispatchedCount, SkippedMaintenanceCount, SkippedBackoffCount, RefuseCount, PolicyClampedCount, HMACRejectCount, plus SetLastTick/LastTick. Snapshot produces a typed MetricsSnapshot ready for JSON serialization in the admin metrics handler. hmac.go — AC-06, AC-15 JobPayload (HostID, FrameworkID, PolicyVersion, EnqueuedAt) with a canonical Encode for stable HMAC. Sign / Verify use HMAC-SHA256 + constant-time compare. DeriveQueueKey uses HKDF-SHA256 from the DEK with info "openwatch-queue-v1" — per the locked open-question decision (option C: HKDF from credential DEK). Tests verify: round-trip; tampering each of the 4 fields produces a different HMAC and is rejected; wrong key rejected; key derivation is deterministic for the same DEK and distinct for different DEKs. startup.go — AC-10 PolicyLoadError enum (policy_missing / signature_invalid / revoked_key / parse_error). Startup(ctx, emit, path, reason) emits scheduler.startup.failed via the injected EmitFunc and returns ErrStartupRefused on any non-OK reason. EmitFunc matches audit.Emit's signature so production wiring is direct; tests use a fake recorder. Coverage after this commit (9 of 15 ACs): AC-01, AC-02, AC-06, AC-08, AC-09, AC-10, AC-11, AC-12, AC-15 Uncovered (6 ACs — all require DB or integration scaffolding): AC-03 cron tick @60s + no double-dispatch on restart AC-04 dispatcher SELECT ... FOR UPDATE SKIP LOCKED AC-05 maintenance_mode skips dispatch; next_scan still advances AC-07 manual POST /scans bypasses schedule AC-13 every host_compliance_schedule UPDATE emits audit AC-14 revocation list mechanism + scheduler.policy.revoked_key.rejected * feat(scheduler): B.1a — Service + Dispatch (DB integration; AC-04/05/13) Live scheduler with the SKIP LOCKED dispatcher. Brings system-scheduler coverage from 60% → 80% (12 of 15 ACs). service.go Service struct holding pool, ladder, policyVersion, hmacKey, emit, metrics, Now (clock injection for tests), DefaultFramework. Dispatch(ctx) — one pass: BEGIN tx SELECT host_id, compliance_state, next_scheduled_scan FROM host_compliance_schedule WHERE next_scheduled_scan <= now() AND maintenance_mode = false ORDER BY next_scheduled_scan FOR UPDATE SKIP LOCKED LIMIT 100 For each row: build JobPayload (host_id, framework_id, policy_version, enqueued_at) HMAC-sign with the derived queue key queue.Enqueue under job_type "scan" UPDATE host_compliance_schedule.next_scheduled_scan forward emit scheduler.schedule.updated COMMIT emit scheduler.tick.dispatched emitScheduleUpdated helper produces the typed audit event with prior + new state in detail. Metrics counters incremented inline. service_test.go (integration; requires OPENWATCH_TEST_DSN) freshPool helper applies migrations through 0011 and truncates the scheduler-touched tables. seedUser/seedHost/seedSchedule build the FK chain. newTestService constructs a Service with a deterministic clock and a fake EmitFunc that records calls. TestDispatch_SkipLocked_DisjointClaim (AC-04) Seeds 12 due hosts. Runs two concurrent Dispatch goroutines. Asserts that countA + countB == 12 (no double-dispatch, no misses) AND that the job_queue has exactly 12 scan rows. TestDispatch_FuturesNotClaimed (AC-04 negative) All hosts have next_scheduled_scan in the future → dispatched == 0. TestDispatch_MaintenanceMode_RowSkipped (AC-05) One maintenance host + one normal host both due now → only normal is claimed. Maintenance row is NOT mutated. TestDispatch_EmitsScheduleUpdated (AC-13) Verifies exactly one scheduler.schedule.updated per dispatched host AND one scheduler.tick.dispatched per tick. Validates detail keys (host_id, change_kind=next_scan_advanced) via JSON decode. ACs covered after this commit (12 of 15): AC-01, AC-02, AC-04, AC-05, AC-06, AC-08, AC-09, AC-10, AC-11, AC-12, AC-13, AC-15 Uncovered (3): AC-03 cron tick @60s + no double-dispatch on restart AC-07 manual POST /scans bypasses schedule AC-14 revocation list mechanism + scheduler.policy.revoked_key.rejected * feat(scheduler): B.1a — final 3 ACs (AC-03/07/14) — 100% coverage system-scheduler now at 100% spec coverage. specter sync passes end-to-end. run.go — AC-03 Run(ctx, interval) wires Service.Dispatch behind internal/cron at DefaultTickInterval = 60 * time.Second. interval = 0 means use the default; tests pass a sub-second cadence so they don't block. TickFunc inside Run logs and returns Dispatch errors; the cron loop keeps running on transient failures. revocation.go — AC-14 RevocationList: set of revoked Ed25519 signing-key fingerprints. Loaded at boot from a separate revocation file (path from config). NewRevocationList / Has / Size; nil-safe. Service.ValidateReload(ctx, fp, version, list) returns PolicyLoadOK or PolicyLoadRevokedKey; on rejection, emits scheduler.policy.revoked_key.rejected with detail.key_fingerprint and detail.policy_version. Previous valid policy stays active. run_test.go covers AC-03 and AC-07: TestDefaultTickInterval_Is60Seconds — runtime constant check TestRun_SourceMentions60SecondInterval — source-inspection of run.go TestDispatch_NoDoubleDispatch_OnRepeatedTick — DB test confirming a second immediate Dispatch claims 0 rows after the first advanced next_scheduled_scan TestServer_NoSchedulerTableInScanHandlers — source-inspection of internal/server/*.go (non-test files). Asserts no .go file in the HTTP layer references "host_compliance_schedule". The scheduler is the only writer of that table; manual POST /scans cannot bypass-then- silently-mutate the schedule. revocation_test.go covers AC-14: RevocationList Has matches added fingerprints; empty/nil safe. ValidateReload accepts un-revoked keys without emitting audit. ValidateReload rejects revoked keys, emits the typed event with the expected detail keys (key_fingerprint + policy_version). Coverage after this commit: 15 of 15 ACs = 100%. AC-01 ladder default + missing-tier fallback (existing) AC-02 NextScanFor arithmetic + ceiling clamp (existing) AC-03 60s tick + no double-dispatch on restart (this commit) AC-04 FOR UPDATE SKIP LOCKED dispatch (prior commit) AC-05 maintenance_mode skips dispatch (prior commit) AC-06 payload host_id/framework_id/policy_version (existing) AC-07 manual POST /scans bypasses schedule (this commit) AC-08 state derivation + UpdateAfterScan (existing) AC-09 policy version snapshotted (existing) AC-10 boot refusal + audit on bad policy (existing) AC-11 metrics counters (existing) AC-12 policy clamped to safety floor + ceiling (existing) AC-13 every schedule UPDATE emits audit (prior commit) AC-14 revocation list rejects revoked-key policy (this commit) AC-15 HMAC tamper-rejection across all 4 fields (existing) specter sync: 31 specs / all pass / coverage thresholds met. * fix(scheduler): make lint clean — gofmt + remove dead code + gosec annotations * fix(scheduler): guard emitCall append with mutex — race-clean concurrent Dispatch test
remyluslosius
added a commit
that referenced
this pull request
May 29, 2026
…, 100%) Closes Slice B.1 trunk. Compliance write-on-change persistence for the Kensa-executor pipeline, complete with all 15 acceptance criteria. Spec Promoted system-transaction-log-writer from draft → approved. 15 ACs identical to PR #415's draft. Migration 0011_host_compliance_schedule.sql Copy of B.1a's migration. Identical content; goose treats duplicate identical migrations as no-ops when both B.1a (#418) and this PR merge. Migration 0012_transaction_log.sql - host_rule_state: ONE row per (host, rule). Current state, UPSERTed every Apply. Status CHECK constraint enforces the closed enum (pass/fail/skipped/error). - transactions: append-only state-change log. UNIQUE(scan_id, rule_id) enforces idempotency at the schema level (spec C-04). - Both tables FK to hosts(id) ON DELETE RESTRICT — historical findings outlive their host references (spec C-06). - Indexes: by (host_id, status) for current-fleet queries; by (host_id, rule_id, occurred_at DESC) for point-in-time temporal queries; by scan_id for idempotency check. Audit events Added two new codes to events.yaml: finding.persisted one per transactions row (spec AC-09) writer.apply.failed per Apply-rollback (spec AC-15) Codegen produces audit.FindingPersisted and audit.WriterApplyFailed constants; events.gen.go grew from 96 → 98 events total. internal/transactionlog package types.go ApplyBatch, Result, Status / ChangeKind / FailureReason enums, sentinel errors, MaxEvidenceBytes (256 KiB cap). writer.go Writer.Apply: single-tx-per-call. Steps: 1. Validate every result (status, evidence size + shape). Spec AC-08 / AC-14 reject BEFORE any INSERT — atomic. 2. Idempotency: if any transactions row exists for the scan_id, no-op (spec AC-05). 3. BEGIN tx. 4. Per result: read prior host_rule_state, decide change_kind (first_seen / state_changed / severity_changed / none), INSERT transactions only on change, UPSERT host_rule_state with COALESCE-style last_changed_at preservation. 5. COMMIT. 6. emit finding.persisted per state-change AFTER commit (audit reflects what persisted, not what attempted). On any error: tx.Rollback + emit writer.apply.failed with classified reason (FK / deadlock / oversize / sqlc). source_test.go (AC-12, AC-13) AC-12: walks every .go and .sql file under app/ asserting no scan_baselines / ScanBaseline references — the Python-era baselines table is explicitly dropped. AC-13: AST-parses every internal/transactionlog .go file asserting no database/sql import and no .Exec/.Query/.QueryRow whose SQL arg uses fmt.Sprintf or string concatenation. writer_test.go (AC-01 through AC-11, AC-14, AC-15) 16 sub-tests covering the writer behavior end-to-end against real Postgres: AC-01 pg_stat_database.xact_commit delta < 10 after 50-rule Apply AC-02 N first_seen rows on first scan AC-03 identical rescan = 0 new transactions, check_count++ AC-04 one flip pass→fail = exactly 1 state_changed row AC-05 same scan_id replay = no-op AC-06 FK violation rolls back the whole batch (zero rows persist) AC-07 DELETE hosts with extant transactions fails (ON DELETE RESTRICT) AC-08 non-JSON-object evidence rejected (table-driven over 4 cases) AC-09 finding.persisted emission count = transactions row count AC-10 1000-rule Apply ≤ 2 seconds wall-clock AC-11 50 concurrent Applys against distinct hosts complete AC-14 oversize evidence rejected BEFORE INSERT; writer.apply.failed audit emitted with reason=evidence_oversize AC-15 FK violation emits writer.apply.failed with reason=fk_violation and detail.rule_count_attempted populated Local validation go build ./internal/transactionlog/: clean go vet ./internal/transactionlog/: clean go test -race ./internal/transactionlog/ (unit + integration with real Postgres + migrations 0001-0012): 16 sub-tests pass specter coverage: system-transaction-log-writer 15/15 = 100% Architectural choices worth flagging - Atomicity: validation phase runs BEFORE BEGIN, so oversize-evidence rejection is genuinely zero-INSERT (no rollback needed). - Pre-commit pending audit emissions: scan.completed / finding.persisted fire only AFTER tx.Commit succeeds, so the audit log truly reflects persisted state. - Evidence schema check: minimal "must be JSON object" gate today; full KensaEvidence-schema validation slots into validateResult when the OpenAPI components.schemas.KensaEvidence shape lands. Slice B.1 trunk status B.1a scheduler PR #418 — 15/15 ACs, ready for review B.1b kensa-executor PR #419 — 16/16 ACs, ready for review B.1c transaction-log-writer this PR — 15/15 ACs Total Slice B.1: 46 ACs covered across 3 specs. Ready to move on to B.2 (liveness loop + drift detector) once these merge.
remyluslosius
added a commit
that referenced
this pull request
May 29, 2026
…, 100%) Closes Slice B.1 trunk. Compliance write-on-change persistence for the Kensa-executor pipeline, complete with all 15 acceptance criteria. Spec Promoted system-transaction-log-writer from draft → approved. 15 ACs identical to PR #415's draft. Migration 0011_host_compliance_schedule.sql Copy of B.1a's migration. Identical content; goose treats duplicate identical migrations as no-ops when both B.1a (#418) and this PR merge. Migration 0012_transaction_log.sql - host_rule_state: ONE row per (host, rule). Current state, UPSERTed every Apply. Status CHECK constraint enforces the closed enum (pass/fail/skipped/error). - transactions: append-only state-change log. UNIQUE(scan_id, rule_id) enforces idempotency at the schema level (spec C-04). - Both tables FK to hosts(id) ON DELETE RESTRICT — historical findings outlive their host references (spec C-06). - Indexes: by (host_id, status) for current-fleet queries; by (host_id, rule_id, occurred_at DESC) for point-in-time temporal queries; by scan_id for idempotency check. Audit events Added two new codes to events.yaml: finding.persisted one per transactions row (spec AC-09) writer.apply.failed per Apply-rollback (spec AC-15) Codegen produces audit.FindingPersisted and audit.WriterApplyFailed constants; events.gen.go grew from 96 → 98 events total. internal/transactionlog package types.go ApplyBatch, Result, Status / ChangeKind / FailureReason enums, sentinel errors, MaxEvidenceBytes (256 KiB cap). writer.go Writer.Apply: single-tx-per-call. Steps: 1. Validate every result (status, evidence size + shape). Spec AC-08 / AC-14 reject BEFORE any INSERT — atomic. 2. Idempotency: if any transactions row exists for the scan_id, no-op (spec AC-05). 3. BEGIN tx. 4. Per result: read prior host_rule_state, decide change_kind (first_seen / state_changed / severity_changed / none), INSERT transactions only on change, UPSERT host_rule_state with COALESCE-style last_changed_at preservation. 5. COMMIT. 6. emit finding.persisted per state-change AFTER commit (audit reflects what persisted, not what attempted). On any error: tx.Rollback + emit writer.apply.failed with classified reason (FK / deadlock / oversize / sqlc). source_test.go (AC-12, AC-13) AC-12: walks every .go and .sql file under app/ asserting no scan_baselines / ScanBaseline references — the Python-era baselines table is explicitly dropped. AC-13: AST-parses every internal/transactionlog .go file asserting no database/sql import and no .Exec/.Query/.QueryRow whose SQL arg uses fmt.Sprintf or string concatenation. writer_test.go (AC-01 through AC-11, AC-14, AC-15) 16 sub-tests covering the writer behavior end-to-end against real Postgres: AC-01 pg_stat_database.xact_commit delta < 10 after 50-rule Apply AC-02 N first_seen rows on first scan AC-03 identical rescan = 0 new transactions, check_count++ AC-04 one flip pass→fail = exactly 1 state_changed row AC-05 same scan_id replay = no-op AC-06 FK violation rolls back the whole batch (zero rows persist) AC-07 DELETE hosts with extant transactions fails (ON DELETE RESTRICT) AC-08 non-JSON-object evidence rejected (table-driven over 4 cases) AC-09 finding.persisted emission count = transactions row count AC-10 1000-rule Apply ≤ 2 seconds wall-clock AC-11 50 concurrent Applys against distinct hosts complete AC-14 oversize evidence rejected BEFORE INSERT; writer.apply.failed audit emitted with reason=evidence_oversize AC-15 FK violation emits writer.apply.failed with reason=fk_violation and detail.rule_count_attempted populated Local validation go build ./internal/transactionlog/: clean go vet ./internal/transactionlog/: clean go test -race ./internal/transactionlog/ (unit + integration with real Postgres + migrations 0001-0012): 16 sub-tests pass specter coverage: system-transaction-log-writer 15/15 = 100% Architectural choices worth flagging - Atomicity: validation phase runs BEFORE BEGIN, so oversize-evidence rejection is genuinely zero-INSERT (no rollback needed). - Pre-commit pending audit emissions: scan.completed / finding.persisted fire only AFTER tx.Commit succeeds, so the audit log truly reflects persisted state. - Evidence schema check: minimal "must be JSON object" gate today; full KensaEvidence-schema validation slots into validateResult when the OpenAPI components.schemas.KensaEvidence shape lands. Slice B.1 trunk status B.1a scheduler PR #418 — 15/15 ACs, ready for review B.1b kensa-executor PR #419 — 16/16 ACs, ready for review B.1c transaction-log-writer this PR — 15/15 ACs Total Slice B.1: 46 ACs covered across 3 specs. Ready to move on to B.2 (liveness loop + drift detector) once these merge.
remyluslosius
added a commit
that referenced
this pull request
May 29, 2026
…, 100%) Closes Slice B.1 trunk. Compliance write-on-change persistence for the Kensa-executor pipeline, complete with all 15 acceptance criteria. Spec Promoted system-transaction-log-writer from draft → approved. 15 ACs identical to PR #415's draft. Migration 0011_host_compliance_schedule.sql Copy of B.1a's migration. Identical content; goose treats duplicate identical migrations as no-ops when both B.1a (#418) and this PR merge. Migration 0012_transaction_log.sql - host_rule_state: ONE row per (host, rule). Current state, UPSERTed every Apply. Status CHECK constraint enforces the closed enum (pass/fail/skipped/error). - transactions: append-only state-change log. UNIQUE(scan_id, rule_id) enforces idempotency at the schema level (spec C-04). - Both tables FK to hosts(id) ON DELETE RESTRICT — historical findings outlive their host references (spec C-06). - Indexes: by (host_id, status) for current-fleet queries; by (host_id, rule_id, occurred_at DESC) for point-in-time temporal queries; by scan_id for idempotency check. Audit events Added two new codes to events.yaml: finding.persisted one per transactions row (spec AC-09) writer.apply.failed per Apply-rollback (spec AC-15) Codegen produces audit.FindingPersisted and audit.WriterApplyFailed constants; events.gen.go grew from 96 → 98 events total. internal/transactionlog package types.go ApplyBatch, Result, Status / ChangeKind / FailureReason enums, sentinel errors, MaxEvidenceBytes (256 KiB cap). writer.go Writer.Apply: single-tx-per-call. Steps: 1. Validate every result (status, evidence size + shape). Spec AC-08 / AC-14 reject BEFORE any INSERT — atomic. 2. Idempotency: if any transactions row exists for the scan_id, no-op (spec AC-05). 3. BEGIN tx. 4. Per result: read prior host_rule_state, decide change_kind (first_seen / state_changed / severity_changed / none), INSERT transactions only on change, UPSERT host_rule_state with COALESCE-style last_changed_at preservation. 5. COMMIT. 6. emit finding.persisted per state-change AFTER commit (audit reflects what persisted, not what attempted). On any error: tx.Rollback + emit writer.apply.failed with classified reason (FK / deadlock / oversize / sqlc). source_test.go (AC-12, AC-13) AC-12: walks every .go and .sql file under app/ asserting no scan_baselines / ScanBaseline references — the Python-era baselines table is explicitly dropped. AC-13: AST-parses every internal/transactionlog .go file asserting no database/sql import and no .Exec/.Query/.QueryRow whose SQL arg uses fmt.Sprintf or string concatenation. writer_test.go (AC-01 through AC-11, AC-14, AC-15) 16 sub-tests covering the writer behavior end-to-end against real Postgres: AC-01 pg_stat_database.xact_commit delta < 10 after 50-rule Apply AC-02 N first_seen rows on first scan AC-03 identical rescan = 0 new transactions, check_count++ AC-04 one flip pass→fail = exactly 1 state_changed row AC-05 same scan_id replay = no-op AC-06 FK violation rolls back the whole batch (zero rows persist) AC-07 DELETE hosts with extant transactions fails (ON DELETE RESTRICT) AC-08 non-JSON-object evidence rejected (table-driven over 4 cases) AC-09 finding.persisted emission count = transactions row count AC-10 1000-rule Apply ≤ 2 seconds wall-clock AC-11 50 concurrent Applys against distinct hosts complete AC-14 oversize evidence rejected BEFORE INSERT; writer.apply.failed audit emitted with reason=evidence_oversize AC-15 FK violation emits writer.apply.failed with reason=fk_violation and detail.rule_count_attempted populated Local validation go build ./internal/transactionlog/: clean go vet ./internal/transactionlog/: clean go test -race ./internal/transactionlog/ (unit + integration with real Postgres + migrations 0001-0012): 16 sub-tests pass specter coverage: system-transaction-log-writer 15/15 = 100% Architectural choices worth flagging - Atomicity: validation phase runs BEFORE BEGIN, so oversize-evidence rejection is genuinely zero-INSERT (no rollback needed). - Pre-commit pending audit emissions: scan.completed / finding.persisted fire only AFTER tx.Commit succeeds, so the audit log truly reflects persisted state. - Evidence schema check: minimal "must be JSON object" gate today; full KensaEvidence-schema validation slots into validateResult when the OpenAPI components.schemas.KensaEvidence shape lands. Slice B.1 trunk status B.1a scheduler PR #418 — 15/15 ACs, ready for review B.1b kensa-executor PR #419 — 16/16 ACs, ready for review B.1c transaction-log-writer this PR — 15/15 ACs Total Slice B.1: 46 ACs covered across 3 specs. Ready to move on to B.2 (liveness loop + drift detector) once these merge.
remyluslosius
added a commit
that referenced
this pull request
May 29, 2026
…, 100%) Closes Slice B.1 trunk. Compliance write-on-change persistence for the Kensa-executor pipeline, complete with all 15 acceptance criteria. Spec Promoted system-transaction-log-writer from draft → approved. 15 ACs identical to PR #415's draft. Migration 0011_host_compliance_schedule.sql Copy of B.1a's migration. Identical content; goose treats duplicate identical migrations as no-ops when both B.1a (#418) and this PR merge. Migration 0012_transaction_log.sql - host_rule_state: ONE row per (host, rule). Current state, UPSERTed every Apply. Status CHECK constraint enforces the closed enum (pass/fail/skipped/error). - transactions: append-only state-change log. UNIQUE(scan_id, rule_id) enforces idempotency at the schema level (spec C-04). - Both tables FK to hosts(id) ON DELETE RESTRICT — historical findings outlive their host references (spec C-06). - Indexes: by (host_id, status) for current-fleet queries; by (host_id, rule_id, occurred_at DESC) for point-in-time temporal queries; by scan_id for idempotency check. Audit events Added two new codes to events.yaml: finding.persisted one per transactions row (spec AC-09) writer.apply.failed per Apply-rollback (spec AC-15) Codegen produces audit.FindingPersisted and audit.WriterApplyFailed constants; events.gen.go grew from 96 → 98 events total. internal/transactionlog package types.go ApplyBatch, Result, Status / ChangeKind / FailureReason enums, sentinel errors, MaxEvidenceBytes (256 KiB cap). writer.go Writer.Apply: single-tx-per-call. Steps: 1. Validate every result (status, evidence size + shape). Spec AC-08 / AC-14 reject BEFORE any INSERT — atomic. 2. Idempotency: if any transactions row exists for the scan_id, no-op (spec AC-05). 3. BEGIN tx. 4. Per result: read prior host_rule_state, decide change_kind (first_seen / state_changed / severity_changed / none), INSERT transactions only on change, UPSERT host_rule_state with COALESCE-style last_changed_at preservation. 5. COMMIT. 6. emit finding.persisted per state-change AFTER commit (audit reflects what persisted, not what attempted). On any error: tx.Rollback + emit writer.apply.failed with classified reason (FK / deadlock / oversize / sqlc). source_test.go (AC-12, AC-13) AC-12: walks every .go and .sql file under app/ asserting no scan_baselines / ScanBaseline references — the Python-era baselines table is explicitly dropped. AC-13: AST-parses every internal/transactionlog .go file asserting no database/sql import and no .Exec/.Query/.QueryRow whose SQL arg uses fmt.Sprintf or string concatenation. writer_test.go (AC-01 through AC-11, AC-14, AC-15) 16 sub-tests covering the writer behavior end-to-end against real Postgres: AC-01 pg_stat_database.xact_commit delta < 10 after 50-rule Apply AC-02 N first_seen rows on first scan AC-03 identical rescan = 0 new transactions, check_count++ AC-04 one flip pass→fail = exactly 1 state_changed row AC-05 same scan_id replay = no-op AC-06 FK violation rolls back the whole batch (zero rows persist) AC-07 DELETE hosts with extant transactions fails (ON DELETE RESTRICT) AC-08 non-JSON-object evidence rejected (table-driven over 4 cases) AC-09 finding.persisted emission count = transactions row count AC-10 1000-rule Apply ≤ 2 seconds wall-clock AC-11 50 concurrent Applys against distinct hosts complete AC-14 oversize evidence rejected BEFORE INSERT; writer.apply.failed audit emitted with reason=evidence_oversize AC-15 FK violation emits writer.apply.failed with reason=fk_violation and detail.rule_count_attempted populated Local validation go build ./internal/transactionlog/: clean go vet ./internal/transactionlog/: clean go test -race ./internal/transactionlog/ (unit + integration with real Postgres + migrations 0001-0012): 16 sub-tests pass specter coverage: system-transaction-log-writer 15/15 = 100% Architectural choices worth flagging - Atomicity: validation phase runs BEFORE BEGIN, so oversize-evidence rejection is genuinely zero-INSERT (no rollback needed). - Pre-commit pending audit emissions: scan.completed / finding.persisted fire only AFTER tx.Commit succeeds, so the audit log truly reflects persisted state. - Evidence schema check: minimal "must be JSON object" gate today; full KensaEvidence-schema validation slots into validateResult when the OpenAPI components.schemas.KensaEvidence shape lands. Slice B.1 trunk status B.1a scheduler PR #418 — 15/15 ACs, ready for review B.1b kensa-executor PR #419 — 16/16 ACs, ready for review B.1c transaction-log-writer this PR — 15/15 ACs Total Slice B.1: 46 ACs covered across 3 specs. Ready to move on to B.2 (liveness loop + drift detector) once these merge.
remyluslosius
added a commit
that referenced
this pull request
May 29, 2026
…, 100%) (#420) * feat(transactionlog): B.1c — system-transaction-log-writer (15/15 ACs, 100%) Closes Slice B.1 trunk. Compliance write-on-change persistence for the Kensa-executor pipeline, complete with all 15 acceptance criteria. Spec Promoted system-transaction-log-writer from draft → approved. 15 ACs identical to PR #415's draft. Migration 0011_host_compliance_schedule.sql Copy of B.1a's migration. Identical content; goose treats duplicate identical migrations as no-ops when both B.1a (#418) and this PR merge. Migration 0012_transaction_log.sql - host_rule_state: ONE row per (host, rule). Current state, UPSERTed every Apply. Status CHECK constraint enforces the closed enum (pass/fail/skipped/error). - transactions: append-only state-change log. UNIQUE(scan_id, rule_id) enforces idempotency at the schema level (spec C-04). - Both tables FK to hosts(id) ON DELETE RESTRICT — historical findings outlive their host references (spec C-06). - Indexes: by (host_id, status) for current-fleet queries; by (host_id, rule_id, occurred_at DESC) for point-in-time temporal queries; by scan_id for idempotency check. Audit events Added two new codes to events.yaml: finding.persisted one per transactions row (spec AC-09) writer.apply.failed per Apply-rollback (spec AC-15) Codegen produces audit.FindingPersisted and audit.WriterApplyFailed constants; events.gen.go grew from 96 → 98 events total. internal/transactionlog package types.go ApplyBatch, Result, Status / ChangeKind / FailureReason enums, sentinel errors, MaxEvidenceBytes (256 KiB cap). writer.go Writer.Apply: single-tx-per-call. Steps: 1. Validate every result (status, evidence size + shape). Spec AC-08 / AC-14 reject BEFORE any INSERT — atomic. 2. Idempotency: if any transactions row exists for the scan_id, no-op (spec AC-05). 3. BEGIN tx. 4. Per result: read prior host_rule_state, decide change_kind (first_seen / state_changed / severity_changed / none), INSERT transactions only on change, UPSERT host_rule_state with COALESCE-style last_changed_at preservation. 5. COMMIT. 6. emit finding.persisted per state-change AFTER commit (audit reflects what persisted, not what attempted). On any error: tx.Rollback + emit writer.apply.failed with classified reason (FK / deadlock / oversize / sqlc). source_test.go (AC-12, AC-13) AC-12: walks every .go and .sql file under app/ asserting no scan_baselines / ScanBaseline references — the Python-era baselines table is explicitly dropped. AC-13: AST-parses every internal/transactionlog .go file asserting no database/sql import and no .Exec/.Query/.QueryRow whose SQL arg uses fmt.Sprintf or string concatenation. writer_test.go (AC-01 through AC-11, AC-14, AC-15) 16 sub-tests covering the writer behavior end-to-end against real Postgres: AC-01 pg_stat_database.xact_commit delta < 10 after 50-rule Apply AC-02 N first_seen rows on first scan AC-03 identical rescan = 0 new transactions, check_count++ AC-04 one flip pass→fail = exactly 1 state_changed row AC-05 same scan_id replay = no-op AC-06 FK violation rolls back the whole batch (zero rows persist) AC-07 DELETE hosts with extant transactions fails (ON DELETE RESTRICT) AC-08 non-JSON-object evidence rejected (table-driven over 4 cases) AC-09 finding.persisted emission count = transactions row count AC-10 1000-rule Apply ≤ 2 seconds wall-clock AC-11 50 concurrent Applys against distinct hosts complete AC-14 oversize evidence rejected BEFORE INSERT; writer.apply.failed audit emitted with reason=evidence_oversize AC-15 FK violation emits writer.apply.failed with reason=fk_violation and detail.rule_count_attempted populated Local validation go build ./internal/transactionlog/: clean go vet ./internal/transactionlog/: clean go test -race ./internal/transactionlog/ (unit + integration with real Postgres + migrations 0001-0012): 16 sub-tests pass specter coverage: system-transaction-log-writer 15/15 = 100% Architectural choices worth flagging - Atomicity: validation phase runs BEFORE BEGIN, so oversize-evidence rejection is genuinely zero-INSERT (no rollback needed). - Pre-commit pending audit emissions: scan.completed / finding.persisted fire only AFTER tx.Commit succeeds, so the audit log truly reflects persisted state. - Evidence schema check: minimal "must be JSON object" gate today; full KensaEvidence-schema validation slots into validateResult when the OpenAPI components.schemas.KensaEvidence shape lands. Slice B.1 trunk status B.1a scheduler PR #418 — 15/15 ACs, ready for review B.1b kensa-executor PR #419 — 16/16 ACs, ready for review B.1c transaction-log-writer this PR — 15/15 ACs Total Slice B.1: 46 ACs covered across 3 specs. Ready to move on to B.2 (liveness loop + drift detector) once these merge. * fix(transactionlog): make lint clean — remove dead helper + empty branch + gofmt
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three draft Specter specs for Slice B's trunk wave — the end-to-end path that takes a scheduled scan through Kensa execution and into the write-on-change transaction log.
All three:
status: draft,tier: 1(100% coverage target once implementation lands).Specs
schedulespolicy, policy_version snapshotted at enqueue, 48h max interval cap, maintenance mode honored, manual scans bypass scheduleKensaEvidenceOpenAPI schema. Drops Python'sscan_baselinestable — the prior transactions row IS the baseline.OpenAPI delta
Scan.policy_versionfield added (snapshotted policy version at enqueue). The existingScan.initiatorobject already carries the scheduler-vs-manual distinction — no new endpoint needed.Coverage gate — expected red
All 33 specs parse cleanly and pass structural checks (
specter check). Coverage report shows the 3 new drafts at 0% — by design, no tests exist yet. Implementation lands in per-component follow-up PRs:This PR is opened as draft. Merging strategy is the open question for review (see below).
Open for review
includes/excludes, priority assignments.a. Merge this spec PR first, accept temporary coverage red on main, then land impl PRs one at a time.
b. Hold this PR as draft; merge each spec+impl together as a single PR per component.
c. Roll everything into one mega-PR (spec + all three impls + tests).
I lean (b) — keeps main green at all times, cost is spec review happens alongside impl review.
policy_versionlocation — currently a top-level Scan field. Could nest underScan.initiator.policy_versionif you prefer.Test plan
specter parse— all 33 specs validspecter check— all constraints referenced, no orphans