Skip to content

fix(packaging): remediation broken under hardened unit — pin Kensa store to writable tree (PKG-3)#673

Merged
remyluslosius merged 1 commit into
mainfrom
fix/pkg3-kensa-store-path
Jun 25, 2026
Merged

fix(packaging): remediation broken under hardened unit — pin Kensa store to writable tree (PKG-3)#673
remyluslosius merged 1 commit into
mainfrom
fix/pkg3-kensa-store-path

Conversation

@remyluslosius

Copy link
Copy Markdown
Contributor

Problem (regressed in rc.14)

On every packaged install, remediation/rollback fails with kensa: remediate path not wired; the boot log shows kensa remediation wiring unavailable / error=kensa: compose remediation service: …. Scans work, which masks it.

Root cause: packaging/common/openwatch.service runs with ProtectSystem=strict + ReadWritePaths=/var/lib/openwatch /var/log/openwatch, but (a) sets no WorkingDirectory (systemd defaults it to the read-only /) and (b) never sets OPENWATCH_KENSA_STORE_PATH. So kensaStorePath() (cmd/openwatch/main.go:768) falls back to .kensa/remediation.db/.kensa/remediation.db, and Kensa's OpenSQLite MkdirAll fails on the read-only root. The remediation path composes the full Kensa with a durable SQLite rollback-pre-state store; the scan path composes a store-less engine, which is why scans are unaffected.

Fix

Environment=OPENWATCH_KENSA_STORE_PATH=/var/lib/openwatch/kensa/remediation.db
WorkingDirectory=/var/lib/openwatch

The store path is inside the writable ReadWritePaths tree (owned by the openwatch user); Kensa's MkdirAll creates the kensa/ subdir. WorkingDirectory is belt-and-suspenders so the bare default can't land on the read-only root if the env var is ever dropped.

Tests

  • New spec C-13 / AC-23 (specs/release/package-build.spec.yaml) + source-inspection regression test TestUnit_KensaStorePathIsWritable asserting the unit sets the store path under /var/lib/openwatch and a writable WorkingDirectory.
  • go test ./packaging/tests/ green; specter check 113 specs; coverage 100%.

Operator workaround (no upgrade needed)

systemctl edit openwatch[Service]\nEnvironment=OPENWATCH_KENSA_STORE_PATH=/var/lib/openwatch/kensa/remediation.dbsystemctl restart openwatch.

Worth an rc.15 — it breaks all remediation on hardened packaged installs.

…ee (PKG-3)

The hardened systemd unit sets ProtectSystem=strict with
ReadWritePaths=/var/lib/openwatch but set no WorkingDirectory and never set
OPENWATCH_KENSA_STORE_PATH. So kensaStorePath() fell back to
.kensa/remediation.db relative to the working dir, which systemd defaults to
the read-only /. Kensa's OpenSQLite MkdirAll then failed, the remediation
service failed to compose at boot, and every remediation/rollback returned
'kensa: remediate path not wired' while scans (store-less engine) kept working.

Set OPENWATCH_KENSA_STORE_PATH=/var/lib/openwatch/kensa/remediation.db (inside
ReadWritePaths) and WorkingDirectory=/var/lib/openwatch as defense-in-depth.
Spec C-13/AC-23 + a source-inspection regression test backstop it.

Fixes remediation on every hardened packaged install (regressed by rc.14).
@remyluslosius remyluslosius merged commit f139379 into main Jun 25, 2026
21 checks passed
@remyluslosius remyluslosius deleted the fix/pkg3-kensa-store-path branch June 25, 2026 13:30
remyluslosius added a commit that referenced this pull request Jun 25, 2026
…n timeouts) as P1 (#674)

PKG-3: hardened systemd unit leaves the Kensa rollback store unwritable ->
remediation 'not wired' on every packaged install (fix in #673).
AUTH-1: idle + absolute session timeouts not effectively enforced for the
browser (polling slides the window; refresh-cookie re-mints sessions; no
client-side idle timer). Layered fix; client idle timer is slice 1 (in progress).
remyluslosius added a commit that referenced this pull request Jun 25, 2026
* docs: session meta — CHANGELOG, SESSION_LOG (2026-06-25), STATUS.md

Document the 2026-06-25 session's in-flight work (PKG-3 #673, AUTH-1 #675/#678,
notifications Slice 1 #679, avg-compliance #676): CHANGELOG [Unreleased]
entries, a SESSION_LOG handoff entry, and a new STATUS.md one-page snapshot.
BACKLOG findings from the security review land in a follow-up commit.

* docs(guides): truthfulness fixes from the 2026-06-25 audit + BACKLOG DOC-3

High-impact, verified guide defects fixed:
- UPGRADE/QUICKSTART/ENVIRONMENT/MONITORING: --config is a GLOBAL flag (Go flag
  parsing stops at the first non-flag arg), so 'openwatch migrate --config X'
  silently ignored --config. Moved --config before the subcommand everywhere.
- COMPLIANCE_CONTROLS: removed the invented 'analyst' role + 'three-tier role
  model' (real: 5 roles — viewer/auditor/ops_lead/security_admin/admin) and the
  fabricated '100/min per user, 1000/min per IP' rate-limit (real: per-IP
  sliding window on the auth endpoints).
- API_GUIDE: the 'not yet in the API' section was almost entirely false (scans,
  remediation, exceptions, posture/drift, audit export, rule browser all ship);
  rewrote it to list the live surface + only the genuinely-absent /metrics and
  /security-info. Added the missing ops_lead role to the role table.
- Version sweep rc.13 -> rc.14; bumped Last Updated to 2026-06-25 on edited guides.

BACKLOG DOC-3 captures the remaining audit items (SCANNING dead-endpoint
appendix, USER_ROLES matrix, INSTALLATION PG-dep, DATABASE_MIGRATIONS fake
output, style sweep) and flags the audit's '538->539' suggestion as a FALSE
POSITIVE — rc.14 bundles Kensa v0.6.0 = 538 (the guides correctly say 538).

* docs(backlog): note the gating TestApply_1000Rules_Under2Seconds perf flake

It hard-asserts 2s and gated #676's CI under -race (passed on rerun); it missed
the 2026-06-21 perftest.Budgetf() migration the other perf tests got.
remyluslosius added a commit that referenced this pull request Jun 25, 2026
Bump version.env to 0.2.0-rc.15 and cut the CHANGELOG [Unreleased] accumulator
into a dated [0.2.0-rc.15] section covering everything landed since rc.14:
- PKG-3 remediation store-path fix (production-breaking) (#673)
- AUTH-1: client idle timer (#675) + absolute-timeout ceiling +
  slide-on-user-activity (#678)
- Notifications Slice 1: durable change-driven bell (#679)
- Avg-compliance parity /hosts <-> /dashboard (#676)

Local: changelog + version-consistency + fips + package-build tests pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant