Skip to content

feat(liveness): B.2a — system-liveness-loop (15/15 ACs, 100%)#421

Merged
remyluslosius merged 2 commits into
mainfrom
feat/slice-b-b2a-liveness
May 29, 2026
Merged

feat(liveness): B.2a — system-liveness-loop (15/15 ACs, 100%)#421
remyluslosius merged 2 commits into
mainfrom
feat/slice-b-b2a-liveness

Conversation

@remyluslosius
Copy link
Copy Markdown
Contributor

Summary

Slice B.2a — system-liveness-loop implementation. Opens Slice B.2 (awareness layer). Credential-free TCP-banner probe loop with hysteresis on state transitions, deterministic per-host jitter, and concurrency guard.

Coverage 15/15 ACs = 100%
Tests 25 sub-tests under -race (pure logic + probe + integration + source-inspection)
LOC 1,767 across 11 files

What landed

Component Purpose
app/specs/system/liveness-loop.spec.yaml New spec (15 ACs, 9 constraints), status: approved
Migration 0013 host_liveness table, FK ON DELETE CASCADE, partial index on unreachable rows
internal/liveness/ Package: types, jitter, probe (raw TCP+banner), service, metrics

Architectural choices locked

  • Credential-free probe — raw net.DialTimeout + 256-byte banner read; never imports internal/credential or golang.org/x/crypto/ssh. AC-14 source-inspection enforces project-wide
  • TCP-banner over ICMP — port 22 banner is the actual signal that matters; ICMP often blocked
  • Hysteresis — single transient failure doesn't flip status; 2 consecutive failures required (configurable). Avoids alert noise from network blips
  • Deterministic jitter — FNV-1a hash of hostID produces stable ±20% offset; same host always lands at the same place in the cadence, important for diagnosability
  • Per-host concurrency guardsync.Map of in-flight host IDs; second concurrent probe of the same host returns ErrProbeInFlight without invoking probe function
  • Audit on transitions onlyhost.connectivity.checked fires on first-seen + state flips; steady-state probes don't audit. Keeps audit volume bounded for stable fleets
  • ProbeFunc seam — production wires Probe; tests inject fakes; future Kensa.Reachable() swap is mechanical

ACs satisfied

AC Mechanism
AC-01 Probe completes within timeout, credential-free
AC-02 TCP refused/timeout produce classified LastErrorType()
AC-03 SSH-2.0- banner → Reachable=true
AC-04 HTTP banner on port 22 → Reachable=false, BannerSeen=true
AC-05 Concurrency guard; second concurrent returns ErrProbeInFlight
AC-06 100 parallel distinct-host probes race-clean
AC-07 ApplyJitter deterministic + ±20% bounded + no collisions
AC-08 ClampInterval [60s, 60min] floor/ceiling; zero → 5min default
AC-09 First success: unknown → reachable + audit
AC-10 First failure from reachable: counter=1, no transition
AC-11 2nd consecutive failure: flips unreachable + audit
AC-12 Success after unreachable: flips back + counter reset
AC-13 Migration creates host_liveness; ON DELETE CASCADE verified
AC-14 Source-inspection: no internal/credential import, no crypto/ssh, no ParsePrivateKey
AC-15 Metrics round-trip under concurrent increments

Local validation

  • go build ./internal/liveness/ — clean
  • go vet ./internal/liveness/ — clean
  • go test -race ./internal/liveness/ — 25 sub-tests pass
  • specter coverage — system-liveness-loop 15/15 = 100%

Slice B.2 status

B.2a liveness loop This PR — 15/15 ACs
B.2b drift detector Pending (next chunk)

Relationship to other PRs

@remyluslosius remyluslosius enabled auto-merge (squash) May 29, 2026 05:32
@remyluslosius remyluslosius force-pushed the feat/slice-b-b2a-liveness branch from fbeecae to d5000e9 Compare May 29, 2026 12:29
remyluslosius added a commit that referenced this pull request May 29, 2026
…100%)

Closes Slice B.2 (awareness layer): B.2a liveness loop + B.2b drift
detector. Pure consumer of the B.1c transaction log; classifies
per-host compliance drift against operator-tunable thresholds.

Spec
  New: app/specs/system/drift-detector.spec.yaml (status: approved).
  14 ACs across 8 constraints.

Migrations 0011 + 0012 (copies from B.1a/B.1c branches, identical content)
  Required for the integration tests — host_rule_state and transactions
  tables are the detector's primary read source. goose treats identical
  duplicate migrations as no-ops when multiple PRs merge.

internal/drift package
  doc.go     Architectural choices: pure consumer of transactions, no
             baselines table (Python-era artifact explicitly dropped),
             percentage-point math, single read transaction.

  types.go   DriftKind closed enum (stable / minor_worsening /
             major_worsening / improvement). Thresholds struct with
             defaults major=10pp, minor=5pp, improvement=5pp.
             ValidateThresholds enforces (0, 100] range + major >= minor.
             DriftReport with per-severity transition counts
             (critical/high/medium/low × became_failing/became_passing).

  classify.go Pure function:
                delta := current - prior
                delta >= ImprovementPP        → DriftImprovement
                delta <= -MajorWorseningPP    → DriftMajorWorsening
                delta <= -MinorWorseningPP    → DriftMinorWorsening
                otherwise                     → DriftStable
              ComplianceScore(passed, failed) excludes skipped from
              denominator (passed / (passed + failed)) × 100.

  service.go Service.DetectForScan reads prior + current scores under
             a single read transaction. Prior score is reconstructed
             by inverting this scan's transactions (state_changed flips,
             first_seen removed from prior). Per-severity counts come
             from the transactions table filtered to (state_changed,
             first_seen). Emission gates on Kind != DriftStable;
             stable scans produce zero audits (no-noise principle).

ACs covered (14 of 14)
  AC-01  Classify(80, 70) at default thresholds → DriftMajorWorsening
  AC-02  Classify(80, 76) → DriftStable (4pp below minor)
  AC-03  Classify(80, 75) → DriftMinorWorsening (5pp = minor)
  AC-04  Classify(70, 78) → DriftImprovement (8pp gain)
  AC-05  Classify(80, 82) → DriftStable (2pp swing)
  AC-06  Different thresholds produce different kinds for same delta
  AC-07  ValidateThresholds rejects (0, 100] violators and major < minor
  AC-08  First-ever scan (all first_seen) → DriftStable, HasPriorBaseline=false
  AC-09  Per-severity transition counts populated from transactions
  AC-10  Major worsening emits one compliance.drift.detected with
         drift_type="major" + negative score_delta
  AC-11  Stable scans emit zero compliance.drift.detected
  AC-12  Source-inspection: no scan_baselines / ScanBaseline references
  AC-13  DriftKind enum has exactly 4 values; AllDriftKinds lists them
  AC-14  ComplianceScore excludes skipped (denominator = pass + fail)

Local validation
  go build ./internal/drift/         clean
  go vet ./internal/drift/           clean
  go test -race ./internal/drift/    16 sub-tests pass against real
    Postgres + migrations 0001-0012
  specter coverage system-drift-detector  14/14 = 100%

Architectural choices worth flagging
  - Pure-function classifier. Same (prior, current, thresholds) →
    same DriftKind, deterministic and trivially testable without a
    database. The 7 Classify tests run in <1ms total.
  - Read-only transaction wraps prior+current score computation
    so a concurrent writer cannot produce a torn view (spec C-06).
  - DriftReport carries the per-severity transition counts so the
    B.3 alert router can route by severity without re-querying the
    transactions table.
  - Audit emits only on non-stable kinds. Stable scans (delta below
    minor threshold) produce zero audit traffic. Spec C-04.

Slice B.2 status
  B.2a liveness loop      PR #421 — 15/15 ACs, ready for review
  B.2b drift detector     this PR — 14/14 ACs

B.2 trunk done: 29 ACs across 2 specs. Ready for B.3 (event bus +
alert router) once these merge.
@remyluslosius remyluslosius force-pushed the feat/slice-b-b2a-liveness branch from d5000e9 to a125c11 Compare May 29, 2026 12:39
remyluslosius added a commit that referenced this pull request May 29, 2026
…100%)

Closes Slice B.2 (awareness layer): B.2a liveness loop + B.2b drift
detector. Pure consumer of the B.1c transaction log; classifies
per-host compliance drift against operator-tunable thresholds.

Spec
  New: app/specs/system/drift-detector.spec.yaml (status: approved).
  14 ACs across 8 constraints.

Migrations 0011 + 0012 (copies from B.1a/B.1c branches, identical content)
  Required for the integration tests — host_rule_state and transactions
  tables are the detector's primary read source. goose treats identical
  duplicate migrations as no-ops when multiple PRs merge.

internal/drift package
  doc.go     Architectural choices: pure consumer of transactions, no
             baselines table (Python-era artifact explicitly dropped),
             percentage-point math, single read transaction.

  types.go   DriftKind closed enum (stable / minor_worsening /
             major_worsening / improvement). Thresholds struct with
             defaults major=10pp, minor=5pp, improvement=5pp.
             ValidateThresholds enforces (0, 100] range + major >= minor.
             DriftReport with per-severity transition counts
             (critical/high/medium/low × became_failing/became_passing).

  classify.go Pure function:
                delta := current - prior
                delta >= ImprovementPP        → DriftImprovement
                delta <= -MajorWorseningPP    → DriftMajorWorsening
                delta <= -MinorWorseningPP    → DriftMinorWorsening
                otherwise                     → DriftStable
              ComplianceScore(passed, failed) excludes skipped from
              denominator (passed / (passed + failed)) × 100.

  service.go Service.DetectForScan reads prior + current scores under
             a single read transaction. Prior score is reconstructed
             by inverting this scan's transactions (state_changed flips,
             first_seen removed from prior). Per-severity counts come
             from the transactions table filtered to (state_changed,
             first_seen). Emission gates on Kind != DriftStable;
             stable scans produce zero audits (no-noise principle).

ACs covered (14 of 14)
  AC-01  Classify(80, 70) at default thresholds → DriftMajorWorsening
  AC-02  Classify(80, 76) → DriftStable (4pp below minor)
  AC-03  Classify(80, 75) → DriftMinorWorsening (5pp = minor)
  AC-04  Classify(70, 78) → DriftImprovement (8pp gain)
  AC-05  Classify(80, 82) → DriftStable (2pp swing)
  AC-06  Different thresholds produce different kinds for same delta
  AC-07  ValidateThresholds rejects (0, 100] violators and major < minor
  AC-08  First-ever scan (all first_seen) → DriftStable, HasPriorBaseline=false
  AC-09  Per-severity transition counts populated from transactions
  AC-10  Major worsening emits one compliance.drift.detected with
         drift_type="major" + negative score_delta
  AC-11  Stable scans emit zero compliance.drift.detected
  AC-12  Source-inspection: no scan_baselines / ScanBaseline references
  AC-13  DriftKind enum has exactly 4 values; AllDriftKinds lists them
  AC-14  ComplianceScore excludes skipped (denominator = pass + fail)

Local validation
  go build ./internal/drift/         clean
  go vet ./internal/drift/           clean
  go test -race ./internal/drift/    16 sub-tests pass against real
    Postgres + migrations 0001-0012
  specter coverage system-drift-detector  14/14 = 100%

Architectural choices worth flagging
  - Pure-function classifier. Same (prior, current, thresholds) →
    same DriftKind, deterministic and trivially testable without a
    database. The 7 Classify tests run in <1ms total.
  - Read-only transaction wraps prior+current score computation
    so a concurrent writer cannot produce a torn view (spec C-06).
  - DriftReport carries the per-severity transition counts so the
    B.3 alert router can route by severity without re-querying the
    transactions table.
  - Audit emits only on non-stable kinds. Stable scans (delta below
    minor threshold) produce zero audit traffic. Spec C-04.

Slice B.2 status
  B.2a liveness loop      PR #421 — 15/15 ACs, ready for review
  B.2b drift detector     this PR — 14/14 ACs

B.2 trunk done: 29 ACs across 2 specs. Ready for B.3 (event bus +
alert router) once these merge.
@remyluslosius remyluslosius force-pushed the feat/slice-b-b2a-liveness branch from a125c11 to 9aa5e12 Compare May 29, 2026 12:50
remyluslosius added a commit that referenced this pull request May 29, 2026
…100%)

Closes Slice B.2 (awareness layer): B.2a liveness loop + B.2b drift
detector. Pure consumer of the B.1c transaction log; classifies
per-host compliance drift against operator-tunable thresholds.

Spec
  New: app/specs/system/drift-detector.spec.yaml (status: approved).
  14 ACs across 8 constraints.

Migrations 0011 + 0012 (copies from B.1a/B.1c branches, identical content)
  Required for the integration tests — host_rule_state and transactions
  tables are the detector's primary read source. goose treats identical
  duplicate migrations as no-ops when multiple PRs merge.

internal/drift package
  doc.go     Architectural choices: pure consumer of transactions, no
             baselines table (Python-era artifact explicitly dropped),
             percentage-point math, single read transaction.

  types.go   DriftKind closed enum (stable / minor_worsening /
             major_worsening / improvement). Thresholds struct with
             defaults major=10pp, minor=5pp, improvement=5pp.
             ValidateThresholds enforces (0, 100] range + major >= minor.
             DriftReport with per-severity transition counts
             (critical/high/medium/low × became_failing/became_passing).

  classify.go Pure function:
                delta := current - prior
                delta >= ImprovementPP        → DriftImprovement
                delta <= -MajorWorseningPP    → DriftMajorWorsening
                delta <= -MinorWorseningPP    → DriftMinorWorsening
                otherwise                     → DriftStable
              ComplianceScore(passed, failed) excludes skipped from
              denominator (passed / (passed + failed)) × 100.

  service.go Service.DetectForScan reads prior + current scores under
             a single read transaction. Prior score is reconstructed
             by inverting this scan's transactions (state_changed flips,
             first_seen removed from prior). Per-severity counts come
             from the transactions table filtered to (state_changed,
             first_seen). Emission gates on Kind != DriftStable;
             stable scans produce zero audits (no-noise principle).

ACs covered (14 of 14)
  AC-01  Classify(80, 70) at default thresholds → DriftMajorWorsening
  AC-02  Classify(80, 76) → DriftStable (4pp below minor)
  AC-03  Classify(80, 75) → DriftMinorWorsening (5pp = minor)
  AC-04  Classify(70, 78) → DriftImprovement (8pp gain)
  AC-05  Classify(80, 82) → DriftStable (2pp swing)
  AC-06  Different thresholds produce different kinds for same delta
  AC-07  ValidateThresholds rejects (0, 100] violators and major < minor
  AC-08  First-ever scan (all first_seen) → DriftStable, HasPriorBaseline=false
  AC-09  Per-severity transition counts populated from transactions
  AC-10  Major worsening emits one compliance.drift.detected with
         drift_type="major" + negative score_delta
  AC-11  Stable scans emit zero compliance.drift.detected
  AC-12  Source-inspection: no scan_baselines / ScanBaseline references
  AC-13  DriftKind enum has exactly 4 values; AllDriftKinds lists them
  AC-14  ComplianceScore excludes skipped (denominator = pass + fail)

Local validation
  go build ./internal/drift/         clean
  go vet ./internal/drift/           clean
  go test -race ./internal/drift/    16 sub-tests pass against real
    Postgres + migrations 0001-0012
  specter coverage system-drift-detector  14/14 = 100%

Architectural choices worth flagging
  - Pure-function classifier. Same (prior, current, thresholds) →
    same DriftKind, deterministic and trivially testable without a
    database. The 7 Classify tests run in <1ms total.
  - Read-only transaction wraps prior+current score computation
    so a concurrent writer cannot produce a torn view (spec C-06).
  - DriftReport carries the per-severity transition counts so the
    B.3 alert router can route by severity without re-querying the
    transactions table.
  - Audit emits only on non-stable kinds. Stable scans (delta below
    minor threshold) produce zero audit traffic. Spec C-04.

Slice B.2 status
  B.2a liveness loop      PR #421 — 15/15 ACs, ready for review
  B.2b drift detector     this PR — 14/14 ACs

B.2 trunk done: 29 ACs across 2 specs. Ready for B.3 (event bus +
alert router) once these merge.
remyluslosius added a commit that referenced this pull request May 29, 2026
* feat(drift): B.2b — system-drift-detector implementation (14/14 ACs, 100%)

Closes Slice B.2 (awareness layer): B.2a liveness loop + B.2b drift
detector. Pure consumer of the B.1c transaction log; classifies
per-host compliance drift against operator-tunable thresholds.

Spec
  New: app/specs/system/drift-detector.spec.yaml (status: approved).
  14 ACs across 8 constraints.

Migrations 0011 + 0012 (copies from B.1a/B.1c branches, identical content)
  Required for the integration tests — host_rule_state and transactions
  tables are the detector's primary read source. goose treats identical
  duplicate migrations as no-ops when multiple PRs merge.

internal/drift package
  doc.go     Architectural choices: pure consumer of transactions, no
             baselines table (Python-era artifact explicitly dropped),
             percentage-point math, single read transaction.

  types.go   DriftKind closed enum (stable / minor_worsening /
             major_worsening / improvement). Thresholds struct with
             defaults major=10pp, minor=5pp, improvement=5pp.
             ValidateThresholds enforces (0, 100] range + major >= minor.
             DriftReport with per-severity transition counts
             (critical/high/medium/low × became_failing/became_passing).

  classify.go Pure function:
                delta := current - prior
                delta >= ImprovementPP        → DriftImprovement
                delta <= -MajorWorseningPP    → DriftMajorWorsening
                delta <= -MinorWorseningPP    → DriftMinorWorsening
                otherwise                     → DriftStable
              ComplianceScore(passed, failed) excludes skipped from
              denominator (passed / (passed + failed)) × 100.

  service.go Service.DetectForScan reads prior + current scores under
             a single read transaction. Prior score is reconstructed
             by inverting this scan's transactions (state_changed flips,
             first_seen removed from prior). Per-severity counts come
             from the transactions table filtered to (state_changed,
             first_seen). Emission gates on Kind != DriftStable;
             stable scans produce zero audits (no-noise principle).

ACs covered (14 of 14)
  AC-01  Classify(80, 70) at default thresholds → DriftMajorWorsening
  AC-02  Classify(80, 76) → DriftStable (4pp below minor)
  AC-03  Classify(80, 75) → DriftMinorWorsening (5pp = minor)
  AC-04  Classify(70, 78) → DriftImprovement (8pp gain)
  AC-05  Classify(80, 82) → DriftStable (2pp swing)
  AC-06  Different thresholds produce different kinds for same delta
  AC-07  ValidateThresholds rejects (0, 100] violators and major < minor
  AC-08  First-ever scan (all first_seen) → DriftStable, HasPriorBaseline=false
  AC-09  Per-severity transition counts populated from transactions
  AC-10  Major worsening emits one compliance.drift.detected with
         drift_type="major" + negative score_delta
  AC-11  Stable scans emit zero compliance.drift.detected
  AC-12  Source-inspection: no scan_baselines / ScanBaseline references
  AC-13  DriftKind enum has exactly 4 values; AllDriftKinds lists them
  AC-14  ComplianceScore excludes skipped (denominator = pass + fail)

Local validation
  go build ./internal/drift/         clean
  go vet ./internal/drift/           clean
  go test -race ./internal/drift/    16 sub-tests pass against real
    Postgres + migrations 0001-0012
  specter coverage system-drift-detector  14/14 = 100%

Architectural choices worth flagging
  - Pure-function classifier. Same (prior, current, thresholds) →
    same DriftKind, deterministic and trivially testable without a
    database. The 7 Classify tests run in <1ms total.
  - Read-only transaction wraps prior+current score computation
    so a concurrent writer cannot produce a torn view (spec C-06).
  - DriftReport carries the per-severity transition counts so the
    B.3 alert router can route by severity without re-querying the
    transactions table.
  - Audit emits only on non-stable kinds. Stable scans (delta below
    minor threshold) produce zero audit traffic. Spec C-04.

Slice B.2 status
  B.2a liveness loop      PR #421 — 15/15 ACs, ready for review
  B.2b drift detector     this PR — 14/14 ACs

B.2 trunk done: 29 ACs across 2 specs. Ready for B.3 (event bus +
alert router) once these merge.

* fix(drift): rename Drift-prefixed exports — Kind/Report/TypeForAudit/AllKinds

Revive flagged DriftKind/DriftReport/DriftTypeForAudit/AllDriftKinds
as stutter in drift.X form. Renamed to remove the prefix; enum value
constants (DriftStable, DriftMinorWorsening, etc.) retain their
prefix since those are already package-scoped names per Go style.

No callers outside internal/drift to update — package is consumed
only by its own tests today.
@remyluslosius remyluslosius force-pushed the feat/slice-b-b2a-liveness branch 3 times, most recently from f9d5bf8 to 9d0f551 Compare May 29, 2026 13:14
…, 100%)

Opens Slice B.2 with the host-reachability probe loop. Credential-free
TCP-banner probe with hysteresis on state transition, deterministic
per-host jitter, concurrency guard, and host_liveness persistence.

Spec
  New: app/specs/system/liveness-loop.spec.yaml (status: approved).
  15 ACs across 9 constraints.

Migration 0013
  host_liveness table: PRIMARY KEY (host_id) FK → hosts(id) ON DELETE
    CASCADE; reachability_status with CHECK constraint
    ('reachable'|'unreachable'|'unknown'); consecutive_failures,
    last_response_ms, last_state_change_at, last_error_type, etc.
  Partial index on rows where status='unreachable' so the scheduler's
    dispatch path can quickly skip unreachable hosts.

internal/liveness package
  doc.go     Architectural choices: credential-free, TCP-banner over
             ICMP, hysteresis, jitter, audit on transitions only.

  types.go   Status enum + safety limits (5min default cadence, 60s
             floor, 60min ceiling, 5s probe timeout, 2-failure
             threshold, ±20% jitter, 256-byte max banner).
             ProbeResult struct + LastErrorType() classifier.
             ErrProbeInFlight sentinel.

  jitter.go  ApplyJitter (deterministic FNV-1a hash of hostID,
             produces value in [(1-jitter)×interval, (1+jitter)×interval]).
             ClampInterval (enforces [60s, 60min] safety range,
             defaults zero to 5min).

  probe.go   Probe(ctx, addr, timeout) - net.DialTimeout + 256-byte
             banner read with conn.SetReadDeadline. Reachable=true
             only when banner begins with "SSH-". Non-SSH banner
             (e.g. HTTP server on port 22) → Reachable=false with
             BannerSeen=true. No SSH handshake, no credentials.

  service.go Service struct with ProbeFunc seam (production uses
             Probe; tests inject controllable behavior; future
             Kensa.Reachable() swap is mechanical).
             ProbeHost(ctx, hostID, addr) → applies concurrency guard,
             calls probeFunc, persists via host_liveness UPSERT,
             emits host.connectivity.checked ONLY on state transitions
             (hysteresis: 2 consecutive failures before reachable→
             unreachable; immediate flip on unreachable→reachable).
             Metrics: probe_count, success/failure counts,
             state_transition_count, last_probe_at.

ACs covered (15 of 15)
  AC-01  Probe completes within timeout, credential-free
  AC-02  TCP refused/timeout produce classified failures
  AC-03  SSH-2.0-banner → Reachable=true with BannerSeen
  AC-04  Non-SSH banner → Reachable=false (BannerSeen=true)
  AC-05  Per-host concurrency guard → ErrProbeInFlight without
         invoking the probe function
  AC-06  100 parallel probes against distinct hosts race-clean
  AC-07  ApplyJitter deterministic + within ±20% + distinct hosts
         produce distinct values (no collisions across 100 hosts)
  AC-08  ClampInterval clamps [60s, 60min]; zero → 5min default
  AC-09  First success: unknown → reachable, audit emitted
  AC-10  First failure from reachable: counter=1, status stays
         reachable (hysteresis), no audit
  AC-11  N=2 consecutive failures: status flips to unreachable +
         audit
  AC-12  Success after unreachable: flips to reachable, counter=0,
         audit emitted
  AC-13  Migration 0013 creates host_liveness; ON DELETE CASCADE on
         hosts removes liveness row
  AC-14  Source-inspection: no internal/credential imports, no
         crypto/ssh imports, no ParsePrivateKey calls anywhere
  AC-15  Metrics struct round-trips through Snapshot under concurrent
         increments

Local validation
  go build ./internal/liveness/         clean
  go vet ./internal/liveness/           clean
  go test -race ./internal/liveness/    25 sub-tests pass against
    real Postgres + migrations 0001-0013
  specter coverage system-liveness-loop  15/15 = 100%

Architectural choices worth flagging
  - Probe via raw net.DialTimeout + banner read (not crypto/ssh) to
    keep the credential-free guarantee. Source-inspection AC-14 catches
    any future reach for the SSH library.
  - Jitter is deterministic from FNV-1a(hostID) so the same host always
    lands at the same offset within the cadence — important for
    diagnosability.
  - Hysteresis (default 2 consecutive failures before flipping
    reachable→unreachable) avoids alert noise from one-off network
    blips. Configurable via Service struct field.
  - Audit emission on state transitions ONLY. Steady-state probes
    don't audit. Keeps audit volume bounded for stable fleets.

Slice B.2 status
  B.2a liveness loop      this PR — 15/15 ACs
  B.2b drift detector     pending (next chunk)
@remyluslosius remyluslosius force-pushed the feat/slice-b-b2a-liveness branch from 9d0f551 to 0030fd0 Compare May 29, 2026 13:24
@remyluslosius remyluslosius merged commit bcee9d6 into main May 29, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant