Skip to content

feat: implement operation state event log with JetStream and Postgres projection#279

Merged
l50 merged 5 commits into
mainfrom
worktree-opstate-eventlog
May 12, 2026
Merged

feat: implement operation state event log with JetStream and Postgres projection#279
l50 merged 5 commits into
mainfrom
worktree-opstate-eventlog

Conversation

@l50
Copy link
Copy Markdown
Contributor

@l50 l50 commented May 12, 2026

Key Changes:

  • Introduced a durable JetStream-backed operation state event log (ARES_OPSTATE)
  • All orchestrator state mutations now emit structured events to the event log
  • Added Postgres projector consumer to upsert events into the archive in real-time
  • Implemented replay and forensics tooling to reconstruct operation state from the event log

Added:

  • OpStateEvent and OpStateEventPayload types to model granular state mutations in ares-core/src/models/op_state_event.rs
  • OpStateRecorder abstraction for event log sinks (NATS, capturing, disabled) in ares-core/src/op_state_log.rs
  • JetStream stream and subject builders for ARES_OPSTATE event log in ares-core/src/nats.rs
  • Postgres projector consumer for streaming event log to database in ares-core/src/persistent_store/projector.rs
  • ares ops replay command and replay logic to build point-in-time state snapshots from the event log (ares-cli/src/cli/ops.rs, ares-cli/src/ops/replay.rs, ares-cli/src/orchestrator/state/replay.rs)
  • Test helpers and capturing logic for asserting event emission in orchestrator state unit tests

Changed:

  • Orchestrator state publishing methods (publish_credential, publish_hash, publish_user, publish_vulnerability, publish_host, etc.) now emit events to the op-state log after successful Redis writes (Phase 2 dual-write)
  • Orchestrator startup now installs a NATS-backed event recorder and can optionally replay state from JetStream instead of Redis (opt-in via ARES_USE_EVENT_LOG_REPLAY=1)
  • Postgres archive is now kept current by the projector consumer, replacing the need for manual ops offload batch jobs
  • Updated orchestrator state logic and tests to use the new event log for deduplication, auditing, and forensics

l50 added 5 commits May 12, 2026 14:09
…sh API

**Added:**

- Introduced `op_state_event.rs` defining `OpStateEvent` and `OpStateEventPayload` for operation state mutations with subject hierarchy, JSON serialization, and deduplication via event IDs
- Added export and module wiring for `OpStateEvent` and `OpStateEventPayload` in `models/mod.rs`
- Added new NATS subject prefix and stream constants for operation state events (`OP_STATE_SUBJECT_PREFIX`, `OP_STATE_STREAM`)
- Implemented subject builders `op_state_subject` and `op_state_filter_for_op` for granular or wildcard subscription to operation state events
- Created `StreamSpec::op_state` for configuring the durable `ARES_OPSTATE` stream with 30-day retention, `Limits` policy, and file storage
- Added `NatsBroker::publish_op_state_event` for publishing op-state events with deduplication and optional optimistic concurrency control
- Defined `OpStatePublishError` and error classification for publish failures and concurrency conflicts
- Added comprehensive tests for subject formatting, stream configuration, and subject hierarchy disjointness in `nats.rs`
- Added unit tests for event construction, JSON serialization, and subject suffix logic in `op_state_event.rs`

**Changed:**

- Updated `VulnerabilityInfo` to derive `PartialEq` for use in event payloads and tests
- Refactored `NatsBroker::ensure_streams` to include the new op-state stream
**Added:**

- Introduced `OpStateRecorder` abstraction to emit operation state events to a NATS-backed JetStream log or in-memory buffer for tests (`ares-core/src/op_state_log.rs`)
- Implemented `emit_op_state` utility to handle event emission and error logging for all publish sites
- Emitted op-state events for credential, hash, user, vulnerability, exploited-vuln, host, and timeline event publishers in orchestrator state modules
- Provided capturing test recorders and comprehensive tests verifying event emission, deduplication, and disabled behavior

**Changed:**

- Updated `SharedState` to hold an `OpStateRecorder` and allow installing or replacing the recorder at runtime (`set_recorder`, `with_recorder`)
- Modified orchestrator modules to dual-write op-state events when a recorder is active, preserving Redis as authoritative for now
- Updated orchestrator state publishing and dedup logic to use new event emission mechanism after successful writes
- Extended test coverage to assert correct event emission across all entity publishers

**Removed:**

- Legacy event logging stubs in orchestrator modules now handled by the new dual-write mechanism
**Added:**

- Introduced `OpStateProjector` for syncing `ARES_OPSTATE` events to Postgres, ensuring the archive stays current by tailing JetStream and upserting events into relevant tables
- Added durable JetStream consumer configuration and logic for idempotent event application using existing unique constraints
- Implemented error handling, schema migration, and conditional projector startup based on NATS and database availability
- Exported `OpStateProjector` and `PROJECTOR_CONSUMER_NAME` from `persistent_store` module
- Included unit tests for utility functions and projector consumer stability

**Changed:**

- Updated orchestrator to spawn the Postgres projector consumer when both NATS and a database URL are available
- Added `debug` import to orchestrator for enhanced logging during projector initialization
**Added:**

- Introduced `SharedState::load_from_event_log` to replay operation state from JetStream event log as an opt-in startup path, controlled by `ARES_USE_EVENT_LOG_REPLAY`
- Added pure function `apply_event_to_state` for mutating state from `OpStateEvent` variants, supporting event log replay and tests
- Created `replay.rs` module with implementation and comprehensive tests for event replay logic
- Registered `replay` module in orchestrator state mod for inclusion in build

**Changed:**

- Updated orchestrator startup to conditionally replay from JetStream event log before falling back to Redis, preserving default behavior unless opt-in environment variable is set
**Added:**

- Introduced `Replay` command to `OpsCommands` for replaying operation state event logs to reconstruct point-in-time snapshots, with options for cutoff by timestamp, event count, and JSON output
- Added `ops/replay.rs` implementing the `ops_replay` function to connect to NATS, fetch and apply event logs, and print human or JSON summaries of reconstructed operation state
- Added `ReplaySnapshot` and `ReplayCutoff` types to `orchestrator/state/replay.rs` for lightweight, serializable operation state snapshots and flexible replay stopping conditions
- Implemented event application and cutoff logic, as well as tests for `ReplaySnapshot` and cutoff behavior

**Changed:**

- Exposed `state` and `state::replay` modules as public to support replay tooling
- Integrated `Replay` command handling into `run_ops` to invoke the new replay functionality when requested

**Removed:**

- Made internal replay module public, replacing previous private visibility to allow CLI access
@codecov
Copy link
Copy Markdown

codecov Bot commented May 12, 2026

Codecov Report

❌ Patch coverage is 64.32326% with 543 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.85%. Comparing base (2cb9af0) to head (5825087).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
ares-core/src/persistent_store/projector.rs 11.41% 287 Missing ⚠️
ares-cli/src/orchestrator/state/replay.rs 76.30% 109 Missing ⚠️
ares-cli/src/orchestrator/mod.rs 0.00% 47 Missing ⚠️
ares-cli/src/ops/replay.rs 0.00% 46 Missing ⚠️
ares-core/src/nats.rs 65.71% 36 Missing ⚠️
ares-core/src/op_state_log.rs 90.32% 9 Missing ⚠️
ares-cli/src/ops/mod.rs 0.00% 5 Missing ⚠️
ares-cli/src/orchestrator/state/shared.rs 76.92% 3 Missing ⚠️
ares-cli/src/orchestrator/state/publishing/mod.rs 92.30% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #279      +/-   ##
==========================================
- Coverage   76.06%   75.85%   -0.21%     
==========================================
  Files         387      392       +5     
  Lines       84347    85859    +1512     
==========================================
+ Hits        64157    65131     +974     
- Misses      20190    20728     +538     
Files with missing lines Coverage Δ
ares-cli/src/orchestrator/state/dedup.rs 100.00% <100.00%> (ø)
...i/src/orchestrator/state/publishing/credentials.rs 86.59% <100.00%> (+3.21%) ⬆️
...-cli/src/orchestrator/state/publishing/entities.rs 98.29% <100.00%> (+0.33%) ⬆️
...res-cli/src/orchestrator/state/publishing/hosts.rs 96.01% <100.00%> (+0.38%) ⬆️
ares-core/src/models/mod.rs 100.00% <ø> (ø)
ares-core/src/models/op_state_event.rs 100.00% <100.00%> (ø)
ares-core/src/models/task.rs 100.00% <ø> (ø)
ares-cli/src/orchestrator/state/publishing/mod.rs 97.08% <92.30%> (-0.28%) ⬇️
ares-cli/src/orchestrator/state/shared.rs 96.48% <76.92%> (-1.37%) ⬇️
ares-cli/src/ops/mod.rs 0.00% <0.00%> (ø)
... and 6 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@l50 l50 merged commit f59bb9e into main May 12, 2026
12 checks passed
@l50 l50 deleted the worktree-opstate-eventlog branch May 12, 2026 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant