Skip to content

WAL compaction for long-running orchestrations #563

@chernistry

Description

@chernistry

Description

For orchestrations running 3+ hours with thousands of ticks, the WAL grows unboundedly. Implement WAL compaction that: checkpoints committed entries, truncates the WAL to only uncommitted entries, and preserves a summary of compacted entries for audit. Different from ORCH-007 (WAL replay) and ORCH-019 (checkpoint/restore) -- this is about WAL size management.

Metadata

Field Value
Priority P1
Scope medium
Complexity medium
Role architect

Implementation Suggestions

  • Integrate with the existing merge queue (src/bernstein/core/merge_queue.py) and worktree management
  • Extend the audit trail (src/bernstein/core/audit.py) and policy engine (src/bernstein/core/policy_engine.py)
  • Build on the WAL (write-ahead log) in .sdd/runtime/ for replay capability
  • Schedule cleanup operations during low-activity windows detected by the adaptive tick

Relevant Files

  • src/bernstein/core/orchestrator.py
  • src/bernstein/core/planner.py
  • src/bernstein/core/plan_loader.py
  • src/bernstein/cli/
  • src/bernstein/core/wal.py

Backlog: road-047-wal-compaction-for-long-running-orchestrations.yaml

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High priorityenhancementNew feature or requesthelp wantedExtra attention is neededpythonPythonreliabilityServer resilience and fault toleranceroadmapRoadmap feature

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions