Overview
Cluster operations perform multiple state changes without transactions, leaving inconsistent states on partial failures.
Problem
Deploy/start/stop/undeploy update multiple systems (local + etcd + scheduler). Partial failures leave split state.
Examples
See issues: #251, #252, #253
Acceptance Criteria
- All cluster operations atomic or compensated
- Partial failures trigger rollback
- State always consistent
- Lost leadership handled
- Operations idempotent
- Recovery from any intermediate state
Estimated Impact
~5 major operations, 3-4 weeks work, CRITICAL for data integrity
Overview
Cluster operations perform multiple state changes without transactions, leaving inconsistent states on partial failures.
Problem
Deploy/start/stop/undeploy update multiple systems (local + etcd + scheduler). Partial failures leave split state.
Examples
See issues: #251, #252, #253
Acceptance Criteria
Estimated Impact
~5 major operations, 3-4 weeks work, CRITICAL for data integrity