Skip to content

PRODUCTION READINESS: Implement transaction semantics for cluster operations #300

@sfloess

Description

@sfloess

Overview

Cluster operations perform multiple state changes without transactions, leaving inconsistent states on partial failures.

Problem

Deploy/start/stop/undeploy update multiple systems (local + etcd + scheduler). Partial failures leave split state.

Examples

See issues: #251, #252, #253

Acceptance Criteria

  • All cluster operations atomic or compensated
  • Partial failures trigger rollback
  • State always consistent
  • Lost leadership handled
  • Operations idempotent
  • Recovery from any intermediate state

Estimated Impact

~5 major operations, 3-4 weeks work, CRITICAL for data integrity

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions