Skip to content

Restart handler restarts all nodes in parallel instead of rolling #121

@Oddly

Description

@Oddly

Bug

When a configuration change triggers the Restart Elasticsearch handler (e.g. adding LDAP SSL settings to elasticsearch.yml), all cluster nodes are restarted simultaneously. This causes full cluster downtime.

The handler in roles/elasticsearch/handlers/main.yml includes restart_and_verify_elasticsearch.yml, which calls the generic restart_and_verify_service.yml. This simply does systemctl restart elasticsearch on all nodes in parallel — no shard allocation management, no waiting for green status between nodes.

Expected behavior

Config-triggered restarts should use a rolling restart pattern:

  1. Disable shard allocation (cluster.routing.allocation.enable: none)
  2. Perform a synced flush (if applicable)
  3. Restart one node
  4. Wait for the node to rejoin and cluster to reach yellow/green
  5. Re-enable shard allocation
  6. Wait for green status
  7. Proceed to next node

Current behavior

All 9 nodes restart simultaneously:

RUNNING HANDLER [restart_and_verify_service | Restart elasticsearch service]
changed: [acc-elastic-master-001.rinis.cloud]
changed: [acc-elastic-master-002.rinis.cloud]
changed: [acc-elastic-master-003.rinis.cloud]
changed: [acc-elastic-ingest-001.rinis.cloud]
changed: [acc-elastic-ingest-002.rinis.cloud]
changed: [acc-elastic-warm-001.rinis.cloud]
changed: [acc-elastic-warm-002.rinis.cloud]
changed: [acc-elastic-hot-001.rinis.cloud]
changed: [acc-elastic-hot-002.rinis.cloud]

Note

The rolling upgrade flow (elasticsearch-rolling-upgrade.yml) already implements proper rolling restart logic with shard allocation management. The config-change handler should reuse this pattern.

Workaround

Set serial: 1 on the Elasticsearch cluster play to force sequential execution, though this doesn't include the shard allocation safety steps.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions