Skip to content

Recovery: Quick cluster state processing can cause relocation finalization to fail and delete both copies #9503

Closed
@bleskes

Description

@bleskes

#8570 added some extra protection for the case where a source shard is being closed during recovery. However, this introduces a race condition in the case that the target shard has moved to POST_RECOVERY and the master processes the shard started action and activates the shard before the source node completes the recovery. In that case the source node will close the source shard, causing the recovery to be cancelled. The target node receives the cancellation notification and deletes the local copy (still in POST_RECOVERY).

The extra close listener is not yet released but is part of the 1.5 push.

See: http://build-us-00.elasticsearch.org/job/es_core_1x_debian/3474/

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions