Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of failed primary replica handling #6816

Closed
wants to merge 2 commits into from

Conversation

kimchy
Copy link
Member

@kimchy kimchy commented Jul 10, 2014

Out of #6808, we improved the handling of a primary failing to make sure replicas that are initializing are properly failed as well. After double checking it, it has 2 problems, the first, if the same shard routing is failed again, there is no protection that we don't apply the failure (which we do in failed shard cases), and the other was that we already tried to handle it (wrongly) in the elect primary method.
This change fixes the handling to work correctly in the elect primary method, and adds unit tests to verify the behavior

Out of elastic#6808, we improved the handling of a primary failing to make sure replicas that are initializing are properly failed as well. After double checking it, it has 2 problems, the first, if the same shard routing is failed again, there is no protection that we don't apply the failure (which we do in failed shard cases), and the other was that we already tried to handle it (wrongly) in the elect primary method.
This change fixes the handling to work correctly in the elect primary method, and adds unit tests to verify the behavior
closes elastic#6816
@@ -310,6 +310,12 @@ public boolean allReplicasActive(ShardRouting shardRouting) {
for (RoutingNode routingNode : this) {
shards.addAll(routingNode.shardsWithState(state));
}
for (ShardRoutingState s : state) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move them into a utils class in the test package now we should have done that already :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can, but I will do it in another pull request, cause that one will be much bigger

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

@s1monw
Copy link
Contributor

s1monw commented Jul 10, 2014

left some cosmetic comments. This looks much better testing wise - I love the unit test :)

@s1monw s1monw removed the review label Jul 10, 2014
@s1monw
Copy link
Contributor

s1monw commented Jul 10, 2014

LGTM

@kimchy kimchy closed this in 75ed24f Jul 10, 2014
kimchy added a commit that referenced this pull request Jul 10, 2014
Out of #6808, we improved the handling of a primary failing to make sure replicas that are initializing are properly failed as well. After double checking it, it has 2 problems, the first, if the same shard routing is failed again, there is no protection that we don't apply the failure (which we do in failed shard cases), and the other was that we already tried to handle it (wrongly) in the elect primary method.
This change fixes the handling to work correctly in the elect primary method, and adds unit tests to verify the behavior
closes #6816
@kimchy kimchy deleted the better_primary_failure_handling branch July 10, 2014 16:52
@clintongormley clintongormley changed the title Improve handling of failed primary replica handling Resiliency: Improve handling of failed primary replica handling Jul 16, 2014
@clintongormley clintongormley changed the title Resiliency: Improve handling of failed primary replica handling Improve handling of failed primary replica handling Jun 7, 2015
@lcawl lcawl added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Allocation labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. resiliency v1.3.0 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants