New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes in unassigned info and version might not be transferred as part of cluster state diffs #12387
Conversation
@bleskes could you take a look when you have a chance? Somehow, adding version to the equal triggers retry failures in RecoveryPercolatorTests. Any idea why? |
elastic#12242 introduced a unique id for an assignment of shard to a node. We should use these id's to drive the decisions made by IndicesClusterStateService when processing the new cluster state sent by the master. If the local shard has a different allocation id than the new cluster state, the shard will be removed and a new one will be created. This fixes a couple of subtle bugs, most notably a node previously got confused if an incoming cluster state had a newly allocated shard in the initializing state and the local copy was started (which can happen if cluster state updates are bulk processed). In that case, the node have previously re-used the local copy instead of initializing a new one. Also, as set of utility methods was introduced on ShardRouting to do various types of matching with other shard routings, giving control about what exactly should be matched (same shard id, same allocation id, all but version and shard info etc.). This is useful here, but also prepares the grounds for the change needed in elastic#12387 (making ShardRouting.equals be strict and perform exact equality).
I had a look. like the change (didn’t do a proper review). The source of trouble was IndicesClusterStateService being confused by the new equal semantics. I fixed this in #12397 . Once it’s in we can try again..
|
Also, as set of utility methods was introduced on ShardRouting to do various types of matching with other shard routings, giving control about what exactly should be matched (same shard id, same allocation id, all but version and shard info etc.). This is useful here, but also prepares the grounds for the change needed in elastic#12387 (making ShardRouting.equals be strict and perform exact equality). Closes elastic#12397
1b23371
to
323f106
Compare
@dakrone rebased, force pushed, and can use a review |
*/ | ||
public static boolean mapsEqualIgnoringArrayOrder(Map<String, Object> first, Map<String, Object> second) { | ||
public static String mapsEqualIgnoringArrayOrder(String path, Map<String, Object> first, Map<String, Object> second) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of confusing, I think instead of trying to use this method as both a comparing method and an explaining method, maybe create an additional method called differenceBetweenMaps
or something that returns the explanation and leave this one as a boolean
method?
Otherwise the name doesn't really match its type, and it's confusing in the comparison to be using a string as the expected null value instead of the build-in junit explanation parameter on all the assert*
methods.
Left a couple of comments about the testing methods, but the actual diff code looks good to me. |
LGTM |
…rt of cluster state diffs The equalTo logic of ShardRouting doesn't take version and unassignedInfo into the account when compares shard routings. Since cluster state diff relies on equal to detect the changes that needs to be sent to other cluster, this omission might lead to changes not being properly propagated to other nodes in the cluster. Closes elastic#12387
a1efe96
to
3354a81
Compare
The equalTo logic of ShardRouting doesn't take version and unassignedInfo into the account when compares shard routings. Since cluster state diff relies on equal to detect the changes that needs to be sent to other cluster, this omission might lead to changes not being properly propagated to other nodes in the cluster.