Cluster state and versions - when should we increment which version and how often? #14158

brwe · 2015-10-16T13:36:32Z

While reviewing #14062 I found it somewhat difficult to reason about which version is incremented when in the cluster state. I open this issue to summarize how I think versions work right now and to maybe get some feedback on how this can be handled in a more easy to understand way.

This is the version numbers we maintain:

cluster state version:
- should be incremented only by one each time a cluster state update task is processed and actually yielded any change in the whole cluster state
- currently only changed in the cluster state update thread.
MetaData version:
- should increment each time something in the cluster settings or index meta data changes.
- currently only changed in the cluster state update thread.
IndexMetaData version:
- Should increment whenever some index specific property changes (mapping, settings, etc. expect for routing changes which we track in shard version).
- changed in many places, look at IndexMetaData.Builder.version(..) and where it is called.
RoutingTable version
- should increment each time any shard routing changes
- currently changed in the cluster state update thread and AllocationService.buildChangedResult(..).
Shard version:
- incremented each time something in the routing of a single shard changes (add/remove replica, relocate shard etc.).
- Shard version is stored in the ShardRoutings. It is updated when a single ShardRouting changes during allocation and then copied over to all ShardRoutings from the same copy in IndexRoutingTable.normalizeVersions() once allocation has finished.

Q: Cluster state version and MetaData version are updated only once per cluster state update task but the others are not or at least it is not immediately clear if they can potentially be incremented by > 1. It it OK if IndexMetaData version, RoutingTable version and shard version increment by more than one between cluster states?

While I think I understand why we do versioning now the way we do it I find it cumbersome to read and wonder if there is a cleaner way to maintain these versions.
For example: we only need the version increments before master sends the new cluster state to the other nodes. Can we build a new cluster state without incrementing any version in any of the components and then here check the difference between new and old cluster state and update all versions in one go? This would leave no questions open about when versions are updated and where. Chatted very briefly with @s1monw who thinks this will add complexity and not remove any but I have not yet given up hope and will give it a shot.

Any kind of feedback is more than welcome.

The text was updated successfully, but these errors were encountered:

bleskes · 2015-10-19T15:13:21Z

Thanks @brwe for the great write up. The way I see it the version are primarily used to allow to check for equality (with one big exception, see further). For example, this is used when updating cluster states on the nodes after receiving a new one from the master where we want to reuse local objects when the didn't change. Longer term I tend to say we should go with implementing equals methods where we need to so can avoid using this version variant (as we recently did in shard routing). We should look into how much effort this requires.

In the cluster state level, the version is also used to indicate "supersedes" relationship where a node can process only the cluster state with highest version from it's pending CS queue and it is guaranteed to include all the changes from the previous ones. It doesn't really matter at the moment if we skip a version, though I don't think it happens now.

Last - ShardRouting's version is used for selecting a primary after a full cluster restart, but we have plans to change that as you know :)

elasticmachine · 2018-03-15T14:04:05Z

Pinging @elastic/es-distributed

DaveCTurner · 2018-03-20T10:12:56Z

Closing this as there's no further action needed here. Possibly this will be revisited during the Zen 2 work.

brwe mentioned this issue Oct 16, 2015

Introduce Primary Terms #14062

Closed

clintongormley added discuss :Cluster labels Oct 16, 2015

clintongormley added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Cluster labels Feb 13, 2018

DaveCTurner added the :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) label Mar 15, 2018

DaveCTurner closed this as completed Mar 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster state and versions - when should we increment which version and how often? #14158

Cluster state and versions - when should we increment which version and how often? #14158

brwe commented Oct 16, 2015

bleskes commented Oct 19, 2015

elasticmachine commented Mar 15, 2018

DaveCTurner commented Mar 20, 2018

Cluster state and versions - when should we increment which version and how often? #14158

Cluster state and versions - when should we increment which version and how often? #14158

Comments

brwe commented Oct 16, 2015

bleskes commented Oct 19, 2015

elasticmachine commented Mar 15, 2018

DaveCTurner commented Mar 20, 2018