-
Notifications
You must be signed in to change notification settings - Fork 24.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed update API #8369
Comments
huge +1 |
+1 |
This issue can be avoided if read-modify-index are executed on the client side. Closing. |
To be clear, for future readers, the issue this plan tries to side step still exists - updates put an unproportional load on the primary. People that encounter it can work around it by doing updates from the client as Nhat indicated. We're currently re-evaluating the plan as described above, which is why the issue is closed. |
@bleskes Thanks for the note for future readers as I've noticed this in our ES 6.1.1 cluster (on Windows VMs). We have 3 nodes and 2 replicas of every index and yet, for some reason, all the primary replicas for every single index are on one node. The VMs periodically will need updates and restart on a staggered schedule and we'll see the primaries shift but they always end up all on one node. I could see maybe not splitting the primaries for a single index but I can't understand why every single index has each ones of their primary replicas on the same node. Perhaps there's a way to control this? At any rate, this node gets very hot when we're doing updates especially during nightly background jobs we run. If at least some of the index primaries were on other nodes, we could share the unproportional load among all 3 nodes as the updates are for a variety of indexes. |
@tdoman https://www.elastic.co/blog/elasticsearch-versioning-support explains how to map what the update api does to equalivent get then index patterns. If you have any questions about how to do this with the .NET client, I suggest asking on the discuss forums. |
@bleskes thanks, I will review that. Can you tell me where to go to find out why all the primary shards for every index are located on the same node? I have another cluster running the same version of ES where I have only one large index and there I see the primary shards for that index split between two of the three nodes. By the same token, is there a way to ask ES to distribute the primary shards among the nodes in the cluster? I'd assume it'd do that by default but in my case, it's not. |
@tdoman as Jason said in other ticket - please open a topic on the forum where we'd be happy to help. I think there's same basic misconceptions about the role of the primary. We keep github for concrete issues and features. |
The difference between a primary and a replica shard should be just a flag which indicates the current role that a shard has. The work load should be the same for all shards in a group, meaning that it shouldn't matter how many primary shards there are on any one node.
The only API which doesn't follow this principle is the
update
API, which can run a potentially heavy script on the primary shard. This can result in hotspots in the cluster, where one node happens to host many primary shards (see #8149).@bleskes suggested a way to fix this: by changing the
update
API to perform theGET
andscript
phases on any primary or replica in a shard group. This would change the characteristics ofupdate
to be more like a normal distributed get-and-reindex.This increases the window for conflicting changes, so it is probably worth changing
retry_on_conflict
to default to1
, instead of0
.If the user is sure that their updates are light and won't cause hotspots on the primary, then they can opt in to primary-only updates with
?preference=_primary
/cc @s1monw
The text was updated successfully, but these errors were encountered: