=cls #17846 Use CRDTs instead of PersistentActor to remember the state of the ShardCoordinator #17871

oseval · 2015-06-30T16:32:08Z

Currently this is not real PR. Only to show my attempt to change the claster-sharding dependency from the akka-persistence to the akka-ddata. I would incrementally update it when found time for this .

Idea:

Distributed state of the ShardRegions are all we need in the cluster-sharding.
Distributed state can be handled by akka-ddata module.
Inconsistency is possible if only we allowing two ShardRegions allocate equals Shards simultaneously.
It is possible avoid inconsistency if we have singleton Coordinator who can allow to the ShardRegion allocate the Shard. R1 ask C to allow him allocate the S1. C1 allow. R1 allocate S1 and then change DData + inner state. The Coordinator will be receiving these changes and then update inner state.
If the Coordinator fails, then it can be recovered from DData which be actual for all nodes inside a cluster.

Welcome for comments!

It is very possible if I missing something in this concept. Correct me please.

And I am sorry for my English.

akka-ci · 2015-06-30T16:36:02Z

Can one of the repo owners verify this patch?

oseval · 2015-07-02T15:05:28Z

@patriknw I have some questions about cluster sharding. I need to know the "underlying idea". Currently only two questions.

Is it critical that the Coordinator reallocates the shards of the terminated regions immediately? Is it possible if the shard from a terminated region will be allocated only after sending a message to this shard?
Is it applicable if the terminated region would be remove itself from the distributed state by sending message to the replicator inside the postStop?

patriknw · 2015-07-02T17:08:03Z

Hi @SmLin
I have not had time to look at your code yet. Will try to get here tomorrow.

There are two things that are persisted.

Coordinator state, the allocations of all shards
The started entities within each shard, see the Remember Entities section in the documentation.

Lets try to figure out the coordinator state first.
For that the allocation is always done for the first message to a shard, also for rebalance and crash.

It is possible to send message to replicator in postStop, but crash must also be handled and if the node is removed from the cluster the replicator might not be alive any more.

oseval · 2015-07-03T08:26:27Z

@patriknw PR is not ready for review yet. So you could do not waste your time. I will notify to you when it would be ready.

patriknw · 2015-07-03T13:13:55Z

@SmLin My thinking around the design.

I think the coordinator should remain a singleton, because it is important with a central decision maker of where the shards are to be allocated. We could start with just saving the coordinator state in a LWWRegister, using WriteMajority and ReadMajority.

The LWWRegister is pretty safe here since the singleton failover is not instant, but we can later create a more customized data type that makes use of akka.cluster.Member.isOlderThan, we want to use the data from the newest singleton. That can be rather useful for singletons in general.

That has the small consistency problems that I sketched in the issue, but perhaps I'm looking too much for the bad scenarios, and perhaps we can make it more safe somehow.

The state of the PersistenShard could be stored in one ORSet per shard id.

I will be on vacation the next 4 weeks, so if I'm unresponsive here that is not because I'm not interested. Thanks for working on this.

oseval · 2015-07-03T13:50:58Z

@patriknw Thank you too.
I am was created first version of code. Currently it is mostly not completed and not works yet, but I wrote a lot of comments everywhere. And I was not working on PersistentShard because this is a much more work in a row - hope it could be doing step by step. I will be wait for response.
Enjoy your holiday!
I will be working on this on my vacation at the next week:)

patriknw · 2015-07-03T13:55:53Z

akka-cluster-sharding/src/main/scala/akka/cluster/sharding/ShardCoordinator.scala

+    case g @ GetSuccess(RegionsKey, _) =>
+      val regions: Map[ActorRef, Set[(ShardId, ActorRef)]] = g.get(RegionsKey).entries
+      regions.foreach { case (region, shards) =>
+        // TODO: Would the Coordinator notified by the `Terminated` message if the region was terminated before `watch`


yes, it will

…state of the ShardCoordinator akka#17871

oseval · 2015-07-08T23:30:42Z

I was done the first step as @patriknw offered. I will be move the PersistentShard onto the CRDT in next step, since this is possible.

Now PR is ready for review.
@rkuhn Could somebody review this PR as long as Patrik on vacation?

rkuhn · 2015-07-09T07:32:41Z

akka-cluster-sharding/src/main/scala/akka/cluster/sharding/ShardCoordinator.scala

+import akka.cluster.ddata.Replicator.Update
+import akka.cluster.ddata.Replicator.UpdateSuccess
+import akka.cluster.ddata.Replicator.UpdateTimeout
+import akka.cluster.ddata.Replicator.WriteMajority


please use a wildcard import for these

rkuhn · 2015-07-09T08:06:03Z

Thanks a lot for this idea and PR, @SmLin! There are some aspects that I cannot currently think through in full, revolving around the semantics of shard allocation. Currently you send back the result immediately before knowing that the CRDT update has been disseminated. The old implementation makes sure that the allocation has been persisted before it can be handed out and acted upon, which ensures that two different regions cannot create the same shard when the coordinator’s machine fails and a new coordinator makes a different decision.

The other point is related: the old implementation pulled shard location information lazily into the regions when needed, but CRDT replication would allow us to just listen to the same data at all regions, saving roundtrips to the coordinator.

One thing I really like about this approach is that the CRDT’s lifetime is coupled to the lifetime of the cluster itself. Persistence means that allocations can outlive the whole cluster, but when starting from scratch we may as well just reallocate things instead. The only missing feature then is that instantiated entities will not automatically be resurrected, but my thinking goes in the direction of making that an optional add-on since not everybody needs that and then we would have completely separated the sharding from the persistence concerns.

oseval · 2015-07-10T12:07:24Z

@rkuhn I have two ways.
First - I have to fix the "double shards allocation" issue and write some test for checking this. And then I would be start the "next step" (and next PR) at bounds of which the distributed data could be wrote only inside their owners. The data about the allocated shards inside the shards, data about the started regions inside the regions. And of course I will to remove all dependencies onto the persistence.
Second - I can do all of this inside the current PR, but it is a lot of changes in a row. Of course I will do as you will say. But I think the first case is better.

rkuhn · 2015-07-10T14:33:43Z

Yes, I agree that the first proposal is the way to go, optimizing the use of the CRDTs can and should a second step; please create a ticket so that we have it properly on the radar.

patriknw · 2015-07-11T07:20:59Z

akka-cluster-sharding/src/main/resources/reference.conf

-  # Settings for the coordinator singleton. Same layout as akka.cluster.singleton.  
+
+  # Timeout of waiting the initial distributed state (an initial state will be queried again if the timeout happened)
+  waiting-for-state-timeout = 30 s


a more reasonable default is 5 s

patriknw · 2015-07-11T07:58:35Z

@SmLin I took a first look. I think it is in the right direction. I appreciate the step-wise approach you take on this. Great work!

To set the expectations right, we will probably have to support both this and the persistent sharding and we need to discuss how to package those. Anyway, don't worry about that now. Please continue in this PR, and separate the different steps with separate commits.

oseval · 2015-07-14T20:27:29Z

@rkuhn @patriknw I am stopped on consistency requirements to case of duplicated shards. To avoid such duplication a coordinator must ensure changes of state to a whole cluster (WriteAll in case of CRDTs) or synchronize state with all regions before moving to the active state. But in second step I will change the source of distributed state changes from a coordinator to the regions and shards. So after second step duplication will be avoided automatically "by design".
Now question: is WriteMajority enough for merge this PR or it must be implemented with the guaranteed consistency?
Currently I think that this step is finished and need for review before start the second step.

oseval · 2015-08-02T20:03:19Z

I have to notice one case when avoiding the shards duplication is impossible. If cluster was fragmented (due to broken link between two datacenters or some other cases), then such situation would be end up with the two identical coordinators in the both clusters.
In case when coordinator extends the PersistentActor this will lead to crash both coordinators (due concurrent writes by two persistent actors) and could be repaired only by cleaning persistent data from database.
In case of CRDT this eventually would lead to stand up two identical shards/entities and could be crash these entities if they are persistent actors.

patriknw · 2015-08-03T14:17:16Z

akka-cluster-sharding/src/main/scala/akka/cluster/sharding/ShardCoordinator.scala

+        "The ShardCoordinator was unable to update a distributed state with error {}. Was retrying with event {}.",
+        error,
+        evt)
+      sendUpdate(evt)


ModifyFailure only happens if you modify function in the Update message throws an exception, and that should never happen unless you have done a programming error. No point in trying to do it again. Will probably fail again. Instead you should just re-throw the cause.

Could it happens due to message/updates reordering? And would fixed after some time.

I don't think so, typical error would be things like NullPointerException, i.e. it should never happen unless we have a bug somewhere and then we don't want to continue anyway

oseval · 2015-08-03T15:19:24Z

We are attacking that issue from other angles.

Am I right understand this mean that issue must not be solved in this PR?

patriknw · 2015-08-03T15:57:38Z

Am I right understand this mean that issue must not be solved in this PR?

yes

oseval · 2015-08-08T20:41:27Z

@patriknw It turns out that ddata realization (without straightforward message exchanges between coordinator, regions and shards) is tremendously slow since gossips have a timeout. So I see two variants:

The Replicator must sending gossips immediately to the nodes where subscribers exists.
Currently this PR seems to be working and I think it is anyway better then saving a state to disk.
Using a ddata not only for the coordinator state gives slightly more safety and no more.

Probability of the shards duplication even at the moments of cluster failures seems to be very small. I don't know exactly, but it possible that such failure could happen only if auto-down enabled. But if it is enabled then saving the coordinator state to disk also have the problems.

Currently I think it is complete solution. Any better solution required absolutely different idea and realization. Sorry, I have no a time to do such realization.

patriknw · 2015-08-09T16:48:56Z

akka-cluster-sharding/src/main/scala/akka/cluster/sharding/ShardRegion.scala

@@ -155,6 +155,12 @@ object ShardRegion {
  @SerialVersionUID(1L) final case object GracefulShutdown extends ShardRegionCommand

  /**
+   * We must be sure that a shard is initialized before to start send messages to it.


what is the reason for the changes in this file?

as mentioned elsewhere: Sometimes persistent shard could be failure during a replay process due to temporary journal unavailability. The user messages which sent to such shard just after creation, but before a failure would be lost. This code removes that problem.

patriknw · 2015-08-09T17:24:16Z

@SmLin I agree that the solution in this PR is good and it is a great advantage that it is so similar to current persistent implementation, since we need to support both approaches in 2.4 (as described in my comment above). Would you be able to wrap up that part or would you like that I make those adjustments?

oseval · 2015-08-10T15:28:34Z

I will try do it by myslef. Thank you for review.

patriknw · 2015-08-10T15:35:16Z

Thanks!

patriknw · 2015-08-10T15:37:20Z

To clarify my idea. We should make it possible to switch between the two implementations with a config setting. The dependency to distributed-data should be in scope provided, so the user must define the dependency and switch the config setting to use this implementation. Does that make sense?

oseval · 2015-08-11T07:53:29Z

Yes, it make sense. Perhaps now it could be done also for akka-persistence dependency.

oseval · 2015-08-17T20:54:00Z

@patriknw I wrote two implementations of the coordinator which depends on config parameter. I duplicated tests for ddata coordinator - currently I don't know how make it more elegantly. Possible you could help me.

patriknw · 2015-08-18T08:03:08Z

akka-cluster-sharding/src/main/scala/akka/cluster/sharding/ShardCoordinator.scala

          context.watch(region)
-          sender() ! RegisterAck(self)
+          region ! RegisterAck(self)


sender() and region is the same, right?
but now we use region here, but sender() a few lines above

patriknw · 2015-08-18T08:17:07Z

@SmLin excellent separation! I have a few comments, but it looks like this can be merged soon. We are releasing on 2.4-RC1 on Friday so it would be great if we can merge it on Thursday at the latests.

Could you also add a section in the docs of the alternative ddata mode:

akka-docs/rst/scala/cluster-sharding.rst
akka-docs/rst/java/cluster-sharding.rst

Regarding the tests I think you could split them up into an abstract class and two implementations. The only difference should be the config (and some initial creation of actors). See example of similar split:

oseval · 2015-08-18T08:40:24Z

akka-cluster-sharding/src/main/scala/akka/cluster/sharding/ShardRegion.scala

    case evt: ClusterDomainEvent                 ⇒ receiveClusterEvent(evt)
    case state: CurrentClusterState              ⇒ receiveClusterState(state)
    case msg: CoordinatorMessage                 ⇒ receiveCoordinatorMessage(msg)
    case cmd: ShardRegionCommand                 ⇒ receiveCommand(cmd)
    case msg if extractEntityId.isDefinedAt(msg) ⇒ deliverMessage(msg, sender())
+    case msg: RestartShard                       ⇒ deliverMessage(msg, sender())


The RestartShard message handling has not existed before - it is a bug fix.

great catch, it's scheduled by receiveTerminated

patriknw · 2015-08-20T09:27:40Z

LGTM
@SmLin please squash into one commit and I will merge it. Excellent contribution!

I will add a few notes about experimental and such in the docs, but I can take care of that.

…state of the ShardCoordinator akka#17871

oseval · 2015-08-20T10:53:48Z

@patriknw Squashed.

patriknw · 2015-08-20T11:29:09Z

thanks!

=cls #17846 Use CRDTs instead of PersistentActor to remember the state of the ShardCoordinator

patriknw · 2015-08-20T16:46:00Z

Refs #17846

oseval mentioned this pull request Jun 30, 2015

Use CRDTs instead of PersistentActor to remember the state of the ShardCoordinator #17846

Closed

patriknw reviewed Jul 3, 2015
View reviewed changes

oseval force-pushed the sharding-with-ddata branch from 0100945 to b3f5a8d Compare July 8, 2015 23:08

oseval added a commit to oseval/akka that referenced this pull request Jul 8, 2015

=cls akka#17846 Use CRDTs instead of PersistentActor to remember the …

b3f5a8d

…state of the ShardCoordinator akka#17871

rkuhn reviewed Jul 9, 2015
View reviewed changes

oseval mentioned this pull request Jul 10, 2015

Use ddata mode as the default for Cluster Sharding #17963

Closed

patriknw reviewed Jul 11, 2015
View reviewed changes

oseval force-pushed the sharding-with-ddata branch from 805f285 to 0d4d45a Compare July 14, 2015 19:38

ktoso added the t:persistence label Jul 15, 2015

patriknw reviewed Aug 3, 2015
View reviewed changes

patriknw reviewed Aug 9, 2015
View reviewed changes

patriknw reviewed Aug 18, 2015
View reviewed changes

oseval reviewed Aug 18, 2015
View reviewed changes

oseval force-pushed the sharding-with-ddata branch 2 times, most recently from fc7c106 to 241f4d8 Compare August 18, 2015 22:26

=cls akka#17846 Use CRDTs instead of PersistentActor to remember the …

6814d08

…state of the ShardCoordinator akka#17871

oseval force-pushed the sharding-with-ddata branch from ec7785e to 6814d08 Compare August 20, 2015 10:49

patriknw mentioned this pull request Aug 20, 2015

=cls #17699 Add some DeadLetterSuppression #18257

Merged

patriknw added a commit that referenced this pull request Aug 20, 2015

Merge pull request #17871 from smlin/sharding-with-ddata

453a554

=cls #17846 Use CRDTs instead of PersistentActor to remember the state of the ShardCoordinator

patriknw merged commit 453a554 into akka:master Aug 20, 2015

patriknw mentioned this pull request Nov 17, 2015

[akka-cluster-sharding] ShardRegion doesn't kill shard actors on HandOff #18945

Closed

412b mentioned this pull request Aug 9, 2016

ShardRestart message can't be processed properly in case of defined messageExtractorId #21145

Closed

=cls #17846 Use CRDTs instead of PersistentActor to remember the state of the ShardCoordinator #17871

=cls #17846 Use CRDTs instead of PersistentActor to remember the state of the ShardCoordinator #17871

Conversation

oseval commented Jun 30, 2015

akka-ci commented Jun 30, 2015

oseval commented Jul 2, 2015

patriknw commented Jul 2, 2015

oseval commented Jul 3, 2015

patriknw commented Jul 3, 2015

oseval commented Jul 3, 2015

Choose a reason for hiding this comment

oseval commented Jul 8, 2015

Choose a reason for hiding this comment

rkuhn commented Jul 9, 2015

oseval commented Jul 10, 2015

rkuhn commented Jul 10, 2015

Choose a reason for hiding this comment

patriknw commented Jul 11, 2015

oseval commented Jul 14, 2015

oseval commented Aug 2, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oseval commented Aug 3, 2015

patriknw commented Aug 3, 2015

oseval commented Aug 8, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patriknw commented Aug 9, 2015

oseval commented Aug 10, 2015

patriknw commented Aug 10, 2015

patriknw commented Aug 10, 2015

oseval commented Aug 11, 2015

oseval commented Aug 17, 2015

Choose a reason for hiding this comment

patriknw commented Aug 18, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

patriknw commented Aug 20, 2015

oseval commented Aug 20, 2015

patriknw commented Aug 20, 2015

patriknw commented Aug 20, 2015