Replies: 2 comments
-
We have a system in place that attempts to do this today for Akka.Cluster.Sharding, but the problem comes down to a matter of timing - it's not really feasible to implement a global pause on message traffic to a particular node in a given cluster under all circumstances. Some nodes leave abruptly, some nodes lag due to non-software factors, and a global lock on a ShardRegion would need to be checked every time we messaged anything in the region (even during the happy path.) Messages can get lost in transit today despite our buffering and hand-off system due to that last factor - even happy path handoffs require a small amount of eventual consistency and messages can be lost during that time. The approach I'd recommend for solving this is to keep the Sharding infrastructure as-is in order to keep it as simple and as fast as possible is to put layer a reliable delivery layer on top of it. That way more robust modes of delivery can be added as-needed without complicating the sharding infrastructure (which is already fairly complex.) One of the projects we still have planned for the Akka.NET 1.5 project lifecycle is a new package, "Akka.Delivery" or "Akka.ReliableDelivery" (we're open to suggestions on naming) that basically replaces the We're getting some interest from users in this module already - so I think we'll likely start work on it sooner rather than later. |
Beta Was this translation helpful? Give feedback.
-
@Aaronontheweb maybe porting the sharding pattern also as part of #4720 would ease @Binelli's concerns. |
Beta Was this translation helpful? Give feedback.
-
Hi,
Sometimes, when I start a new node in the cluster and the rebalance occurs, some messages get dropped from the Shard that is being moved to another node. I know the reason for this is to avoid reordering.
Couldn't we avoid this drop of messages using the following strategy:
1 - ShardCoordinator sends a "CoordinatedBeginHandOff" message to the ShardRegion that owns the Shard with the ShardId and the IActorRef of all the ShardRegions part of the cluster
2 - ShardRegion receives the message and for each other ShardRegion sends the BeginHandOff message for them. It then waits for all BeginHandOffAcks and forwards then to the ShardCoordinator
3 - When it receives the last BeginHandOffAck, it can do the BeginHandOff process itself and sends back to the ShardCoordinator the last BeginHandOffAck
4 - ShardCoordinator then sends the ShardRegion the HandOff
This way I think we could be sure that the ShardRegion that owns the Shard being moved will not receive messages from other ShardRegions after they send the BeginHandOffAck and any messages buffered between it processing BeginHandOff and received the HandOff message all belong to the same origin (local) and when the HandOff process is completed can be forwarded to the new destination without the risk of being re-ordered.
Beta Was this translation helpful? Give feedback.
All reactions