New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Reliable delivery in Typed, #20984 #28155

Merged

patriknw merged 1 commit into master from wip-20984-reliable-delivery-patriknw

Mar 12, 2020

Member

patriknw commented Nov 8, 2019 •

edited

Approach

I'm taking a rather different approach than in classic AtLeastOnceDelivery because I would like:

support plain flow control, with a work pulling approach
be possible to use with or without persistence (without it may loose messages if producer node crashes)
detect lost messages on the consumer side and let that drive resends, instead of aggressively resending from producer side
deliver messages in order and deduplicate resent messages
have an efficient protocol for acknowledgments over the network (not ack each message), but still have a simple one-by-one protocol for the end user

I have thought of 3 separate usage cases:

Point-to-point between one producer and one consumer, this is also the building block of other usages.
Work pulling pattern, where workers register to one work producer master. Workers are equal and a message can be routed to any of the workers that has demand.
Sharding. Routing to the destination is based on the entityId. Many producers can send to the same destination entity.

Those use cases are illustrated in the tests in ReliableDeliverySpec, WorkPullingSpec, ReliableDeliveryShardingSpec.

Point-to-point

ProducerController and ConsumerController are point-to-point with single producer and single consumer.

The ScalaDoc in ProducerController and ConsumerController describes how it works and the message protocol.

Work Pulling

ConsumerController (CC-1) is started for worker-1, and registered to the Receptionist.

The WorkPullingProducerController receives the Listing from the Receptionist and spawns a ProducerController (PC-1) for CC-1. The PC-1/CC-1 pair is used for delivery of messages selected for worker-1.

Another worker-2 is started and registered. Corresponding ProducerController (PC-2) for CC-2. The PC-1/CC-1 pair is used for delivery of messages selected for worker-1.

The WorkPullingProducerController randomly selects worker for each message (job) among the workers with demand (according to the PC/CC pair request window).

Sharding

A ShardingProducerController is started for each producer, typically one per node. Messages to any entity can be sent via this ShardingProducerController. For the first message for an entity it spawns a ProducerController (PC-entity-1), which is used for delivery of messages to that specific entity.

Sharding creates the ShardingConsumerController for the first message to that entity, which spawns the application's entity Behavior.

A ConsumerController (CC-a) is also spawned and is working in pair with the ProducerController (PC-entity-1).

Message for another entity means that the ShardingProducerController will spawn another ProducerController (PC-2) for that entity.

There can be many producers, typically one per node. The ShardingConsumerController will spawn a new ConsumerController for each producer.

TODO:

akka-ci added validating needs-attention and removed validating labels

akka-ci commented Nov 8, 2019

Test FAILed.

johanandren reviewed

View reviewed changes

Member

johanandren left a comment

Through a first skim, looks very cool, but have lots of questions :)

akka-actor-typed/src/main/scala/akka/actor/typed/internal/delivery/ConsumerController.scala Outdated

+                            producer: ActorRef[ProducerController.InternalCommand],
+                            start: Start[A],
+                            firstSeqNr: Long): Behavior[InternalCommand] = {
+                          val requestedSeqNr = firstSeqNr - 1 + RequestWindow

Member

johanandren Nov 8, 2019

I guess we'd want to watch to watch the consumer as well and tie the lifecycle of the CC to the consumer?

Member Author

patriknw Nov 9, 2019

yes, also restart of consumer is a fixme

akka-actor-typed/src/main/scala/akka/actor/typed/internal/delivery/ConsumerController.scala Outdated

+                      checkProducerId(producerId, pid, seqNr)
+                      val expectedSeqNr = s.receivedSeqNr + 1
+                      if (seqNr == expectedSeqNr || (first && seqNr >= expectedSeqNr) || (first && seqMsg.producer != s.producer)) {
+                        logIfChangingProducer(s.producer, seqMsg, pid, seqNr)

Member

johanandren Nov 8, 2019

What is the scenario when we get a new producer?

Member Author

patriknw Nov 9, 2019

Maybe mostly when we introduce the durable producer, which may crash and then continue elsewhere.
Also, for the Sharding use case there is a loose coupling between PC and CC which could allow PC to be stopped and started again (possible elsewhere).

akka-actor-typed/src/main/scala/akka/actor/typed/internal/delivery/ConsumerController.scala Outdated

+                        context.log.infoN("from producer [{}], missing [{}], received [{}]", pid, expectedSeqNr, seqNr)
+                        if (resendLost) {
+                          seqMsg.producer ! Resend(fromSeqNr = expectedSeqNr)
+                          resending(s.copy(producer = seqMsg.producer))

Member

johanandren Nov 8, 2019

So we restart from there and drop the current message if resendLost is enabled, probably good to have different logging for the two scenarios, and maybe at warning for when not resending.

Member

He-Pin Dec 15, 2019

no time/size window here before sending the resending message?
I more prefer to change the behavior to some kind of observing before that

akka-actor-typed/src/main/scala/akka/actor/typed/internal/delivery/ConsumerController.scala Outdated

+                        } else {
+                          deliverTo ! Delivery(pid, seqNr, msg, context.self)
+                          waitingForConfirmation(s.copy(producer = seqMsg.producer, receivedSeqNr = seqNr), first)
+                        }

Member

johanandren Nov 8, 2019

Should there perhaps be 3 options, resend, drop and stop/fail on detected gap?

Member

He-Pin Dec 15, 2019

better a strategy?

akka-actor-typed/src/main/scala/akka/actor/typed/internal/delivery/ConsumerController.scala Outdated

+                      Behaviors.unhandled
+                    case Start(_) =>
+                      Behaviors.unhandled

Member

johanandren Nov 8, 2019

Probably good to log warn here as well, since it would only happen when using the API incorrectly

akka-actor-typed/src/main/scala/akka/actor/typed/internal/delivery/ProducerController.scala Outdated

+               * to ignore lost messages, and then the `ProducerController` will not buffer unconfirmed messages.
+               * In that mode it provides only flow control but no reliable delivery.
+               */
+              object ProducerController {

Member

johanandren Nov 8, 2019

Is there a way for the producer to signal that it is done/reached the end of the messages it wants to send etc?

Member Author

patriknw Nov 9, 2019

no, good idea

Member Author

patriknw Dec 12, 2019

Changing my mind, better to have that as an application level end message. Because it needs the same delivery mechanism and the application consumer must be able to understand it. When it has been received the application consumer can tear things down.

Member

johanandren Dec 19, 2019

Ok, so that means the CC stopping on consumer termination?

Member Author

patriknw Dec 19, 2019

yes, that is implemented

on producer side I think the producer would have to wait for ack corresponding to that last message before stopping the PC

akka-actor-typed/src/main/scala/akka/actor/typed/internal/delivery/ProducerController.scala Outdated


		final case class RequestNextParam[A](currentSeqNr: Long, confirmedSeqNr: Long, sendNextTo: ActorRef[A])

		final case class MessageWithConfirmation[A](message: A, replyTo: ActorRef[Long])

Member

johanandren Nov 8, 2019

Perhaps dont take a boxed-primitive as reply message but a wrapping type?

So the idea is that the message is A but the response is the sequence number it got? How will a producer correlate those two?

Member Author

patriknw Nov 9, 2019

The primary purpose is to use this with ask from the P. The reply (seqNr: Long) is sent back when all is confirmed for that message: A. Having the seqNr as the reply might not be important, could be Done. I exposed seqNr to both P and C so far.

akka-actor-typed/src/main/scala/akka/actor/typed/internal/delivery/ProducerController.scala Outdated

+                final case class RegisterConsumer[A](
+                    consumerController: ActorRef[ConsumerController.Command[A]],
+                    replyTo: ActorRef[Done])
+                    extends Command[A]

Member

johanandren Nov 8, 2019

This is just here for testability?

Member Author

patriknw Nov 9, 2019

no, needed for dynamic registration/replacement of CC, for example when using sharding

...-typed/src/main/scala/akka/actor/typed/internal/delivery/WorkPullingProducerController.scala Outdated

+                    hasRequested: Boolean)
+                // TODO Now the workers have to be registered explicitly/manually.
+                // We could support automatic registration via Receptionist, similar to how routers work.

Member

johanandren Nov 8, 2019

Was just going to ask if you had thought about how to find the remote consumer, for this case it makes sense, one producer many consumers/workers. But for a 1:1 reliable delivery between cluster nodes that is potentially short lived I don't think the receptionist is a great fit (maybe I just made that use case up though, not sure) .

Member Author

patriknw Nov 9, 2019

I agree, Receptionist shouldn't be used for short lived.
The point-to-point scenario currently requires some application specific way to exchange CC/PC ActorRef to connect them.

...-typed/src/main/scala/akka/actor/typed/internal/delivery/WorkPullingProducerController.scala Outdated

+                        State(Map.empty, hasRequested = false))
+                    }
+                    .narrow
+                }

Member

johanandren Nov 8, 2019

This is very cool, kind of pre-baking much of the distributed workers sample.

Member Author

patriknw Nov 9, 2019

yup, was fun to write

Member

chbatey Dec 13, 2019

Yeah this is really nice, i hadn't quite got my head around how this was going to work with the P2P API but looks good

Member

He-Pin commented Nov 12, 2019

detect lost messages on the consumer side and let that drive resends, instead of aggressively resending from producer side

with a sequential seqNr or syncId?

akka-ci added validating tested and removed needs-attention validating labels

akka-ci commented Nov 13, 2019

Test PASSed.

akka-ci added validating needs-attention and removed tested validating labels

akka-ci commented Nov 16, 2019

Test FAILed.

akka-ci added validating tested and removed needs-attention validating labels

akka-ci commented Nov 17, 2019

Test PASSed.

akka-ci added validating tested and removed tested validating labels

akka-ci commented Nov 18, 2019

Test PASSed.

akka-ci added validating tested and removed tested validating labels

akka-ci added tested and removed tested validating labels

akka-ci commented Mar 11, 2020

Test PASSed.

akka-ci added validating and removed tested labels

Member Author

patriknw commented Mar 12, 2020

Ready for final approval.
I have completed the Java doc examples.
Created issues for remaining TODO/FIXME that can be worked on later: https://github.com/akka/akka/issues?q=is%3Aopen+is%3Aissue+label%3At%3Areliable-delivery

Let me squash this manually and craft the commit message before merging.

octonato reviewed

View reviewed changes

akka-docs/src/main/paradox/typed/reliable-delivery.md

+              while waiting for the confirmation are stashed by the `ConsumerController` and delivered when the previous
+              message is confirmed.
+              The consumer and the `ConsumerController` actors are supposed to be local so that these messages are fast

Member

octonato Mar 12, 2020

Suggested change

      
            The consumer and the `ConsumerController` actors are supposed to be local so that these messages are fast
          
            The consumer and the `ConsumerController` actors are guaranteed to be local so that these messages are fast

Member

johanandren Mar 12, 2020 •

edited

That doesn't seem right, it's a recommendation for the user and not an aspect built into the reliable delivery actors.

Member Author

patriknw Mar 12, 2020

it's enforced by a check. I'll reformulate this, but "guaranteed" is not the right word because user can do the wrong thing (but will then notice at runtime)

Member Author

patriknw Mar 12, 2020

clarified in 64178fa

akka-ci added tested and removed validating labels

akka-ci commented Mar 12, 2020

Test PASSed.

johanandren reviewed

View reviewed changes

Member

johanandren left a comment

Java samples mostly looking good, two small things noted.

akka-cluster-sharding-typed/src/test/java/jdocs/delivery/ShardingDocExample.java Outdated Show resolved Hide resolved

akka-cluster-sharding-typed/src/test/java/jdocs/delivery/ShardingDocExample.java Outdated Show resolved Hide resolved

akka-ci added validating tested and removed tested validating labels

akka-ci commented Mar 12, 2020

Test PASSed.

johanandren approved these changes

View reviewed changes

Member

johanandren left a comment

LGTM, let's merge this behemoth!

Member Author

patriknw commented Mar 12, 2020

thanks, I'll squash


          Reliable delivery in Typed, #20984

1d16e84

Different approach than in classic AtLeastOnceDelivery because I would like:
* support flow control, with a work pulling approach
* be possible to use with or without persistence (without it may loose
  messages if producer node crashes)
* detect lost messages on the consumer side and let that drive resends,
  instead of aggressively resending from producer side
* deliver messages in order and deduplicate resent messages
* have an efficient protocol for acknowledgments over the network (not ack each message),
  but still have a simple one-by-one protocol for the end user

* support 3 use cases (building blocks)
  * point-to-point
  * work pulling
  * sharding
* optional durable queue, with one event sourced implementation
* protobuf serialization
* ApiMayChange
* reference docs and examples
* api docs
* doc example code missing so far

patriknw force-pushed the wip-20984-reliable-delivery-patriknw branch from dba92f3 to 1d16e84 Compare

March 12, 2020 14:37

akka-ci added validating tested and removed tested validating labels

akka-ci commented Mar 12, 2020

Test PASSed.

Member Author

patriknw commented Mar 12, 2020

I count @chbatey 's previous reviews as an approval

patriknw merged commit 7d790ef into master

patriknw deleted the wip-20984-reliable-delivery-patriknw branch

March 12, 2020 15:52

agapic reviewed

View reviewed changes

akka-cluster-sharding-typed/src/test/scala/docs/delivery/ShardingDocExample.scala

+                    val producerController =
+                      context.spawn(ShardingProducerController(producerId, region, durableQueueBehavior = None), "producerController")
+                    context.spawn(TodoService(producerController), "producer")

agapic Aug 26, 2020 •

edited

Wondering if you could also include an example of actually calling UpdateTodo in the TodoService? What would that look like? In particular, what would the entityRef look like?

The below doesn't work because I get an exception.

      EntityTypeKey<TodoService.Command> entityTypeKey1 =
          EntityTypeKey.create(TodoService.Command.class, "todo");

      EntityRef<TodoService.Command> entityRef1 =
          ClusterSharding.get(system).entityRefFor(entityTypeKey1, "123");
      CompletionStage<Response> r = entityRef1.ask(replyTo -> new UpdateTodo("123", "tire", false, replyTo),
                                                   Duration.ofSeconds(3));

java.lang.ClassCastException: com.agapic.TodoService$UpdateTodo cannot be cast to akka.actor.typed.delivery.ConsumerController$Command

Member Author

patriknw Aug 26, 2020

Try "todoService" in the EntityTypeKey instead, because I think the example is already using "todo" for the TodoList and here you are trying to interact with the TodoService.

However, I think you misunderstand how it is intended to be used. You try to use it with two levels of Sharding, one for the TodoList and one for the TodoService. The docs say " A single ShardingProducerController per ActorSystem (node) can be shared for sending to all entities of a certain entity type." https://doc.akka.io/docs/akka/current/typed/reliable-delivery.html#sharding

If you have more questions I'd recommend switching over to https://discuss.akka.io/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

He-Pin He-Pin left review comments

octonato octonato left review comments

chbatey chbatey left review comments

agapic agapic left review comments

johanandren johanandren approved these changes