Document ActorScheduler
Capabilities and Limitations
#9142
Labels
kind/epic
Categorizes an issue as an umbrella issue (e.g. OKR) which references other, smaller issues
Description
Zeebe currently implements and uses the
ActorScheduler
, a system that coordinates work between different components that we call actors. There is a growing list of issues and missing features that we'd like to address by replacing theActorScheduler
with something else. To help us make an informed decision on what a replacement should look like, we should first properly document these issues and missing features.Goals
ActorScheduler
capabilities #9145ActorScheduler
limitations #9183When picking a replacement system, the capability overview should allow us to judge how much we need to re-implement or re-design.
The limitation overview can act as a wish-list of new features.
Results
Overall, the actor scheduler is used to serialize work and thus prevent data races.
Since we have many concerns regarding usability, testability, maintainabilty and observability, we'd like to migrate away from the actor scheduler and replace it with something else.
Usage of the actor scheduler is fairly limited: In production code we are counting 27 different actors. Nevertheless, replacing it's usage will be difficult since it's usage is not well isolated and it is unclear how much we rely on some of the more subtle semantics such as ordering.
There are two different approaches we can take. The first option, and originally the motivation to look at the existing actor scheduler, is to switch to a full actor framework. In practice that'd mean Akka since there appears to be no good alternative. The second option is that we could go the same route that other similar projects such as Kafka are going and try to rely solely on
java.util.concurrent
.Obviously, these two approaches are very different and they come with different advantages and drawbacks.
Switching to Akka would give us access to many desirable properties such as easier observability, potential for further usage of the Akka ecosystem and a programming model that is already familiar to some of us and that has proven useful for distributed systems. Testability should also improve dramatically, both by using Akka's dedicated testing libraries as well as allowing randomized property tests of actors and interactions between actors.
Switching to a more manual approach of using
java.util.concurrent
has the advantage that our code could end up being more idiomatic which makes on-boarding easier and potentially unlocks future improvements in the Java world such as project Loom. It'd also align Zeebe with other projects in a similar space, such as Kafka. To quote from their coding guide:We should take some time to validate both approaches and see how they'd work for us in practice.
The text was updated successfully, but these errors were encountered: