Added aggregate distribution via Swarm #60

uberbrodt · 2017-06-26T12:42:48Z

This pull request implements issue #39, allowing aggregates to run on a cluster of nodes.

Adds a protocol, DistributedAggregate, which defines hooks that an
aggregate can implement to handle Swarm events related to cluster
topology changes and netsplits. The default simply restarts processes on new
nodes in the first case and drops duplicate/divergent states in the
second. The reasoning is that whatever is loaded from eventstore is most
likely the correct version of the aggregate.

In order to support all of Swarm's clustering capabilities (ie. rebalancing aggregates when nodes are added or removed), process registration moved to the AggregateSupervisor.

Adds a protocol, DistributedAggregate, which defines hooks that an aggregate can implement to handle Swarm events related to cluster toplogy changes and netsplits. The default simply restarts processes on new nodes in the first case and drops duplicate/divergent states in the second. The reasoning is that whatever is loaded from eventstore is most likely the correct version of the aggregate. Added proper registration

slashdotdash · 2017-06-26T15:05:57Z

Thanks for the pull request @uberbrodt, looks like a good start.

I would prefer that support for running on a cluster of nodes was optional, in a similar approach to supporting multiple event stores. So by default Commanded only runs on a single node, but a consumer can install and configure an extension package to support running on a cluster (e.g. commanded_swarm_cluster). This allows alternative distribution methods to be implemented, keeps the core slim, and minimises external dependencies.

It could be implemented as an Elixir behaviour, the API would be:

Locate a process by identifier (e.g. via_tuple for Registry, Swarm.whereis_name/1 for Swarm).
Start process for a module and identifier.
Provide a macro to be used by the Commanded.Aggregates.Aggregate module to hook in the handle_* callbacks needed (none for Registry, :begin_handoff, end_handoff, etc. for Swarm).

We also need to consider how to deal with event handlers running distributed in a cluster. Ideally they would be distributed as per aggregates and reusing the same logic: only one instance running, but shared amongst the available nodes. The PostgreSQL-based event store does not currently support running on a cluster, so that's another blocker!

Finally, the split brain scenario must be addressed. For reference the Akka library provides a number of strategies for dealing with split brain (e.g. static quorum, keep majority, keep oldest, keep referee). This is needed to prevent temporary network partitions from causing write conflicts and running duplicate event handlers.

I'm not exactly sure how best to proceed with these changes. I don't think it would work if an application using Commanded was run on more than one node.

uberbrodt · 2017-06-26T18:40:35Z

Splitting it out seems reasonable. I'll work on that.

We also need to consider how to deal with event handlers running distributed in a cluster. Ideally they would be distributed as per aggregates and reusing the same logic: only one instance running, but shared amongst the available nodes.

Didn't think about Event Handlers; would you also want to do the same with ProcessManagers?

The PostgreSQL-based event store does not currently support running on a cluster, so that's another blocker!

As far as PostgreSQL event stores, I'm not too concerned about that. If it's down, your app is down; for most users this is an acceptable trade of availability for consistency. Unless you're talking about something specific to how the event_store application works, in which case we should dig in further.

Finally, the split brain scenario must be addressed. For reference the Akka library provides a number of strategies for dealing with split brain (e.g. static quorum, keep majority, keep oldest, keep referee). This is needed to prevent temporary network partitions from causing write conflicts and running duplicate event handlers.

TL;DR Swarm has picked a strategy, :resolve_conflict is the hook for users to make decisions about it.

So, Swarm's strategy is to have both sides of a netsplit run and to send :resolve_conflict to the "right" process after the split is resolved. It's essentially skipping any notion of quorum in favor of availability. I implemented DistributedAggregate as a protocol so that implementers can decide what they want to do. They could drop the other state and scrub the event history, restart event handlers, or build a custom merge strategy. It really depends on the aggregate and what sort of application you have. For my purposes (and I would think for many use cases) this is fine, as the business cost of dropping some events or returning stale data is very low. Since we're going to split this out to it's own package, those that need to prioritize availability will have to accept the trade-offs above.

Expanding on the Akka clustering a bit, it should be pointed out that none of the strategies necessarily provide 100% consistency or the requirement for custom code to resolve conflicts. They're merely providing some course-grained control to favor either consistency or availability in certain scenarios (which is super valuable, for sure). I think it would be useful to look at which strategies are a good fit for Commanded and specifically to look at riak_core, since it's going to provide slightly more advanced recovery options through read repair and handoffs. This article goes into that a bit more, but from what I can gather you're not going to have the sort of authoritative leader that you have in Akka (negating many of their strategies) and will actually defer quorum calculations to read time. It's not clear to me yet how that would work with the event_store.

slashdotdash · 2017-06-27T12:07:16Z

Let's look at what we're trying to achieve with adding support for running on a cluster of nodes:

Availability - allow our application to continue running when a node fails.
Scalability - add nodes to distribute load.
Deployment - no downtime deployment via rolling upgrades.

So yes distribution should include aggregate instances, event handlers, and process managers.

I would prefer to design for consistency, over availability. We use Elixir processes to provide guarantees about concurrency when running on a single node. I'd like similar semantics for clusters, if possible, so a consumer of Commanded writes the same code irrespective of the number of deployed nodes (one, or many). Do you think that's a fair and useful goal?

In the case of an aggregate instance, you cannot easily merge together two disparate event streams to resolve a conflict. For event handlers and process managers, you will have problems if two instances run concurrently, on different nodes, handling the same events.

So we prevent duplicate aggregate instances, event handlers, and process managers to run during a net split. This follows the Akka approach to handling a net split to guarantee consistency, at the cost of availability for one half of the cluster.

I think this is the simplest approach to implement to begin with. We could look at adding causal consistency afterwards (e.g. Eventuate and Don't Settle for Eventual Consistency).

Feel free to have a chat with me on the Commanded Gitter chat. I'm open to suggestions on alternative approaches, these are just some of my initial thoughts.

slashdotdash · 2017-10-09T06:42:04Z

Replaced by #80.

slashdotdash mentioned this pull request Jul 16, 2017

Distributed Elixir Compatibility #62

Closed

slashdotdash closed this Oct 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added aggregate distribution via Swarm #60

Added aggregate distribution via Swarm #60

uberbrodt commented Jun 26, 2017

slashdotdash commented Jun 26, 2017

uberbrodt commented Jun 26, 2017

slashdotdash commented Jun 27, 2017 •

edited

slashdotdash commented Oct 9, 2017

Added aggregate distribution via Swarm #60

Added aggregate distribution via Swarm #60

Conversation

uberbrodt commented Jun 26, 2017

slashdotdash commented Jun 26, 2017

uberbrodt commented Jun 26, 2017

slashdotdash commented Jun 27, 2017 • edited

slashdotdash commented Oct 9, 2017

slashdotdash commented Jun 27, 2017 •

edited