New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support running on a cluster of nodes #39
Comments
Any plans on this one? |
@edimoldovan I don't have immediate plans to work on this, but want to provide support for clusters at some point. Feel free to provide any input you have. My initial thoughts were to look at an implementation using @bitwalker's Swarm library. |
You'll have to make decisions about the various CAP tradeoffs, as Swarm makes tradeoffs for the use case it was designed for, and others like Riak Core do the same, so depending on your use case, you'll want to evaluate which of those fits best and then go check out building an implementation. Swarm was designed with being a high performance global process registry in mind - it is eventually consistent, and is willing to duplicate everything in both partitions to stay available during netsplits, and then resolve conflicts when the partition is healed - but it almost certainly means one side's changes are going to need to be dropped, but how that's handled is up to you (but the default behaviour is whichever node "owns" the key is kept and the other data dropped, so that the behavior is deterministic). So if that level of consistency is acceptable, then Swarm might be a good fit, otherwise Riak Core is where I'd look. |
Thanks for the feedback @bitwalker. I was aware of Riak Core but hadn't considered it as an option, so will investigate further. |
Just a thought: Is direct coordination between commanded nodes required to implement this? Hosting a commanded-based application on Heroku would probably not be possible if instances need to talk to each other. |
@aflatter Yes, it would be necessary for the nodes to communicate with each other to support running as a cluster. |
@slashdotdash Can you expand a bit on the features that are part of "running as a cluster"? Would it be enough to add optimistic writes for event streams and a reliable message bus in the event store to allow multiple instances of an application? |
@aflatter Commanded, and the Event Store, make use of OTP processes to guarantee serialized concurrent access to event streams and aggregates. They are designed to run as singleton processes, one process per logical stream/aggregate. For a cluster of nodes there needs to be a distributed process registry to locate the process and send it messages. This would require node to node communication. |
I've submitted a pull request to Swarm to support consistency during a network partition (#38). This ensures only a single instance of a process is running in the cluster. A requirement for distributing aggregate, event handler, and process manager processes. |
@edimoldovan This issue is now under active development. |
Why... is this needed? EventStore itself cannot be clustered (at this time). I'm just wondering if this is adding extra complexity, without actual use cases (sorry to be a downer). |
Ben working on EventStore to be clustered. I'm waiting this to have easier way to manage memory consumption. |
@drozzy I'm currently working on adding support to the Elixir Event Store, using Swarm for process distribution (#53). I plan to apply the same approach to Commanded. Greg's Event Store already supports running on a cluster. There have been a number of requests for this feature. The drivers are reliability (run multiple nodes so that your app continues running when a node crashes) and to support rolling deployments. |
I've been struggling with this for quite a while now. We are experimenting with event sourcing and we wanted to come up with an event store that could be scaled horizontally in the future. We tried to get around of giving a total order to the events, but things get really complex. |
@rosacris You might be interested to look at how Eventuate deals with replication using causal consistency (Vector clocks). An event sourced system only needs to guarantee ordering of events within a single stream (aggregate). So you could use that to shard on, as you mentioned. You could use an aggregate stream (e.g. |
@rosacris We use Cassandra as an eventstore at Lix. Cassandras Lightweight transactions allow us to guarantee serialisability within one aggregate. Cassandra is very much horizontal scaleable if that's your need. We use an in-house event-sourcing framework which we havn't yet had the time to clean up enough to make it ready for open source. |
@slashdotdash thanks for the suggestions, that's what what we ended up doing. The key observation here is that we only need a total order among the events of the same aggregate, the $all stream can be any serialization of the global partial-order. Is it ok to say that if in any case there should be a causal dependency between events of different aggregates, it should be enforced by a process manager? (thinking the process manager as a way to specify a bound to all the valid event interleaving). @ssboisen I was not aware of lightweight transactions! We ditched cassandra for the moment and used MySQL instead (mainly because that's what our legacy system uses). We are also in the middle of cleaning up our ES framework to open source it. |
@ssboisen I was checking out Cassandra Lightweight Transactions, but I am unable to see how do they help in the case of the event store. How do you model the tables? One row per aggregate? |
@rosacris LWT works inside a partition so one partition per aggregate and one row per batch. Something like |
Commanded is currently restricted to run on a single node only.
Using a library such as Swarm would help to run on a cluster of nodes.
Must consider how to deal with split brain scenarios.
The text was updated successfully, but these errors were encountered: