Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static quorum ring distribution strategy #38

Merged
merged 20 commits into from Aug 23, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
e87b06f
Fix ring strategy module alias
slashdotdash Aug 14, 2017
80614e2
Elixir v1.5
slashdotdash Aug 14, 2017
3291a47
Fix broken registry tests
slashdotdash Aug 14, 2017
13e2dbe
Create `StaticQuorumRing` distribution strategy
slashdotdash Aug 17, 2017
df72d71
Handle pending track requests
slashdotdash Aug 17, 2017
d36fd31
Restart downed process due to unavailable node once quorum reached
slashdotdash Aug 17, 2017
4349d55
Track pending registration when node dies but no node available to re…
slashdotdash Aug 17, 2017
a3c63aa
Track pending registrations when node unavailable
slashdotdash Aug 17, 2017
46d596b
Static quorum ring strategy documentation
slashdotdash Aug 17, 2017
2eead4e
Count nodes inside inner ring
slashdotdash Aug 17, 2017
0744cdf
Split brain quorum test
slashdotdash Aug 18, 2017
57559f9
Add an optional `:timeout` to Swarm.register_name function
slashdotdash Aug 21, 2017
32d4410
Use `do_track/2` function instead of `handle_call`
slashdotdash Aug 21, 2017
f928e19
Include static quorum strategy in README
slashdotdash Aug 21, 2017
87f3aad
`Swarm.Distribution.Strategy.key_to_node/2` may return `:undefined` node
slashdotdash Aug 21, 2017
694eecc
Call `handle_topology_change/2` on monitor `:noconnection`
slashdotdash Aug 21, 2017
45e710b
Revert to Elixir v1.3
slashdotdash Aug 23, 2017
9b146a6
Add typespec for `Swarm.register_name/4`
slashdotdash Aug 23, 2017
2333731
Include `sync_nodes_*` config in static quorum module docs
slashdotdash Aug 23, 2017
2213793
Pull request feedback
slashdotdash Aug 23, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
73 changes: 58 additions & 15 deletions README.md
Expand Up @@ -54,27 +54,70 @@ end
## Restrictions

- auto-balancing of processes in the cluster require registrations be done via
`register_name/4`, which takes module/function/args params, and handles starting
`register_name/5`, which takes module/function/args params, and handles starting
the process for you. The MFA must return `{:ok, pid}`.
This is how Swarm handles process handoff between nodes, and automatic restarts when nodedown
events occur and the cluster topology changes.


## Consistency Guarantees

Like any distributed system, a choice must be made in terms of guarantees provided.
Swarm favors availability over consistency, even though it is eventually consistent, as network partitions,
when healed, will be resolved by asking any copies of a given name that live on nodes where they don't
belong to shutdown.
Like any distributed system, a choice must be made in terms of guarantees provided. You can choose between
availability or consistency during a network partition by selecting the appropriate process distribution strategy.

Network partitions result in all partitions running an instance of processes created with Swarm.
Swarm was designed for use in an IoT platform, where process names are generally based on physical device ids,
and as such, the consistency issue is less of a problem. If events get routed to two separate partitions,
it's generally not an issue if those events are for the same device. However this is clearly not ideal
in all situations. Swarm also aims to be fast, so registrations and lookups must be as low latency as possible,
even when the number of processes in the registry grows very large. This is acheived without consensus by using
a consistent hash of the name which deterministically defines which node a process belongs on, and all requests
to start a process on that node will be serialized through that node to prevent conflicts.
Swarm provides two strategies for you to use:

- #### `Swarm.Distribution.Ring`

This strategy favors availability over consistency, even though it is eventually consistent, as
network partitions, when healed, will be resolved by asking any copies of a given name that live on
nodes where they don't belong to shutdown.

Network partitions result in all partitions running an instance of processes created with Swarm.
Swarm was designed for use in an IoT platform, where process names are generally based on physical
device ids, and as such, the consistency issue is less of a problem. If events get routed to two
separate partitions, it's generally not an issue if those events are for the same device. However
this is clearly not ideal in all situations. Swarm also aims to be fast, so registrations and
lookups must be as low latency as possible, even when the number of processes in the registry grows
very large. This is acheived without consensus by using a consistent hash of the name which
deterministically defines which node a process belongs on, and all requests to start a process on
that node will be serialized through that node to prevent conflicts.

This is the default strategy and requires no configuration.

- #### `Swarm.Distribution.StaticQuorumRing`

A quorum is the minimum number of nodes that a distributed cluster has to obtain in order to be
allowed to perform an operation. This can be used to enforce consistent operation in a distributed
system.

You configure the quorum size by defining the minimum number of nodes that must be connected in the
cluster to allow process registration and distribution. If there are fewer nodes currently
available than the quorum size, any calls to `Swarm.register_name/5` will block until enough nodes
have started.

In a network partition, the partition containing at least the quorum size number of clusters will
continue operation. Processes running on the other side of the split will be stopped, and restarted
on the active side. This ensures that only one instance of a registered process will be running in
the cluster.

You must configure this strategy and its minimum quorum size using the `:static_quorum_size` setting:

```elixir
config :swarm,
distribution_strategy: Swarm.Distribution.StaticQuorumRing,
static_quorum_size: 5
```

The quorum size should be set to half the cluster size, plus one node. So a three node cluster
would be two, a five node cluster is three, and a nine node cluster is five. You *must* not add more
than 2 x quorum size - 1 nodes to the cluster as this would cause a network split to result in
both partitions continuing operation.

Processes are distributed amongst the cluster using the same consistent hash of their name as in
the ring strategy above.

This strategy is a good choice when you have a fixed number of nodes in the cluster.

## Clustering

Expand Down Expand Up @@ -107,13 +150,13 @@ not include those nodes in it's distribution algorithm, or communicate with thos
## Registration/Process Grouping

Swarm is intended to be used by registering processes *before* they are created, and letting Swarm start
them for you on the proper node in the cluster. This is done via `register_name/4`. You may also register
them for you on the proper node in the cluster. This is done via `Swarm.register_name/5`. You may also register
processes the normal way, i.e. `GenServer.start_link({:via, :swarm, name}, ...)`. Swarm will manage these
registrations, and replicate them across the cluster, however these processes will not be moved in response
to cluster topology changes.

Swarm also offers process grouping, similar to the way `gproc` does properties. You "join" a process to a group
after it's started, (beware of doing so in `init/1` outside of a Task, or it will deadlock), with `Swarm.join/2`.
after it's started, (beware of doing so in `init/1` outside of a Task, or it will deadlock), with `Swarm.join/2`.
You can then publish messages (i.e. `cast`) with
`Swarm.publish/2`, and/or call all processes in a group and collect results (i.e. `call`) with `Swarm.multi_call/2` or
`Swarm.multi_call/3`. Leaving a group can be done with `Swarm.leave/2`, but will automatically be done when a process
Expand Down
7 changes: 6 additions & 1 deletion lib/swarm.ex
Expand Up @@ -36,9 +36,14 @@ defmodule Swarm do
This version also returns an ok tuple with the pid if it registers successfully,
or an error tuple if registration fails. You cannot use this with processes which
are already started, it must be started by `:swarm`.

You can optionally provide a `:timeout` value to limit the duration of blocking calls.
The default value is `:infinity` to block indefinitely.
"""
@spec register_name(term, atom(), atom(), [term]) :: {:ok, pid} | {:error, term}
defdelegate register_name(name, m, f, a), to: Swarm.Registry, as: :register
@spec register_name(term, atom(), atom(), [term], non_neg_integer() | :infinity) :: {:ok, pid} | {:error, term}
def register_name(name, m, f, a, timeout \\ :infinity)
def register_name(name, m, f, a, timeout), do: Swarm.Registry.register(name, m, f, a, timeout)

@doc """
Unregisters the given name from the registry.
Expand Down
92 changes: 92 additions & 0 deletions lib/swarm/distribution/static_quorum_ring.ex
@@ -0,0 +1,92 @@
defmodule Swarm.Distribution.StaticQuorumRing do
@moduledoc """
A quorum is the minimum number of nodes that a distributed cluster has to obtain in order to be
allowed to perform an operation. This can be used to enforce consistent operation in a distributed system.

## Quorum size

You must configure the distribution strategy and its quorum size using the `:static_quorum_size` setting:

config :swarm,
distribution_strategy: Swarm.Distribution.StaticQuorumRing,
static_quorum_size: 5

It defines the minimum number of nodes that must be connected in the cluster to allow process
registration and distribution.

If there are fewer nodes currently available than the quorum size, any calls to
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth discussing here the use of the kernel config options :sync_nodes_mandatory, :sync_nodes_optional, and :sync_nodes_timeout. These ensure the required and optional members of the cluster are connected when the runtime boots and before any applications start, it's particularly useful for use cases this strategy is designed around (i.e. the cluster members are known in advance). The mandatory and optional settings take a list of nodes, and the timeout setting takes an integer or :infinity. You can configure it like any other app, e.g.:

config :kernel,
  sync_nodes_mandatory: [:"node1@192.168.1.1", :"node2@192.168.1.2"],
  sync_nodes_timeout: 60_000

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a useful feature I was unaware of.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is for sure :). The only caveat to the above is that the configuration needs to be present when the VM boots, so running under mix, you need to pass --erl "-config path/to/sys.config" and convert the configuration I mentioned to Erlang terms, e.g.:

[{kernel, [{sync_nodes_mandatory, ['node1@192.168.1.1', ...]},
                 {sync_nodes_timeout, 60000}]}].

Using the Mix config files works for releases though.

`Swarm.register_name/5` will block until enough nodes have started.

You can configure the `:kernel` application to wait for cluster formation before starting your
application during node start up. The `sync_nodes_optional` configuration specifies which nodes
to attempt to connect to within the `sync_nodes_timeout` window, defined in milliseconds, before
continuing with startup. There is also a `sync_nodes_mandatory` setting which can be used to
enforce all nodes are connected within the timeout window or else the node terminates.

config :kernel,
sync_nodes_optional: [:"node1@192.168.1.1", :"node2@192.168.1.2"],
sync_nodes_timeout: 60_000

The `sync_nodes_timeout` can be configured as `:infinity` to wait indefinitely for all nodes to
connect. All involved nodes must have the same value for `sync_nodes_timeout`.

### Example

In a 9 node cluster you would configure the `:static_quorum_size` as 5. If there is a network split
of 4 and 5 nodes, processes on the side with 5 nodes will continue running but processes on the
other 4 nodes will be stopped.

Be aware that in the running 5 node cluster, no more failures can be handled because the
remaining cluster size would be less than 5. In the case of another failure in that 5 node
cluster all running processes will be stopped.
"""

use Swarm.Distribution.Strategy

alias Swarm.Distribution.StaticQuorumRing

defstruct [:static_quorum_size, :ring]

def create do
%StaticQuorumRing{
static_quorum_size: Application.get_env(:swarm, :static_quorum_size, 2),
ring: HashRing.new(),
}
end

def add_node(quorum, node) do
%StaticQuorumRing{quorum |
ring: HashRing.add_node(quorum.ring, node),
}
end

def add_node(quorum, node, weight) do
%StaticQuorumRing{quorum |
ring: HashRing.add_node(quorum.ring, node, weight),
}
end

def add_nodes(quorum, nodes) do
%StaticQuorumRing{quorum |
ring: HashRing.add_nodes(quorum.ring, nodes),
}
end

def remove_node(quorum, node) do
%StaticQuorumRing{quorum |
ring: HashRing.remove_node(quorum.ring, node),
}
end

@doc """
Maps a key to a specific node via the current distribution strategy.

If the available nodes in the cluster are fewer than the minimum node count it returns `:undefined`.
"""
def key_to_node(%StaticQuorumRing{static_quorum_size: static_quorum_size, ring: ring}, key) do
case length(ring.nodes) do
node_count when node_count < static_quorum_size -> :undefined
_ -> HashRing.key_to_node(ring, key)
end
end
end
4 changes: 2 additions & 2 deletions lib/swarm/distribution/strategy.ex
Expand Up @@ -13,7 +13,7 @@ defmodule Swarm.Distribution.Strategy do
process for storing the distribution state, because it has the potential to become a bottleneck otherwise,
however this is really up to the needs of your situation, just know that you can go either way.
"""
alias Swarm.Distribution.Strategy.Ring, as: RingStrategy
alias Swarm.Distribution.Ring, as: RingStrategy

defmacro __using__(_) do
quote do
Expand All @@ -34,7 +34,7 @@ defmodule Swarm.Distribution.Strategy do
@callback add_node(strategy, node, weight) :: strategy | {:error, reason}
@callback add_nodes(strategy, nodelist) :: strategy | {:error, reason}
@callback remove_node(strategy, node) :: strategy | {:error, reason}
@callback key_to_node(strategy, key) :: node()
@callback key_to_node(strategy, key) :: node() | :undefined

def create(), do: strategy_module().create()
def create(node), do: strategy_module().add_node(create(), node)
Expand Down
2 changes: 1 addition & 1 deletion lib/swarm/registry.ex
Expand Up @@ -9,7 +9,7 @@ defmodule Swarm.Registry do
## Public API

defdelegate register(name, pid), to: Tracker, as: :track
defdelegate register(name, module, fun, args), to: Tracker, as: :track
defdelegate register(name, module, fun, args, timeout), to: Tracker, as: :track

@spec unregister(term) :: :ok
def unregister(name) do
Expand Down