New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Static quorum ring distribution strategy #38
Merged
bitwalker
merged 20 commits into
bitwalker:master
from
slashdotdash:feature/split-brain
Aug 23, 2017
Merged
Changes from all commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
e87b06f
Fix ring strategy module alias
slashdotdash 80614e2
Elixir v1.5
slashdotdash 3291a47
Fix broken registry tests
slashdotdash 13e2dbe
Create `StaticQuorumRing` distribution strategy
slashdotdash df72d71
Handle pending track requests
slashdotdash d36fd31
Restart downed process due to unavailable node once quorum reached
slashdotdash 4349d55
Track pending registration when node dies but no node available to re…
slashdotdash a3c63aa
Track pending registrations when node unavailable
slashdotdash 46d596b
Static quorum ring strategy documentation
slashdotdash 2eead4e
Count nodes inside inner ring
slashdotdash 0744cdf
Split brain quorum test
slashdotdash 57559f9
Add an optional `:timeout` to Swarm.register_name function
slashdotdash 32d4410
Use `do_track/2` function instead of `handle_call`
slashdotdash f928e19
Include static quorum strategy in README
slashdotdash 87f3aad
`Swarm.Distribution.Strategy.key_to_node/2` may return `:undefined` node
slashdotdash 694eecc
Call `handle_topology_change/2` on monitor `:noconnection`
slashdotdash 45e710b
Revert to Elixir v1.3
slashdotdash 9b146a6
Add typespec for `Swarm.register_name/4`
slashdotdash 2333731
Include `sync_nodes_*` config in static quorum module docs
slashdotdash 2213793
Pull request feedback
slashdotdash File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
defmodule Swarm.Distribution.StaticQuorumRing do | ||
@moduledoc """ | ||
A quorum is the minimum number of nodes that a distributed cluster has to obtain in order to be | ||
allowed to perform an operation. This can be used to enforce consistent operation in a distributed system. | ||
|
||
## Quorum size | ||
|
||
You must configure the distribution strategy and its quorum size using the `:static_quorum_size` setting: | ||
|
||
config :swarm, | ||
distribution_strategy: Swarm.Distribution.StaticQuorumRing, | ||
static_quorum_size: 5 | ||
|
||
It defines the minimum number of nodes that must be connected in the cluster to allow process | ||
registration and distribution. | ||
|
||
If there are fewer nodes currently available than the quorum size, any calls to | ||
`Swarm.register_name/5` will block until enough nodes have started. | ||
|
||
You can configure the `:kernel` application to wait for cluster formation before starting your | ||
application during node start up. The `sync_nodes_optional` configuration specifies which nodes | ||
to attempt to connect to within the `sync_nodes_timeout` window, defined in milliseconds, before | ||
continuing with startup. There is also a `sync_nodes_mandatory` setting which can be used to | ||
enforce all nodes are connected within the timeout window or else the node terminates. | ||
|
||
config :kernel, | ||
sync_nodes_optional: [:"node1@192.168.1.1", :"node2@192.168.1.2"], | ||
sync_nodes_timeout: 60_000 | ||
|
||
The `sync_nodes_timeout` can be configured as `:infinity` to wait indefinitely for all nodes to | ||
connect. All involved nodes must have the same value for `sync_nodes_timeout`. | ||
|
||
### Example | ||
|
||
In a 9 node cluster you would configure the `:static_quorum_size` as 5. If there is a network split | ||
of 4 and 5 nodes, processes on the side with 5 nodes will continue running but processes on the | ||
other 4 nodes will be stopped. | ||
|
||
Be aware that in the running 5 node cluster, no more failures can be handled because the | ||
remaining cluster size would be less than 5. In the case of another failure in that 5 node | ||
cluster all running processes will be stopped. | ||
""" | ||
|
||
use Swarm.Distribution.Strategy | ||
|
||
alias Swarm.Distribution.StaticQuorumRing | ||
|
||
defstruct [:static_quorum_size, :ring] | ||
|
||
def create do | ||
%StaticQuorumRing{ | ||
static_quorum_size: Application.get_env(:swarm, :static_quorum_size, 2), | ||
ring: HashRing.new(), | ||
} | ||
end | ||
|
||
def add_node(quorum, node) do | ||
%StaticQuorumRing{quorum | | ||
ring: HashRing.add_node(quorum.ring, node), | ||
} | ||
end | ||
|
||
def add_node(quorum, node, weight) do | ||
%StaticQuorumRing{quorum | | ||
ring: HashRing.add_node(quorum.ring, node, weight), | ||
} | ||
end | ||
|
||
def add_nodes(quorum, nodes) do | ||
%StaticQuorumRing{quorum | | ||
ring: HashRing.add_nodes(quorum.ring, nodes), | ||
} | ||
end | ||
|
||
def remove_node(quorum, node) do | ||
%StaticQuorumRing{quorum | | ||
ring: HashRing.remove_node(quorum.ring, node), | ||
} | ||
end | ||
|
||
@doc """ | ||
Maps a key to a specific node via the current distribution strategy. | ||
|
||
If the available nodes in the cluster are fewer than the minimum node count it returns `:undefined`. | ||
""" | ||
def key_to_node(%StaticQuorumRing{static_quorum_size: static_quorum_size, ring: ring}, key) do | ||
case length(ring.nodes) do | ||
node_count when node_count < static_quorum_size -> :undefined | ||
_ -> HashRing.key_to_node(ring, key) | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be worth discussing here the use of the
kernel
config options:sync_nodes_mandatory
,:sync_nodes_optional
, and:sync_nodes_timeout
. These ensure the required and optional members of the cluster are connected when the runtime boots and before any applications start, it's particularly useful for use cases this strategy is designed around (i.e. the cluster members are known in advance). Themandatory
andoptional
settings take a list of nodes, and thetimeout
setting takes an integer or:infinity
. You can configure it like any other app, e.g.:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a useful feature I was unaware of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is for sure :). The only caveat to the above is that the configuration needs to be present when the VM boots, so running under mix, you need to pass
--erl "-config path/to/sys.config"
and convert the configuration I mentioned to Erlang terms, e.g.:Using the Mix config files works for releases though.