Akka.Cluster development - usable beta #400

Aaronontheweb · 2014-09-17T04:10:47Z

After a couple month's of work, @smalldave and I have an initial beta of Akka.Cluster available to use.

We're in the process of porting the multi-node unit tests and doing bugfixes on the ClusterDaemon, but the core of Akka.Cluster is running smoothly and includes a couple of examples.

I'm going to leave this PR open for a bit as it explains some of the conceptual underpinnings of how clustering works, but also because it includes many new additions. Most of this work has been going into the akka-cluster branch but I think now is the right time to begin making this work available to other developers for review, testing, and having fun.

How Clustering Works

Akka.Cluster is easy to use, but it's important to understand conceptually how it works as this will have an impact on how you design your clusters when using it in production. It's a straight port of Akka's clustering module, which itself borrows concepts from verifiably successful clustering implementations such as Cassandra, Riak, and others.

I'll explain some of the terminology as we go.

First, here's what the initial state of a new Akka cluster looks like:

There are two types of nodes in an Akka.Cluster:

Seed nodes, which have well-known addresses that are described inside your Config
Non-seed nodes, which are dynamically deployed nodes that join the cluster by establishing contact with a seed nodes.

The majority of your nodes will typically be non-seed nodes for production clusters - this is because they're easier to manage from a configuration standpoint and it also requires fewer static IP addresses and deployment overhead.

So here's why seed nodes are important:

Seed nodes are used to start the conversation (known as gossip) between all nodes - every node has to contact them in order to join the cluster. You can have a cluster with a single seed node - if it goes down then you can't deploy any new nodes without changing your configuration, which is why you typically see 2-3 seed nodes in live deployments.

For this example, I gave all of the nodes different deployment configurations just to show how establishing a cluster works - in real-life these nodes would all be deployed with the same configuration usually.

Once nodes establish connectivity to each other, they begin exchanging information using what's known as a Gossip protocol - regularly delivered peer-to-peer messages that contain time-versioned information about the status of all known peers in the network. Each of these gossip messages includes a version number expressed as a Vector clock which gives each node the ability to determine which information arriving from the network is actually more recent than what the node's most recent gossip state already contains.

After a period of initial gossip, the nodes have begun to establish connectivity with each other and know now about at least 1 neighbor each. In addition, a leader node has been elected for every role in the cluster (more on that later.) The leader's job is to dictate which nodes are up and which nodes are down, based on the consensus of gossip information from the other peers.

Once the gossip has had a chance to propagate across the cluster, all nodes are now aware of each other and the ring of nodes are all considered Up - therefore you can start doing things such as cluster-aware routers.

Whoops! Looks like a node just died! At least two of the other nodes in the cluster (usually nodes that are not adjacent to it) noticed either because (1) C started missing cluster heartbeats or (2) the TCP connection to C died. The remaining nodes will gossip this information and mark the node as unreachable for now and will eventually prune it from the set of known nodes if the node doesn't return after a period of time.

If the node that died was the leader, a new leader would be elected.

And thus the cluster's network topology changes. New nodes may join later and other nodes may still leave, but the gossip protocol and interconnectivity between peers is what keeps the wheels turning no matter how things change.

Using Akka.Cluster

I haven't added build support for Akka.Cluster NuGet packages just yet, but for the time being you can manually reference Akka.Cluster and its dependencies (Akka.Remote and Microsoft.Bcl.Immutable.)

To run Akka.Cluster inside your application, use this as a base configuration:

akka {
            actor {
              provider = "Akka.Cluster.ClusterActorRefProvider, Akka.Cluster"
            }

            remote {
              log-remote-lifecycle-events = DEBUG
              helios.tcp {
                hostname = "127.0.0.1"
                port = 0
              }
            }

            cluster {
              seed-nodes = [
                "akka.tcp://ClusterSystem@127.0.0.1:2551",
                "akka.tcp://ClusterSystem@127.0.0.1:2552"]

              auto-down-unreachable-after = 10s
            }
          }

This gives you access to the Cluster actor system extension, specifies a couple of seed nodes, and the ClusterActorRefProvider gives you access to cluster routers, which are the primary way you integrate clusters into your existing applications.

Here's how you can configure a cluster router:

/myAppRouter {
 router = consistent-hashing-pool
      nr-of-instances = 100
      cluster {
        enabled = on
        max-nr-of-instances-per-node = 3
        allow-local-routees = off
      }
}

So this defines a consistent hashing pool router that automatically deploys actors on active nodes participating in the cluster, and it explicitly forbids running any actors locally on the calling machine (allow-local-routees = off).

I'll be including some more examples and better documentation on all of the cluster options available for each type of router.

Roles and Clusters

From one of the new samples added to examples/Cluster in this pull request:
akka.cluster.roles = [backend]

Clustering would not be very useful if you could only cluster identical pieces of software together - so that's why you can define roles for each node in your cluster. That way if you can have multiple different pieces of software built on Akka.NET still form a cluster together, but only assign work to nodes that fulfill specific roles - and this is something you can specify in your routing configuration:

/myAppRouter {
 router = consistent-hashing-pool
      nr-of-instances = 100
      cluster {
        enabled = on
        max-nr-of-instances-per-node = 3
        allow-local-routees = off
        use-role = backend
      }
}

Now this router will only assign work to nodes that have the backend role enabled in their configuration.

Wrap Up

@smalldave and I still have more work to do yet, and we'll be periodically submitting PRs to dev. In the meantime though, we would love it if you could run the samples, document bugs, or help us port the MultiNode testkit (located in Akka.Remote.TestKit.).

If you have any questions about this PR, put them here - if you have any questions about the design or usage please post on the Akka.NET users group or create an issue here.

… of Async<unit>

Akka cluster

Akka.NET v0.6.3 - Release Notes

…pass

- Implemented OnMemberUpListener - Implemented most of Cluster class - Working on implementing ClusterDaemon

…lue equality

rogeralsing · 2014-09-17T07:44:49Z

This is pure awesome sauce!

Aaronontheweb · 2014-09-17T07:49:45Z

but the core of Akka.Cluster is running smoothly

To clarify: there's a lot of easily discoverable bugs (RACE CONDITIONS YEEEEEEEEEEEAH) you'll see right away in the examples, but the system recovers and forms the cluster and does its job :p

Aaronontheweb · 2014-09-18T19:12:46Z

src/core/Akka.Cluster/ClusterRemoteWatcher.cs

+        /// </summary>
+        public void TakeOverResponsibility(Address address)
+        {
+            foreach (var watching in Watching)


cc @smalldave So this is the troublemaker from your comment - got it.

Instead of making the underlying Watching collection immutable, could we iterate over a copy of it here / use a projected collection instead?

maybe do foreach(var watching in Watching.Where(x => x.Key.Path.Address.Equals(address)){ ... } ?

Actually, I guess the projected collection may not work since it might still be yielded from the original collection as it iterates... We'll need to make a copy.

Ah, nevermind - I could just .ToList() the results and get a copy that way.

Yep sounds sensible. The immutable collection is likely to be more efficient if changes to the data are infrequent.
I doubt it really matters here though.

Aaronontheweb · 2014-09-19T09:41:51Z

Have a bit of a backup of Akka.Cluster stuff coming in - any objections if I merge this? We've documented a bunch of issues and we'll be fixing them subsequently.

HCanber · 2014-09-19T11:23:20Z

Are there any tests that fails? Remove those. It's irritating enough with those tests that occasionally fails :)

I'm going to review the testkit changes. I think I saw something strange in there....

HCanber · 2014-09-19T11:30:23Z

src/contrib/testkits/Akka.TestKit.Xunit/XunitAssertions.cs

+        public static void Equivalent<T>(IEnumerable<T> expected, IEnumerable<T> actual)
+        {
+            Assert.True(expected.All(x => actual.Contains(x)) && actual.All(y => expected.Contains(y)));
+        }


This is a helper function used in internal tests and should not belong to the TestKit. Move it to XAssert class in Akka.Tests.Shared.Internals/Helpers/XunitHelpers.cs

Disagree. It's a translation of a public member that developers regularly use in MSTest, which I ported to XUnit in order to support set-equivalence testing for vector clocks and gossip (partially ordered.)

Developers who are working with Akka.Cluster in their own application will benefit from being able to use this for testing for partially ordered network events, particularly in Actors that subscribe directly to Cluster. It has direct benefit for XUnit of the end-users for the TestKit.

I disagree. I don't think it's our place to extend xUnit, MSTest, nUnit, MSpec or any other test framework to make it equivalent to any of the other's. If the user wants to extend xUnit there are a lot of other's who have done that already.

And XUnitAssertions is not intended to be used directly by the user. It's only intended to implement the bare necessities to make TestKit work. I'm going to move this to Akka.Tests.Shared.Internals/Helpers/XunitHelpers.cs.

HCanber · 2014-09-19T12:21:55Z

Ok, I'm done reviewing.

Implement Remote Testkit controller

Aaronontheweb · 2014-09-19T17:19:53Z

@HCanber thanks Hakan, I'll make these changes.

Aaronontheweb · 2014-09-19T18:15:35Z

@HCanber a couple of the MultiJVM verification tests fail due to #399 at the moment

* Reversed some TestKit changes * Moved Akka/Extensions.cs into Akka.Util.Internal (reflected in namespace) * Marked ArrayExtensions.cs as internal and moved into Akka.Util.Internal * Made HCanber's recommended fixes for Skip, Until ArrayExtensions * Fixed but with ReservedActorRef equality operator

Aaronontheweb · 2014-09-19T20:12:52Z

Not surprised there's a merge conflict now :p

I'm going to fix some of the bugs @smalldave and I discussed and then do a local merge with dev.

….Cluster.Tests

Aaronontheweb · 2014-09-19T20:32:33Z

@smalldave made those changes to ClusterRemoteWatcher - the role-based Akka.Cluster example performs much better now. There's still a flurry of EndpointTerminated exceptions in the beginning but I think that's caused by the different actor systems racing to contact the seed nodes before the seed nodes have been able to start remoting themselves (fairly normal behavior given how the cluster's started in these examples.)

Aaronontheweb · 2014-09-19T20:32:51Z

@smalldave and also: the collection modified errors are no longer thrown.

Akka.Cluster development - usable beta

Horusiath and others added 30 commits July 16, 2014 23:28

Akka.FSharp api: fixed ask operator (<?) to return Async<obj> instead…

28132b2

… of Async<unit>

Port Member, VectorClock, Reachability and Gossip

17494bd

Merge pull request akkadotnet#219 from smalldave/akka-cluster

7ef0413

Akka cluster

ClusterDomainEventPublisher and more

94f2dc2

Merge pull request akkadotnet#268 from Aaronontheweb/master

90b7f5a

Akka.NET v0.6.3 - Release Notes

Implement ClusterDaemon (unfinished)

e38edb0

merged with dave's changes

1d203e7

fixed some post-merge compilation issues

aa6bd78

ported all Akka.Cluster tests to XUnit

1105144

Whoops, left out some files

8766884

Merge branch 'dev' into akka-cluster

02cf70f

filled in some missing assertion types to get other Cluster tests to …

bd05797

…pass

implemented Cluster methods

65ba882

- Implemented OnMemberUpListener - Implemented most of Cluster class - Working on implementing ClusterDaemon

implemented ClusterReadView

28d7597

implemented GossipStats and JoinSeedNodeProcess

3112be0

implemented FirstSeedNodeProcess

a14a8d2

implemented Cluster wire formats

9dabca3

adjusted the namespace on ClusterMessages

1ec297e

implemented the first half of the ClusterMessageSerializer

5bfa7be

finished most of the ClusterMessageSerializer for Gossip

ab07d3d

working on adding ClusterMessageSerializerSpec

bb83623

hooking up the reference.conf for cluster

075d8d0

Merge branch 'dev' into akka-cluster

820308b

integrated Cluster.conf into unit tests and Akka.Cluster

a2ff04c

overrode the Equals operator for a number of messages to check for va…

c0c54ea

…lue equality

finished implemented ClusterMessageSerializerSpec

5fe7224

implementing ClusterMetrics

bd4c91b

Merge branch 'dev' into akka-cluster

5a7f4f7

finished first implementation of the ClusterMetricsCollector

c665413

Merge remote-tracking branch 'upstream/dev' into akka-cluster

d52222b

Implement Remote Testkit controller

c225b3b

Aaronontheweb reviewed Sep 18, 2014
View reviewed changes

HCanber reviewed Sep 19, 2014
View reviewed changes

Merge pull request #7 from smalldave/akka-cluster-controller

7aecf3b

Implement Remote Testkit controller

Aaronontheweb added 2 commits September 19, 2014 13:15

resolved mergeconflict with dev (/src/core/Akka/Actor/Futures.cs)

56ff35d

integrated testkit changes (implicit sender akkadotnet#388) into Akka…

cca3063

….Cluster.Tests

resolved Collection modified while enumerating bug

2017979

Aaronontheweb added a commit that referenced this pull request Sep 19, 2014

Merge pull request #400 from Aaronontheweb/akka-cluster

b002fa8

Akka.Cluster development - usable beta

Aaronontheweb merged commit b002fa8 into akkadotnet:dev Sep 19, 2014

This was referenced Sep 20, 2014

TestKit changes #406

Merged

Configurable mailboxes 404 #405

Closed

This was referenced Oct 16, 2014

Akka.NET v0.7 #479

Merged

Akka v0.7 live release #483

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Akka.Cluster development - usable beta #400

Akka.Cluster development - usable beta #400

Aaronontheweb commented Sep 17, 2014

rogeralsing commented Sep 17, 2014

Aaronontheweb commented Sep 17, 2014

Aaronontheweb Sep 18, 2014

Aaronontheweb Sep 18, 2014

Aaronontheweb Sep 18, 2014

smalldave Sep 18, 2014

Aaronontheweb commented Sep 19, 2014

HCanber commented Sep 19, 2014

HCanber Sep 19, 2014

Aaronontheweb Sep 19, 2014

HCanber Sep 20, 2014

HCanber commented Sep 19, 2014

Aaronontheweb commented Sep 19, 2014

Aaronontheweb commented Sep 19, 2014

Aaronontheweb commented Sep 19, 2014

Aaronontheweb commented Sep 19, 2014

Aaronontheweb commented Sep 19, 2014

Akka.Cluster development - usable beta #400

Akka.Cluster development - usable beta #400

Conversation

Aaronontheweb commented Sep 17, 2014

How Clustering Works

Using Akka.Cluster

Roles and Clusters

Wrap Up

rogeralsing commented Sep 17, 2014

Aaronontheweb commented Sep 17, 2014

Aaronontheweb Sep 18, 2014

Choose a reason for hiding this comment

Aaronontheweb Sep 18, 2014

Choose a reason for hiding this comment

Aaronontheweb Sep 18, 2014

Choose a reason for hiding this comment

smalldave Sep 18, 2014

Choose a reason for hiding this comment

Aaronontheweb commented Sep 19, 2014

HCanber commented Sep 19, 2014

HCanber Sep 19, 2014

Choose a reason for hiding this comment

Aaronontheweb Sep 19, 2014

Choose a reason for hiding this comment

HCanber Sep 20, 2014

Choose a reason for hiding this comment

HCanber commented Sep 19, 2014

Aaronontheweb commented Sep 19, 2014

Aaronontheweb commented Sep 19, 2014

Aaronontheweb commented Sep 19, 2014

Aaronontheweb commented Sep 19, 2014

Aaronontheweb commented Sep 19, 2014