Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Akka.Cluster development - usable beta #400

Merged
merged 120 commits into from
Sep 19, 2014

Conversation

Aaronontheweb
Copy link
Member

After a couple month's of work, @smalldave and I have an initial beta of Akka.Cluster available to use.

We're in the process of porting the multi-node unit tests and doing bugfixes on the ClusterDaemon, but the core of Akka.Cluster is running smoothly and includes a couple of examples.

I'm going to leave this PR open for a bit as it explains some of the conceptual underpinnings of how clustering works, but also because it includes many new additions. Most of this work has been going into the akka-cluster branch but I think now is the right time to begin making this work available to other developers for review, testing, and having fun.

How Clustering Works

Akka.Cluster is easy to use, but it's important to understand conceptually how it works as this will have an impact on how you design your clusters when using it in production. It's a straight port of Akka's clustering module, which itself borrows concepts from verifiably successful clustering implementations such as Cassandra, Riak, and others.

I'll explain some of the terminology as we go.

First, here's what the initial state of a new Akka cluster looks like:

image

There are two types of nodes in an Akka.Cluster:

  • Seed nodes, which have well-known addresses that are described inside your Config
  • Non-seed nodes, which are dynamically deployed nodes that join the cluster by establishing contact with a seed nodes.

The majority of your nodes will typically be non-seed nodes for production clusters - this is because they're easier to manage from a configuration standpoint and it also requires fewer static IP addresses and deployment overhead.

So here's why seed nodes are important:

image

Seed nodes are used to start the conversation (known as gossip) between all nodes - every node has to contact them in order to join the cluster. You can have a cluster with a single seed node - if it goes down then you can't deploy any new nodes without changing your configuration, which is why you typically see 2-3 seed nodes in live deployments.

For this example, I gave all of the nodes different deployment configurations just to show how establishing a cluster works - in real-life these nodes would all be deployed with the same configuration usually.

image

Once nodes establish connectivity to each other, they begin exchanging information using what's known as a Gossip protocol - regularly delivered peer-to-peer messages that contain time-versioned information about the status of all known peers in the network. Each of these gossip messages includes a version number expressed as a Vector clock which gives each node the ability to determine which information arriving from the network is actually more recent than what the node's most recent gossip state already contains.

After a period of initial gossip, the nodes have begun to establish connectivity with each other and know now about at least 1 neighbor each. In addition, a leader node has been elected for every role in the cluster (more on that later.) The leader's job is to dictate which nodes are up and which nodes are down, based on the consensus of gossip information from the other peers.

image

Once the gossip has had a chance to propagate across the cluster, all nodes are now aware of each other and the ring of nodes are all considered Up - therefore you can start doing things such as cluster-aware routers.

image

Whoops! Looks like a node just died! At least two of the other nodes in the cluster (usually nodes that are not adjacent to it) noticed either because (1) C started missing cluster heartbeats or (2) the TCP connection to C died. The remaining nodes will gossip this information and mark the node as unreachable for now and will eventually prune it from the set of known nodes if the node doesn't return after a period of time.

If the node that died was the leader, a new leader would be elected.

image

And thus the cluster's network topology changes. New nodes may join later and other nodes may still leave, but the gossip protocol and interconnectivity between peers is what keeps the wheels turning no matter how things change.

Using Akka.Cluster

I haven't added build support for Akka.Cluster NuGet packages just yet, but for the time being you can manually reference Akka.Cluster and its dependencies (Akka.Remote and Microsoft.Bcl.Immutable.)

To run Akka.Cluster inside your application, use this as a base configuration:

akka {
            actor {
              provider = "Akka.Cluster.ClusterActorRefProvider, Akka.Cluster"
            }

            remote {
              log-remote-lifecycle-events = DEBUG
              helios.tcp {
                hostname = "127.0.0.1"
                port = 0
              }
            }

            cluster {
              seed-nodes = [
                "akka.tcp://ClusterSystem@127.0.0.1:2551",
                "akka.tcp://ClusterSystem@127.0.0.1:2552"]

              auto-down-unreachable-after = 10s
            }
          }

This gives you access to the Cluster actor system extension, specifies a couple of seed nodes, and the ClusterActorRefProvider gives you access to cluster routers, which are the primary way you integrate clusters into your existing applications.

Here's how you can configure a cluster router:

/myAppRouter {
 router = consistent-hashing-pool
      nr-of-instances = 100
      cluster {
        enabled = on
        max-nr-of-instances-per-node = 3
        allow-local-routees = off
      }
}

So this defines a consistent hashing pool router that automatically deploys actors on active nodes participating in the cluster, and it explicitly forbids running any actors locally on the calling machine (allow-local-routees = off).

I'll be including some more examples and better documentation on all of the cluster options available for each type of router.

Roles and Clusters

From one of the new samples added to examples/Cluster in this pull request:
akka.cluster.roles = [backend]

Clustering would not be very useful if you could only cluster identical pieces of software together - so that's why you can define roles for each node in your cluster. That way if you can have multiple different pieces of software built on Akka.NET still form a cluster together, but only assign work to nodes that fulfill specific roles - and this is something you can specify in your routing configuration:

/myAppRouter {
 router = consistent-hashing-pool
      nr-of-instances = 100
      cluster {
        enabled = on
        max-nr-of-instances-per-node = 3
        allow-local-routees = off
        use-role = backend
      }
}

Now this router will only assign work to nodes that have the backend role enabled in their configuration.

Wrap Up

@smalldave and I still have more work to do yet, and we'll be periodically submitting PRs to dev. In the meantime though, we would love it if you could run the samples, document bugs, or help us port the MultiNode testkit (located in Akka.Remote.TestKit.).

If you have any questions about this PR, put them here - if you have any questions about the design or usage please post on the Akka.NET users group or create an issue here.

Horusiath and others added 30 commits July 16, 2014 23:28
- Implemented OnMemberUpListener
- Implemented most of Cluster class
- Working on implementing ClusterDaemon
@rogeralsing
Copy link
Contributor

This is pure awesome sauce!

@Aaronontheweb
Copy link
Member Author

but the core of Akka.Cluster is running smoothly

To clarify: there's a lot of easily discoverable bugs (RACE CONDITIONS YEEEEEEEEEEEAH) you'll see right away in the examples, but the system recovers and forms the cluster and does its job :p

/// </summary>
public void TakeOverResponsibility(Address address)
{
foreach (var watching in Watching)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @smalldave So this is the troublemaker from your comment - got it.

Instead of making the underlying Watching collection immutable, could we iterate over a copy of it here / use a projected collection instead?

maybe do foreach(var watching in Watching.Where(x => x.Key.Path.Address.Equals(address)){ ... } ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I guess the projected collection may not work since it might still be yielded from the original collection as it iterates... We'll need to make a copy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nevermind - I could just .ToList() the results and get a copy that way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep sounds sensible. The immutable collection is likely to be more efficient if changes to the data are infrequent.
I doubt it really matters here though.

@Aaronontheweb
Copy link
Member Author

Have a bit of a backup of Akka.Cluster stuff coming in - any objections if I merge this? We've documented a bunch of issues and we'll be fixing them subsequently.

@HCanber
Copy link
Contributor

HCanber commented Sep 19, 2014

Are there any tests that fails? Remove those. It's irritating enough with those tests that occasionally fails :)

I'm going to review the testkit changes. I think I saw something strange in there....

public static void Equivalent<T>(IEnumerable<T> expected, IEnumerable<T> actual)
{
Assert.True(expected.All(x => actual.Contains(x)) && actual.All(y => expected.Contains(y)));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a helper function used in internal tests and should not belong to the TestKit. Move it to XAssert class in Akka.Tests.Shared.Internals/Helpers/XunitHelpers.cs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disagree. It's a translation of a public member that developers regularly use in MSTest, which I ported to XUnit in order to support set-equivalence testing for vector clocks and gossip (partially ordered.)

Developers who are working with Akka.Cluster in their own application will benefit from being able to use this for testing for partially ordered network events, particularly in Actors that subscribe directly to Cluster. It has direct benefit for XUnit of the end-users for the TestKit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. I don't think it's our place to extend xUnit, MSTest, nUnit, MSpec or any other test framework to make it equivalent to any of the other's. If the user wants to extend xUnit there are a lot of other's who have done that already.

And XUnitAssertions is not intended to be used directly by the user. It's only intended to implement the bare necessities to make TestKit work. I'm going to move this to Akka.Tests.Shared.Internals/Helpers/XunitHelpers.cs.

@HCanber
Copy link
Contributor

HCanber commented Sep 19, 2014

Ok, I'm done reviewing.

Implement Remote Testkit controller
@Aaronontheweb
Copy link
Member Author

@HCanber thanks Hakan, I'll make these changes.

@Aaronontheweb
Copy link
Member Author

@HCanber a couple of the MultiJVM verification tests fail due to #399 at the moment

* Reversed some TestKit changes
* Moved Akka/Extensions.cs into Akka.Util.Internal (reflected in
namespace)
* Marked ArrayExtensions.cs as internal and moved into
Akka.Util.Internal
* Made HCanber's recommended fixes for Skip, Until ArrayExtensions
* Fixed but with ReservedActorRef equality operator
@Aaronontheweb
Copy link
Member Author

Not surprised there's a merge conflict now :p

I'm going to fix some of the bugs @smalldave and I discussed and then do a local merge with dev.

@Aaronontheweb
Copy link
Member Author

@smalldave made those changes to ClusterRemoteWatcher - the role-based Akka.Cluster example performs much better now. There's still a flurry of EndpointTerminated exceptions in the beginning but I think that's caused by the different actor systems racing to contact the seed nodes before the seed nodes have been able to start remoting themselves (fairly normal behavior given how the cluster's started in these examples.)

@Aaronontheweb
Copy link
Member Author

@smalldave and also: the collection modified errors are no longer thrown.

Aaronontheweb added a commit that referenced this pull request Sep 19, 2014
Akka.Cluster development - usable beta
@Aaronontheweb Aaronontheweb merged commit b002fa8 into akkadotnet:dev Sep 19, 2014
This was referenced Sep 20, 2014
This was referenced Oct 16, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants