Cluster Deployment fails completely after only one disconnected Node: AssociationError #3887

Kenji-Tanaka · 2019-08-21T14:37:40Z

Greetings.
I implemented a simple Cluster Application in Akka with Scala, which I also implemented in Akka.NET with C# for testing purposes. As soon as one disconnect a non-seed Cluster Node in Akka.NET, the cluster fails to deploy to all Nodes that join at a later time. Strangely enough, it does not work in Akka.NET, although it corresponds exactly to the Akka version which works fine.

I am using DotNet Core 2.2.401 under Windows 10 and the following Packages:

Akka: 1.3.14
Akka.Cluster: 1.3.14

I created a Repository that contains the Code to reproduce this Behaviour:
https://github.com/Kenji-Tanaka/AkkaNetClusterSample.git

Steps to reproduce:

git clone https://github.com/Kenji-Tanaka/AkkaNetClusterSample.git
run one seed: dotnet run seed
when the seed is UP, run a node: dotnet run node
after a few Seconds, both print the Time it was started every Second
press any key to shutdown the node
start a new node with dotnet run node

Expected Behaviour:

The node joins, an Actor is deployed to it and both print the Time it was started every Second again.

Actual Behaviour:

The Cluster logs Warnings (AssociationError) and no Actor is deployed.
The Log for the seed Node is attached: log.txt

I guess I might do something wrong here. However, I don't see what it could be. I accept every hint with gratitude. Akka.NET is great, keep up the good work!

Thank You!

The text was updated successfully, but these errors were encountered:

Aaronontheweb · 2019-09-20T17:39:52Z

Thanks for submitting this - we should take a look at the sample and see if it's a configuration issue of some kind. That's usually in the culprit.

Kenji-Tanaka · 2019-11-12T07:34:40Z

The problem described above is still present in Akka.NET 1.3.15 using DotNet Core 3.0.
I wonder if it could really be because of the configuration, because I use the same configuration in the original Actor System implemented in Scala, where it works correctly.

valdisz · 2019-11-23T16:26:41Z

I'll look into this problem

valdisz · 2019-11-23T17:06:38Z

After a quick investigation here are some details:

In sample seed and worker nodes are started on random ports. When worker node exits, seed marks it as unreachable, but when worker wants to join back, it uses a new port and then the problem occurs.

If worker uses static port all works as expected. Looks like the problem is in cluster joining code.

Aaronontheweb · 2019-11-25T18:56:42Z

yes, this is just a configuration issue - on the JVM they might have it so cluster.allow-weakly-up-member is set to on by default, which isn't the case currently in Akka.NET. As @valdisz points out, having the child node restart on a different port makes the cluster think that it's a new node joining, which it won't allow since there's still an unreachable node missing (and thus, it can't vote on any changes in membership.)

I explain this in a lot of depth here and offer a few different solutions for tackling these types of problems: https://petabridge.com/blog/proper-care-of-akkadotnet-clusters/

Kenji-Tanaka · 2019-12-09T11:24:09Z

Excellent, when adding allow-weakly-up-members = on to the configuration, this example works like the one in Akka (Scala). Thanks also for the link to the article!

Aaronontheweb · 2019-12-10T07:45:25Z

@Kenji-Tanaka we've now updated Akka.NET to automatically have this setting on by default via #4087

Aaronontheweb closed this as completed Nov 25, 2019

This was referenced Dec 10, 2019

Enable akka.cluster.allow-weakly-up-members by default #4087

Closed

enable akka.cluster.allow-weakly-up-members by default #4088

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster Deployment fails completely after only one disconnected Node: AssociationError #3887

Cluster Deployment fails completely after only one disconnected Node: AssociationError #3887

Kenji-Tanaka commented Aug 21, 2019

Aaronontheweb commented Sep 20, 2019

Kenji-Tanaka commented Nov 12, 2019

valdisz commented Nov 23, 2019

valdisz commented Nov 23, 2019

Aaronontheweb commented Nov 25, 2019

Kenji-Tanaka commented Dec 9, 2019

Aaronontheweb commented Dec 10, 2019

Cluster Deployment fails completely after only one disconnected Node: AssociationError #3887

Cluster Deployment fails completely after only one disconnected Node: AssociationError #3887

Comments

Kenji-Tanaka commented Aug 21, 2019

Aaronontheweb commented Sep 20, 2019

Kenji-Tanaka commented Nov 12, 2019

valdisz commented Nov 23, 2019

valdisz commented Nov 23, 2019

Aaronontheweb commented Nov 25, 2019

Kenji-Tanaka commented Dec 9, 2019

Aaronontheweb commented Dec 10, 2019