New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nodes Auto-downing themselves #2903
Comments
Do you have |
TL;DR;, nodes can't |
right yes, I have cluster.auto-down-unreachable-after. So if I get rid of this, the error wont happen? |
Yes. Your node will not be Downed. However they will still |
Ok Thank you. |
@Danthar When they "Disassociate", will this mean they can not talk to each other, thus they are "logically" unreachable. Any way to associate them again, or this happens automatically within Akka ? |
As you can see in the logs that where posted. It starts with the a node marking another node as unreachable. This means that the first node has determined it has lost its connection to the other node. If you dont down the node manually, or automatically. It will remain in unreachable state, untill it becomes reachable again. So when can this happen. Generally network issues are the cause. But those are usually intermittent. |
I'm seeing lots of messages like this:
Is this normal ? or there is a problem in my network or lighthouse ? At lighthouse, event viewer shows this errors occurring ...
|
The problems are in your network. It seems like your connection is flaky, causing these logs. |
@ingted what does your cluster config look like? |
Because a single big message can't cause that all on its own |
Hi @Aaronontheweb , My config is this one:
|
so max payload size is set to 1Gb? Ditto with the send and receive buffer size? edit: dot-netty : {
tcp : {
public-hostname : 10.38.112.143
hostname : 10.38.112.143
public-port : 53316
port : 53316
message-frame-size : 100000000b
send-buffer-size : 100000000b
receive-buffer-size : 100000000b
maximum-frame-size : 100000000b
}
} |
It seems like
could resolve the issue but it is impossible to add customized FsPickler extended pickler for every type without [Serializable] tag... |
Is the issue in this case just a dropped message over the wire? Or a full blown disassociation? What's the error message exactly? |
There is no any error message in publisher side nor in subscriber side. Just a full blown disassociation. The seed node would quarantine the subscriber node... |
I am testing to replace
with
to see if it helps or not... |
Both the ["Akka.Cluster.Tools.PublishSubscribe.IDistributedPubSubMessage, Akka.Cluster.Tools" : FSharpExpr] and [System.Object:hyperion] would lead to
|
The resolution is to specifiy serializer(s) for each type of messages which would be passed by pubsub in the serialization-bindings section. |
BTW, " nodes can't Down themselves without being explicitly given a Down command. Akka.NET never does this by default. Can only happen if you explicitly turn this setting on." (I have down logic in my code and caused unexpected BATCHLY nodeS downed... After read this, I confirmed that the logic caused my all nodes down during peak CPU time) (this time not the sin of FSharp object serialization...) Oops!!! |
I should note that as of Akka.NET v1.5, the Split Brain Resolver (SBR) will now actually down nodes automatically and is enabled by default - but it's highly configurable and can be turned off. |
I am experiencing a self shutdown issue when I leave my ActorSystem running for a while (12 hours).
I can't reproduce self shutdown, I've gone through the Akka .NET code and found the code path it takes to produce some of the errors, I can only reproduce this after leaving it running over night.
We have:
-- These nodes have lighthouse roles
-- 2 Nodes have role1 role and 1 node has both role1 and role2 roles
role1 - using persistence and sharding
role2 - is a singleton that inserts data in MSSQL
Error Log of a node:
The text was updated successfully, but these errors were encountered: