Restarting a node with different number of peers prevents it to connect to the cluster #1174

ch1bo · 2023-11-22T09:57:34Z

Context & versions

All recent versions, seen on 9abb099

Analyses

Misbehaviour

This bug was observed through the following sequence of operations:

A group of 5 people share keys and network addresses to form a Head among themselves
One of the parties (alice) inadvertently misconfigure their node by "forgetting" another party's (bob) configuration
The nodes connect to each other and start sending Ping notifications
The alice's node is not seen by bob's
alice stops her node, reconfigure it, then restarts
Now alice is not seen by any other party's node

Troubleshooting

Looking at their logs, alice's peers notice they are seeing the following message repeatedly:

{"timestamp":"2023-11-22T09:34:57.526354646Z","threadId":3675,"namespace":"HydraNode-\" bob\"","message":{"reliability":{"fromParty":{"vkey":"0f23a3124d401d89e8f6cfb724eb00ac073c74edc93a4db9b0889c999f2415fe"},"numberOfParties":5,"partyAcks":[0,0,0,0],"tag":"ReceivedMalformedAcks"},"tag":"Reliability"}}

This means that alice is still sending Reliability layer messages as if she only had 3 peers. Investigating further the issue we realised this was caused by the acknowledgments persistence mechanism we have put in place as part of #1101

In its first run alice's node save the acknowledged messages' vector for 4 parties
In the second run, alice's node load the saved vector which is still 4, but number of parties should now be 5.

Expected behaviour

The user should be warned that there's an inconsistency between the currently configured network peers and the saved state. It should probably not be possible to start a node in such a situation unknowingly as this situation could come not from a misconfiguration but from an unsuspecting party forming a new head with a different configuration, and reusing inadvertently persisted state from a previous run.

In general, this issue highlights the need for a better strategy on how the hydra-node persists its state and what the user can do about it.

The text was updated successfully, but these errors were encountered:

ffakenz · 2023-11-23T10:18:50Z

This problem was reproduced by having a node missing --hydra-verification-key and --cardano-verification-key for a peer.

In the code we check that the number of hydra-verification-keys and cardano-verification-keys matches, but we do not check they also match the list of --peer configured.

So here we have 2 different things going on:

we should verify the number of peers, hydra-verification-keys and cardano-verification-keys matches.
when we restart the network, we need to check that its configuration is consistent with what is persisted on acks.

ch1bo added the bug 🐛 Something isn't working label Nov 22, 2023

ffakenz self-assigned this Nov 22, 2023

ffakenz added this to the 0.14.0 milestone Nov 22, 2023

ghost changed the title ~~Network produces MalformedAcks even after fixing --peer list~~ Restarting a node with different number of peers prevents it to connect to the cluster Nov 24, 2023

ghost mentioned this issue Nov 24, 2023

Prevents node from starting given persisted network state is inconsistent with configuration #1179

Merged

4 tasks

ghost closed this as completed in #1179 Nov 28, 2023

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restarting a node with different number of peers prevents it to connect to the cluster #1174

Restarting a node with different number of peers prevents it to connect to the cluster #1174

ch1bo commented Nov 22, 2023 •

edited by ghost

Loading

ffakenz commented Nov 23, 2023 •

edited

Loading

Restarting a node with different number of peers prevents it to connect to the cluster #1174

Restarting a node with different number of peers prevents it to connect to the cluster #1174

Comments

ch1bo commented Nov 22, 2023 • edited by ghost Loading

Context & versions

Analyses

Misbehaviour

Troubleshooting

Expected behaviour

ffakenz commented Nov 23, 2023 • edited Loading

ch1bo commented Nov 22, 2023 •

edited by ghost

Loading

ffakenz commented Nov 23, 2023 •

edited

Loading