Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow grouping of validators #2601

Open
MarkusTeufelberger opened this issue Jun 25, 2018 · 18 comments
Open

Allow grouping of validators #2601

MarkusTeufelberger opened this issue Jun 25, 2018 · 18 comments
Assignees
Labels
Feature Request Used to indicate requests to add new features

Comments

@MarkusTeufelberger
Copy link
Collaborator

I'd like to propose the following feature:

Instead of weighing each single validator on a UNL by the same weight, I'd like to be able to weight lists of validators by the same weight.

Example:

Alice operates 3 validators (A1, A2 and A3)
Bob runs 2 of them (B1, B2)
Charlie runs only one (C)

Currently I can only add one of the A's one of the B's and the C validator to an UNL, e.g. [A1, B2, C]

I'd like to be able to have an UNL like this: [[A1, A2, A3], [B1, B2], C]

From each sub-list the first validator would be considered (in this example, even if A2 and A3 disagree with A1, as long as a validation from A1 reaches my node, it would count that). A different approach might be to weigh all sub-validators the same but in relation to the global UNL (all the A validators are weighed 1/9, the Bs 1/6 and the C one 1/3). That would be probably closer to what might be expected, but might lead to more churn/work.

Anyways: Grouping validators together for failover or just because some entity might choose to run more than one is a useful feature to have and would be also helpful for decentralization efforts.

@bachase
Copy link
Collaborator

bachase commented Jun 25, 2018

I agree that automating the failover is a very useful feature.

How do you propose distinguishing

  1. A1 crashed and is offline
  2. You just haven't yet received a validation from A1

@MarkusTeufelberger
Copy link
Collaborator Author

Until any validator is actually responding, you can't distinguish 1 and 2 currently too. If A1 hasn't sent anything until a certain point in time, it could simply be ignored for the current ledger or treated as offline.

@bachase
Copy link
Collaborator

bachase commented Jun 25, 2018

So if you have heard from A2, but not A1, after some duration you will ignore A1 and go with A2 validation? I worry how best to chose that duration, given that A1 and A2 might disagree, but you do not know if other validators (which you might trust) have received A1s validation.

@MarkusTeufelberger
Copy link
Collaborator Author

That's a problem in general - if these were independent the same situation could arise.
In the proposed case it would just be harder for an entity to be considered offline if they are operating more than one validator and just one of these goes down.

@bachase
Copy link
Collaborator

bachase commented Jun 26, 2018

I agree that this is a problem in general. However, for a single UNL with A1 and A2 acting this way, you can consider the safety of this scenario relative to your fault tolerance threshold. Your proposal seems like using conditional UNLs, in which you switch to a new UNL based on validator activity. Managing the fork-safety and overlap requirements in that setting seems combinatorially challenging.

Granted, rippled does not currently provide a way for managing UNL overlap complexity anyway, so I am very sympathetic to your argument that this doesn't really introduce a new challenge.

@MarkusTeufelberger
Copy link
Collaborator Author

Consensus is relatively fork-safe anyways, the bigger threat is not making forward progress. Also individual servers don't have any way of even querying the global state necessary to calculate these thresholds (#1751 recently celebrated its 2nd birthday with no reaction whatsoever). Even if there was some information, it would be relatively easy to feed someone false information designed to interrupt them.

Consensus doesn't really take into account the global state, I don't see that this would change it or how this proposal would require stronger guarantees. If anything, it would help with network stability.

@nbougalis nbougalis added the Feature Request Used to indicate requests to add new features label Nov 1, 2018
@nbougalis nbougalis changed the title [feature request] allow grouping of validators Allow grouping of validators Nov 1, 2018
@intelliot
Copy link
Collaborator

@ChronusZ is this worth pursuing?

If not, let's close this issue.

@ChronusZ
Copy link
Contributor

I think Brad is concerned about a situation where A1 submits a validation for some ledger b1 and A2 submits a validation for b2. Now suppose node C1 lists the two A nodes as belonging to the same operator, whereas node C2 lists the two A nodes as separate. Then from the perspective of C1, A2 validates b1, whereas from the perspective of C2, A2 validates b2. Thus A2 exhibits the worst kind of Byzantine behavior even though no one is behaving incorrectly and there's no way to identify that anything went wrong.

@ChronusZ
Copy link
Contributor

Even if all nodes share the same UNL and agree on the grouping of validators in that UNL, there are semi-practical attacks that a Byzantine validator group could execute taking advantage of this mechanic to send contradictory validations without accountability. Now the consensus algorithm remains safe without the assumption of Byzantine accountability as long as the number of Byzantine validators does not go above 20%, but with the current algorithm we have a nice soft safeguard because even with >20% Byzantine validators, forking the ledger requires extremely careful control over the p2p network to avoid being immediately identified as faulty.

@MarkusTeufelberger
Copy link
Collaborator Author

From the perspective of C1, A2's vote would just be ignored since A1 sent a valididation (but if validations were actually logged, C1 would of course log A2 as voting for b2). I don't see how C1 would react any different, the only thing is that currently C1 would have to drop A2 from their UNL completely to get the desired behavior of operator A only having one single vote, not two. With validator grouping A_ validators might have a higher availability from the perspective of C1.

With nUNLs I would actually expect C1 to vote for putting A2 on the poo-poo list in case it goes down, even if A1 is constantly up.

@ChronusZ
Copy link
Contributor

My understanding of your suggestion was that all A validators are treated as having validated whatever ledger was validated by the lowest-index A-validator from whom we received a validation. Is the logic you're actually suggesting as follows?: After the timeout, let b be the ledger validated by the lowest-index A-validator from whom we received a validation. Then for each A-validator from whom we didn't receive a validation, pretend that validator actually validated b; and for each A-validator from whom we received a validation for a different ledger than b, pretend that validator didn't submit a validation.

If so, this scheme still has a similar exploit, although it's slightly harder to enact. A2 just needs to submit their validation to C2 such that it arrives just before the timeout. Then there won't be enough time for C1 to see the validation for b2 so C1 will treat A2 as having validated b1. Again there's no identifiably faulty behavior going on here, so there's essentially no risk in attempting this attack.

@MarkusTeufelberger
Copy link
Collaborator Author

That's just "normal" byzantine behavior though? From the perspective of C1, A2 jsut was offline or very late, from the perspective of C2 it was still in time. I don't see how that would be any better or worse if validators were grouped.

The idea in general is that I would like to move from "An UNL contains a list (actually set...) of validators" to "An UNL contains a list (actually set) of node operating entities with their actual validators as sub-lists/sub-sets" with some easy to understand rules how to resolve eventual conflicts within the nodes of a single operator (e.g. "take the first one in the list", "take the first one that actually arrives at my node", "take the majority within that operator" or even "take a random one").

@ChronusZ
Copy link
Contributor

It's way worse with validator grouping. Without grouping, C2 sees an A2 validation for b2 and C1 sees no validation from A2. This is fine and presents no safety threat, only a potential temporary loss of forward progress. With grouping, C2 still sees a validation for b2 but now C1 sees a validation for b1. This is a safety threat even though A2 is behaving no different from a laggy honest node, which is a serious issue.

@MarkusTeufelberger
Copy link
Collaborator Author

MarkusTeufelberger commented Mar 27, 2023

C2 would also see the validation for b1 from A1 though, so it would consider 2 opposing votes from the same operator, while a grouped UNL would only consider one. C1 would have at least one validator fewer than C2 in its UNL btw., it would not count the A1 vote twice.

I still fail to see how this would present any issue or be a safety threat, maybe the example is too simple or the explanation what happens unclear?

With the validators in the original issue:

Alice operates 3 validators (A1, A2 and A3)
Bob runs 2 of them (B1, B2)
Charlie runs only one (C)

Now 2 validators or nodes D1 and D2 have 2 different UNLs: D1 is grouped (3 operators: [[A1, A2, A3], [B1, B2], C]), D2 is ungrouped(6 validators: [A1, A2, A3, B1, B2, C]). You are concerned that if B2 is rather late from the perspective of D1 and also conflicting with B1, this is somehow worse if D1 does grouping?

@ChronusZ
Copy link
Contributor

Ok, then in that case the UNL overlap between D1 and D2 is effectively just [A1,B1,C] which is only 50% of D2's UNL, insufficient for guaranteeing safety even with 100% honest nodes.

Let's consider an example where there is a single UNL [A1,A2,B1,B2,C1,C2,D,E] and all nodes agree to use the obvious grouping structure. Suppose [A1,B1,C1,D] validate b1 and [A2,B2,C2,E] validate b2. Now if a node receives the validations from [A1,B1,C1,D] (and possibly E) but the validations from [A2,B2,C2] are delayed past the timeout, then it will see 80% support for b1 and fully validate. If instead a node receives the validations from [A2,B2,C2,E] (and possibly D) but the validations from [A1,B1,C1] are delayed past the timeout, then it will fully validate b2.

Thus with your proposal, even if all nodes agree on the UNL and its grouping structure, then the network can fork in the event of (1) an extremely rare accident even with all nodes behaving honestly, (2) an adversary with strong control over the p2p network even with all validators on the UNL behaving honestly, or (3) an adversary controlling 60% of the UNL (namely the A,B,C validators) with no significant control over the p2p network. In (2) and (3) the attack can be executed while maintaining plausible deniability for the attacker. Note that in the current algorithm, even an adversary controlling 100% of the validators cannot fork the network while maintaining plausible deniability.

@MarkusTeufelberger
Copy link
Collaborator Author

Thanks, that example is much clearer to me.

Of course this can be pushed further towards case 1 with various methods, but that just makes it harder or take longer to exploit, not impossible. 🤔

@MarkusTeufelberger
Copy link
Collaborator Author

One option might be to require all/most configured validators to actually cast a vote for something and then just drop some when calculating the outcome. One could also require validators from an operator to have a (simple?/super?) majority among themselves, with the current option of a single validator being the trivial case.

Still sounds a bit too hand-wavy for my liking, but I still think the problem is a relevant one unless there is a global agreement between validator operators to always operate at least a certain number of validators and add the same number per operator to recommended UNLs.

@ChronusZ
Copy link
Contributor

ChronusZ commented Mar 29, 2023

True, I guess if you count a validator group as unresponsive until you receive validations from a proper majority (i.e., strictly greater than 50%), then you at least avoid the issue of deniable Byzantine behavior when everyone agrees on the same UNL with the same grouping structure.

Actually this new functionality can be achieved without making any direct changes to the consensus mechanism. Say there is an entity A that operates the validator group [A1,A2,...,An]. Let t be the smallest integer such that t > n/2, i.e. t = floor(n/2)+1. Now A distributes a (t,n) threshold secret among these nodes. Then instead of adding the public keys of A1,...,An to the UNL, we just add the public key of the threshold secret to the UNL.

I guess we would still need to modify the p2p code to give a way for nodes to combine the threshold signatures to produce a single validation for the group. But not all p2p nodes would need to have that amendment enabled; the validation shares would be passed around like ordinary validations until at least t of them arrive at some node with the amendment enabled, and then that node would produce the threshold validation which would again be passed around like an ordinary validation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Request Used to indicate requests to add new features
Projects
None yet
Development

No branches or pull requests

5 participants