-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] Implement a quorum balancing algorithm #684
Conversation
d67d233
to
3b132de
Compare
Codecov Report
@@ Coverage Diff @@
## v0.x.x #684 +/- ##
==========================================
- Coverage 91.10% 90.34% -0.77%
==========================================
Files 65 66 +1
Lines 5185 5425 +240
==========================================
+ Hits 4724 4901 +177
- Misses 461 524 +63
Continue to review full report at Codecov.
|
I started writing better tests and realized the node would not automatically add itself in the list of quorum nodes, which was a mistake. I will update the PR with the tests & fixes in the coming days.. |
4a327df
to
8c7dd1f
Compare
source/agora/consensus/Quorum.d
Outdated
} | ||
|
||
/// very simplistic way of reducing a 64-byte blob to an 8-byte seed | ||
private ulong hashToSeed (Hash hash) |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This needs to be reworked a little bit. There is an edge-case. What if one node has 66% of the staked amount of the entire network? In the current algorithm most nodes will end up with a single node in its quorum. We should add some minimum amount of nodes that every node should include. We could make it something simple like "must include at least 5 nodes, and if those nodes do not reach a majority of stakes then add additional nodes". |
source/agora/consensus/Quorum.d
Outdated
|
||
We have two sets of tests, based on what's possible: | ||
|
||
- countQuorumMatches(): used when the number of nodes is low. |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
source/agora/consensus/Quorum.d
Outdated
Note: If you use AscendingQuadratic for too many nodes, it will | ||
assert as you will quickly run out of BOA. | ||
|
||
Note: Intersection checks take roughly ~2 minutes for a single configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder about why I hit this performance issue. The Stellar network has hundreds (thousands?) of nodes, it sounds like this algorithm would never finish if it attempted to do quorum intersection checking for the live system. Examine what's causing the performance issue. Maybe the bindings are wrong, the GC is kicking in, or something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that stellar has implemented an interrupt method, it might just be that the algorithm is inherently NP-hard.
The comment mentions 2 minutes and "16 node quorums", does that mean there are 16 nodes in the network overall, or each quorums have 16 nodes ?
} | ||
|
||
/// limits the number of nodes in a quorum set | ||
private enum MAX_NODES_IN_QUORUM = 7; |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
source/agora/consensus/Quorum.d
Outdated
|
||
public QuorumConfig buildQuorumConfig ( PublicKey node_key, | ||
Enrollment[] enrolls, UTXOFinder finder, Hash rand_seed, | ||
size_t line = __LINE__ ) |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
source/agora/consensus/Quorum.d
Outdated
NodeStake[] stakes_by_price = orderStakesDescending(all_stakes); | ||
|
||
const Amount min_quorum_amount = Amount( | ||
cast(ulong)(10_000_000 * (0.67 * // todo: add multiply() support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turn this todo
into an issue and work on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to have to remove this piece of logic. There's two reasons:
- Floating point math could lead to inconsistent results across platforms. We need consistency here. Perhaps we could use fixed-point math, but it would require looking for a good library implementation.
- It doesn't entirely make sense that we're adding nodes into a quorum to make that quorum's monetary value 67% of the network, while at the same time having a 67% threshold value for those included nodes. That would actually turn into 44% with the threshold taken into account. Also, within that quorum set SCP treats every node as equal. This means nodes with 1/10th of the quorum stake and those with 9/10th of the value are treated the same. So monetary value != political (voting) power, as it should be. Finally, the node selection logic already makes sure that nodes with a higher stake are more frequently included in a quorum, so the
min_quorum_amount
is unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a standard for double-precision that is IEEE 754. Modern 32-bit computers implement double-precision with two 32-bit memory words and 64-bit computers can do it with its 64-bit memory word. So We don't need to worry about the output of the floating-point operation.
auto stakes = stake_map | ||
.byKeyValue | ||
.map!(pair => NodeStake(pair.key, pair.value)) | ||
.array; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allocates. Replace with a non-allocating version.
|
||
*******************************************************************************/ | ||
|
||
private string toToml (QuorumConfig[PublicKey] quorums) |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
source/agora/consensus/Quorum.d
Outdated
return toUserReadable(input[].hashFull()); | ||
} | ||
|
||
// wrapper around Amount with some unsafe primitives |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
} | ||
|
||
/// for debugging | ||
private string pretty (int[PublicKey] counts) |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
// using pregenerated seeds to test the determenistic behavior of the algorithm | ||
// note: sorted for easier comparison with PublicKey[] | ||
version (unittest) | ||
private immutable string[64] pregen_seeds = |
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been wanting to add a module with "test data" for a while, I think it could be useful to have somewhere to get "valid, initialized data".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. It would be nice to have consistent keys in our test-suite, it helps with debugging.
8c7dd1f
to
4981382
Compare
source/agora/consensus/Quorum.d
Outdated
|
||
assert(isQuorumSetSane(scp_quorum, true, &reason), | ||
format("Key %s with %s fails sanity check before normalization: %s", | ||
key.nice, quorum.toToml, reason.to!string)); |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
5cd0bff
to
4459ce0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay did a first pass. There's obviously A LOT to review and discuss here, will continue soon.
source/agora/consensus/Quorum.d
Outdated
Note: If you use AscendingQuadratic for too many nodes, it will | ||
assert as you will quickly run out of BOA. | ||
|
||
Note: Intersection checks take roughly ~2 minutes for a single configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that stellar has implemented an interrupt method, it might just be that the algorithm is inherently NP-hard.
The comment mentions 2 minutes and "16 node quorums", does that mean there are 16 nodes in the network overall, or each quorums have 16 nodes ?
source/agora/consensus/Quorum.d
Outdated
|
||
Contains the quorum generator algorithm. | ||
|
||
Note: If you use AscendingQuadratic for too many nodes, it will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use Note:
, it generates bad doc IIRC.
Also this could use more actual documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this could use more actual documentation.
Well it's a draft. If half the code gets thrown away I'd hate to throw away the documentation too. I'll add it as soon as it's finalized.
source/agora/consensus/Quorum.d
Outdated
we do not want to centralize the quorum configuration to only include | ||
this one node. This would lead to too much political (voting) power. | ||
|
||
The voting power of a Boa holder may be increased by spawning multiple nodes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The voting power of a Boa holder may be increased by spawning multiple nodes, | |
The voting power of a BOA holder may be increased by spawning multiple nodes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if they're holding snakes huh??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We send them a job offer
|
||
However, this will lead to the Boa holder to receive less rewards than if | ||
they spawned a single node with 10 units. It's their choice on how to | ||
distribute their voting power vs rewards ratio. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice explanation. We should add the general consideration in the module doc. But that can be left for later.
// using pregenerated seeds to test the determenistic behavior of the algorithm | ||
// note: sorted for easier comparison with PublicKey[] | ||
version (unittest) | ||
private immutable string[64] pregen_seeds = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been wanting to add a module with "test data" for a while, I think it could be useful to have somewhere to get "valid, initialized data".
} | ||
|
||
/// ditto | ||
private string pretty (PublicKey input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noice. Probably something we want to generalize in the long run. I mean Ocean has it, and I think stellar does too.
Something to consider: For easier memorization, you can use a noun and a verb, like what docker does. E.g. "patriot pangolin" and "mesmerizing door" might stand out more than "Andrew" and "Nathan".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes, the gifycat approach.
Or I could use https://github.com/bpfkorea/ocean/blob/dev-2020-04-02/src/ocean/io/digest/FirstName.d :D
source/agora/consensus/Quorum.d
Outdated
/// convenience because Amount's API sucks | ||
private Amount added (in Amount lhs, in Amount rhs) | ||
{ | ||
Amount amount = lhs; | ||
assert(amount.add(rhs)); | ||
return amount; | ||
} | ||
|
||
/// convenience | ||
private Amount subbed (in Amount lhs, in Amount rhs) | ||
{ | ||
Amount amount = lhs; | ||
assert(amount.sub(rhs)); | ||
return amount; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those modify the object. So I can't use them with const values.
E.g. I can't do Amount.MinFreezeAmount.mustAdd(Amount.MinFreezeAmount)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amount(Amount.MinFreezeAmount).mustAdd(Amount.MinFreezeAmount)
?
source/agora/consensus/Quorum.d
Outdated
static foreach (node_idx; 0 .. pregen_seeds.length) | ||
mixin("PublicKey n%s;".format(node_idx)); | ||
|
||
#line __LINE__ |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
16 nodes, where each node has some number of nodes in its quorum set. And yeah, it's unfortunate that the algorithm is inherently slow.. hence the interrupts. |
37f93c7
to
ad86ac4
Compare
Windows linker error:
internal screaming |
For those that are curious what we talked about during our whiteboarding session: I needed to define some constraints or properties of the algorithm. For example, knowing what the optimal quorum layout should look like. We want several properties of the algorithm:
This last part is my main focus for this sprint. |
ad86ac4
to
0c8ecb4
Compare
The generator uses a provided source of randomness to shuffle the nodes into different quorums. The source of randomness should be derived from the revealed preimages of the validators. Nodes which have a higher staken amount have a higher chance of being included in each node's quorum set. Part of: bosagora#240
0c8ecb4
to
9fadde7
Compare
This should be reworked based on what was introduced in #845. We could add the very first enhancements to the quorum generator. I'll keep this in mind as a background task. |
This will be replaced with a series of simpler PRs. |
I've had several different approaches. Initially I tried to split up the stakes into three categories, but this made the algorithm really hard to reason about.
Then I've realized that I can just use the stakes as the chances of each node being included in a quorum set. It simplifies the algorithm quite a bit.
What has been done:
What's still on the table:
MAX_NODES_IN_QUORUM
is reached, or 67% of the economic value of the entire network is reached.Requires:
Add rudimentary bindings to std::unordered_map #662Add the quorum intersection checker code from stellar-core #664Part of: #240