Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] Implement a quorum balancing algorithm #684

Closed

Conversation

AndrejMitrovic
Copy link
Contributor

@AndrejMitrovic AndrejMitrovic commented Mar 20, 2020

I've had several different approaches. Initially I tried to split up the stakes into three categories, but this made the algorithm really hard to reason about.

Then I've realized that I can just use the stakes as the chances of each node being included in a quorum set. It simplifies the algorithm quite a bit.

What has been done:

  • SCP verifies the quorum set is correct
  • Incorporated quorum intersection checker from Stellar into the tests
  • Testing of potential quorum splits, this is also part of the quorum intersection checker.
  • There is a toml generator, and I am using it successfully with the standalone SCP implementation go-scp.

What's still on the table:

  • More tests need to be added with hardcoded keys to ensure no regressions.
  • Integration & byzantine node tests are missing.
  • Does it make sense to limit the number of nodes in a quorum? Currently the algorithm stops adding nodes to a queue as soon as MAX_NODES_IN_QUORUM is reached, or 67% of the economic value of the entire network is reached.

Requires:

Part of: #240

@AndrejMitrovic AndrejMitrovic added status-blocked Another issue/PR/external dependency is blocking completion of this PR/issue type-feature An addition to the system introducing new functionalities labels Mar 20, 2020
@AndrejMitrovic AndrejMitrovic added this to the 2. Validator milestone Mar 20, 2020
@codecov
Copy link

codecov bot commented Mar 20, 2020

Codecov Report

Merging #684 into v0.x.x will decrease coverage by 0.76%.
The diff coverage is 73.75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           v0.x.x     #684      +/-   ##
==========================================
- Coverage   91.10%   90.34%   -0.77%     
==========================================
  Files          65       66       +1     
  Lines        5185     5425     +240     
==========================================
+ Hits         4724     4901     +177     
- Misses        461      524      +63     
Flag Coverage Δ
#integration 47.80% <0.00%> (-3.09%) ⬇️
#unittests 89.25% <74.05%> (-0.71%) ⬇️
Impacted Files Coverage Δ
source/agora/common/Amount.d 97.56% <33.33%> (-2.44%) ⬇️
source/agora/consensus/Quorum.d 74.26% <74.26%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9446bd3...9fadde7. Read the comment docs.

@AndrejMitrovic
Copy link
Contributor Author

I started writing better tests and realized the node would not automatically add itself in the list of quorum nodes, which was a mistake. I will update the PR with the tests & fixes in the coming days..

}

/// very simplistic way of reducing a 64-byte blob to an 8-byte seed
private ulong hashToSeed (Hash hash)

This comment was marked as resolved.

@AndrejMitrovic
Copy link
Contributor Author

This needs to be reworked a little bit.

There is an edge-case. What if one node has 66% of the staked amount of the entire network? In the current algorithm most nodes will end up with a single node in its quorum. We should add some minimum amount of nodes that every node should include. We could make it something simple like "must include at least 5 nodes, and if those nodes do not reach a majority of stakes then add additional nodes".


We have two sets of tests, based on what's possible:

- countQuorumMatches(): used when the number of nodes is low.

This comment was marked as resolved.

Note: If you use AscendingQuadratic for too many nodes, it will
assert as you will quickly run out of BOA.

Note: Intersection checks take roughly ~2 minutes for a single configuration
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder about why I hit this performance issue. The Stellar network has hundreds (thousands?) of nodes, it sounds like this algorithm would never finish if it attempted to do quorum intersection checking for the live system. Examine what's causing the performance issue. Maybe the bindings are wrong, the GC is kicking in, or something else.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that stellar has implemented an interrupt method, it might just be that the algorithm is inherently NP-hard.
The comment mentions 2 minutes and "16 node quorums", does that mean there are 16 nodes in the network overall, or each quorums have 16 nodes ?

}

/// limits the number of nodes in a quorum set
private enum MAX_NODES_IN_QUORUM = 7;

This comment was marked as resolved.


public QuorumConfig buildQuorumConfig ( PublicKey node_key,
Enrollment[] enrolls, UTXOFinder finder, Hash rand_seed,
size_t line = __LINE__ )

This comment was marked as resolved.

NodeStake[] stakes_by_price = orderStakesDescending(all_stakes);

const Amount min_quorum_amount = Amount(
cast(ulong)(10_000_000 * (0.67 * // todo: add multiply() support
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turn this todo into an issue and work on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to have to remove this piece of logic. There's two reasons:

  • Floating point math could lead to inconsistent results across platforms. We need consistency here. Perhaps we could use fixed-point math, but it would require looking for a good library implementation.
  • It doesn't entirely make sense that we're adding nodes into a quorum to make that quorum's monetary value 67% of the network, while at the same time having a 67% threshold value for those included nodes. That would actually turn into 44% with the threshold taken into account. Also, within that quorum set SCP treats every node as equal. This means nodes with 1/10th of the quorum stake and those with 9/10th of the value are treated the same. So monetary value != political (voting) power, as it should be. Finally, the node selection logic already makes sure that nodes with a higher stake are more frequently included in a quorum, so the min_quorum_amount is unnecessary.

Copy link
Contributor

@linked0 linked0 Apr 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a standard for double-precision that is IEEE 754. Modern 32-bit computers implement double-precision with two 32-bit memory words and 64-bit computers can do it with its 64-bit memory word. So We don't need to worry about the output of the floating-point operation.

auto stakes = stake_map
.byKeyValue
.map!(pair => NodeStake(pair.key, pair.value))
.array;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allocates. Replace with a non-allocating version.


*******************************************************************************/

private string toToml (QuorumConfig[PublicKey] quorums)

This comment was marked as outdated.

return toUserReadable(input[].hashFull());
}

// wrapper around Amount with some unsafe primitives

This comment was marked as resolved.

}

/// for debugging
private string pretty (int[PublicKey] counts)

This comment was marked as resolved.

// using pregenerated seeds to test the determenistic behavior of the algorithm
// note: sorted for easier comparison with PublicKey[]
version (unittest)
private immutable string[64] pregen_seeds =

This comment was marked as outdated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been wanting to add a module with "test data" for a while, I think it could be useful to have somewhere to get "valid, initialized data".

Copy link
Contributor Author

@AndrejMitrovic AndrejMitrovic Apr 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. It would be nice to have consistent keys in our test-suite, it helps with debugging.

#754


assert(isQuorumSetSane(scp_quorum, true, &reason),
format("Key %s with %s fails sanity check before normalization: %s",
key.nice, quorum.toToml, reason.to!string));

This comment was marked as resolved.

@AndrejMitrovic AndrejMitrovic force-pushed the quorum-balancing branch 8 times, most recently from 5cd0bff to 4459ce0 Compare April 14, 2020 09:39
Copy link
Collaborator

@Geod24 Geod24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay did a first pass. There's obviously A LOT to review and discuss here, will continue soon.

Note: If you use AscendingQuadratic for too many nodes, it will
assert as you will quickly run out of BOA.

Note: Intersection checks take roughly ~2 minutes for a single configuration
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that stellar has implemented an interrupt method, it might just be that the algorithm is inherently NP-hard.
The comment mentions 2 minutes and "16 node quorums", does that mean there are 16 nodes in the network overall, or each quorums have 16 nodes ?


Contains the quorum generator algorithm.

Note: If you use AscendingQuadratic for too many nodes, it will
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use Note: , it generates bad doc IIRC.

Also this could use more actual documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this could use more actual documentation.

Well it's a draft. If half the code gets thrown away I'd hate to throw away the documentation too. I'll add it as soon as it's finalized.

source/agora/consensus/Quorum.d Show resolved Hide resolved
we do not want to centralize the quorum configuration to only include
this one node. This would lead to too much political (voting) power.

The voting power of a Boa holder may be increased by spawning multiple nodes,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The voting power of a Boa holder may be increased by spawning multiple nodes,
The voting power of a BOA holder may be increased by spawning multiple nodes,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if they're holding snakes huh??

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We send them a job offer


However, this will lead to the Boa holder to receive less rewards than if
they spawned a single node with 10 units. It's their choice on how to
distribute their voting power vs rewards ratio.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice explanation. We should add the general consideration in the module doc. But that can be left for later.

source/agora/consensus/Quorum.d Show resolved Hide resolved
// using pregenerated seeds to test the determenistic behavior of the algorithm
// note: sorted for easier comparison with PublicKey[]
version (unittest)
private immutable string[64] pregen_seeds =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been wanting to add a module with "test data" for a while, I think it could be useful to have somewhere to get "valid, initialized data".

}

/// ditto
private string pretty (PublicKey input)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noice. Probably something we want to generalize in the long run. I mean Ocean has it, and I think stellar does too.
Something to consider: For easier memorization, you can use a noun and a verb, like what docker does. E.g. "patriot pangolin" and "mesmerizing door" might stand out more than "Andrew" and "Nathan".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 923 to 937
/// convenience because Amount's API sucks
private Amount added (in Amount lhs, in Amount rhs)
{
Amount amount = lhs;
assert(amount.add(rhs));
return amount;
}

/// convenience
private Amount subbed (in Amount lhs, in Amount rhs)
{
Amount amount = lhs;
assert(amount.sub(rhs));
return amount;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those modify the object. So I can't use them with const values.

E.g. I can't do Amount.MinFreezeAmount.mustAdd(Amount.MinFreezeAmount).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amount(Amount.MinFreezeAmount).mustAdd(Amount.MinFreezeAmount) ?

static foreach (node_idx; 0 .. pregen_seeds.length)
mixin("PublicKey n%s;".format(node_idx));

#line __LINE__

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

@AndrejMitrovic
Copy link
Contributor Author

The comment mentions 2 minutes and "16 node quorums", does that mean there are 16 nodes in the network overall, or each quorums have 16 nodes ?

16 nodes, where each node has some number of nodes in its quorum set.

And yeah, it's unfortunate that the algorithm is inherently slow.. hence the interrupts.

@AndrejMitrovic AndrejMitrovic removed the status-blocked Another issue/PR/external dependency is blocking completion of this PR/issue label Apr 16, 2020
@AndrejMitrovic AndrejMitrovic force-pushed the quorum-balancing branch 5 times, most recently from 37f93c7 to ad86ac4 Compare April 16, 2020 07:36
@AndrejMitrovic
Copy link
Contributor Author

Windows linker error:

agora.obj : error LNK2019: unresolved external symbol "public: static class std::shared_ptr<class stellar::QuorumIntersectionChecker *> __cdecl stellar::QuorumIntersectionChecker::create(struct std::unordered_map<struct stellar::PublicKey,class std::shared_ptr<struct stellar::SCPQuorumSet>,struct std::hash<struct stellar::PublicKey>,struct std::equal_to<struct stellar::PublicKey>,class std::allocator<struct std::pair<struct stellar::PublicKey const ,class std::shared_ptr<struct stellar::SCPQuorumSet> > > > const &)" (?create@QuorumIntersectionChecker@stellar@@SA?AV?$shared_ptr@PEAVQuorumIntersectionChecker@stellar@@@std@@AEBU?$unordered_map@UPublicKey@stellar@@V?$shared_ptr@USCPQuorumSet@stellar@@@std@@U?$hash@UPublicKey@stellar@@@4@U?$equal_to@UPublicKey@stellar@@@4@V?$allocator@U?$pair@$$CBUPublicKey@stellar@@V?$shared_ptr@USCPQuorumSet@stellar@@@std@@@std@@@4@@4@@Z) referenced in function _D5agora9consensus6Quorum22verifyQuorumsIntersectFHSQBy6common6crypto3Key9PublicKeySQDeQBg5Types12QuorumConfigZv

internal screaming

@AndrejMitrovic
Copy link
Contributor Author

For those that are curious what we talked about during our whiteboarding session:

I needed to define some constraints or properties of the algorithm. For example, knowing what the optimal quorum layout should look like.

We want several properties of the algorithm:

  • It should be possible to shuffle the quorums. This can be done by using the validator's revealed preimages. The preimages are combined into a "seed", which is then fed to the quorum balancing algorithm.
  • Nodes which stake a higher amount should be included more often in quorum sets.
  • The amount of required messaging between nodes should be minimized.

This last part is my main focus for this sprint.

The generator uses a provided source of randomness
to shuffle the nodes into different quorums.

The source of randomness should be derived from
the revealed preimages of the validators.

Nodes which have a higher staken amount have a
higher chance of being included in each node's
quorum set.

Part of: bosagora#240
@AndrejMitrovic
Copy link
Contributor Author

This should be reworked based on what was introduced in #845. We could add the very first enhancements to the quorum generator. I'll keep this in mind as a background task.

@AndrejMitrovic
Copy link
Contributor Author

This will be replaced with a series of simpler PRs.

@AndrejMitrovic AndrejMitrovic deleted the quorum-balancing branch June 17, 2020 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-feature An addition to the system introducing new functionalities
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants