Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement shell quorum balancing #845

Merged
merged 4 commits into from
Jun 5, 2020

Conversation

AndrejMitrovic
Copy link
Contributor

@AndrejMitrovic AndrejMitrovic commented May 15, 2020

Requires:
#800, #840, #810, #841, #881

@AndrejMitrovic AndrejMitrovic added status-blocked Another issue/PR/external dependency is blocking completion of this PR/issue type-feature An addition to the system introducing new functionalities labels May 15, 2020
@AndrejMitrovic

This comment has been minimized.

@@ -112,7 +174,8 @@ public class Validator : FullNode, API

log.info("Doing network discovery..");
this.network.discover(required_peer_keys);
this.network.startPeriodicCatchup(this.ledger, &this.nominator.isNominating);
this.network.startPeriodicCatchup(this.ledger,

This comment was marked as resolved.

@AndrejMitrovic AndrejMitrovic removed the status-blocked Another issue/PR/external dependency is blocking completion of this PR/issue label May 18, 2020
@AndrejMitrovic

This comment has been minimized.

@codecov
Copy link

codecov bot commented May 18, 2020

Codecov Report

Merging #845 into v0.x.x will increase coverage by 0.03%.
The diff coverage is 90.71%.

Impacted file tree graph

@@            Coverage Diff             @@
##           v0.x.x     #845      +/-   ##
==========================================
+ Coverage   90.40%   90.43%   +0.03%     
==========================================
  Files          71       73       +2     
  Lines        5795     5868      +73     
==========================================
+ Hits         5239     5307      +68     
- Misses        556      561       +5     
Flag Coverage Δ
#integration 55.29% <71.42%> (+0.77%) ⬆️
#unittests 89.18% <90.71%> (+0.05%) ⬆️
Impacted Files Coverage Δ
source/agora/common/Config.d 91.52% <ø> (-0.09%) ⬇️
source/agora/test/Simple.d 81.81% <ø> (+0.33%) ⬆️
source/agora/node/Validator.d 76.31% <75.00%> (+2.12%) ⬆️
source/agora/test/Quorum.d 88.88% <88.88%> (ø)
source/agora/consensus/Quorum.d 90.47% <90.47%> (ø)
source/agora/test/EnrollmentManager.d 89.47% <90.62%> (-7.41%) ⬇️
source/agora/consensus/protocol/Nominator.d 88.70% <100.00%> (+0.73%) ⬆️
source/agora/node/FullNode.d 89.39% <100.00%> (+4.54%) ⬆️
source/agora/node/Ledger.d 97.51% <100.00%> (+0.17%) ⬆️
source/agora/test/Base.d 86.64% <100.00%> (-0.16%) ⬇️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d2931dd...023698e. Read the comment docs.

@AndrejMitrovic
Copy link
Contributor Author

It seems the test-suite times out (30 minutes). I'm trying to bisect it.

@AndrejMitrovic AndrejMitrovic force-pushed the shell-quorum branch 3 times, most recently from d51c4e8 to fa23de1 Compare May 19, 2020 07:46

Returns:
The genesis block

*******************************************************************************/

private immutable(Block) makeGenesisBlock (in NodeConfig[] node_configs)
private immutable(Block) makeGenesisBlock (in KeyPair[] key_pairs)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should be moved to the previous commit where the function was introduced.

@AndrejMitrovic AndrejMitrovic force-pushed the shell-quorum branch 4 times, most recently from 6f65b13 to 523d12a Compare May 20, 2020 05:10
@AndrejMitrovic
Copy link
Contributor Author

I've discovered that what is causing these failures is actually the code coverage mode. The test-suite works fine without it. But with it there are timeouts. Either the code coverage is significantly slowing down the tests, or there is a bug in the coverage generator code in DMD front-end.

@AndrejMitrovic
Copy link
Contributor Author

The feature I discovered in Geod24/localrest#81 came in useful. I rewrote one test.

@AndrejMitrovic AndrejMitrovic force-pushed the shell-quorum branch 5 times, most recently from ecc45cc to 1ab1b1e Compare June 4, 2020 04:53
@AndrejMitrovic
Copy link
Contributor Author

I further simplified the changeset. The diff is much smaller now.

@AndrejMitrovic AndrejMitrovic force-pushed the shell-quorum branch 4 times, most recently from 1c66d18 to 00996a3 Compare June 5, 2020 01:41
The generator currently takes the set of enrollments
and generates the same quorum set for all nodes.
In the same way that we remove transactions from the pool
only after they're externalized into a block,
we should also remove enrollments at this stage.
The quorum set is now derived from the active
set of enrollments in the blockchain.

After each block has been externalized,
the active validator set is checked.
If the set of validators has changed,
then the quorum set for any Validators
is reconfigured according to the
quorum generator algorithm.

Fixes bosagora#785
@AndrejMitrovic
Copy link
Contributor Author

AndrejMitrovic commented Jun 5, 2020

@bpfkorea/core please review when you have the time.

@@ -228,6 +228,9 @@ public class Ledger
this.enroll_man.clearExpiredValidators(block.header.height);

foreach (idx, ref enrollment; block.header.enrollments)
{
this.enroll_man.pool.remove(enrollment.utxo_key);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. I think in my fixes for #836 I will remove the call within addValidator because that function can fail.

Copy link
Member

@TrustHenry TrustHenry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it looks Great.

Comment on lines +128 to +129
return .buildQuorumConfig(this.config.node.key_pair.address,
enrollments, this.utxo_set.getUTXOFinder());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agora.consensus.Quorum.buildQuorumConfig ( const ref PublicKey node_key,
in Enrollment[] enrolls, in UTXOFinder finder )
I think it's pointing here.
There's no problem. But The code is difficult to read.

Comment on lines +232 to 235
this.enroll_man.pool.remove(enrollment.utxo_key);

if (auto r = this.enroll_man.addValidator(enrollment,
block.header.height, this.utxo_set.getUTXOFinder()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this.enroll_man.pool.remove
The remove is run twice here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you see it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +106 to +116
// at block height 1007 the freeze tx's are available
nodes.enumerate.each!((idx, node) =>
retryFor(node.getBlockHeight() == 1007, 5.seconds,
format("Node %s has block height %s. Expected: %s",
idx, node.getBlockHeight(), 1007)));

// now we can create enrollments
Enrollment enroll_0 = nodes[$ - 2].createEnrollmentData();
Enrollment enroll_1 = nodes[$ - 1].createEnrollmentData();
nodes[2].enrollValidator(enroll_0);
nodes[3].enrollValidator(enroll_1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question about this: Should we add the ability to enroll as soon as we freeze ?
Pro:

  • Genesis does it;
  • It allows to potentially solve the censorship issue if paired with a rule that more validators => Higher priority;
    Cons:
  • Attack vector ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's file an issue for this. And yes, I wanted this feature too.

@Geod24
Copy link
Collaborator

Geod24 commented Jun 5, 2020

I don't think the double clear should block the merge.

@Geod24 Geod24 merged commit c27172a into bosagora:v0.x.x Jun 5, 2020

const bool ExtraChecks = true;
const(char)* fail_reason;
enforce(isQuorumSetSane(scp_quorum, ExtraChecks, &fail_reason),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to have a test code for isQuorumSetSane.

format("Quorum %s fails sanity check before normalization: %s",
quorum, fail_reason.fromStringz));

normalizeQSet(scp_quorum);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to have a test code for normalizeQSet .

@AndrejMitrovic AndrejMitrovic deleted the shell-quorum branch June 5, 2020 09:36
Comment on lines +62 to +63
// test up to 1024 nodes
foreach (num_nodes; iota(1, 10).map!(n => 2 ^^ n))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// test up to 1022 nodes

unittest
{
    import std.algorithm;
    import std.stdio;
    import std.range;

    uint sum;
    foreach (num_nodes; iota(1, 10).map!(n => 2 ^^ n))
       sum+=num_nodes;
    writeln(sum);
}

Result : 1022

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not testing all node count configurations. That would be too expensive (especially once the algorithm is changed).

What I am testing here is this:

num_nodes : 2
num_nodes : 4
num_nodes : 8
num_nodes : 16
num_nodes : 32
num_nodes : 64
num_nodes : 128
num_nodes : 256
num_nodes : 512
num_nodes : 1024

See?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well actually it's not inclusive as I forgot to use 11 in the iota call. +1 points to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-feature An addition to the system introducing new functionalities
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants