Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validator: Connect to all quorum set validators #643

Merged
merged 2 commits into from
Apr 5, 2020

Conversation

AndrejMitrovic
Copy link
Contributor

@AndrejMitrovic AndrejMitrovic commented Mar 10, 2020

Requires #648

@AndrejMitrovic AndrejMitrovic added the type-enhancement An improvement of existing functionalities label Mar 10, 2020
@codecov
Copy link

codecov bot commented Mar 10, 2020

Codecov Report

Merging #643 into v0.x.x will increase coverage by 0.02%.
The diff coverage is 95%.

Impacted file tree graph

@@            Coverage Diff             @@
##           v0.x.x     #643      +/-   ##
==========================================
+ Coverage   91.09%   91.12%   +0.02%     
==========================================
  Files          63       63              
  Lines        4786     4821      +35     
==========================================
+ Hits         4360     4393      +33     
- Misses        426      428       +2
Flag Coverage Δ
#integration 54.41% <93.75%> (+0.17%) ⬆️
#unittests 89.88% <92.5%> (+0.03%) ⬆️
Impacted Files Coverage Δ
source/agora/network/NetworkManager.d 80.12% <100%> (+1.14%) ⬆️
source/agora/test/Base.d 89.14% <100%> (+0.25%) ⬆️
source/agora/test/NetworkDiscovery.d 88.88% <88.88%> (ø) ⬆️
source/agora/node/Node.d 98.94% <88.88%> (-1.06%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 91943ce...4d6e10b. Read the comment docs.

@Geod24
Copy link
Collaborator

Geod24 commented Mar 10, 2020

Isn't it going against some of our goals to have validator special casing in network manager ?

@AndrejMitrovic
Copy link
Contributor Author

AndrejMitrovic commented Mar 10, 2020

Right, so we could move the list of required peers to the Node class (and ValidatorNode class once that's implemented). I'll rework it.

@AndrejMitrovic
Copy link
Contributor Author

AndrejMitrovic commented Mar 10, 2020

Actually, I've realized that the issue itself isn't so clear-cut.

The node doesn't have to be connected to all of its quorum peers in order to reach consensus. For example, let's say we have this node quorum layout:

A => (B, C, D)
D => (B, C)

A has (B, C, D) in its quorum set, while D has (B, C).

If A loses or does not establish its connection to B, it does not mean it won't receive its messages. As long as A is connected to D, then D will propagate B's messages back to A.


So changing minPeersConnected is incorrect here, although it's safer (but works against liveness). Changing peerLimitReached however is correct because we should always strive towards connecting to all of our quorum peers, even if their messages can be forwarded by other nodes in the network.

@AndrejMitrovic AndrejMitrovic added status-blocked Another issue/PR/external dependency is blocking completion of this PR/issue and removed status-blocked Another issue/PR/external dependency is blocking completion of this PR/issue labels Mar 11, 2020
@AndrejMitrovic
Copy link
Contributor Author

#648 was merged, this is now unblocked.

I've decided to keep the safe behavior by default. The validating node will attempt to connect to all of its quorum peers before discover() finishes. Later we can fine-tune this and make it have more liveness vs safety.

@AndrejMitrovic
Copy link
Contributor Author

Updated with better documentation. @bpfkorea/core ready for review.


auto quorum_keys =
node_confs
.filter!(conf => conf.is_validator)
Copy link
Member

@TrustHenry TrustHenry Mar 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If conf.is_validator is false, an exception may occur.

Module tests failed: agora.test.NetworkDiscovery
object.Exception@submodules/localrest/source/geod24/LocalRest.d-mixin-716(770): "Request timed-out"
----------------
/usr/local/opt/dmd/include/dlang/dmd/core/internal/array/equality.d:33 @safe void geod24.LocalRest.RemoteAPI!(agora.test.Base.TestAPI).RemoteAPI.start() [0x10e7876da]
/usr/local/opt/dmd/include/dlang/dmd/core/internal/array/equality.d:33 @safe void agora.test.Base.TestAPIManager.start().__lambda1!(agora.test.Base.NodePair).__lambda1(agora.test.Base.NodePair) [0x10e7965e0]
/usr/local/opt/dmd/include/dlang/dmd/core/internal/array/equality.d:33 @safe std.typecons.Flag!("each").Flag std.algorithm.iteration.each!(agora.test.Base.TestAPIManager.start().__lambda1).each!(agora.test.Base.NodePair[]).each(ref agora.test.Base.NodePair[]) [0x10e796657]
/usr/local/opt/dmd/include/dlang/dmd/core/internal/array/equality.d:33 void agora.test.Base.TestAPIManager.start() [0x10e783b7f]
/usr/local/opt/dmd/include/dlang/dmd/core/internal/array/equality.d:33 void agora.test.NetworkDiscovery.__unittest_L69_C1() [0x10e7ab21a]
source/agora/node/main.d:40 void agora.test.NetworkDiscovery.__modtest() [0x10e4edf4a]
/usr/local/opt/dmd/include/dlang/dmd/core/internal/array/equality.d:33 int agora.test.Base.customModuleUnitTester().__foreachbody3(ref agora.test.Base.customModuleUnitTester().ModTest) [0x10e7837df]
/usr/local/opt/dmd/include/dlang/dmd/core/internal/array/equality.d:33 void std.parallelism.ParallelForeach!(agora.test.Base.customModuleUnitTester().ModTest[]).ParallelForeach.opApply(scope int delegate(ref agora.test.Base.customModuleUnitTester().ModTest)).doIt() [0x10e566203]

Copy link
Contributor Author

@AndrejMitrovic AndrejMitrovic Mar 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if none of the node configs' is_validator is true, it will fail because map will try to invoke popFront on the empty range. But this makeMinimalNetQuorum should only be called for this specific test-case where all the nodes are validators.

This comment was marked as resolved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah it definitely does not throw. I wonder where I got this error from.

I'll add an assert.


// build the list of required quorum peers to connect to if we're a Validator
Set!PublicKey required_peer_keys;
if (this.config.node.is_validator)
Copy link
Member

@TrustHenry TrustHenry Mar 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this.config.node.is_validator is false, a unit test results in an error.

Module tests failed: agora.test.NetworkDiscovery
core.exception.AssertError@source/agora/test/NetworkDiscovery.d(89): Node 0 has 2 peers: ["GAYDGXP436SZ7LV6B5EXRZX27TB2D2GQRFV3SRN775RU6OQPMJT347F5", "GDD62CVWBKJE4NXYMELIV6GHZ5AKJ2TNY6MFASHUQY24RFAKJ4EZHCUF"]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are you setting it to false? Can you show me this test-case?

Copy link
Collaborator

@Geod24 Geod24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to only work for the initial connection, doesn't it ?
What happens if a required peer drops off, then a non-required peer connects, taking the "slot", and when the required peer comes back online, it wouldn't be accepted, would it ?

@@ -744,6 +749,31 @@ public APIManager makeTestNetwork (APIManager : TestAPIManager = TestAPIManager)
return conf;
}

// for disovery testing: full quorum, but only 1 node in the 'network' section

This comment was marked as resolved.


auto quorum_keys =
node_confs
.filter!(conf => conf.is_validator)

This comment was marked as resolved.

@AndrejMitrovic
Copy link
Contributor Author

What happens if a required peer drops off

True. The current design of NetworkManager does not take into account dropped connections. It's not really obvious how to deal with this while using the RestInterfaceClient. Maybe after #203 is implemented it will be much clearer how to handle connections?

@Geod24
Copy link
Collaborator

Geod24 commented Mar 18, 2020

Maybe after #203 is implemented it will be much clearer how to handle connections?

Then this is your lucky day.

@AndrejMitrovic
Copy link
Contributor Author

I think reworking the NetworkManager class could be a good issue to tackle in one of the next sprints. #587 and #644

@AndrejMitrovic
Copy link
Contributor Author

Then this is your lucky day.

Maybe today..?

@AndrejMitrovic
Copy link
Contributor Author

This seems to only work for the initial connection, doesn't it ?
What happens if a required peer drops off, then a non-required peer connects, taking the "slot", and when the required peer comes back online, it wouldn't be accepted, would it ?

I'll try to redesign the networking class to make it more flexible. Can we make this part of future work in #587?

A validator currently does not attempt to connect
to all of its quorum peers before it's ready
to start nominating.
AndrejMitrovic added a commit to AndrejMitrovic/agora that referenced this pull request Apr 1, 2020
@@ -767,11 +771,13 @@ public APIManager makeTestNetwork (APIManager : TestAPIManager = TestAPIManager)
return conf;
}

// for disovery testing: full quorum, but only 1 node in the 'network' section
// for dissovery testing: full quorum, but only 1 node in the 'network' section
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL. Also wrong commit.

Previously the network discovery would be considered
complete even if the quorum which a node depends on
has not been connected to.
@Geod24 Geod24 merged commit 0c45867 into bosagora:v0.x.x Apr 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-enhancement An improvement of existing functionalities
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants