-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework NetworkManager using tasks #889
Conversation
acf6375
to
ad8f906
Compare
Codecov Report
@@ Coverage Diff @@
## v0.x.x #889 +/- ##
==========================================
+ Coverage 90.34% 90.41% +0.06%
==========================================
Files 70 70
Lines 5643 5652 +9
==========================================
+ Hits 5098 5110 +12
+ Misses 545 542 -3
Continue to review full report at Codecov.
|
Oh and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a big diff
source/agora/common/Config.d
Outdated
/// The maximum number of connection tasks to use during network discovery | ||
public size_t connection_tasks = 10; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit eerie to expose this in a config file that might be touched by not-so-technical ppl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just make it a constant then? I'm not sure which number to use. But I guess we can start with a low number like 10 and then increase it as we see fit?
source/agora/node/FullNode.d
Outdated
scope (success) | ||
this.taskman.wait(5.seconds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the scope (success)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I don't want to suspend the fiber if an exception is thrown. I want it to kill the fiber instead.
And now that I said that out loud, I realize I should probably make discover
nothrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Buuuuuuuttttttttt... Why not simply:
while (true)
{
this.network.discover();
this.taskman.wait(5.seconds);
}
The wait is in a scope (success)
in a loop with a single statement. Either the statement throws and the code is never executed, or the statement doesn't throw and the code is executed. What am I missing ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh right, my bad. It was copied over from the other example usage where I have a continue
. I'll fix it.
@@ -57,31 +57,288 @@ mixin AddLogger!(); | |||
/// Ditto | |||
public class NetworkManager | |||
{ | |||
/// Node information | |||
private static struct Node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be something less ambiguous, like NodeInfo
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had that exact same thought. But then I realized we already have that struct, returned in getNodeInfo
. I need to think of another name..
key = client.getPublicKey(); | ||
break; | ||
} | ||
catch (HTTPStatusException ex) // vibe.d (non-unittests) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens in LocalRest ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it returns PublicKey.init
. This was implemented this way as a workaround when splitting up full node vs validator node, Henry's PR.
|
||
***********************************************************************/ | ||
|
||
public this (void delegate (Set!Address addresses) onNewAddresses ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public this (void delegate (Set!Address addresses) onNewAddresses ) | |
public this (void delegate (Set!Address addresses) onNewAddresses) |
scope (success) | ||
this.outer.taskman.wait(1.seconds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto why scope success ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this specific case it's because of the continue
below. I didn't want to duplicate the wait call.
ad8f906
to
cb94ff1
Compare
|
cb94ff1
to
9939aa2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Definitely the right directions. I'm not sure I can give this the review it deserves in the time remaining in the sprint, but I really, really want to get this through the door.
Typo in the commit message: "does not [missing 'need' I guess] to be re-triggered"
source/agora/node/FullNode.d
Outdated
scope (success) | ||
this.taskman.wait(5.seconds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Buuuuuuuttttttttt... Why not simply:
while (true)
{
this.network.discover();
this.taskman.wait(5.seconds);
}
The wait is in a scope (success)
in a loop with a single statement. Either the statement throws and the code is never executed, or the statement doesn't throw and the code is executed. What am I missing ?
if (this.outer.banman.isBanned(this.address)) | ||
{ | ||
this.onFailedConnection(this.address); | ||
return; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My inner developer suggests me that we might want to make onFailedConnection
returns a boolean telling whether we should retry or abort the request, and that we could move the isBanned
check in there.
I don't know how sensible this would be, and I'm not asking you to do it, just raising possibilities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh this is a good idea.
If there's anything missing (I'm sure there is), we can continue work on it. Filing another issue with the rest of the problems that have to be fixed. |
e3e2eda
to
d0ade1d
Compare
There are random failures due to #887 and also the occasional failure by dub not finding a file. But otherwise it's ready to go. |
d0ade1d
to
4207610
Compare
Fixed wrong docs. |
This triggers quite a bit... |
It was the only API endpoint wrapper which stored the key to an internal public field, but this is unnecessary as the cached key is only used in one scope.
Grouping them together with the other sets.
Using the ban manager with an infinite ban time is awkward, and further complicates the situation where the node's IP changes. The Ban data is persistent and this is problematic if the Node's IP changes but we still have an IP in the ban list that is no longer used.
The new NetworkManager now uses separate connection and address discovery tasks. The 'connection_tasks' config option has been added to fine-tune the number of connection tasks a Node should use. The discover() method can now be called multiple times, which is required to be able to support quorum reshuffling. The address discovery is now an endless task which does not need to be re-triggered, it runs in the background and queries each client in turn every 5 seconds. The periodic catchup routine no longer waits for full node discovery before attempting to read blocks. It is unnecessary to have to wait for minPeersConnected() to be true, catchup can proceed as soon as we have at least one node connection. Fixes bosagora#587
Rebased on v0.x.x |
Works for me locally, let's see what the CI says.
There might be some missing
scope
/@safe
and so on, I'll add those if neccessary.Fixes #587