-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reveal a pre-image periodically #563
Conversation
Codecov Report
@@ Coverage Diff @@
## v0.x.x #563 +/- ##
==========================================
+ Coverage 88.52% 88.62% +0.09%
==========================================
Files 58 58
Lines 4306 4377 +71
==========================================
+ Hits 3812 3879 +67
- Misses 494 498 +4
Continue to review full report at Codecov.
|
source/agora/node/GossipProtocol.d
Outdated
@@ -80,6 +80,16 @@ public class GossipProtocol | |||
this.network.sendTransaction(tx); | |||
this.ledger.tryNominateTXSet(); | |||
} | |||
|
|||
if (this.enroll_man.needRevealPreimage(this.ledger.getBlockHeight())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The needRevealPreimage
check at that location causes unnecessary calls.
I think it would be better to check when the height of the block changes as a timing for checking next_reveal_height
Ready for review. |
9e4e5d0
to
25b8445
Compare
- needRevealPreimage: function for checking if a certain height is revelation timing - increaseNextRevealHeight: function for setting next block height to reveal pre-image
25b8445
to
78950c2
Compare
A validator reveals a pre-image every 1 hour, or 6 blocks and the pre-image has the data for double the period because it make the validator to be able to skip to next revelation timing in case of a failure of the revelation. The revelation period could be configurable sometime.
78950c2
to
2c1ab5c
Compare
Rebased on v0.x.x and fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Post-merge review because I didn't have the chance to review this before.
@@ -120,6 +127,9 @@ public class EnrollmentManager | |||
|
|||
if (!results.empty) | |||
this.enroll_key = results.oneValue!(ubyte[]).deserializeFull!(Hash); | |||
|
|||
// load next height for preimage revelation | |||
this.next_reveal_height = this.getNextRevealHeight(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come setNextRevealHeight
is not called here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first, the next_reveal_height
is set by calling setNextRevealHeight
in updateEnrolledHeight
which is called in addValidatedBlock
of the ledger, which is being reviewed in the PR #504
@@ -309,6 +319,11 @@ public class EnrollmentManager | |||
enroll_hash, ex); | |||
return false; | |||
} | |||
|
|||
// set next height for revealing a pre-image | |||
if (enroll_hash == this.enroll_key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future we should separate these two functionalities of the EnrollmentManager
:
- Managing enrollments for other nodes
- Managing enrollments for this node
Otherwise the implementation will become complex and full of state, like it has started to show.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future we should separate these two functionalities of the EnrollmentManager
I totally agree with you. I have a plan to split the current EnrollmentManager
into EnrollmentManager
and ValidatorSet
|
||
public bool getNextPreimage (out PreimageInfo preimage) @safe | ||
{ | ||
auto height = this.next_reveal_height + PreimageRevealPeriod * 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the * 2
part? So will this actually reveal preimages for the next 12 blocks (2 hours), rather than 6 blocks (1 hour)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was discussed during the whiteboarding session. Currently it's a bit convoluted, as you mentioned, we should use a sliding windows instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I missed the first part of the whiteboarding session when this happened.
{ | ||
auto height = this.next_reveal_height + PreimageRevealPeriod * 2; | ||
if (height > ValidatorCycle - 1) | ||
height = ValidatorCycle - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks completely wrong? If the block height is 1009 then we will simply not reveal the preimage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the block height is 1009 then we will simply not reveal the preimage?
As previously mentioned, the next_reveal_height
is set based on a current block height by calling setNextRevealHeight in updateEnrolledHeight which is called in addValidatedBlock of the ledger which is being reviewed in the PR #504
|
||
***************************************************************************/ | ||
|
||
public bool getNextPreimage (out PreimageInfo preimage) @safe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are the unittests for this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are the unittests for this function?
I missed it. I will add it in a related PR.
public void increaseNextRevealHeight () @safe nothrow | ||
{ | ||
ulong next_height = this.getNextRevealHeight(); | ||
if (this.next_reveal_height < ulong.max) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if there is a database error then the node will silently stop emitting preimages? The only way the user will know this is if they're actively monitoring the logs.
In the meantime the node stops emitting preimages, it gets slashed for not following the protocol, and the node owner loses funds. This is not good.
If there is an exceptional error that makes the node misbehave then we have to treat it as an error, not as a diagnostical issue. If the node can't reveal its preimages because of a database error then the node should shutdown with an error. The node operator can get notified when its node shuts down. That way the node operator gets the chance to fix the error (faulty disk or whatever else is causing the issue).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The node operator can get notified when its node shuts down.
Right. There needs a way to notify an abnormal status of a node to the operator through client UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The UI is a separate concern. Agora doesn't have to run with any UI at all. It just has to exit with a process code != 0
, and then whichever initialization system is in use (systemd or otherwise) can notify the operator - whether it's through e-mail alerts, push notifications, etc. All of that stuff is external to the Agora node. But we must notify the operating system that we failed, and that means shutting down with an exit code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the operators of nodes need to be familiar with command line environment. That's why I said about UI. Adding the code of assert or something in will be worked on one related PR or one issue.
} | ||
catch (Exception ex) | ||
{ | ||
log.error("Database operation error {}", ex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned before, this is a showstopper error. The node cannot reveal preimages, it should be shut down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned before, this is a showstopper error. The node cannot reveal preimages, it should be shut down
There needs to add the code of assert(0)
in a related PR.
} | ||
catch (Exception ex) | ||
{ | ||
log.error("Database operation error {}", ex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same problem here. We are logging errors and then proceeding as if there was no error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same problem here. We are logging errors and then proceeding as if there was no error.
There needs to add the code of assert(0)
in a related PR.
@@ -248,6 +248,16 @@ public class Node : API | |||
this.network.sendTransaction(tx); | |||
this.ledger.tryNominateTXSet(); | |||
} | |||
|
|||
if (this.enroll_man.needRevealPreimage(this.ledger.getBlockHeight())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the right place to put this. Why is it being triggered when we receive a Transaction? There might not be enough transactions in the pool to start a nomination. It should be triggered when the ledger's state changes (new block added / externalize was called), and only if we're not catching up on outdated blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A validator reveals a pre-image periodically.
#492