Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reveal a pre-image periodically #563

Merged
merged 2 commits into from
Feb 24, 2020
Merged

Conversation

linked0
Copy link
Contributor

@linked0 linked0 commented Feb 19, 2020

A validator reveals a pre-image periodically.

  • Every 1 hour, or 6 blocks.
  • The revelation period could become configurable in another issue.
  • The pre-image has the data for double the period because it make the validator to be able to skip to next revelation timing In case of a failure of the revelation.
  • For unittest, the updateEnrolledHeight function is called manually because the PR Write Enrollments data to the block. #504 is being reviewed now. The PR has the code setting an enrolled height for an enrollment.

#492

@codecov
Copy link

codecov bot commented Feb 19, 2020

Codecov Report

Merging #563 into v0.x.x will increase coverage by 0.09%.
The diff coverage is 94.36%.

Impacted file tree graph

@@            Coverage Diff             @@
##           v0.x.x     #563      +/-   ##
==========================================
+ Coverage   88.52%   88.62%   +0.09%     
==========================================
  Files          58       58              
  Lines        4306     4377      +71     
==========================================
+ Hits         3812     3879      +67     
- Misses        494      498       +4
Flag Coverage Δ
#unittests 88.62% <94.36%> (+0.09%) ⬆️
Impacted Files Coverage Δ
source/agora/node/Node.d 90.36% <100%> (+0.61%) ⬆️
source/agora/test/Base.d 88.42% <100%> (+0.06%) ⬆️
source/agora/consensus/EnrollmentManager.d 86.62% <90%> (+0.37%) ⬆️
source/agora/test/EnrollmentManager.d 96.77% <97.14%> (+0.47%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 34ffcfa...2c1ab5c. Read the comment docs.

@@ -80,6 +80,16 @@ public class GossipProtocol
this.network.sendTransaction(tx);
this.ledger.tryNominateTXSet();
}

if (this.enroll_man.needRevealPreimage(this.ledger.getBlockHeight()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The needRevealPreimage check at that location causes unnecessary calls.
I think it would be better to check when the height of the block changes as a timing for checking next_reveal_height

@linked0
Copy link
Contributor Author

linked0 commented Feb 20, 2020

Ready for review.
All builds and tests have passed.

source/agora/consensus/EnrollmentManager.d Outdated Show resolved Hide resolved
- needRevealPreimage: function for checking if a certain height is revelation timing
- increaseNextRevealHeight: function for setting next block height to reveal pre-image
A validator reveals a pre-image every 1 hour, or 6 blocks and the pre-image has
the data for double the period because it make the validator to be able to skip
to next revelation timing in case of a failure of the revelation. The revelation
period could be configurable sometime.
@Geod24
Copy link
Collaborator

Geod24 commented Feb 23, 2020

Rebased on v0.x.x and fixed

@Geod24 Geod24 merged commit 0cdc928 into bosagora:v0.x.x Feb 24, 2020
Copy link
Contributor

@AndrejMitrovic AndrejMitrovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post-merge review because I didn't have the chance to review this before.

@@ -120,6 +127,9 @@ public class EnrollmentManager

if (!results.empty)
this.enroll_key = results.oneValue!(ubyte[]).deserializeFull!(Hash);

// load next height for preimage revelation
this.next_reveal_height = this.getNextRevealHeight();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come setNextRevealHeight is not called here?

Copy link
Contributor Author

@linked0 linked0 Feb 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first, the next_reveal_height is set by calling setNextRevealHeight in updateEnrolledHeight which is called in addValidatedBlock of the ledger, which is being reviewed in the PR #504

@@ -309,6 +319,11 @@ public class EnrollmentManager
enroll_hash, ex);
return false;
}

// set next height for revealing a pre-image
if (enroll_hash == this.enroll_key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future we should separate these two functionalities of the EnrollmentManager:

  • Managing enrollments for other nodes
  • Managing enrollments for this node

Otherwise the implementation will become complex and full of state, like it has started to show.

Copy link
Contributor Author

@linked0 linked0 Feb 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future we should separate these two functionalities of the EnrollmentManager

I totally agree with you. I have a plan to split the current EnrollmentManager into EnrollmentManager and ValidatorSet


public bool getNextPreimage (out PreimageInfo preimage) @safe
{
auto height = this.next_reveal_height + PreimageRevealPeriod * 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the * 2 part? So will this actually reveal preimages for the next 12 blocks (2 hours), rather than 6 blocks (1 hour)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was discussed during the whiteboarding session. Currently it's a bit convoluted, as you mentioned, we should use a sliding windows instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I missed the first part of the whiteboarding session when this happened.

{
auto height = this.next_reveal_height + PreimageRevealPeriod * 2;
if (height > ValidatorCycle - 1)
height = ValidatorCycle - 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks completely wrong? If the block height is 1009 then we will simply not reveal the preimage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the block height is 1009 then we will simply not reveal the preimage?

As previously mentioned, the next_reveal_height is set based on a current block height by calling setNextRevealHeight in updateEnrolledHeight which is called in addValidatedBlock of the ledger which is being reviewed in the PR #504


***************************************************************************/

public bool getNextPreimage (out PreimageInfo preimage) @safe
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are the unittests for this function?

Copy link
Contributor Author

@linked0 linked0 Feb 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are the unittests for this function?

I missed it. I will add it in a related PR.

public void increaseNextRevealHeight () @safe nothrow
{
ulong next_height = this.getNextRevealHeight();
if (this.next_reveal_height < ulong.max)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if there is a database error then the node will silently stop emitting preimages? The only way the user will know this is if they're actively monitoring the logs.

In the meantime the node stops emitting preimages, it gets slashed for not following the protocol, and the node owner loses funds. This is not good.

If there is an exceptional error that makes the node misbehave then we have to treat it as an error, not as a diagnostical issue. If the node can't reveal its preimages because of a database error then the node should shutdown with an error. The node operator can get notified when its node shuts down. That way the node operator gets the chance to fix the error (faulty disk or whatever else is causing the issue).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The node operator can get notified when its node shuts down.

Right. There needs a way to notify an abnormal status of a node to the operator through client UI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UI is a separate concern. Agora doesn't have to run with any UI at all. It just has to exit with a process code != 0, and then whichever initialization system is in use (systemd or otherwise) can notify the operator - whether it's through e-mail alerts, push notifications, etc. All of that stuff is external to the Agora node. But we must notify the operating system that we failed, and that means shutting down with an exit code.

Copy link
Contributor Author

@linked0 linked0 Feb 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the operators of nodes need to be familiar with command line environment. That's why I said about UI. Adding the code of assert or something in will be worked on one related PR or one issue.

}
catch (Exception ex)
{
log.error("Database operation error {}", ex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned before, this is a showstopper error. The node cannot reveal preimages, it should be shut down.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned before, this is a showstopper error. The node cannot reveal preimages, it should be shut down

There needs to add the code of assert(0) in a related PR.

}
catch (Exception ex)
{
log.error("Database operation error {}", ex);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same problem here. We are logging errors and then proceeding as if there was no error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same problem here. We are logging errors and then proceeding as if there was no error.

There needs to add the code of assert(0) in a related PR.

@@ -248,6 +248,16 @@ public class Node : API
this.network.sendTransaction(tx);
this.ledger.tryNominateTXSet();
}

if (this.enroll_man.needRevealPreimage(this.ledger.getBlockHeight()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the right place to put this. Why is it being triggered when we receive a Transaction? There might not be enough transactions in the pool to start a nomination. It should be triggered when the ledger's state changes (new block added / externalize was called), and only if we're not catching up on outdated blocks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants