Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High-watermark feature #696

Closed
beetrootkid opened this issue Jan 4, 2023 · 6 comments
Closed

High-watermark feature #696

beetrootkid opened this issue Jan 4, 2023 · 6 comments
Assignees
Labels
TeamGroot Under active development by TeamGroot @PegaSys

Comments

@beetrootkid
Copy link

You guys have recently implemented the watermark-repair subcommand which will specify that bellow a certain epoch, web3signer will refuse to sign. Wanted to ask, would it be simple to implement a similar feature, but instead web3signer would not sign beyond a certain epoch?

Context: we are running web3signer + vault + slashing db in an Azure cluster (lets call it old cluster). And now we'd like to move it to a new cluster in AWS. In theory, we could achieve near-zero downtime and safely if:

  • old cluster was running with --do-not-sign-after-epoch=160000
  • new cluster was running with --do-not-sign-before-epoch=160000 (not sure how to use the watermark-repair but I assume it accomplishes the same as this illustrative flag)

post epoch 160000, we would naturally teardown old cluster and keep only new cluster. The end goal for us is to move keys from one vault to another. That means we'd have to load same set of validator public keys in 2 different vaults and 1) not be slashed 2) minimal downtime 3) simple process without data migrations. We would expect to perform this scenario only when strictly necessary. If there a better way to achieve the same result, let us know. thanks!

@jframe jframe added the TeamGroot Under active development by TeamGroot @PegaSys label Jan 4, 2023
@non-fungible-nelson
Copy link
Contributor

@jframe let's look to prioritize this in the next few sprints.

@beetrootkid
Copy link
Author

Hi. Just following up on the status of this feature?

@non-fungible-nelson
Copy link
Contributor

Prioritizing in our next release. @jframe FYSA

@siladu siladu self-assigned this Aug 28, 2023
@jframe jframe added the doc-change-required Indicates an issue or PR that requires doc to be updated label Aug 29, 2023
@siladu
Copy link
Contributor

siladu commented Aug 30, 2023

@beetrootkid We are approaching a solution and would like some feedback please...

For safety reasons - specifically in order to protect against two web3signer instances that share the same database from being misconfigured - we prefer the use of the database for storing the high-watermark.

In order to protect against clock discrepancies in CL clients that still are running in the old cluster, we must leave the high-watermark in place until all instances using the high-watermark are decommissioned.

Therefore the current preference is to use an 'offline' command similar to the current watermark-repair, for example:
web3signer eth2 set-high-watermark --epoch=<same source and target epoch> --slot=<slot>

This would only need to be executed once on the old cluster's database.
The existing web3signer eth2 watermark-repair --epoch=<same source and target epoch> --slot=<slot> would need to be executed on the new cluster's database to set the low-watermark.

This is captured in the following ACs...

Acceptance Criteria

  1. Web3signer won't sign anything at or beyond the high-watermark (Web3signer already won't sign anything below the low-watermark)
  2. Web3signer operator is responsible for setting a good high-watermark that is in the future (so good documentation is needed)
  3. Operations that would automatically set the low-watermark to be greater than the high-watermark after it is set should invalidate and abort
    i. Importing new slashing protection data (via subcommand or keymanager api) with a minimum slot/epoch that is greater than the high-watermark
    ii. Using watermark-repair to set low-watermark as greater than the existing high-watermark
  4. high-watermark should not be automatically removed as a side-effect of another operation to avoid unintended safety issues
  5. high-watermark must be the same value for all 'old instances' of web3signer
  6. low-watermark must be the same value for all 'new instances' of web3signer
  7. Should only be able to set high-watermark for all validators rather than a subset
  8. high-watermark must remain in place until all incoming sign requests from the old cluster are handled
  9. Exported slashing protection data will not include any watermark information
  10. Pruning should have no impact on the high-watermark since the signedData should not progress once the high-watermark is reached
  11. Disabled validators should not be affected by this functionality (they should continue to be disabled and never sign)

Notes

  • What if we set the high-watermark in the past?

    • by default probably the validator(s) will just stop signing since it will check the high-watermark upon the next signature attempt
    • we have discussed solutions for validating this, but it is complicated to protect against this at the point when we set the high-watermark
  • We don't enforce setting low-watermark for all validators with watermark-repair - could that lead to issues if low is incorrectly set for only a subset of validators?

    • Should setting high-watermark follow this or should we have a more specific migrate command that sets both for all validators?
  • Can we reset or remove the high-watermark? Probably need to at least reset to account for user error.

  • Do we need to lock database when setting high-watermark?

Example command suggestions

web3signer eth2 watermark-repair --set-high-watermark --epoch=x --slot=y
web3signer eth2 watermark-repair --remove-high-watermark

web3signer eth2 set-high-watermark --epoch=x --slot=y
web3signer eth2 remove-high-watermark

web3signer eth2 migrate --epoch=x --slot=y --[old|set-high-watermark] (sets all validators to same high-watermark)
web3signer eth2 migrate --epoch=x --slot=y --[new|set-low-watermark] (sets all validators to same low-watermark)

@joaocenoura
Copy link

Your proposed approach make sense for our usecase. Worth considering to add a way of consulting what is the current high watermark for confirmation purposes (could be a new GET endpoint)

@siladu
Copy link
Contributor

siladu commented Aug 30, 2023

Tasks

  • Add high_watermark data to metadata db table
  • CRUD operations in MetadataDao
  • Prevent signing at or beyond high-watermark
  • Invalidate low-watermark operations that conflict with set high-watermark (slashing data import and watermark-repair)
  • New endpoint to query current high-watermark
  • Remove --validator-ids option from watermark-repair
  • New/Update existing subcommand to set/reset/remove high-watermark for all validators
  • Acceptance tests
  • Performance impact

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
TeamGroot Under active development by TeamGroot @PegaSys
Projects
None yet
Development

No branches or pull requests

6 participants