Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make old Zebra versions eventually refuse to run #1870

Closed
7 tasks
Tracked by #3247 ...
teor2345 opened this issue Mar 9, 2021 · 10 comments · Fixed by #6351
Closed
7 tasks
Tracked by #3247 ...

Make old Zebra versions eventually refuse to run #1870

teor2345 opened this issue Mar 9, 2021 · 10 comments · Fixed by #6351
Assignees
Labels
A-rust Area: Updates to Rust code C-security Category: Security issues I-remote-node-overload Zebra can overload other nodes on the network I-usability Zebra is hard to understand or use NU-6 Network Upgrade: NU6 specific tasks S-needs-design Status: Needs a design decision

Comments

@teor2345
Copy link
Collaborator

teor2345 commented Mar 9, 2021

Motivation

  1. Nodes stuck on outdated network upgrades can be attacked using a small amount of hash power. For example, this is a security issue for users if that Zebra instance backs a lightwalletd server.

  2. Include the latest checkpoints to prevent long-range attacks under both Proof of Work (this is even more important under Proof of Stake).

  3. Having an end of support halt puts an upper bound on how long security issues can persist on the network.

  4. If an old Zebra version launches after a network upgrade, it will keep trying each peer it gets from the DNS seeders, even if they have been upgraded, and reject its connections. This is a distributed denial of service risk, and it places extra load on the network.

Issue 4 is mitigated by:

Solution

Zebra should refuse to run on launch, if a lot of blocks have been generated since the last release. The exact number of blocks isn't important, but it should be some fraction of the block/times between network upgrades.

Zebra should exit when it reaches the end of support block height.

Design Tasks:

  • work out the average time between recent network upgrades
  • decide how long Zebra should run before it requires an upgrade
  • decide which time we want to use as the "last change" time, this time must be stored somewhere in Zebra's source code
    • an explicit support date constant
    • a date calculated from: a checkpoint date stored in the checkpoint files each time they are updated, and the target block time interval (this requires an update to zebra-checkpoints and the checkpoint parsing code)
  • get this design reviewed by other Zebra developers

Implementation Tasks:

  • make Zebra exit on launch if it's a long time since the "last change" time chosen during the design process

Testing:

  • Make a test that fails if the end of support height is in the next few weeks, so we don't forget to update it

Documentation:

  • Add "update the end of support halt" to the release checklist

Alternatives

This is a serious security issue, so we must do something. But it won't have a major impact until a lot of Zebra nodes are deployed and run for a long time.

Context

zcashd has a support interval of around 16 weeks between required upgrades, with new versions coming out every 6 weeks. It's important that the end of support halt is not a multiple of the release cycle.

This is similar to Zebra's 4 month peer address limit in #1865.

Here is the initial zcashd issue:

@teor2345 teor2345 added C-bug Category: This is a bug A-consensus Area: Consensus rule updates A-rust Area: Updates to Rust code S-needs-design Status: Needs a design decision S-needs-triage Status: A bug report needs triage NU-5 Network Upgrade: NU5 specific tasks P-Medium C-security Category: Security issues I-consensus Zebra breaks a Zcash consensus rule I-hang A Zebra component stops responding to requests I-heavy Problems with excessive memory, disk, or CPU usage I-slow Problems with performance or responsiveness I-usability Zebra is hard to understand or use I-remote-node-overload Zebra can overload other nodes on the network labels Mar 9, 2021
@teor2345 teor2345 added this to No Estimate in Effort Affinity Grouping via automation Mar 9, 2021
@teor2345 teor2345 added this to To Do in 🦓 via automation Mar 9, 2021
@teor2345 teor2345 moved this from No Estimate to S - 3 in Effort Affinity Grouping Mar 9, 2021
@mpguerra mpguerra removed the S-needs-triage Status: A bug report needs triage label Mar 10, 2021
@str4d
Copy link
Contributor

str4d commented Mar 11, 2021

The standard zcashd support period is actually 16 weeks, not 6. This ensures we usually have two supported releases available given our 6-week release cycle (with some slack for slippage), which gives zcashd users around 3-4 months in which to perform upgrades.

We do usually shorten the support period (if necessary) for the last release prior to supporting a NU on mainnet, to ensure that the only supported zcashd releases at NU activation will follow the NU.

@teor2345
Copy link
Collaborator Author

The standard zcashd support period is actually 16 weeks, not 6.

Thanks, I've fixed the ticket.

@teor2345
Copy link
Collaborator Author

This is a serious security issue, but it's not required before NU5 activation.

@teor2345 teor2345 removed this from the 2021 Sprint 10 milestone Mar 24, 2021
@teor2345 teor2345 added A-rust Area: Updates to Rust code P-Medium and removed C-bug Category: This is a bug I-hang A Zebra component stops responding to requests I-slow Problems with performance or responsiveness I-consensus Zebra breaks a Zcash consensus rule A-rust Area: Updates to Rust code P-High labels Nov 8, 2021
@mpguerra mpguerra removed this from the 2021 Sprint 24 milestone Nov 8, 2021
@mpguerra
Copy link
Contributor

One for the ziggurat team?

@teor2345
Copy link
Collaborator Author

teor2345 commented Jun 2, 2022

We don't want to do this now, it seems to cause a lot of problems.

@teor2345 teor2345 closed this as not planned Won't fix, can't repro, duplicate, stale Jun 2, 2022
@teor2345 teor2345 added NU-6 Network Upgrade: NU6 specific tasks and removed A-consensus Area: Consensus rule updates NU-5 Network Upgrade: NU5 specific tasks I-heavy Problems with excessive memory, disk, or CPU usage labels Nov 7, 2022
@teor2345
Copy link
Collaborator Author

teor2345 commented Nov 7, 2022

ECC devs asked us about adding an automated halt to each Zebra version, so I'm going to re-open this ticket.

Something to discuss between ECC and ZF in the future is what Zebrad is planning to do with respect to EOS halts. In zcashd, each release has a strictly limited lifetime, which is really useful for when we want to coordinate network upgrades. Is Zebrad interested in doing something similar, and if so should we consider codifying the approach as a ZIP?
str4d also mentioned zcashd's potential for using alert keys to put the network into a safe mode; what's zebrad doing there?

https://discord.com/channels/809218587167293450/809251050741170187/1037816158067363882

We might want to do this before the first stable release of Zebra, or before NU6 testnet activation.

@teor2345 teor2345 reopened this Nov 7, 2022
🦓 automation moved this from To Do to In progress Nov 7, 2022
@teor2345 teor2345 changed the title Decide if old Zebra versions should eventually refuse to run Make old Zebra versions eventually refuse to run Mar 1, 2023
@teor2345
Copy link
Collaborator Author

teor2345 commented Mar 1, 2023

@mpguerra the ECC engineers have suggested that we make this change for security reasons. Can we please schedule it in before Zebra does its first stable release?

(If we change it after doing our first stable release, or after we pick up a lot of users, they will be very surprised.)

@oxarbitrage
Copy link
Contributor

I created a pull request for this here

By default this pull request will end the support for a release at 180 days(6 months - around 12 releases) after its creation.

If we consider the last 2 network upgrades in Zcash which are Canopy and NU5, there are 1 year and a half apart, I am assuming that NU6 will not be much less than that. The number can be adjusted, i am openb to discuss it as 6 months just sounds reasonable to me but it is more or less random selection.

@dconnolly
Copy link
Contributor

Could go in the existing progress task?

@teor2345
Copy link
Collaborator Author

I update this ticket based on the sync discussion today.

@mergify mergify bot closed this as completed in #6351 Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rust Area: Updates to Rust code C-security Category: Security issues I-remote-node-overload Zebra can overload other nodes on the network I-usability Zebra is hard to understand or use NU-6 Network Upgrade: NU6 specific tasks S-needs-design Status: Needs a design decision
Projects
No open projects
🦓
  
In progress
Development

Successfully merging a pull request may close this issue.

5 participants