Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CIP-0047? | Hardfork safety mechanism #318

Closed
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
206 changes: 206 additions & 0 deletions CIP-hardforkSafeguard/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,206 @@
---
CIP: ?
Title: Hardfork Safeguard via Stake Representation
Authors: Jared Corduan <jared.corduan@iohk.io>
Status: Draft
Type: Standards Track
Created: 2022-06-30
License: CC-BY-4.0
---

## Simple Summary / Abstract

This CIP replaces a manual safety check regarding the readiness of the network
for a hardfork with an automatic check.

Ever since the Shelley ledger era, block headers have included a protocol
version indicating the maximum supported protocol version that the block
producer is capable of supporting (see section 13, Software Updates, of the
[Shelley ledger specification](https://hydra.iohk.io/job/Cardano/cardano-ledger/shelleyLedgerSpec/latest/download-by-type/doc-pdf/ledger-spec)).

This (semantically meaningless) field provides a helpful metric for determining
how many blocks will be produced after a hardfork,
since nodes that have not upgraded will no longer produce blocks.
(Nodes that have not upgraded will fail the `chainChecks` check from Figure 74
of the Shelley ledger specification, since the major protocol version in the
ledger state will exceed the node's max major protocol version value,
and hence can no longer make blocks.)

If most of the blocks in the recent past (e.g. the last epoch) are
broadcasting their readiness for a hardfork,
we know that it is safe to propose an update to the major protocol version
which triggers a hardfork.

This CIP proposes automating this process,
and making the protocol version in the header semantically meaningful.
The ledger state will determine the stake
(represented as the proportion of the active stake) of all the block producers
whose last block contained the next major protocol version.
Moreover, a new protocol parameter `hardforkThreshold` will be used to reject any
protocol parameter update that proposes to change the major protocol version
but does not have enough backing stake.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify was "automating the process" means in this context?
I can see in your description how this could be used to prevent a hardfork if there is insufficient backing - but what isn't explained is how such a hardfork would be initiated. Who could make such a change proposal? Is that also automated? Is it anyone who can produce blocks (so any SPO for example)? Or just some specific entity / entities?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me see if I can explain it well enough here, and if it makes sense I'll add it to the CIP.

The current manual process is: around the time of a hard fork, humans use a tool like db-sync to see which stake pools are posting blocks with the current major protocol version in the block header, and which ones have the the current major protocol version plus one. If the holders of the goverance keys decide "yea, looks good", and they are otherwise ok with the hard fork (I do not myself know all that goes into the operational side if this, and am not proposing automating anything besides this single check), then they submit update proposals. If 5 of 7 agree on the exact proposals, the change occurs on the epoch boundary.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that makes total sense... I guess what I asked for was some clarification on what exactly is meant by "automating the process" - so in the brave new world, how would this work. Is it still one of the seven proposing, at least five of seven agreeing, but THEN (after that) there is ALSO the technical requirement proposed here?
Or does this change the first part of the process as well, i.e. can others propose that change? Who needs to approve before this technical requirement comes into play.

I am thinking (but could be totally wrong) that right now you are only focused on the negative case, i.e. this CIP only deals with creating a hurdle that prevents a change and doesn't try to change any other part of the process. But I think this could be somewhat clearer in this paragraph of the explanation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking (but could be totally wrong) that right now you are only focused on the negative case, i.e. this CIP only deals with creating a hurdle that prevents a change and doesn't try to change any other part of the process.

That is exactly right! This CIP does not intend to change the off-chain process in any way. It's just a final blocker to stop a hard fork that is not properly endorsed by the SPOs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized I lied 😆

This CIP does not intend to change the off-chain process in any way.

This CIP intends to lift one very specific off-chain process to the protocol itself, namely looking through block headers to see who is signaling their readiness.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to make this clear now.


## Motivation / History

Currently, the governance key holders collectively agree to increase the major
protocol version. This allows them to make a human judgment as to the
readiness of the network for a hardfork (using the mechanism described above).
Since only a few, well-aligned parties are involved, this is currently easy to
coordinate. As we move into the Voltaire phase, where the governance of the
network is decentralized,
it is imperative that we codify this human judgment in the protocol itself.

## Specification

### New Protocol Parameter

There will be a new protocol parameter named `hardforkThreshold`,
containing a rational number.

The bounds of `hardforkThreshold` need to be considered with care
so that unsafe values are not possible and to place checks and balances on the
governance mechanism.
The minimum value should greater than a half, and the maximum value should be
less than one.
The exact bounds need further, careful consideration.

### Tracking Hardfork endorsements

The ledger state will maintain a set of stake pool IDs corresponding to the
Copy link
Collaborator

@SebastienGllmt SebastienGllmt Aug 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem this doesn't address is what if you have competing proposals? The scheme described in this CIP only allows for sequential votes. The way this CIP is structure, I don't think this even allows you to propose two competing upgrades one after the other because the current structure doesn't have an expiry on upgrade proposals (other than maybe skipping version numbers if a version is deemed to have failed)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are strict rules on how the protocol version can be increased:

https://github.com/input-output-hk/cardano-ledger/blob/7e2f674d2a2d14752d4c2d5abf60b26ae015b9e2/eras/shelley/impl/src/Cardano/Ledger/Shelley/PParams.hs#L519-L520

(m + 1, 0) == (m', n') || (m, n + 1) == (m', n')

Either the major is increased by exactly one (and the minor reset to zero), or the minor is increased by exactly one (and the major remains unchanged).

Moreover, this proposal is only putting in a safeguard for hardforks (major number increase). So it is always clear what the broadcasted protocol version in the block header is referring to.

Maybe this is only clear if I also explain how the existing protocol parameter update system works? During the voting window, each goverance key can propose a change (they can submit multiple proposals, but the latest one overrides the previous) for the end of the current epoch. If quorum is met, the change happens, otherwise nothing happens and the voting state resets. After the voting window, the each goverance key can stage a vote for the next epoch, which behaves exactly as though they waited until the next epoch and placed a vote during the next voting window.


current structure doesn't have an expiry on upgrade proposals

I think I've explained this above as well. The current structure has harsh expirations. or am I misunderstanding what you meant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a sentence to the end of this paragraph, let me know if it's clear.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you actually addressed the real question - and this is related to my comment above.
Who gets to make such proposals. Let's say entity A wants to make a change to parameter X and therefore submits a proposal to change the protocol version to (m+1, 0) - and entity B wants to leave parameter X unchanged, but wants to change parameter Y instead. The also submit a proposal. Which protocol version would be assigned to that? How would a determination be made, which of the changes to proceed with? How would the SPOs indicate which of the proposals they endorse?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's say entity A wants to make a change to parameter X and therefore submits a proposal to change the protocol version to (m+1, 0) - and entity B wants to leave parameter X unchanged, but wants to change parameter Y instead. The also submit a proposal. Which protocol version would be assigned to that?

I guess there are two issues here:

  • how the protocol parameter update system works
  • how stake pool operators endorse a hard fork

The answer to the first question is:

Whether or not the governance system will move to change the protocol version to (m+1, 0) depends not just on A and B, but also on the other five governance entities (the quorum is 5 of 7 on mainnet). The current system is very basic: at least five of the keys must agree on the entire set of changes. So if entities A - G all want to change Y to 42, but only four of the entities want to change the protocol version to (m+1, 0), nothing is changed at all, not even Y.

The answer to the second question is:

Regardless of what the governance body is doing, if you are a stake pool operator and you are aware of a software update that prepares a hard fork, let's say introducing protocol version to (m+1, 0), you can:

  • signal your willingness to enact the harfork by placing m+1 in your block headers (the new software will actually do this for you)
  • signal your unwillingness to enact the harfork by placing m in your block headers (the old software will actually do this for you)

If not enough stake is backed by the bolck producers posting m+1, no update proposal can occur which changes the major version to m+1, even if quorum is met and even if other protocol parameters we also slated to change.


I can try to summarize this in the CIP, since clearly it's still not clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried again to make this point more clear. Let me know if it is still murky.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if 20% of SPO's upgrade, then a bug is found. 50% of the original upgraded SPO then install the new fixed major release and enough other SPO's do as well so the hard fork is successful. But 10% of the SPO's have the buggy prefix running but it has the same version number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an excellent point @WarriorField , and I'm embarrassed I did not think to address it in this CIP. It's far from just a theoretical concern, this came up quite recently, and this is what we did:

https://github.com/input-output-hk/cardano-node/blame/8832f86728ef6a425452b44f2f269acde149448c/cardano-node/src/Cardano/Node/Protocol/Cardano.hs#L201-L206

We fiddled with the minor version. And of course this only worked since the process is still manual. This CIP should address this, thank you!

block producers whose last block endorsed the next major protocol version.
Endorsing here means that the major protocol version in the block header is
exactly one more than the current major protocol version in the ledger state.
Note that the protocol version in the block header is set by the particular
cardano-node release being used by the block producer.
When no hardfork is anticipated, the node will be configured to place the
current major protocol version in the block header, indicating that the node
is not ready for any hardfork.
When a new release is introduced which can handle an upcoming hardfork,
the node will be configured to use the next major protocol version in the
block header.
Note also that there is no ambiguity regarding what the endorsement in the
block header is referring to, since the major protocol version is only allowed
to increase by one.

Whenever the major protocol version is updated, the set of endorsements is
reset to the empty set.

In order to track the endorsements, the `TICK` ledger rule will need two items
added to the environment (since the Shelley era, the `TICK` rule has had an
empty environment).
In particular, it will need the following from the block header:
* The pool ID of the block producer
* The major protocol version

### Rejecting Updates

The main point of the safeguard introduced in this CIP is the ability to reject
protocol parameter updates which propose to increase the major protocol version
when not enough block producers are prepared.
The rejection will happen in both the consensus layer and the ledger layer.

The timing of the rejection is critical, and requires understanding a bit about
the timing of the hardfork combinator (see the diagram below).
Ouroboros (Praos and Genesis) have a notion of a stability window,
corresponding to the duration of slots after which the consensus mechanism will
no longer roll back a block.
The stability window is currently three tenths of the epoch length
(36 hours on mainnet).
The hardfork combinator requires that the changes to the ledger state which
enact a hardfork (confirmed proposals to increase the major protocol version)
be stable two stability windows before the end of the epoch.
See section 17.4, Ledger restrictions, of the
[consensus report](https://github.com/input-output-hk/ouroboros-network/tree/314845c4087bc6e662d7df0d376ab1910a5b5476/ouroboros-consensus/docs/report).
Therefore protocol parameter updates for the next epoch boundary must be
submitted during the first four tenths of the epoch.
Call this first four tenths of each epoch the "proposal window" for the
purposes of this document.
The consensus layer
[analyzes](https://github.com/input-output-hk/ouroboros-network/blob/314845c4087bc6e662d7df0d376ab1910a5b5476/ouroboros-consensus-shelley/src/Ouroboros/Consensus/Shelley/Ledger/Inspect.hs#L77-L96)
the ledger state one stability window after the proposal window has ended to
determine if the major protocol version will be increased at the next epoch
boundary.
The ledger itself does not apply the protocol parameter update until the
epoch boundary.

To apply the new safeguard, the consensus layer will now use new logic for
determining if the major protocol will be increased, and the ledger will
need to use the exact same logic on the epoch boundary.
The new logic will take the same parameters that are currently being
used to make the determination.
See [protocolUpdates](https://github.com/input-output-hk/ouroboros-network/blob/314845c4087bc6e662d7df0d376ab1910a5b5476/ouroboros-consensus-shelley/src/Ouroboros/Consensus/Shelley/Ledger/Inspect.hs#L106-L110),
and note that the set of endorsements, the stake distribution, and the
protocol parameters will all included in what the consensus layer calls the
`LedgerState`, and what the ledger layer calls the `NewEpochState`
(the endorsements will be added to `LedgerState`, but the pool stake
distribution and the protocol parameters are already included).

The new logic for determining if the major protocol version will change is:
* Has quorum been met on the proposed protocol parameter updates?
* If not, there is nothing else to do.
* If so, proceed.
* Does the update modify the major protocol parameter version?
* If not, the update will be applied on the epoch boundary, and there is
nothing else to do.
* If so, proceed.
* What is the sum of the relative, active stake of the block producers listed
in the endorsement set defined in the
[previous section](#tracking-hardfork-endorsements)?
Note that the stake distribution used here is the same as stake
distribution currently being used for block production.
* Is the sum computed above at least as large as the value of the
`hardforkThreshold` protocol parameter?
* If not, the entire update is rejected.
* If so, the update will be applied on the epoch boundary.

#### Timing diagram

The following table illustrates the timing described above,
using the durations on mainnet (five-day epochs).

```mermaid
sequenceDiagram
participant s0 as Epoch Start
participant s1 as 12 hrs
participant s2 as 24 hrs
participant s3 as 36 hrs
participant s4 as 48 hrs
participant s5 as 60 hrs
participant s6 as 72 hrs
participant s7 as 84 hrs
participant s8 as 96 hrs
participant s9 as 108 hrs
participant sA as Epoch End

s0->s4: Proposal Window
s4->s7: Ledger state stabilization Window

Note over s7: Consensus to<br>determine if<br>hardfork will occur
Note over sA: Non-rejected<br>updates are applied<br>to ledger state
```

## Rationale

The safeguard presented in this CIP aligns very closely with the manual check
currently performed
today before any hardfork.
Moreover, we have strived to make the minimal changes needed to automate

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually only part of the criteria currently being used. SPO block creation ratio, defi TVL criterion, exchange adoption criterion. I'm not saying that the other two should be encoded here - but it would be reasonable to mention them in the Rationale

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I can make it more clear that this CIP is only aiming to automate one very specific check? I myself do not know the whole process.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that clarification. Yes, that would be very appreciated. I clearly read way too much into what you were intending to change.

the check.

## Backwards compatibility

This change is not backwards compatible; it requires a hardfork.
Since it only adds a new safeguard to the ledger rules, however,
no changes are needed to the serialization or to any downstream components.

## Path to Active

A hardfork is required for these changes.
A new ledger era is needed, containing the changes described.
The consensus layer will require minimal changes, namely
support for the new ledger era and adopting the new logic for determining if a
hardfork is immanent.

## Copyright

This CIP is licensed under Apache-2.0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is a silly nit-pick, but you have a license under the Copyright header - and that license is inconsistent with the one you list in line 8 above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for catching this! I will ask about what I am supposed to use, I admit I just copied those from an existing CIP created by an IOG employee.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

speaking with my "I am not a lawyer but I have spent way too much time working on licenses" hat on... I'd suggest going with the license in line 8 (CC-BY-4.0) as that is a great license for documents. Apache v2 is a very good choice for code (as it makes it easy to reuse and clarifies patent concerns), but doesn't really make a lot of sense if you apply it to documents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!