Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] The Merge #2229

Closed
wants to merge 9 commits into from
Closed

[WIP] The Merge #2229

wants to merge 9 commits into from

Conversation

mkalinin
Copy link
Collaborator

The Merge

Work in progress. Introduces spec changes to implement the executable beacon chain proposal.

Executable beacon chain is the version of Ethereum Mainnet with PoS consensus driven by the Beacon Chain. Consensus Upgrade (from PoW to PoS) is the core change that is coming with the Merge.

Based on

Related

Implementations

TODO

  • make the spec executable
  • run existing tests with stubbed application payload
  • add tests covering application payload
  • rebase to Altair once ready

@protolambda
Copy link
Collaborator

Marking as "do not merge" on request by @mkalinin. Looking forward to the merge of the merge though!

@protolambda protolambda added the Bellatrix CL+EL Merge label Mar 11, 2021
```python
def process_application_payload(state: BeaconState, body: BeaconBlockBody) -> None:
"""
Note: This function is designed to be able to be run in parallel with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth noting that the other process_block functions in eth2 take ~2ms on average hardware - pragmatically there's not that much to be gained from parallelization here - I would guess this falls in the category of being a "nice touch" - it makes testing and reasoning easier for example when functions are "pure" in the order-independent sense.

In general, making the eth1 state transition synchronous here leaves smaller margins for attestation production, an issue we're already seeing during finalization today, when the beacon chain is doing "nothing" - it would be good to define behaviors / describe more in the validator guide what the client should do when eth1 is momentarily unable to produce a block on time (for whatever reason, hickup in the client, hardware etc), what the next proposer should do and other failure scenarios, as well as investigate / simulate tolerances for failures and delays.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, the above comment is not taking into account future eth2 data requirements of course - the 2ms time is from current load which is mainly CPU-bound - with more data and shards, these timings may change dramatically, but at that point eth1 and eth2 might end up competing for the same (scarce) resource, disk, which also is hard to parallelize.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concerns you have listed above are important and addressing them is a part of readiness criteria for consensus upgrade. Transaction throughput (the size of the application block) is one of the evaluation criteria of the chain after the upgrade, attesters are in a heavy dependency on this property and time restriction that we have throughout a slot turns into restriction on the block size. The other potential approach to pushing the limits is delayed execution which comes with new complexity related to verification of gas limit per block.

Proposers may follow the pending block technique in a way it is currently implemented by Ethereum clients which means that application block is always ready in advance and proposer just needs to pick it up.

The other side of the problem of synchronous interaction between consensus and application chains is resource exhaustion attack vector, its analysis and mitigation.

All the above is out of the scope of this spec but definitely is on the critical path to the upgrade.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to note is that the primary attestation issue today is seen on epoch boundaries where the epoch transition must be processed before the block is created and before the block is gossiped at each step (to get the correct proposer shuffling to validate the signature).

My estimation is that this is causing late receiving of blocks across the network and thus the incorrect/late attestations. This epoch (shuffling) processing problem is independent of computation you do after block is gossiped/forwarded, and the epoch processing issue is a major target for optimizations.

If the epoch boundary block can be gossiped quickly (same speed as non-epoch boundary blocks), then there should be ample time (3s+) to compute the block's payload to attest. Eth1 blocks today are processed in 200ms so as long as you don't do that processing as a gossip validation, I don't see this compounding our current issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note! another parallelization design goal is that it ensures the application layer could be pulled out into a shard. By reducing the same-slot-processing dependencies between beacon and application-layer, we keep future designs open

)

application_state = get_application_state(state.application_state_root)
application_state_transition(application_state, beacon_chain_data, body.application_payload)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the beacon node holding the application state?

It seems to me that we can just do:

application_state_transition(state.application_state_root, beacon_chain_data, body.application_payload)

That saves 1 round trip of retrieving and passing the application state

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the beacon node holding the application state?

This is implementation dependant. If there is a client that connected to both networks (consensus and application) then it has direct access to the application state. The other type of client has separated pieces of software representing consensus and application parts, these pieces are interacting via RPC protocol. This is the reason behind keeping the spec abstract in that regard.

The other reason for that is the difference between commitment schemes of beacon state and application state, that is binary tree vs MPT. If the scheme were the same we could think of embedding application state into the beacon state. Then application state would be a part of the beacon state for the monolithic type of clients and could be still represented as a state root for clients of separated architecture.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that we can just do:

application_state_transition(state.application_state_root, beacon_chain_data, body.application_payload)

That saves 1 round trip of retrieving and passing the application state

Currently, it follows a logic similar to beacon chain state_transition function that accepts the (state, block) tuple and mutates the state by applying the state transition.

@mratsim
Copy link
Contributor

mratsim commented Mar 14, 2021

As a new miner I'm a bit bummed to see this expedited, but this 51% attack threat is ridiculous and short-sighted. For what it's worth - you have my support 100%.

Also why not use a draft PR? smile

The expedited quick merge described by Vitalik (see https://notes.ethereum.org/@vbuterin/B1mUf6DXO) is different and would happen before this PR.

Does this mean we’re throwing out the plans to make the beacon chain interact with the application state more directly as Mikhail Kalinin and co are working on?

No. Rather, the minimal merge as described above would only be the first step, and a post-merge hard fork would simplify the protocol and clean out unnecessary bits and make the interaction between the application layer and the beacon chain more “natural”. Eventually we would go even further and, for example, replace the application_block: bytes with application_txs: List[bytes]) and even make the application state a “normal” part of the BeaconState (even though it’s hashed in a non-SSZ way).

In fact, the minimal merge will make these things easier to work on, as it would no longer be needed to worry about merging two chains at the same time.

Copy link
Contributor

@djrtwo djrtwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick pass. a couple of nits and some comments

specs/merge/beacon-chain.md Show resolved Hide resolved
specs/merge/beacon-chain.md Show resolved Hide resolved
specs/merge/beacon-chain.md Outdated Show resolved Hide resolved
specs/merge/beacon-chain.md Outdated Show resolved Hide resolved
specs/merge/beacon-chain.md Outdated Show resolved Hide resolved
```python
def process_application_payload(state: BeaconState, body: BeaconBlockBody) -> None:
"""
Note: This function is designed to be able to be run in parallel with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to note is that the primary attestation issue today is seen on epoch boundaries where the epoch transition must be processed before the block is created and before the block is gossiped at each step (to get the correct proposer shuffling to validate the signature).

My estimation is that this is causing late receiving of blocks across the network and thus the incorrect/late attestations. This epoch (shuffling) processing problem is independent of computation you do after block is gossiped/forwarded, and the epoch processing issue is a major target for optimizations.

If the epoch boundary block can be gossiped quickly (same speed as non-epoch boundary blocks), then there should be ample time (3s+) to compute the block's payload to attest. Eth1 blocks today are processed in 200ms so as long as you don't do that processing as a gossip validation, I don't see this compounding our current issue.

```python
def process_application_payload(state: BeaconState, body: BeaconBlockBody) -> None:
"""
Note: This function is designed to be able to be run in parallel with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note! another parallelization design goal is that it ensures the application layer could be pulled out into a shard. By reducing the same-slot-processing dependencies between beacon and application-layer, we keep future designs open

@djrtwo
Copy link
Contributor

djrtwo commented Mar 23, 2021

closing in favor of #2257
Will leave branch open for now in case we need for reference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants