Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nimbus 0.9 and Sign blocks in dev service #958

Merged
merged 27 commits into from
Dec 15, 2021

Conversation

JoshOrndorff
Copy link
Contributor

@JoshOrndorff JoshOrndorff commented Nov 4, 2021

What does it do?

Solves https://purestake.atlassian.net/browse/MOON-557
Companion for moonbeam-foundation/nimbus#15
Companion for moonbeam-foundation/nimbus#21

This PR updates Nimbus from roughly 0.8 to 0.9. This includes two breaking changes which are both described in detail in their upstream PRs linked above. Briefly, they are:

  • Use the preruntime digest rather than the author inherent
  • Use the NimbusApi rather than the AuthorFilterAPI

Changes to the Dev service

This PR updates the dev service to use the new NimbusManualSealConsensusDataProvider. This means that it performs proper slot prediction and block signing using the keystore even without a backing relay chain. Previously the dev service did not sign blocks, check signatures, do slot prediction, or even use the keys in the keystore at all. Previously the dev service just inserted Alice's author id, and everyone trusted it.

Is there something left for follow-up PRs?

This PR changes the moonbeam runtime to accept block author information through a new pre-runtime digest. This change was necessary to meet the assumptions of manual seal which uses the new pre-runtime digest. It also makes Nimbus more similar to Aura and Babe.

@joelamouche This will require a minor update to Polkadot JS to detect the author information properly in the dev service. The dev service will now use a pre-runtime digest rather than a consensus

What value does it bring to users

Anything that happens in the dev service naturally has better test coverage both because of CI and because we use the dev service a lot when testing PRs and runtime upgrades. This adds slot prediction to the dev service. That allows us to test slot prediction much more easily including for runtime upgrades.

@JoshOrndorff JoshOrndorff added A3-inprogress Pull request is in progress. No review needed at this stage. B7-runtimenoteworthy Changes should be noted in any runtime-upgrade release notes C1-low Does not elevate a release containing this beyond "low priority". D5-nicetohaveaudit⚠️ PR contains trivial changes to logic that should be properly reviewed. labels Nov 4, 2021
@JoshOrndorff JoshOrndorff changed the title Sign blocks dev service Sign blocks in dev service Nov 4, 2021
@girazoki
Copy link
Collaborator

girazoki commented Nov 5, 2021

I think the PR is a nice addition. Does this mean we can now write integration tets in the dev service for the slot prediction?

@joelamouche
Copy link
Contributor

@JoshOrndorff

This will require a minor update to Polkadot JS to detect the author properly

I'm afraid they might not approve a modification that is so Moonbeam specific. That being said, isn't this change just a change in the Block type returned, that should be detected by the api, and thus not require any modification?

@JoshOrndorff
Copy link
Contributor Author

@joelamouche The change on the UI side is about how we extract the block author. We already have nimbus-specific code there and this change makes it less of a special unicorn, and more like Aura and Babe. Here is the change in the consensus engine that needs to be reflected. https://github.com/PureStake/nimbus/pull/15/files#r744993190

@joelamouche
Copy link
Contributor

@JoshOrndorff Sounds good. Im creating a ticket for that

@JoshOrndorff
Copy link
Contributor Author

JoshOrndorff commented Dec 8, 2021

This is all updated with master and with the latest Nimbus now. In order to test it we will need to:

  1. Start a relay-para network with 2 validators and two collators using moonbeam master
  2. Upgrade one collator to this code. Verify:
    • The chain is still live
    • Both collators insert the original author-inherent
    • Upgraded collator also inserts the preruntime digest
    • Collators sync each others blocks
  3. Upgrade the other collator to this code
  4. Runtime upgrade to this code. Confirm:
    • The chain stays live
    • Both collators still insert preruntime digest
    • Both collators insert new empty inherent

@joelamouche
Copy link
Contributor

joelamouche commented Dec 8, 2021

@JoshOrndorff I created a JIRA ticket for this (I can assist)

@JoshOrndorff
Copy link
Contributor Author

JoshOrndorff commented Dec 13, 2021

I've tested this manually, and both the client and runtime portions of the upgrade succeeded as expected! I even recorded it in case we struggle to reproduce it https://www.twitch.tv/videos/1232376769

I'll get the unit tests cleaned up soon.

@JoshOrndorff JoshOrndorff changed the title Sign blocks in dev service Nimbus 0.9 and Sign blocks in dev service Dec 14, 2021
@JoshOrndorff
Copy link
Contributor Author

For some reason, this PR makes the dev service take significantly longer to start up. Nonetheless, our authorship task will still start quickly.

This results in logs like the following:

2021-12-14 10:06:54 🙌 Starting consensus session on top of parent 0x7e75a7259c8b9e8869ebad6fd99ac5180b3bc36d5af2ed5e6cf66d8ab91bb279    
2021-12-14 10:06:54 🔏 No Nimbus keys available. We will not be able to author.    
2021-12-14 10:06:54 Instant Seal encountered an error: no nimbus keys available to manual seal    
2021-12-14 10:06:55 🙌 Starting consensus session on top of parent 0x7e75a7259c8b9e8869ebad6fd99ac5180b3bc36d5af2ed5e6cf66d8ab91bb279    
2021-12-14 10:06:55 🔏 No Nimbus keys available. We will not be able to author.    
2021-12-14 10:06:55 Instant Seal encountered an error: no nimbus keys available to manual seal    
2021-12-14 10:06:55 📦 Highest known block at #0    
2021-12-14 10:06:55 〽️ Prometheus exporter started at 127.0.0.1:9615    
2021-12-14 10:06:55 Listening for new connections on 127.0.0.1:9944.    
2021-12-14 10:06:55 Development Service Ready    
2021-12-14 10:06:56 🙌 Starting consensus session on top of parent 0x7e75a7259c8b9e8869ebad6fd99ac5180b3bc36d5af2ed5e6cf66d8ab91bb279    
2021-12-14 10:06:56 🔮 Skipping candidate production because we are not eligible    
2021-12-14 10:06:56 Instant Seal encountered an error: no nimbus keys available to manual seal  

You can see that the authorship tries and fails to author blocks before the development service is ready. Once it is ready, the node begins behaving normally. I don't know whether this try-to-author-even-before-ready behavior existed previously because the node startup was too fast to tell previously.

@JoshOrndorff
Copy link
Contributor Author

@girazoki helped me find the problem. wasmtime, the web assembly compiler, takes a very long time to compile the wasm runtime. The underlying issue is very likely bytecodealliance/wasmtime#3523

I've confirmed this hypothesis in two ways:

  1. Using this tool https://github.com/girazoki/wasmtime-try which outputs:
Importing ./moonbase_runtime.compact.wasm...
imported in 352616 ms
  1. Running with --wasm-execution interpreted-i-know-what-i-do restores the desired behavior.

The fix is very likely to be putting #[inline(never)] on the part of code that is offending the wasm compiler, but first we need to find it.

My first idea is to see if the nimbus template has the same issue, and if so, figure out at which commit it started.

@JoshOrndorff
Copy link
Contributor Author

The dev service works just fine, with normal authorship starting almost immediately when I use the moonriver or moonbeam runtimes. It is only moonbase that has the problem.

# This command with the moonriver runtime works as expected
moonbeam --chain moonriver-dev --alice --tmp --sealing 1000

# But this one with the moonbase runtime takes a while at startup
moonbeam --chain moonbase-dev --alice --tmp --sealing 1000

But the wasmtime-try output indicates that both runtimes are taking far too long to compile

Importing ./moonbase_runtime.compact.wasm...
imported in 362149 ms

Importing ./moonriver_runtime.compact.wasm...
imported in 15476 ms

runtime/common/src/apis.rs Outdated Show resolved Hide resolved
@JoshOrndorff
Copy link
Contributor Author

The majority of the TS tests are fixed in 608b273 which brings in a hotfix to Substrate's manual seal. This change has been PRed upstream: paritytech/substrate#10498

@JoshOrndorff JoshOrndorff added C9-critical Elevates a release containing this PR to "critical priority". A8-mergeoncegreen Pull request is reviewed well. and removed C1-low Does not elevate a release containing this beyond "low priority". A3-inprogress Pull request is in progress. No review needed at this stage. labels Dec 15, 2021
@JoshOrndorff JoshOrndorff merged commit 6c9c17a into master Dec 15, 2021
@JoshOrndorff JoshOrndorff deleted the joshy-sign-blocks-dev-service branch December 15, 2021 18:31
@librelois librelois mentioned this pull request Dec 16, 2021
25 tasks
@crystalin crystalin added the B5-clientnoteworthy Changes should be mentioned in any downstream projects' release notes label Dec 17, 2021
@notlesh notlesh added D1-audited👍 PR contains changes to fund-managing logic that has been properly reviewed and externally audited and removed D5-nicetohaveaudit⚠️ PR contains trivial changes to logic that should be properly reviewed. labels Feb 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A8-mergeoncegreen Pull request is reviewed well. B5-clientnoteworthy Changes should be mentioned in any downstream projects' release notes B7-runtimenoteworthy Changes should be noted in any runtime-upgrade release notes C9-critical Elevates a release containing this PR to "critical priority". D1-audited👍 PR contains changes to fund-managing logic that has been properly reviewed and externally audited
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants