Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Khala v0.1.7 node got occasionally full network stale #551

Open
h4x3rotab opened this issue Nov 1, 2021 · 6 comments
Open

Khala v0.1.7 node got occasionally full network stale #551

h4x3rotab opened this issue Nov 1, 2021 · 6 comments
Labels
accident An accident affecting the network or some products P0 related-substrate-v0.9.11/12

Comments

@h4x3rotab
Copy link
Contributor

h4x3rotab commented Nov 1, 2021

This can happen around once per day. Usually the entire network got stuck. A restart (or sometimes two) can have them move forward.

Symptom

The entire network got stuck at a certain block (include full nodes and collators, expect one collator, usually). The nodes got panic when trying to import blocks. When it occurs, each node will fall into one of the following errors, repeat the same message, and never recover until a full restart.

Records:

  • Khala 711330
  • Khala 700324
  • (could have more events)
  • Khala 666541, Kusama parent 9920583
  • Khala 659927, Kusama parent 9904431

Type 1: Transaction will be valid in the future

Nov 01 08:46:50 khala-collators-asia collator_1.sh[23964]: 2021-11-01 08:46:50 [Parachain] ⚙️  Syncing  0.0 bps, target=#659935 (49 peers), best: #659927 (0x8b0f…f74d), finalized #659927 (0x8b0f…f74d), ⬇ 75.8kiB/s ⬆ 48.5kiB/s
Nov 01 08:46:51 khala-collators-asia collator_1.sh[23964]: 2021-11-01 08:46:51 [Parachain] panicked at 'Transaction will be valid in the future', /home/h4x/.cargo/git/checkouts/substrate-7e08433d4c370a21/420db18/frame/executive/src/lib.rs:363:17
Nov 01 08:46:51 khala-collators-asia collator_1.sh[23964]: 2021-11-01 08:46:51 [Parachain] Block prepare storage changes error:
Nov 01 08:46:51 khala-collators-asia collator_1.sh[23964]: RuntimeApiError(Application(Execution(Other("Wasm execution trapped: wasm trap: unreachable\nwasm backtrace:\n    0: 0x6458 - <unknown>!rust_begin_unwind\n    1: 0x1e62 - <unknown>!core::panicking::panic_fmt::h189230e171f1a9ba\n    2: 0x1568bf - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::hbc8a99a38c425195\n    3: 0x17e7ea - <unknown>!Core_execute_block\n"))))
Nov 01 08:46:51 khala-collators-asia collator_1.sh[23964]: 2021-11-01 08:46:51 [Parachain] 💔 Error importing block 0x683c5aeafe84a0e9716ffab02b07fefa2856e68d426e5f59af9713bdd927bc64: Err(Other(ClientImport("Error at calling runtime api: Execution failed: Other(\"Wasm execution trapped: wasm trap: unreachable\\nwasm backtrace:\\n    0: 0x6458 - <unknown>!rust_begin_unwind\\n    1: 0x1e62 - <unknown>!core::panicking::panic_fmt::h189230e171f1a9ba\\n    2: 0x1568bf - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::hbc8a99a38c425195\\n    3: 0x17e7ea - <unknown>!Core_execute_block\\n\")")))
Nov 01 08:46:51 khala-collators-asia collator_1.sh[23964]: 2021-11-01 08:46:51 [Relaychain] 💤 Idle (48 peers), best: #9904741 (0x81f7…bcdf), finalized #9904737 (0x59fc…0d0f), ⬇ 304.1kiB/s ⬆ 306.8kiB/s
Nov 01 08:46:54 khala-collators-asia collator_1.sh[23964]: 2021-11-01 08:46:54 [Parachain] panicked at 'Transaction will be valid in the future', /home/h4x/.cargo/git/checkouts/substrate-7e08433d4c370a21/420db18/frame/executive/src/lib.rs:363:17
Nov 01 08:46:54 khala-collators-asia collator_1.sh[23964]: 2021-11-01 08:46:54 [Parachain] Block prepare storage changes error:
Nov 01 08:46:54 khala-collators-asia collator_1.sh[23964]: RuntimeApiError(Application(Execution(Other("Wasm execution trapped: wasm trap: unreachable\nwasm backtrace:\n    0: 0x6458 - <unknown>!rust_begin_unwind\n    1: 0x1e62 - <unknown>!core::panicking::panic_fmt::h189230e171f1a9ba\n    2: 0x1568bf - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::hbc8a99a38c425195\n    3: 0x17e7ea - <unknown>!Core_execute_block\n"))))
Nov 01 08:46:54 khala-collators-asia collator_1.sh[23964]: 2021-11-01 08:46:54 [Parachain] 💔 Error importing block 0x683c5aeafe84a0e9716ffab02b07fefa2856e68d426e5f59af9713bdd927bc64: Err(Other(ClientImport("Error at calling runtime api: Execution failed: Other(\"Wasm execution trapped: wasm trap: unreachable\\nwasm backtrace:\\n    0: 0x6458 - <unknown>!rust_begin_unwind\\n    1: 0x1e62 - <unknown>!core::panicking::panic_fmt::h189230e171f1a9ba\\n    2: 0x1568bf - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::hbc8a99a38c425195\\n    3: 0x17e7ea - <unknown>!Core_execute_block\\n\")")))
Nov 01 08:46:54 khala-collators-asia collator_1.sh[23964]: 2021-11-01 08:46:54 [Relaychain] ✨ Imported #9904742 (0x0058…f3f6)
Nov 01 08:46:55 khala-collators-asia collator_1.sh[23964]: 2021-11-01 08:46:55 [Parachain] ⚙️  Syncing  0.0 bps, target=#659935 (49 peers), best: #659927 (0x8b0f…f74d), finalized #659927 (0x8b0f…f74d), ⬇ 45.5kiB/s ⬆ 3.7kiB/s
Nov 01 08:46:56 khala-collators-asia collator_1.sh[23964]: 2021-11-01 08:46:56 [Parachain] panicked at 'Transaction will be valid in the future', /home/h4x/.cargo/git/checkouts/substrate-7e08433d4c370a21/420db18/frame/executive/src/lib.rs:363:17
Nov 01 08:46:56 khala-collators-asia collator_1.sh[23964]: 2021-11-01 08:46:56 [Parachain] Block prepare storage changes error:
Nov 01 08:46:56 khala-collators-asia collator_1.sh[23964]: RuntimeApiError(Application(Execution(Other("Wasm execution trapped: wasm trap: unreachable\nwasm backtrace:\n    0: 0x6458 - <unknown>!rust_begin_unwind\n    1: 0x1e62 - <unknown>!core::panicking::panic_fmt::h189230e171f1a9ba\n    2: 0x1568bf - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::hbc8a99a38c425195\n    3: 0x17e7ea - <unknown>!Core_execute_block\n"))))
Nov 01 08:46:56 khala-collators-asia collator_1.sh[23964]: 2021-11-01 08:46:56 [Parachain] 💔 Error importing block 0x683c5aeafe84a0e9716ffab02b07fefa2856e68d426e5f59af9713bdd927bc64: Err(Other(ClientImport("Error at calling runtime api: Execution failed: Other(\"Wasm execution trapped: wasm trap: unreachable\\nwasm backtrace:\\n    0: 0x6458 - <unknown>!rust_begin_unwind\\n    1: 0x1e62 - <unknown>!core::panicking::panic_fmt::h189230e171f1a9ba\\n    2: 0x1568bf - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::hbc8a99a38c425195\\n    3: 0x17e7ea - <unknown>!Core_execute_block\\n\")")))

Type 2: Storage root must match that calculated

2021-11-02 11:50:45 [Parachain] panicked at 'Storage root must match that calculated.', /home/h4x/.cargo/git/checkouts/substrate-7e08433d4c370a21/420db18/frame/executive/src/lib.rs:473:9
2021-11-02 11:50:45 [Parachain] Block prepare storage changes error:
RuntimeApiError(Application(Execution(Other("Wasm execution trapped: wasm trap: unreachable\nwasm backtrace:\n    0: 0x6458 - <unknown>!rust_begin_unwind\n    1: 0x1e62 - <unknown>!core::panicking::panic_fmt::h189230e171f1a9ba\n    2: 0x153b - <unknown>!core::panicking::panic::h4aa999cda2b58621\n    3: 0x156951 - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::hbc8a99a38c425195\n    4: 0x17e7ea - <unknown>!Core_execute_block\n"))))
2021-11-02 11:50:45 [Parachain] 💔 Error importing block 0x801c1798a33bd69a94fef636594bf16dc8c208cb893c3e3f37161fbf56635202: Err(Other(ClientImport("Error at calling runtime api: Execution failed: Other(\"Wasm execution trapped: wasm trap: unreachable\\nwasm backtrace:\\n    0: 0x6458 - <unknown>!rust_begin_unwind\\n    1: 0x1e62 - <unknown>!core::panicking::panic_fmt::h189230e171f1a9ba\\n    2: 0x153b - <unknown>!core::panicking::panic::h4aa999cda2b58621\\n    3: 0x156951 - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::hbc8a99a38c425195\\n    4: 0x17e7ea - <unknown>!Core_execute_block\\n\")")))

Minor issue: NotInFinalizedChain

This one can happen after a restart of the node. It can also self recover after a few minutes.

2021-11-02 15:15:39 [Parachain] Block import error: NotInFinalizedChain
2021-11-02 15:15:39 [Parachain] 💔 Error importing block 0x0a1f6222b3a7235dd833038bf98151c01b8f8d99a7a332da3d0695f7646aab2d: Err(Other(ClientImport("Potential long-range attack: block not in finalized chain.")))

Other background information

We have encountered similar problems in the past when we were running our parachain testnet v1. It's suspicious that we were also running a parachain runtime based on a version of substrate (polkadot) older than the client's. After upgraded the runtime to match the substrate version of the client, we had never seen the problem again, until now.

@h4x3rotab h4x3rotab changed the title Collators reports "Transaction will be valid in the future" occasionally Khala v0.1.17 node got occasionally block import error Nov 2, 2021
@h4x3rotab h4x3rotab changed the title Khala v0.1.17 node got occasionally block import error Khala v0.1.17 node got occasionally full network stale Nov 2, 2021
@h4x3rotab h4x3rotab added accident An accident affecting the network or some products P0 related-substrate-v0.9.11/12 labels Nov 2, 2021
@h4x3rotab h4x3rotab changed the title Khala v0.1.17 node got occasionally full network stale Khala v0.1.7 node got occasionally full network stale Nov 2, 2021
@coder-chenda
Copy link

how to solve it?

@h4x3rotab
Copy link
Contributor Author

@coder-chenda No solution yet. Now it's mitigated by restarting every time it got stuck. You can also try to add --state-cache-size 0 to the node cli. It's reported to be helpful but we cannot confirm.

@yrong
Copy link

yrong commented Nov 15, 2021

same issue met, @h4x3rotab any progress?

2021-11-12 13:12:36 [Parachain🌈] 💤 Idle (17 peers), best: #737456 (0xccbf…34a7), finalized #737454 (0x5da9…457c), ⬇ 1.7kiB/s ⬆ 1.8kiB/s
2021-11-12 13:12:37 [Parachain🌈] panicked at 'Storage root must match that calculated.', /cargo-home/git/checkouts/substrate-7e08433d4c370a21/d76f399/frame/executive/src/lib.rs:503:9
2021-11-12 13:12:37 [Parachain🌈] Block prepare storage changes error:
RuntimeApiError(Application(Execution(Other("Wasm execution trapped: wasm trap: unreachable\nwasm backtrace:\n    0: 0x374566 - <unknown>!rust_begin_unwind\n    1: 0x2637 - <unknown>!core::panicking::panic_fmt::h3ab5417155b7ba3b\n    2: 0x1ccb - <unknown>!core::panicking::panic::h5bfdfaa3db9a4b4a\n    3: 0x24fadc - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::h11a86dd3b827ed26\n    4: 0x26a629 - <unknown>!Core_execute_block\n"))))
2021-11-12 13:12:37 [Parachain🌈] 💔 Error importing block 0x230d09903695c483424b273c39f02a7bddff7ef419c1324448aca38b69d87d3a: Err(Other(ClientImport("Error at calling runtime api: Execution failed: Other(\"Wasm execution trapped: wasm trap: unreachable\\nwasm backtrace:\\n    0: 0x374566 - <unknown>!rust_begin_unwind\\n    1: 0x2637 - <unknown>!core::panicking::panic_fmt::h3ab5417155b7ba3b\\n    2: 0x1ccb - <unknown>!core::panicking::panic::h5bfdfaa3db9a4b4a\\n    3: 0x24fadc - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::h11a86dd3b827ed26\\n    4: 0x26a629 - <unknown>!Core_execute_block\\n\")")))
2021-11-12 13:12:37 [Parachain🌈] panicked at 'Storage root must match that calculated.', /cargo-home/git/checkouts/substrate-7e08433d4c370a21/d76f399/frame/executive/src/lib.rs:503:9
2021-11-12 13:12:37 [Parachain🌈] Block prepare storage changes error:
RuntimeApiError(Application(Execution(Other("Wasm execution trapped: wasm trap: unreachable\nwasm backtrace:\n    0: 0x374566 - <unknown>!rust_begin_unwind\n    1: 0x2637 - <unknown>!core::panicking::panic_fmt::h3ab5417155b7ba3b\n    2: 0x1ccb - <unknown>!core::panicking::panic::h5bfdfaa3db9a4b4a\n    3: 0x24fadc - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::h11a86dd3b827ed26\n    4: 0x26a629 - <unknown>!Core_execute_block\n"))))
2021-11-12 13:12:37 [Parachain🌈] 💔 Error importing block 0x230d09903695c483424b273c39f02a7bddff7ef419c1324448aca38b69d87d3a: Err(Other(ClientImport("Error at calling runtime api: Execution failed: Other(\"Wasm execution trapped: wasm trap: unreachable\\nwasm backtrace:\\n    0: 0x374566 - <unknown>!rust_begin_unwind\\n    1: 0x2637 - <unknown>!core::panicking::panic_fmt::h3ab5417155b7ba3b\\n    2: 0x1ccb - <unknown>!core::panicking::panic::h5bfdfaa3db9a4b4a\\n    3: 0x24fadc - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::h11a86dd3b827ed26\\n    4: 0x26a629 - <unknown>!Core_execute_block\\n\")")))
2021-11-12 13:12:37 [Relaychain] 💤 Idle (8 peers), best: #10062243 (0x8298…e7e2), finalized #10062239 (0x7920…0f5e), ⬇ 270.1kiB/s ⬆ 114.8kiB/s
2021-11-12 13:12:38 [Parachain🌈] panicked at 'Storage root must match that calculated.', /cargo-home/git/checkouts/substrate-7e08433d4c370a21/d76f399/frame/executive/src/lib.rs:503:9
2021-11-12 13:12:38 [Parachain🌈] Block prepare storage changes error:
RuntimeApiError(Application(Execution(Other("Wasm execution trapped: wasm trap: unreachable\nwasm backtrace:\n    0: 0x374566 - <unknown>!rust_begin_unwind\n    1: 0x2637 - <unknown>!core::panicking::panic_fmt::h3ab5417155b7ba3b\n    2: 0x1ccb - <unknown>!core::panicking::panic::h5bfdfaa3db9a4b4a\n    3: 0x24fadc - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::h11a86dd3b827ed26\n    4: 0x26a629 - <unknown>!Core_execute_block\n"))))
2021-11-12 13:12:38 [Parachain🌈] 💔 Error importing block 0x230d09903695c483424b273c39f02a7bddff7ef419c1324448aca38b69d87d3a: Err(Other(ClientImport("Error at calling runtime api: Execution failed: Other(\"Wasm execution trapped: wasm trap: unreachable\\nwasm backtrace:\\n    0: 0x374566 - <unknown>!rust_begin_unwind\\n    1: 0x2637 - <unknown>!core::panicking::panic_fmt::h3ab5417155b7ba3b\\n    2: 0x1ccb - <unknown>!core::panicking::panic::h5bfdfaa3db9a4b4a\\n    3: 0x24fadc - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::h11a86dd3b827ed26\\n    4: 0x26a629 - <unknown>!Core_execute_block\\n\")")))
2021-11-12 13:12:38 [Parachain🌈] panicked at 'Storage root must match that calculated.', /cargo-home/git/checkouts/substrate-7e08433d4c370a21/d76f399/frame/executive/src/lib.rs:503:9
2021-11-12 13:12:38 [Parachain🌈] Block prepare storage changes error:
RuntimeApiError(Application(Execution(Other("Wasm execution trapped: wasm trap: unreachable\nwasm backtrace:\n    0: 0x374566 - <unknown>!rust_begin_unwind\n    1: 0x2637 - <unknown>!core::panicking::panic_fmt::h3ab5417155b7ba3b\n    2: 0x1ccb - <unknown>!core::panicking::panic::h5bfdfaa3db9a4b4a\n    3: 0x24fadc - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::h11a86dd3b827ed26\n    4: 0x26a629 - <unknown>!Core_execute_block\n"))))
2021-11-12 13:12:38 [Parachain🌈] 💔 Error importing block 0x230d09903695c483424b273c39f02a7bddff7ef419c1324448aca38b69d87d3a: Err(Other(ClientImport("Error at calling runtime api: Execution failed: Other(\"Wasm execution trapped: wasm trap: unreachable\\nwasm backtrace:\\n    0: 0x374566 - <unknown>!rust_begin_unwind\\n    1: 0x2637 - <unknown>!core::panicking::panic_fmt::h3ab5417155b7ba3b\\n    2: 0x1ccb - <unknown>!core::panicking::panic::h5bfdfaa3db9a4b4a\\n    3: 0x24fadc - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::h11a86dd3b827ed26\\n    4: 0x26a629 - <unknown>!Core_execute_block\\n\")")))
2021-11-12 13:12:41 [Relaychain] 🔍 Discovered new external address for our node: /ip4/100.114.237.0/tcp/30334/ws/p2p/12D3KooWCoh5WZGgPxo3tRt4sERrAA3CaDd8gCmx8BCvruaLJkdS

@h4x3rotab
Copy link
Contributor Author

@yrong No. According to the tracking issue paritytech/cumulus#607, there's nothing on our end can be done. If you have the control over all the collators, you can add --state-cache-size 0 to disable the state cache, and it seems the problem can be mitigated a bit, but it's not 100% guaranteed.

@yrong
Copy link

yrong commented Nov 15, 2021

@h4x3rotab thks will try and let you know

@h4x3rotab
Copy link
Contributor Author

Update: the current walkaround is to add --state-cache-size 0 to half of the collators, and apply auto restart script to all the collators. The network has been running for nearly one month without full network stale.

We are still waiting for paritytech/cumulus#607 as the final fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accident An accident affecting the network or some products P0 related-substrate-v0.9.11/12
Projects
None yet
Development

No branches or pull requests

3 participants