Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validator node crash. #93

Closed
xchainw opened this issue Jul 6, 2024 · 5 comments
Closed

Validator node crash. #93

xchainw opened this issue Jul 6, 2024 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@xchainw
Copy link

xchainw commented Jul 6, 2024

Since upgrading to v24/v25, the validator nodes has encountered several crashes within a few days, and the relevant logs were found as follows. Before the update the validator nodes were running normally for months without crashing once. What is the cause of this problem?

2024-07-06 02:19:00 ?? Starting consensus session on top of parent 0x5424875a2afeedfbdf83bcb259e196c31cd16b753859c6b87e5a6af084031c1b
====================
Version: 3.1.0-5ea548d-x86_64-linux-gnu
   0: sp_panic_handler::set::{{closure}}
   1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/alloc/src/boxed.rs:1999:9
      std::panicking::rust_panic_with_hook
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/panicking.rs:695:13
   2: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/panicking.rs:582:13
   3: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/sys_common/backtrace.rs:151:18
   4: rust_begin_unwind
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/panicking.rs:578:5
   5: core::panicking::panic_fmt
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/core/src/panicking.rs:67:14
   6: core::result::unwrap_failed
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/core/src/result.rs:1651:5
   7: core::result::Result<T,E>::unwrap
   8: <sc_consensus_poscan::PowBlockImport<B,I,C,S,Algorithm,CAW,CIDP,AccountId,BlockNumber> as sc_consensus::block_import::BlockImport<B>>::import_block::{{closure}}
   9: <alloc::boxed::Box<dyn sc_consensus::block_import::BlockImport<B>+Error = sp_consensus::error::Error+Transaction = Transaction+core::marker::Sync+core::marker::Send> as sc_consensus::block_import::BlockImport<B>>::import_block::{{closure}}
  10: <&mut F as core::future::future::Future>::poll
  11: sc_consensus::import_queue::basic_queue::BlockImportWorker<B>::new::{{closure}}
  12: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
  13: <futures_util::future::select::Select<A,B> as core::future::future::Future>::poll
  14: <tracing_futures::Instrumented<T> as core::future::future::Future>::poll
  15: tokio::park::thread::CachedParkThread::block_on
  16: tokio::runtime::handle::Handle::block_on
  17: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
  18: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
  19: tokio::runtime::task::harness::Harness<T,S>::poll
  20: tokio::runtime::blocking::pool::Inner::run
  21: std::sys_common::backtrace::__rust_begin_short_backtrace
  22: core::ops::function::FnOnce::call_once{{vtable.shim}}
  23: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/alloc/src/boxed.rs:1985:9
      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/alloc/src/boxed.rs:1985:9
      std::sys::unix::thread::Thread::new::thread_start
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/sys/unix/thread.rs:108:17
  24: <unknown>
  25: <unknown>
Thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: Application(UnknownBlock("State already discarded for BlockId::Hash(0x220d1d66679abdf1fda75ef093c631f79f700a85c03dac4e87f461f28001106c)"))', /home/runner/work/3DP/3DP/consensus/poscan/src/lib.rs:565
This is a bug. Please report it at:
	https://github.com/3Dpass/3DP/issues
@skyours
Copy link

skyours commented Jul 6, 2024

I'm having the same problem.

@PaulSff PaulSff added the bug Something isn't working label Jul 6, 2024
@xchainw
Copy link
Author

xchainw commented Jul 6, 2024

Another crash occurred about an hour ago, checking telemetry, a number of nodes in the validator set all crashed at the same time. Checking the logs it seems to be caused by an error when importing block #951440.

2024-07-06 17:12:20 💤 Idle (40 peers), best: #951438 (0x312e…050b), finalized #951435 (0xb16e…0385), ⬇ 284.8kiB/s ⬆ 805.7kiB/s
2024-07-06 17:12:25 💤 Idle (40 peers), best: #951438 (0x312e…050b), finalized #951436 (0x169f…8ad1), ⬇ 666.8kiB/s ⬆ 560.3kiB/s
2024-07-06 17:12:30 poscan: start validate
2024-07-06 17:12:30 >>> object is valid (runtime)
2024-07-06 17:12:30 >>> poscan: end validate
2024-07-06 17:12:30 >>> object is valid
2024-07-06 17:12:31 💤 Idle (40 peers), best: #951438 (0x312e…050b), finalized #951436 (0x169f…8ad1), ⬇ 535.7kiB/s ⬆ 624.4kiB/s
2024-07-06 17:12:32 ✨ Imported #951439 (0x0b68…7015)
2024-07-06 17:12:32 🙌 Starting consensus session on top of parent 0x0b688bc64f962f11caa031db782b847781dd0a338a60546d086f4f96d38a7015
2024-07-06 17:12:33 🎁 Prepared block for proposing at 951440 (518 ms) [hash: 0x8e877934a159151ce5d9822cfa9dd8442fd1ca18251c7ea663ba79f598d964b6; parent_hash: 0x0b68…7015; extrinsics (8): [0xe029…84fe, 0xbcd8…d2f0, 0x17e7…9117, 0x8734…7dd0, 0x7b1e…3160, 0xfac8…8358, 0xbf77…1bb6, 0xc14f…71a7]]
2024-07-06 17:12:35 poscan: start validate
2024-07-06 17:12:35 >>> object is valid (runtime)
2024-07-06 17:12:35 >>> poscan: end validate
2024-07-06 17:12:35 >>> object is valid
2024-07-06 17:12:36 💤 Idle (40 peers), best: #951439 (0x0b68…7015), finalized #951436 (0x169f…8ad1), ⬇ 471.6kiB/s ⬆ 477.6kiB/s
2024-07-06 17:12:37 >>> verify: seal.hist_hash != hist_hash
2024-07-06 17:12:37 >>> InvalidSeal after self.algorithm.verify
2024-07-06 17:12:37 💔 Error importing block 0x09309ed2ca24d7872337ebdc22fec4f68e0e8dc00f51809aa8b10dee98145655: consensus error: Import failed: PoScan validation error: invalid seal
====================
Version: 3.1.0-5ea548d-x86_64-linux-gnu
   0: sp_panic_handler::set::{{closure}}
   1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/alloc/src/boxed.rs:1999:9
      std::panicking::rust_panic_with_hook
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/panicking.rs:695:13
   2: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/panicking.rs:582:13
   3: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/sys_common/backtrace.rs:151:18
   4: rust_begin_unwind
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/panicking.rs:578:5
   5: core::panicking::panic_fmt
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/core/src/panicking.rs:67:14
   6: core::result::unwrap_failed
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/core/src/result.rs:1651:5
   7: core::result::Result<T,E>::unwrap
   8: <sc_consensus_poscan::PowBlockImport<B,I,C,S,Algorithm,CAW,CIDP,AccountId,BlockNumber> as sc_consensus::block_import::BlockImport<B>>::import_block::{{closure}}
   9: <alloc::boxed::Box<dyn sc_consensus::block_import::BlockImport<B>+Error = sp_consensus::error::Error+Transaction = Transaction+core::marker::Sync+core::marker::Send> as sc_consensus::block_import::BlockImport<B>>::import_block::{{closure}}
  10: <&mut F as core::future::future::Future>::poll
  11: sc_consensus::import_queue::basic_queue::BlockImportWorker<B>::new::{{closure}}
  12: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
  13: <futures_util::future::select::Select<A,B> as core::future::future::Future>::poll
  14: <tracing_futures::Instrumented<T> as core::future::future::Future>::poll
  15: tokio::park::thread::CachedParkThread::block_on
  16: tokio::runtime::handle::Handle::block_on
  17: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
  18: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
  19: tokio::runtime::task::harness::Harness<T,S>::poll
  20: tokio::runtime::blocking::pool::Inner::run
  21: std::sys_common::backtrace::__rust_begin_short_backtrace
  22: core::ops::function::FnOnce::call_once{{vtable.shim}}
  23: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/alloc/src/boxed.rs:1985:9
      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/alloc/src/boxed.rs:1985:9
      std::sys::unix::thread::Thread::new::thread_start
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/sys/unix/thread.rs:108:17
  24: <unknown>
  25: <unknown>
Thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: Application(UnknownBlock("State already discarded for BlockId::Hash(0xead911c9af64012557626d8cc228d4a8e1c0f041f9b699edfab5a08b73ee0035)"))', /home/runner/work/3DP/3DP/consensus/poscan/src/lib.rs:565
This is a bug. Please report it at:
	https://github.com/3Dpass/3DP/issues

@mishk
Copy link
Collaborator

mishk commented Jul 8, 2024

Will be fixed in 1-2 days

mishk pushed a commit that referenced this issue Jul 8, 2024
@PaulSff
Copy link
Collaborator

PaulSff commented Jul 8, 2024

Here is the fix: 70205fc

@PaulSff PaulSff closed this as completed Jul 8, 2024
@xchainw
Copy link
Author

xchainw commented Jul 13, 2024

2024-07-13 06:03:00 🎁 Prepared block for proposing at 960728 (426 ms) [hash: 0x27a81f71517c81177081c7198a7a8241cb0d113ab9ba44dd08bdf65c111ceef2; parent_hash: 0x37e9…65d9; extrinsics (2): [0xf6f7…0a75, 0xd7a2…3985]]
2024-07-13 06:03:01 💤 Idle (12 peers), best: #960727 (0x37e9…65d9), finalized #960725 (0x1d5d…cb0f), ⬇ 205.4kiB/s ⬆ 403.4kiB/s
2024-07-13 06:03:05 poscan: start validate
2024-07-13 06:03:05 >>> object is valid (runtime)
2024-07-13 06:03:05 >>> poscan: end validate
2024-07-13 06:03:05 >>> object is valid
2024-07-13 06:03:06 💤 Idle (12 peers), best: #960727 (0x37e9…65d9), finalized #960725 (0x1d5d…cb0f), ⬇ 290.2kiB/s ⬆ 444.2kiB/s
2024-07-13 06:03:10 ♻️  Reorg on #960727,0x37e9…65d9 to #960728,0x0a7e…9c56, common ancestor #960725,0x1d5d…cb0f
2024-07-13 06:03:10 ✨ Imported #960728 (0x0a7e…9c56)
2024-07-13 06:03:10 🙌 Starting consensus session on top of parent 0x0a7ecf7a550e1fc85521204bb408090bf3e6fa91bddcb42e6868798d2c369c56
2024-07-13 06:03:10 🎁 Prepared block for proposing at 960729 (346 ms) [hash: 0x3b54b5e7f904039f310bf935b98766c9c82b7bd4535bf429944baf0f7597ae31; parent_hash: 0x0a7e…9c56; extrinsics (1): [0xd74e…6de7]]
2024-07-13 06:03:11 💤 Idle (12 peers), best: #960728 (0x0a7e…9c56), finalized #960726 (0xad8e…569f), ⬇ 102.6kiB/s ⬆ 312.0kiB/s
====================
Version: 3.1.0-70205fc-x86_64-linux-gnu
   0: sp_panic_handler::set::{{closure}}
   1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/alloc/src/boxed.rs:1999:9
      std::panicking::rust_panic_with_hook
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/panicking.rs:695:13
   2: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/panicking.rs:582:13
   3: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/sys_common/backtrace.rs:151:18
   4: rust_begin_unwind
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/panicking.rs:578:5
   5: core::panicking::panic_fmt
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/core/src/panicking.rs:67:14
   6: core::result::unwrap_failed
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/core/src/result.rs:1651:5
   7: std::sys_common::backtrace::__rust_begin_short_backtrace
   8: core::ops::function::FnOnce::call_once{{vtable.shim}}
   9: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/alloc/src/boxed.rs:1985:9
      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/alloc/src/boxed.rs:1985:9
      std::sys::unix::thread::Thread::new::thread_start
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/sys/unix/thread.rs:108:17
  10: start_thread
             at ./nptl/./nptl/pthread_create.c:442:8
  11: __GI___clone3
             at ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
Thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: UnknownBlock("State already discarded for BlockId::Hash(0x37e90df3ce53362b1e133d9eb07f1611ae2165e33042b56dcc67ffd1416365d9)")', nodes/poscan-consensus/src/service.rs:463
This is a bug. Please report it at:
	https://github.com/3Dpass/3DP/issues
====================
Version: 3.1.0-70205fc-x86_64-linux-gnu
   0: sp_panic_handler::set::{{closure}}
   1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/alloc/src/boxed.rs:1999:9
      std::panicking::rust_panic_with_hook
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/panicking.rs:695:13
   2: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/panicking.rs:582:13
   3: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/sys_common/backtrace.rs:151:18
   4: rust_begin_unwind
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/panicking.rs:578:5
   5: core::panicking::panic_fmt
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/core/src/panicking.rs:67:14
   6: core::result::unwrap_failed
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/core/src/result.rs:1651:5
   7: std::sys_common::backtrace::__rust_begin_short_backtrace
   8: core::ops::function::FnOnce::call_once{{vtable.shim}}
   9: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/alloc/src/boxed.rs:1985:9
      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/alloc/src/boxed.rs:1985:9
      std::sys::unix::thread::Thread::new::thread_start
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/sys/unix/thread.rs:108:17
  10: start_thread
             at ./nptl/./nptl/pthread_create.c:442:8
  11: __GI___clone3
             at ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
Thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: UnknownBlock("State already discarded for BlockId::Hash(0x37e90df3ce53362b1e133d9eb07f1611ae2165e33042b56dcc67ffd1416365d9)")', nodes/poscan-consensus/src/service.rs:463
This is a bug. Please report it at:
	https://github.com/3Dpass/3DP/issues
====================

@xchainw xchainw mentioned this issue Jul 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants