Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Witness node cannot switch to correct fork on restart if was shut down on another long fork #1703

Closed
abitmore opened this issue Apr 5, 2019 · 2 comments

Comments

@abitmore
Copy link
Member

commented Apr 5, 2019

Bug Description
If we shutdown a witness_node which is not on the correct fork when the chain is forking, if both forks are long, when restarting, it is unable to switch to the correct fork, and reports unlinkable_block_exception.

If the correct fork is a soft fork, if we run the soft fork code with current data, it will crash on the forking block when replaying.

Related logs:

2019-04-05T11:11:54 th_a:invoke handle_block         handle_block ] Got block: #36311740 022a12bcb9fc73ea319c5ff01f4adfb3eb545722 time: 2019-04-05T11:11:54 transaction(s): 18 latency: 449 ms from: abc123  irreversible: 36310625 (-1115)                     application.cpp:515
2019-04-05T11:11:54 th_a:invoke handle_block           push_block ] Pushing block to fork database that failed to link: 022a12bcb9fc73ea319c5ff01f4adfb3eb545722, 36311740                      fork_database.cpp:64
2019-04-05T11:11:54 th_a:invoke handle_block           push_block ] Head: 36311670, 022a12765598724d7f8c7cd5439906df0519368c                    fork_database.cpp:65
2019-04-05T11:11:54 th_a:invoke handle_block         handle_block ] Error when pushing block:
3080000 unlinkable_block_exception: unlinkable block

As analysed by @pmconrad:

It happens because replay switches to undo_db + push_block 50 blocks before end of block database. For normal operation that's ok but not after such a long fork.

That said, the fixed 50 here is probably too small:

uint32_t undo_point = last_block_num < 50 ? 0 : last_block_num - 50;

On the other hand, we probably should NOT rely on this last_block to determine usable head block number, because _block_id_to_block may contain reversible blocks. But it's a bit tricky.

auto last_block = _block_id_to_block.last();

Impacts
Describe which portion(s) of BitShares Core may be impacted by this bug. Please tick at least one box.

  • API (the application programming interface)
  • Build (the build process or something prior to compiled code)
  • CLI (the command line wallet)
  • Deployment (the deployment process after building such as Docker, Travis, etc.)
  • DEX (the Decentralized EXchange, market engine, etc.)
  • P2P (the peer-to-peer network for transaction/block propagation)
  • Performance (system or user efficiency, etc.)
  • Protocol (the blockchain logic, consensus, validation, etc.)
  • Security (the security of system or user data, etc.)
  • UX (the User Experience)
  • Other (please add below)

Expected Behavior
Able to switch forks.

CORE TEAM TASK LIST

  • Evaluate / Prioritize Bug Report
  • Refine User Stories / Requirements
  • Define Test Cases
  • Design / Develop Solution
  • Perform QA/Testing
  • Update Documentation

@abitmore abitmore added this to the Future Feature Release milestone Apr 5, 2019

@abitmore abitmore added this to New -Awaiting Core Team Evaluation in Project Backlog via automation Apr 5, 2019

@pmconrad

This comment has been minimized.

Copy link
Contributor

commented Apr 6, 2019

A simple solution would be to enable undo_db GRAPHENE_MAX_UNDO_HISTORY before the last block in the index. That should cover all possible cases.
A more complex solution would write the last known LIB on disk somewhere, either in a separate file (probably best), or by modifying the index file structure or the blocks file structure. Not sure if it's worth it.

We should measure how much slower replaying 10k block with undo_db enabled really is, then decide how to solve.

@pmconrad pmconrad added this to To do in Feature Release (3.3.0) via automation Jun 25, 2019

@pmconrad pmconrad removed this from New -Awaiting Core Team Evaluation in Project Backlog Jun 25, 2019

@pmconrad pmconrad self-assigned this Jun 26, 2019

@pmconrad pmconrad moved this from To do to In development in Feature Release (3.3.0) Jun 26, 2019

@pmconrad pmconrad moved this from In development to In testing in Feature Release (3.3.0) Jun 26, 2019

@pmconrad

This comment has been minimized.

Copy link
Contributor

commented Jun 27, 2019

Resolved by #1832

@pmconrad pmconrad closed this Jun 27, 2019

Feature Release (3.3.0) automation moved this from In testing to Done Jun 27, 2019

@abitmore abitmore referenced this issue Jul 19, 2019
1 of 11 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.