Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] BFT fails to increment the storage round when syncing and gets stuck at the old round #2984

Closed
feezybabee opened this issue Jan 10, 2024 · 5 comments
Labels
bug Incorrect or unexpected behavior

Comments

@feezybabee
Copy link

https://hackerone.com/reports/2287110

Summary

BFT fails to increment the storage round when syncing and gets stuck at the old round

Steps To Reproduce:

  1. Run the ./devnet.sh

  2. Klill the node-3 when BFT advance to round=10, wait serveral seconds and restart the node

  3. Check the logs in validator-3.log

WARN BFT failed to increment to the next round from round 10 - Next round (11) is behind the current committee's starting round (14)
  1. And the node is stuck
Proposed batch for round 10 is still valid
Resending batch proposal for round 10 to peer '127.0.0.1:5000'

Proof-of-Concept (PoC)

  1. When the node sync_with_certificate_from_peer, it will probably produce a block when send_primary_certificate_to_bft

2.ledger.current_committee will be updated when advance_to_next_block

  1. And there will be no way to increment_to_next_round because the check, see: https://github.com/AleoHQ/snarkOS/blob/testnet3/node/bft/src/helpers/storage.rs#L166

Supporting Material/References:

Logs: https://github.com/ghostant-1017/logs/blob/master/logs-20231215175639.tar.gz

Impact

The vulnerability will cause the node be stuck.

@feezybabee feezybabee added the bug Incorrect or unexpected behavior label Jan 10, 2024
@ghostant-1017
Copy link
Contributor

Still reproduced in the latest commit: 052457b

@raychu86
Copy link
Contributor

This PR is not the final fix - https://github.com/AleoHQ/snarkOS/pull/3074, but should address this issue.

@howardwu
Copy link
Contributor

howardwu commented Feb 11, 2024

@ghostant-1017 can you check if #3074 fixes the issue, and whether it introduces any new issues?

@ghostant-1017
Copy link
Contributor

Yes! The pr seems has fixed this issue

@ghostant-1017 can you check if #3074 fixes the issue, and whether it introduces any new issues?

@raychu86
Copy link
Contributor

We also added a new change on top of the previous fix and noticed more stability - https://github.com/AleoHQ/snarkOS/pull/3105. It should also fix the issue where the validator node would be stuck on an old proposal - "Proposed batch for round X is still valid.

Closing the issue now, but please feel free to reopen if you notice the fixes did not fully resolve the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect or unexpected behavior
Projects
None yet
Development

No branches or pull requests

4 participants