Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: making zoneconcierge resilient against long BTC reorg (again) #413

Merged
merged 8 commits into from
Jan 22, 2024

Conversation

SebastianElvis
Copy link
Member

@SebastianElvis SebastianElvis commented Jan 21, 2024

(Migrated from the private repo)


The new BTC data model changes the algorithm of finding BTC headers to send upon a finalised epoch in the sense that it no longer resilient against long reorg in Bitcoin -- if the last segment is entirely reverted due to the reorg, initHeader will be nil and the chain will panic (stack trace attached below).

This PR makes zoneconcierge resilient against long BTC reorg again. In particular, if zoneconcierge finds the last sent segment is reverted entirely, then it will send the last w+1 BTC headers, such that the 1st BTC header among them is w-deep and must be canonical assuming w-long reorg never happens.

In addition, this PR also restores the design that upon initialising a Babylon contract, the IBC packet will only include the last w+1 BTC headers rather than all the way down to genesis, such that the 1st BTC header among them is w-deep and will be the base header of Babylon contract.

This PR also introduces assertions in BTC light client about the last reorg point storage, and adds a fuzz test on getHeadersToBroadcast.

Appendix: stack trace upon a reorg reverting the entire LastSentSegment

To reproduce, comment out this in this PR and run the new fuzz test.

4:02PM ERR CONSENSUS FAILURE!!! err="runtime error: invalid memory address or nil pointer dereference" module=consensus stack="goroutine 494 [running]:\nruntime/debug.Stack()\n\truntime/debug/stack.go:24 +0x5e\ngithub.com/cometbft/cometbft/conse
nsus.(*State).receiveRoutine.func2()\n\tgithub.com/cometbft/cometbft@v0.38.2/consensus/state.go:800 +0x46\npanic({0x3856820?, 0x668e860?})\n\truntime/panic.go:914 +0x21f\ngithub.com/babylonchain/babylon/x/zoneconcierge/keeper.Keeper.getHeadersTo
Broadcast({{0x46744f0, 0xc001c87820}, {0x461c1c0, 0xc0014a0478}, {0x461bda0, 0xc00152b3b0}, {0x462cb88, 0xc0020ef540}, {0x46592c0, 0xc00152b490}, ...}, ...)\n\tgithub.com/babylonchain/babylon/x/zoneconcierge/keeper/ibc_packet_btc_timestamp.go:18
5 +0x14f\ngithub.com/babylonchain/babylon/x/zoneconcierge/keeper.Hooks.AfterRawCheckpointFinalized({{{0x46744f0, 0xc001c87820}, {0x461c1c0, 0xc0014a0478}, {0x461bda0, 0xc00152b3b0}, {0x462cb88, 0xc0020ef540}, {0x46592c0, 0xc00152b490}, ...}}, ..
.)\n\tgithub.com/babylonchain/babylon/x/zoneconcierge/keeper/hooks.go:35 +0x8b\ngithub.com/babylonchain/babylon/x/checkpointing/types.MultiCheckpointingHooks.AfterRawCheckpointFinalized({0xc0013059b0, 0x3, 0x31?}, {0x46591a8, 0xc00f5b5500}, 0x1?
)\n\tgithub.com/babylonchain/babylon/x/checkpointing/types/hooks.go:44 +0x7a\ngithub.com/babylonchain/babylon/x/checkpointing/keeper.Keeper.AfterRawCheckpointFinalized(...)\n\tgithub.com/babylonchain/babylon/x/checkpointing/keeper/hooks.go:38\ng
ithub.com/babylonchain/babylon/x/checkpointing/keeper.Keeper.SetCheckpointFinalized({{0x46744f0, 0xc001c87820}, {0x461c1c0, 0xc0014a0348}, {0x46591e0, 0xc00261a0e0}, {0x4665a48, 0xc00152b110}, {0x465c600, 0xc0011cf050}}, ...)\n\tgithub.com/babyl
onchain/babylon/x/checkpointing/keeper/keeper.go:308 +0x351\ngithub.com/babylonchain/babylon/x/btccheckpoint/keeper.Keeper.checkCheckpoints({{0x46744f0, 0xc001c87820}, {0x461c1c0, 0xc0014a0470}, {0x462ca48, 0xc001c87f30}, {0x462cb38, 0xc0020ef4f
0}, {0x465c540, 0xc0020eea50}, ...}, ...)\n\tgithub.com/babylonchain/babylon/x/btccheckpoint/keeper/keeper.go:403 +0x65f\ngithub.com/babylonchain/babylon/x/btccheckpoint/keeper.Keeper.OnTipChange(...)\n\tgithub.com/babylonchain/babylon/x/btcchec
kpoint/keeper/keeper.go:273\ngithub.com/babylonchain/babylon/x/btccheckpoint.EndBlocker({0x46591a8, 0xc00f5b5500}, {{0x46744f0, 0xc001c87820}, {0x461c1c0, 0xc0014a0470}, {0x462ca48, 0xc001c87f30}, {0x462cb38, 0xc0020ef4f0}, ...})\n\tgithub.com/b
abylonchain/babylon/x/btccheckpoint/abci.go:13 +0xa8\ngithub.com/babylonchain/babylon/x/btccheckpoint.AppModule.EndBlock(...)\n\tgithub.com/babylonchain/babylon/x/btccheckpoint/module.go:165\ngithub.com/cosmos/cosmos-sdk/types/module.(*Manager).
EndBlock(_, {{0x4659448, 0x68573c0}, {0x4675260, 0xc00e519080}, {{0x0, 0x0}, {0xc001c88c40, 0x9}, 0x6143, ...}, ...})\n\tgithub.com/cosmos/cosmos-sdk@v0.50.1/types/module/module.go:804 +0x2a9\ngithub.com/babylonchain/babylon/app.(*BabylonApp).En
dBlocker(...)\n\tgithub.com/babylonchain/babylon/app/app.go:1089\ngithub.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).endBlock(0xc000568b40, {0x68573c0?, 0x4675260?})\n\tgithub.com/cosmos/cosmos-sdk@v0.50.1/baseapp/baseapp.go:774 +0xf5\ngithub.com/c
osmos/cosmos-sdk/baseapp.(*BaseApp).internalFinalizeBlock(0xc000568b40, {0x4659448, 0x68573c0}, 0xc00e7cf8c0)\n\tgithub.com/cosmos/cosmos-sdk@v0.50.1/baseapp/abci.go:819 +0x13af\ngithub.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).FinalizeBlock(0xc0
00568b40, 0xc00e7cf8c0)\n\tgithub.com/cosmos/cosmos-sdk@v0.50.1/baseapp/abci.go:874 +0x125\ngithub.com/cosmos/cosmos-sdk/server.cometABCIWrapper.FinalizeBlock(...)\n\tgithub.com/cosmos/cosmos-sdk@v0.50.1/server/cmt_abci.go:44\ngithub.com/cometbf
t/cometbft/abci/client.(*localClient).FinalizeBlock(0xc002114840?, {0x4659f48?, 0x68573c0?}, 0xc0?)\n\tgithub.com/cometbft/cometbft@v0.38.2/abci/client/local_client.go:185 +0xd0\ngithub.com/cometbft/cometbft/proxy.(*appConnConsensus).FinalizeBlo
ck(0xc0009e71a0, {0x4659f48, 0x68573c0}, 0x1?)\n\tgithub.com/cometbft/cometbft@v0.38.2/proxy/app_conn.go:104 +0x16d\ngithub.com/cometbft/cometbft/state.(*BlockExecutor).ApplyBlock(_, {{{0xb, 0x0}, {0xc001bc6540, 0x6}}, {0xc001bc6546, 0x9}, 0x1,
0x6142, {{0xc0124ef5e0, ...}, ...}, ...}, ...)\n\tgithub.com/cometbft/cometbft@v0.38.2/state/execution.go:213 +0x622\ngithub.com/cometbft/cometbft/consensus.(*State).finalizeCommit(0xc000cee380, 0x6143)\n\tgithub.com/cometbft/cometbft@v0.38.2/co
nsensus/state.go:1770 +0xb7d\ngithub.com/cometbft/cometbft/consensus.(*State).tryFinalizeCommit(0xc000cee380, 0x6143)\n\tgithub.com/cometbft/cometbft@v0.38.2/consensus/state.go:1681 +0x2f6\ngithub.com/cometbft/cometbft/consensus.(*State).enterCo
mmit.func1()\n\tgithub.com/cometbft/cometbft@v0.38.2/consensus/state.go:1616 +0x9c\ngithub.com/cometbft/cometbft/consensus.(*State).enterCommit(0xc000cee380, 0x6143, 0x0)\n\tgithub.com/cometbft/cometbft@v0.38.2/consensus/state.go:1654 +0xc5b\ngi
thub.com/cometbft/cometbft/consensus.(*State).addVote(0xc000cee380, 0xc00aee6680, {0xc003df8450, 0x28})\n\tgithub.com/cometbft/cometbft@v0.38.2/consensus/state.go:2323 +0x1d4d\ngithub.com/cometbft/cometbft/consensus.(*State).tryAddVote(0xc000cee
380, 0xc00aee6680, {0xc003df8450?, 0xc005431c08?})\n\tgithub.com/cometbft/cometbft@v0.38.2/consensus/state.go:2055 +0x26\ngithub.com/cometbft/cometbft/consensus.(*State).handleMsg(0xc000cee380, {{0x46281a0, 0xc009552918}, {0xc003df8450, 0x28}})\
n\tgithub.com/cometbft/cometbft@v0.38.2/consensus/state.go:928 +0x3ce\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine(0xc000cee380, 0x0)\n\tgithub.com/cometbft/cometbft@v0.38.2/consensus/state.go:835 +0x3d1\ncreated by github.com
/cometbft/cometbft/consensus.(*State).OnStart in goroutine 349\n\tgithub.com/cometbft/cometbft@v0.38.2/consensus/state.go:397 +0x10c\n"

if initHeader == nil {
// if initHeader is nil, then this means a large reorg happens such that all headers
// in the last segment are reverted. In this case, send the last w+1 BTC headers
return k.getDeepEnoughBTCHeaders(ctx)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the smart contract do btc fork choice or does it just accepts whatever Babylon will send it ? becouse from what I understand with this apprach we will send competing fork to smart contract ?

Also doesn't this approach fail if there is no checkpoints finalized for over W blocks as then we will have gap in blocks we will send to smart contract ?

My current thinking for proper fix is:

  • store last W+1 blocks known by smart contract
  • with each finalised checkpoint, find lowest common ancestor between smart contract chain and current babylon tip.
  • send to smart contract chain extension from found lowest common ancestor to current babylon tip
  • update last W+1 blocks known by smart contract

This way practically do fork choice for smart contract, and even if there is no finalized checkpoints for long time, we can always extend smart contract chain. (as long as there is no rollback larger than W)

Copy link
Member Author

@SebastianElvis SebastianElvis Jan 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the smart contract do btc fork choice or does it just accepts whatever Babylon will send it ? becouse from what I understand with this apprach we will send competing fork to smart contract ?

So Babylon contract always follows the fork received from Babylon. This ensures consistency of BTC light client between Babylon and Babylon contract, and is secure as long as Babylon is secure. In the future it will support fork choice in order to allow any user to submit BTC headers to Babylon contract.

Also doesn't this approach fail if there is no checkpoints finalized for over W blocks as then we will have gap in blocks we will send to smart contract ?

No, because as long as one block in lastSentSegment is still canonical, Babylon will send all BTC headers since that canonical block in lastSentSegment, even if the number of BTC headers is larger than w.

Also in the case where Babylon does not have checkpoints over w BTC blocks, doesn't Babylon lose liveness and vigilante monitor will raise alarms? Babylon (and phase2 integration) has liveness only when there is a checkpoint in every w BTC blocks. No checkpoint in w consecutive BTC blocks implies that Babylon is censoring BTC checkpoints.

with each finalised checkpoint, find lowest common ancestor between smart contract chain and current babylon tip.

This was the exact design of phase2 integration. However, in the new data model of BTC light client, finding lowest common ancestor of two forks seems impossible, because every header that is not canonical is removed.

store last W+1 blocks known by smart contract

This will make the zoneconcierge stateful in the sense that it rememebers the state of each Babylon contract. Given that Babylon contract will allow anyone to submit BTC headers, Babylon won't precisely know the state of the Babylon contract. This will also introduce the O(w * n) storage overhead where n is the number of chains doing phase-2 integration.

In fact, Babylon will send exactly the same BTC segment to each chain with phase-2 inetrgation, so this mechanism effectively falls back to storing the last w+1 BTC headers in Babylon's BTC light client. The current implementation can be considered as an "optimisation" to this approach, i.e., if the last BTC segment is not entirely reverted then Babylon sends much fewer BTC headers to Babylon contract than w+1.

Overall, I think we can live with having the w assumption (which is already the case) in this part of the code for now, and add back the functionality to preserve forks and find common ancestors if there is a compelling use case. Wdyt?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make the zoneconcierge stateful in the sense that it rememebers the state of each Babylon contract. Given that Babylon contract will allow anyone to submit BTC headers, Babylon won't precisely know the state of the Babylon contract. This will also introduce the O(w * n) storage overhead where n is the number of chains doing phase-2 integration.

Doesn't this whole logic go away when btc lc smart contract will be allow anybody to submit headers ? Then it true that every chain can have btc lc in different state, so that chain submitted by Babylon may be not the best i.e
babylon finalized checkpoint with btc chain with length - 10, but in the mean time chain has beed extended to 11, so whatever Babylon will send will be invalid.

Overall, I think we can live with having the w assumption (which is already the case) in this part of the code for now, and add back the functionality to preserve forks and find common ancestors if there is a compelling use case. Wdyt?

Yup, I think we need to live with W assumption as this is perversive in many places.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this whole logic go away when btc lc smart contract will be allow anybody to submit headers ?

Ideally, if there 1 honest guy who keeps sending new BTC headers to Babylon contract, then yes Babylon does not need to send it over again. Thing is that we cannot rely on random users for Babylon contract's liveness. Thus Babylon still needs to send over BTC headers. Some of those headers might be duplicated, so Babylon contract is implemented in a way that it ignores (but does not reject) duplicated BTC headers.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, if there 1 honest guy who keeps sending new BTC headers to Babylon contract, then yes Babylon does not need to send it over again. Thing is that we cannot rely on random users for Babylon contract's liveness. Thus Babylon still needs to send over BTC headers. Some of those headers might be duplicated, so Babylon contract is implemented in a way that it ignores (but does not reject) duplicated BTC headers.

So this is a choice in the sense that:

  • either we leave Babylon sending those headers
  • or we introduce assumption in smart contract that there is at least one header reporter for btc headers (similar assumption as in Babylon)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have have both of them, namely Babylon and a vigilante program submit BTC headers to Babylon contract simultaneously. This is to answer the concerns about censorship resistance of Babylon contract -- even Babylon is censoring BTC headers, Babylon contract can still receive BTC headers, and vigilante monitor on phase2 integration will panick if Babylon censors BTC timestamps as well.

Copy link
Collaborator

@KonradStaniec KonradStaniec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good, given what I now know about smart contract.

Although new fuzz tests are failing.

@@ -182,6 +195,12 @@ func (k Keeper) getHeadersToBroadcast(ctx context.Context) []*btclctypes.BTCHead
}
}

if initHeader == nil {
// if initHeader is nil, then this means a large reorg happens such that all headers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then this means a large reorg it not necessarily large reorg. In case when segment had length 1 which is possible when checkpoints finalizing different epoch are in adjacent btc blocks, this could be small re-org

@SebastianElvis
Copy link
Member Author

SebastianElvis commented Jan 22, 2024

Although new fuzz tests are failing.

There was a mistake in the assertions, making the test flaky. Fixed and fuzzed the test locally

@SebastianElvis SebastianElvis merged commit fa904d3 into dev Jan 22, 2024
4 checks passed
@SebastianElvis SebastianElvis deleted the fix-zc-ibc-panic branch January 22, 2024 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants