fix: making zoneconcierge resilient against long BTC reorg (again) #413

SebastianElvis · 2024-01-21T10:31:06Z

(Migrated from the private repo)

The new BTC data model changes the algorithm of finding BTC headers to send upon a finalised epoch in the sense that it no longer resilient against long reorg in Bitcoin -- if the last segment is entirely reverted due to the reorg, initHeader will be nil and the chain will panic (stack trace attached below).

This PR makes zoneconcierge resilient against long BTC reorg again. In particular, if zoneconcierge finds the last sent segment is reverted entirely, then it will send the last w+1 BTC headers, such that the 1st BTC header among them is w-deep and must be canonical assuming w-long reorg never happens.

In addition, this PR also restores the design that upon initialising a Babylon contract, the IBC packet will only include the last w+1 BTC headers rather than all the way down to genesis, such that the 1st BTC header among them is w-deep and will be the base header of Babylon contract.

This PR also introduces assertions in BTC light client about the last reorg point storage, and adds a fuzz test on getHeadersToBroadcast.

Appendix: stack trace upon a reorg reverting the entire `LastSentSegment`

To reproduce, comment out this in this PR and run the new fuzz test.

4:02PM ERR CONSENSUS FAILURE!!! err="runtime error: invalid memory address or nil pointer dereference" module=consensus stack="goroutine 494 [running]:\nruntime/debug.Stack()\n\truntime/debug/stack.go:24 +0x5e\ngithub.com/cometbft/cometbft/conse
nsus.(*State).receiveRoutine.func2()\n\tgithub.com/cometbft/cometbft@v0.38.2/consensus/state.go:800 +0x46\npanic({0x3856820?, 0x668e860?})\n\truntime/panic.go:914 +0x21f\ngithub.com/babylonchain/babylon/x/zoneconcierge/keeper.Keeper.getHeadersTo
Broadcast({{0x46744f0, 0xc001c87820}, {0x461c1c0, 0xc0014a0478}, {0x461bda0, 0xc00152b3b0}, {0x462cb88, 0xc0020ef540}, {0x46592c0, 0xc00152b490}, ...}, ...)\n\tgithub.com/babylonchain/babylon/x/zoneconcierge/keeper/ibc_packet_btc_timestamp.go:18
5 +0x14f\ngithub.com/babylonchain/babylon/x/zoneconcierge/keeper.Hooks.AfterRawCheckpointFinalized({{{0x46744f0, 0xc001c87820}, {0x461c1c0, 0xc0014a0478}, {0x461bda0, 0xc00152b3b0}, {0x462cb88, 0xc0020ef540}, {0x46592c0, 0xc00152b490}, ...}}, ..
.)\n\tgithub.com/babylonchain/babylon/x/zoneconcierge/keeper/hooks.go:35 +0x8b\ngithub.com/babylonchain/babylon/x/checkpointing/types.MultiCheckpointingHooks.AfterRawCheckpointFinalized({0xc0013059b0, 0x3, 0x31?}, {0x46591a8, 0xc00f5b5500}, 0x1?
)\n\tgithub.com/babylonchain/babylon/x/checkpointing/types/hooks.go:44 +0x7a\ngithub.com/babylonchain/babylon/x/checkpointing/keeper.Keeper.AfterRawCheckpointFinalized(...)\n\tgithub.com/babylonchain/babylon/x/checkpointing/keeper/hooks.go:38\ng
ithub.com/babylonchain/babylon/x/checkpointing/keeper.Keeper.SetCheckpointFinalized({{0x46744f0, 0xc001c87820}, {0x461c1c0, 0xc0014a0348}, {0x46591e0, 0xc00261a0e0}, {0x4665a48, 0xc00152b110}, {0x465c600, 0xc0011cf050}}, ...)\n\tgithub.com/babyl
onchain/babylon/x/checkpointing/keeper/keeper.go:308 +0x351\ngithub.com/babylonchain/babylon/x/btccheckpoint/keeper.Keeper.checkCheckpoints({{0x46744f0, 0xc001c87820}, {0x461c1c0, 0xc0014a0470}, {0x462ca48, 0xc001c87f30}, {0x462cb38, 0xc0020ef4f
0}, {0x465c540, 0xc0020eea50}, ...}, ...)\n\tgithub.com/babylonchain/babylon/x/btccheckpoint/keeper/keeper.go:403 +0x65f\ngithub.com/babylonchain/babylon/x/btccheckpoint/keeper.Keeper.OnTipChange(...)\n\tgithub.com/babylonchain/babylon/x/btcchec
kpoint/keeper/keeper.go:273\ngithub.com/babylonchain/babylon/x/btccheckpoint.EndBlocker({0x46591a8, 0xc00f5b5500}, {{0x46744f0, 0xc001c87820}, {0x461c1c0, 0xc0014a0470}, {0x462ca48, 0xc001c87f30}, {0x462cb38, 0xc0020ef4f0}, ...})\n\tgithub.com/b
abylonchain/babylon/x/btccheckpoint/abci.go:13 +0xa8\ngithub.com/babylonchain/babylon/x/btccheckpoint.AppModule.EndBlock(...)\n\tgithub.com/babylonchain/babylon/x/btccheckpoint/module.go:165\ngithub.com/cosmos/cosmos-sdk/types/module.(*Manager).
EndBlock(_, {{0x4659448, 0x68573c0}, {0x4675260, 0xc00e519080}, {{0x0, 0x0}, {0xc001c88c40, 0x9}, 0x6143, ...}, ...})\n\tgithub.com/cosmos/cosmos-sdk@v0.50.1/types/module/module.go:804 +0x2a9\ngithub.com/babylonchain/babylon/app.(*BabylonApp).En
dBlocker(...)\n\tgithub.com/babylonchain/babylon/app/app.go:1089\ngithub.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).endBlock(0xc000568b40, {0x68573c0?, 0x4675260?})\n\tgithub.com/cosmos/cosmos-sdk@v0.50.1/baseapp/baseapp.go:774 +0xf5\ngithub.com/c
osmos/cosmos-sdk/baseapp.(*BaseApp).internalFinalizeBlock(0xc000568b40, {0x4659448, 0x68573c0}, 0xc00e7cf8c0)\n\tgithub.com/cosmos/cosmos-sdk@v0.50.1/baseapp/abci.go:819 +0x13af\ngithub.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).FinalizeBlock(0xc0
00568b40, 0xc00e7cf8c0)\n\tgithub.com/cosmos/cosmos-sdk@v0.50.1/baseapp/abci.go:874 +0x125\ngithub.com/cosmos/cosmos-sdk/server.cometABCIWrapper.FinalizeBlock(...)\n\tgithub.com/cosmos/cosmos-sdk@v0.50.1/server/cmt_abci.go:44\ngithub.com/cometbf
t/cometbft/abci/client.(*localClient).FinalizeBlock(0xc002114840?, {0x4659f48?, 0x68573c0?}, 0xc0?)\n\tgithub.com/cometbft/cometbft@v0.38.2/abci/client/local_client.go:185 +0xd0\ngithub.com/cometbft/cometbft/proxy.(*appConnConsensus).FinalizeBlo
ck(0xc0009e71a0, {0x4659f48, 0x68573c0}, 0x1?)\n\tgithub.com/cometbft/cometbft@v0.38.2/proxy/app_conn.go:104 +0x16d\ngithub.com/cometbft/cometbft/state.(*BlockExecutor).ApplyBlock(_, {{{0xb, 0x0}, {0xc001bc6540, 0x6}}, {0xc001bc6546, 0x9}, 0x1,
0x6142, {{0xc0124ef5e0, ...}, ...}, ...}, ...)\n\tgithub.com/cometbft/cometbft@v0.38.2/state/execution.go:213 +0x622\ngithub.com/cometbft/cometbft/consensus.(*State).finalizeCommit(0xc000cee380, 0x6143)\n\tgithub.com/cometbft/cometbft@v0.38.2/co
nsensus/state.go:1770 +0xb7d\ngithub.com/cometbft/cometbft/consensus.(*State).tryFinalizeCommit(0xc000cee380, 0x6143)\n\tgithub.com/cometbft/cometbft@v0.38.2/consensus/state.go:1681 +0x2f6\ngithub.com/cometbft/cometbft/consensus.(*State).enterCo
mmit.func1()\n\tgithub.com/cometbft/cometbft@v0.38.2/consensus/state.go:1616 +0x9c\ngithub.com/cometbft/cometbft/consensus.(*State).enterCommit(0xc000cee380, 0x6143, 0x0)\n\tgithub.com/cometbft/cometbft@v0.38.2/consensus/state.go:1654 +0xc5b\ngi
thub.com/cometbft/cometbft/consensus.(*State).addVote(0xc000cee380, 0xc00aee6680, {0xc003df8450, 0x28})\n\tgithub.com/cometbft/cometbft@v0.38.2/consensus/state.go:2323 +0x1d4d\ngithub.com/cometbft/cometbft/consensus.(*State).tryAddVote(0xc000cee
380, 0xc00aee6680, {0xc003df8450?, 0xc005431c08?})\n\tgithub.com/cometbft/cometbft@v0.38.2/consensus/state.go:2055 +0x26\ngithub.com/cometbft/cometbft/consensus.(*State).handleMsg(0xc000cee380, {{0x46281a0, 0xc009552918}, {0xc003df8450, 0x28}})\
n\tgithub.com/cometbft/cometbft@v0.38.2/consensus/state.go:928 +0x3ce\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine(0xc000cee380, 0x0)\n\tgithub.com/cometbft/cometbft@v0.38.2/consensus/state.go:835 +0x3d1\ncreated by github.com
/cometbft/cometbft/consensus.(*State).OnStart in goroutine 349\n\tgithub.com/cometbft/cometbft@v0.38.2/consensus/state.go:397 +0x10c\n"

KonradStaniec · 2024-01-21T15:32:51Z

x/zoneconcierge/keeper/ibc_packet_btc_timestamp.go

+	if initHeader == nil {
+		// if initHeader is nil, then this means a large reorg happens such that all headers
+		// in the last segment are reverted. In this case, send the last w+1 BTC headers
+		return k.getDeepEnoughBTCHeaders(ctx)


does the smart contract do btc fork choice or does it just accepts whatever Babylon will send it ? becouse from what I understand with this apprach we will send competing fork to smart contract ?

Also doesn't this approach fail if there is no checkpoints finalized for over W blocks as then we will have gap in blocks we will send to smart contract ?

My current thinking for proper fix is:

store last W+1 blocks known by smart contract

with each finalised checkpoint, find lowest common ancestor between smart contract chain and current babylon tip.

send to smart contract chain extension from found lowest common ancestor to current babylon tip

update last W+1 blocks known by smart contract

This way practically do fork choice for smart contract, and even if there is no finalized checkpoints for long time, we can always extend smart contract chain. (as long as there is no rollback larger than W)

does the smart contract do btc fork choice or does it just accepts whatever Babylon will send it ? becouse from what I understand with this apprach we will send competing fork to smart contract ?

So Babylon contract always follows the fork received from Babylon. This ensures consistency of BTC light client between Babylon and Babylon contract, and is secure as long as Babylon is secure. In the future it will support fork choice in order to allow any user to submit BTC headers to Babylon contract.

Also doesn't this approach fail if there is no checkpoints finalized for over W blocks as then we will have gap in blocks we will send to smart contract ?

No, because as long as one block in lastSentSegment is still canonical, Babylon will send all BTC headers since that canonical block in lastSentSegment, even if the number of BTC headers is larger than w.

Also in the case where Babylon does not have checkpoints over w BTC blocks, doesn't Babylon lose liveness and vigilante monitor will raise alarms? Babylon (and phase2 integration) has liveness only when there is a checkpoint in every w BTC blocks. No checkpoint in w consecutive BTC blocks implies that Babylon is censoring BTC checkpoints.

with each finalised checkpoint, find lowest common ancestor between smart contract chain and current babylon tip.

This was the exact design of phase2 integration. However, in the new data model of BTC light client, finding lowest common ancestor of two forks seems impossible, because every header that is not canonical is removed.

store last W+1 blocks known by smart contract

This will make the zoneconcierge stateful in the sense that it rememebers the state of each Babylon contract. Given that Babylon contract will allow anyone to submit BTC headers, Babylon won't precisely know the state of the Babylon contract. This will also introduce the O(w * n) storage overhead where n is the number of chains doing phase-2 integration.

In fact, Babylon will send exactly the same BTC segment to each chain with phase-2 inetrgation, so this mechanism effectively falls back to storing the last w+1 BTC headers in Babylon's BTC light client. The current implementation can be considered as an "optimisation" to this approach, i.e., if the last BTC segment is not entirely reverted then Babylon sends much fewer BTC headers to Babylon contract than w+1.

Overall, I think we can live with having the w assumption (which is already the case) in this part of the code for now, and add back the functionality to preserve forks and find common ancestors if there is a compelling use case. Wdyt?

This will make the zoneconcierge stateful in the sense that it rememebers the state of each Babylon contract. Given that Babylon contract will allow anyone to submit BTC headers, Babylon won't precisely know the state of the Babylon contract. This will also introduce the O(w * n) storage overhead where n is the number of chains doing phase-2 integration.

Doesn't this whole logic go away when btc lc smart contract will be allow anybody to submit headers ? Then it true that every chain can have btc lc in different state, so that chain submitted by Babylon may be not the best i.e
babylon finalized checkpoint with btc chain with length - 10, but in the mean time chain has beed extended to 11, so whatever Babylon will send will be invalid.

Overall, I think we can live with having the w assumption (which is already the case) in this part of the code for now, and add back the functionality to preserve forks and find common ancestors if there is a compelling use case. Wdyt?

Yup, I think we need to live with W assumption as this is perversive in many places.

Doesn't this whole logic go away when btc lc smart contract will be allow anybody to submit headers ?

Ideally, if there 1 honest guy who keeps sending new BTC headers to Babylon contract, then yes Babylon does not need to send it over again. Thing is that we cannot rely on random users for Babylon contract's liveness. Thus Babylon still needs to send over BTC headers. Some of those headers might be duplicated, so Babylon contract is implemented in a way that it ignores (but does not reject) duplicated BTC headers.

Ideally, if there 1 honest guy who keeps sending new BTC headers to Babylon contract, then yes Babylon does not need to send it over again. Thing is that we cannot rely on random users for Babylon contract's liveness. Thus Babylon still needs to send over BTC headers. Some of those headers might be duplicated, so Babylon contract is implemented in a way that it ignores (but does not reject) duplicated BTC headers.

So this is a choice in the sense that:

either we leave Babylon sending those headers

or we introduce assumption in smart contract that there is at least one header reporter for btc headers (similar assumption as in Babylon)

We could have have both of them, namely Babylon and a vigilante program submit BTC headers to Babylon contract simultaneously. This is to answer the concerns about censorship resistance of Babylon contract -- even Babylon is censoring BTC headers, Babylon contract can still receive BTC headers, and vigilante monitor on phase2 integration will panick if Babylon censors BTC timestamps as well.

KonradStaniec

I think it looks good, given what I now know about smart contract.

Although new fuzz tests are failing.

KonradStaniec · 2024-01-22T08:12:45Z

x/zoneconcierge/keeper/ibc_packet_btc_timestamp.go

@@ -182,6 +195,12 @@ func (k Keeper) getHeadersToBroadcast(ctx context.Context) []*btclctypes.BTCHead
 		}
 	}

+	if initHeader == nil {
+		// if initHeader is nil, then this means a large reorg happens such that all headers


then this means a large reorg it not necessarily large reorg. In case when segment had length 1 which is possible when checkpoints finalizing different epoch are in adjacent btc blocks, this could be small re-org

SebastianElvis · 2024-01-22T08:49:43Z

Although new fuzz tests are failing.

There was a mistake in the assertions, making the test flaky. Fixed and fuzzed the test locally

SebastianElvis added 6 commits January 20, 2024 21:49

init

b607fdd

test

25917fc

make linter happy

c1c278a

only send last w+1 BTC headers upon init

54012e9

replace rollback point with last w+1 BTC headers

95168df

fix ci

bcf7644

SebastianElvis requested review from KonradStaniec and vitsalis January 21, 2024 10:31

KonradStaniec reviewed Jan 21, 2024

View reviewed changes

Merge branch 'dev' into fix-zc-ibc-panic

f29273b

KonradStaniec approved these changes Jan 22, 2024

View reviewed changes

fix ci test

b13ab52

SebastianElvis merged commit fa904d3 into dev Jan 22, 2024
4 checks passed

SebastianElvis deleted the fix-zc-ibc-panic branch January 22, 2024 08:58

SebastianElvis mentioned this pull request Jan 24, 2024

doc: more details in ZoneConcierge's documentation #422

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: making zoneconcierge resilient against long BTC reorg (again) #413

fix: making zoneconcierge resilient against long BTC reorg (again) #413

SebastianElvis commented Jan 21, 2024 •

edited

Loading

KonradStaniec Jan 21, 2024

SebastianElvis Jan 21, 2024 •

edited

Loading

KonradStaniec Jan 22, 2024

SebastianElvis Jan 22, 2024

KonradStaniec Jan 22, 2024

SebastianElvis Jan 22, 2024

KonradStaniec left a comment

KonradStaniec Jan 22, 2024

SebastianElvis commented Jan 22, 2024 •

edited

Loading

fix: making zoneconcierge resilient against long BTC reorg (again) #413

fix: making zoneconcierge resilient against long BTC reorg (again) #413

Conversation

SebastianElvis commented Jan 21, 2024 • edited Loading

Appendix: stack trace upon a reorg reverting the entire LastSentSegment

KonradStaniec Jan 21, 2024

Choose a reason for hiding this comment

SebastianElvis Jan 21, 2024 • edited Loading

Choose a reason for hiding this comment

KonradStaniec Jan 22, 2024

Choose a reason for hiding this comment

SebastianElvis Jan 22, 2024

Choose a reason for hiding this comment

KonradStaniec Jan 22, 2024

Choose a reason for hiding this comment

SebastianElvis Jan 22, 2024

Choose a reason for hiding this comment

KonradStaniec left a comment

Choose a reason for hiding this comment

KonradStaniec Jan 22, 2024

Choose a reason for hiding this comment

SebastianElvis commented Jan 22, 2024 • edited Loading

SebastianElvis commented Jan 21, 2024 •

edited

Loading

Appendix: stack trace upon a reorg reverting the entire `LastSentSegment`

SebastianElvis Jan 21, 2024 •

edited

Loading

SebastianElvis commented Jan 22, 2024 •

edited

Loading