Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

btcstaking: improved algorithm for voting power update #486

Merged

Conversation

SebastianElvis
Copy link
Member

@SebastianElvis SebastianElvis commented Feb 19, 2024

This PR introduces the improved algorithm for updating voting power table and distribution cache. Instead of iterating over all finality providers and BTC delegations, the algorithm reconciles the previous voting power distribution with all new voting power update events. This reduces the complexity of BeginBlocker from O(# of all BTC delegations) to O(# of BTC delegations whose states are updated at this height).

The algorithm works as follows:

  1. Find the BTC tip height h at current Babylon height and BTC tip height h' at the prior Babylon height (ie current - 1)
  2. Find all voting power distribution update events during [h', h]
  • This includes newly active BTC delegations, newly unbonded BTC delegations, and slashed finality providers
  1. Reconcile the last voting power distribution with all these events, as follows
  2. exclude all finality providers that are newly slashed
  3. exclude all newly unbonded BTC delegations
  4. add all newly active BTC delegations
  5. filter out top N finality providers as active fp set
  6. record the new voting power table and cache
  7. record all metrics
  8. remove all processed events

Along the way, this PR also simplifies and fixes all existing fuzz tests, which have covered most of the scenarios.

TODO (that can be in subsequent PRs):

  • update docs
  • add more fuzz tests

@SebastianElvis SebastianElvis marked this pull request as ready for review February 21, 2024 04:48
@SebastianElvis SebastianElvis changed the title new algorithm for voting power update btcstaking: improved algorithm for voting power update Feb 21, 2024
NewState: types.BTCDelegationStatus_PENDING,
}
if err := sdkCtx.EventManager().EmitTypedEvent(event); err != nil {
panic(fmt.Errorf("failed to emit EventBTCDelegationStateUpdate for the new pending BTC delegation: %w", err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it is a little too harsh to panic if the emit event fails, maybe just log or return err

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that events are part of the consensus, if some validators successfully emit events while others do not, then there will be consensus failure. Panicking here allows the validators with event issues to stop immediately, resolve the issues, and restart. Wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Events do not affect appHash
The error that can happen on EmitTypedEvent is to unmarshal the typed event, not on EmitEvent part

So, if one validator sends the event and other doesn't, it does not affect consensus...

my opinion is

panics should be avoided in almost all cases (besides from rarely bad cases like consensus failures)

Copy link
Member Author

@SebastianElvis SebastianElvis Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a part of app hash, but a part of Tendermint consensus (i.e., as input of some hash values in Tendermint headers). There was a decision that Tendermint makes events a part of consensus to make them deterministic (here and here). So in case if many validators fail to emit events, the Cosmos SDK app hash won't diverge, but Tendermint consensus will stall.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda agree with @RafilxTenfen in the sense that if the panic indeed happens, then it happens in all the validators and the chain will completely stall until a fix is deployed over >2/3 validators. But if we log error/alert here, we won't have a stall.

As we have used panics in all events emitting places, we can revisit this problem later.

Copy link
Member Author

@SebastianElvis SebastianElvis Feb 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then it happens in all the validators and the chain will completely stall until a fix is deployed over >2/3 validators. But if we log error/alert here, we won't have a stall.

This is not necessarily the only possibility -- another possible scenario (which I think is more likely) is that some validators did something wrong in their configurations and they failed to emit events, while other validators emit events without problem. In this case if validators with invalid configurations return an error while other validators go through the process, then the two sets of validators will end up with different app hash. If we panick here these validators still have a chance to fix the configuration and restart the node to catch up, without doing any roll back.

Copy link
Contributor

@RafilxTenfen RafilxTenfen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of suggestions, but no impediments
utACK

@SebastianElvis SebastianElvis force-pushed the new-voting-power-update-algorithm branch from 176697b to e1a40d2 Compare February 21, 2024 23:35
Copy link
Contributor

@gitferry gitferry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. Spent some time to get my head around. In general, both the algorithm and implementation lgtm. Awesome work!

One question regarding the necessity of introducing ActiveFinalityProviders field. If we always store FinalityProviders in the order of voting power. It would be easy to get active set out of this. So do we really need ActiveFinalityProviders?

Some other comments:

x/btcstaking/keeper/btc_delegations.go Outdated Show resolved Hide resolved

// for each finality provider the delegation restakes to, update its index
for _, fpBTCPK := range btcDel.FpBtcPkList {
var btcDelIndex = types.NewBTCDelegatorDelegationIndex()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var btcDelIndex = types.NewBTCDelegatorDelegationIndex()
var btcDelIndex *types.BTCDelegatorDelegationIndex

Is NewBTCDelegatorDelegationIndex needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, since later on we invoke btcDelIndex.Add. If it's nil then the function invocation will panick

Comment on lines +28 to +29
// if the previous height is not indexed (which might happen at genesis),
// use the base header
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Genesis is the only case that btc height is not indexed, right? Maybe we can check whether babylonHeight == 0 and panic if not (actually cosmos forbids access of height 0, so maybe we can panic eight way)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if genesis is the only case that this might happen. Since BTC light client is initialised earlier than BTC staking module I would say using the base header is the safest solution. Wdyt?

x/btcstaking/keeper/incentive.go Show resolved Hide resolved
x/btcstaking/keeper/power_dist_change.go Outdated Show resolved Hide resolved
@SebastianElvis
Copy link
Member Author

If we always store FinalityProviders in the order of voting power. It would be easy to get active set out of this.

Good point, you are right! Will revise

x/btcstaking/keeper/power_dist_change.go Outdated Show resolved Hide resolved
/*
filter and classify all events into new/expired BTC delegations and slashed FPs
*/
for _, event := range events {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could probably be separate function called filterAndClassifyAllEvents

On the other hand, why do we need this classification step ?

Wouldn't it be enough to:

  1. Create map fpBtcPkHex -> FinalityProviderDistInfo based on old distribution cache
  2. iterate over all events and apply the changes to this map based on events
  3. Create new distribution cache based on this new map

This flow maps a bit more nicely to the flow I had in mind for the whole thing i.e newDistCache = appalyEvents(oldDistCahce, Events) but maybe I am missing some detail here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because the distribution cache uses lists for finality providers and BTC delegations, thus cannot support efficient query. Say there is a BTC delegation expired, we need to do an O(n) operation to find that BTC delegation and remove it. This needs to be done for every BTC delegation.

Right now we classify these events and assign them to maps, such that we only need to iterate over distribution cache once

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I think I understand.

On the other hand with this approach with map you need to iterate over all finality providers and all their delegations in this middle loop:

	// iterate over all finality providers and apply all events
	for i := range dc.FinalityProviders {
		// create a copy of the finality provider
		fp := *dc.FinalityProviders[i]
		fp.TotalVotingPower = 0
		fp.BtcDels = []*types.BTCDelDistInfo{}

		fpBTCPKHex := fp.BtcPk.MarshalHex()

		// if this finality provider is slashed, continue to avoid recording it
		if _, ok := slashedFPs[fpBTCPKHex]; ok {
			continue
		}

		// add all BTC delegations that are not unbonded to the new finality provider
		for j := range dc.FinalityProviders[i].BtcDels {
			btcDel := *dc.FinalityProviders[i].BtcDels[j]
			if _, ok := unbondedBTCDels[btcDel.StakingTxHash]; !ok {
				fp.AddBTCDelDistInfo(&btcDel)
			}
		}

		// process all new BTC delegations under this finality provider
		if fpActiveBTCDels, ok := activeBTCDels[fpBTCPKHex]; ok {
			// handle new BTC delegations for this finality provider
			for _, d := range fpActiveBTCDels {
				fp.AddBTCDel(d)
			}
			// remove the finality provider entry in activeBTCDels map, so that
			// after the for loop the rest entries in activeBTCDels belongs to new
			// finality providers with new BTC delegations
			delete(activeBTCDels, fpBTCPKHex)
		}

		// add this finality provider to the new cache if it has voting power
		if fp.TotalVotingPower > 0 {
			newDc.AddFinalityProviderDistInfo(&fp)
		}
	}

which seems wasteful. I think main advantage of having the event based model here is that we are able caluclate to per block reward and power distribution diffs.

In most cases the differences in rewards/power between Babylon blocks will be minimal like few new delegations and few unbondings. Given that it seems to me that it we should avoid iteration over all data from past distribution.

Not a blocker for me for now, we can improve it after adding more tests to this 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most cases the differences in rewards/power between Babylon blocks will be minimal like few new delegations and few unbondings. Given that it seems to me that it we should avoid iteration over all data from past distribution.

Yep agreed. Roughly speaking the current version has the complexity of O(n*x), while the optimal complexity is O(n log n), where n is the number of active finality providers and x is the number of active BTC delegations under each finality provider. To achieve the optimal value we need to use a map to store all BTC delegation's voting power, which seems not straightforward given that we cannot use map that is non-deterministic. Added a TODO for now.

x/btcstaking/keeper/power_dist_change.go Show resolved Hide resolved
proto/babylon/btcstaking/v1/incentive.proto Outdated Show resolved Hide resolved
x/btcstaking/keeper/btc_delegations.go Outdated Show resolved Hide resolved
x/btcstaking/types/incentive.go Outdated Show resolved Hide resolved
@SebastianElvis
Copy link
Member Author

Thanks all for the great comments! I have addresses or answers all comments, notably

  • sorting finality providers in the voting power distribution cache and avoid saving a copy of active ones
  • only saving events that affect voting power distribution (i.e., excluding pending BTC delegations)
  • some refactoring of public/private functions

Please feel free to take a look again.

Copy link
Contributor

@KonradStaniec KonradStaniec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Left only some potential sugguestions for future improvements :)

/*
filter and classify all events into new/expired BTC delegations and slashed FPs
*/
for _, event := range events {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I think I understand.

On the other hand with this approach with map you need to iterate over all finality providers and all their delegations in this middle loop:

	// iterate over all finality providers and apply all events
	for i := range dc.FinalityProviders {
		// create a copy of the finality provider
		fp := *dc.FinalityProviders[i]
		fp.TotalVotingPower = 0
		fp.BtcDels = []*types.BTCDelDistInfo{}

		fpBTCPKHex := fp.BtcPk.MarshalHex()

		// if this finality provider is slashed, continue to avoid recording it
		if _, ok := slashedFPs[fpBTCPKHex]; ok {
			continue
		}

		// add all BTC delegations that are not unbonded to the new finality provider
		for j := range dc.FinalityProviders[i].BtcDels {
			btcDel := *dc.FinalityProviders[i].BtcDels[j]
			if _, ok := unbondedBTCDels[btcDel.StakingTxHash]; !ok {
				fp.AddBTCDelDistInfo(&btcDel)
			}
		}

		// process all new BTC delegations under this finality provider
		if fpActiveBTCDels, ok := activeBTCDels[fpBTCPKHex]; ok {
			// handle new BTC delegations for this finality provider
			for _, d := range fpActiveBTCDels {
				fp.AddBTCDel(d)
			}
			// remove the finality provider entry in activeBTCDels map, so that
			// after the for loop the rest entries in activeBTCDels belongs to new
			// finality providers with new BTC delegations
			delete(activeBTCDels, fpBTCPKHex)
		}

		// add this finality provider to the new cache if it has voting power
		if fp.TotalVotingPower > 0 {
			newDc.AddFinalityProviderDistInfo(&fp)
		}
	}

which seems wasteful. I think main advantage of having the event based model here is that we are able caluclate to per block reward and power distribution diffs.

In most cases the differences in rewards/power between Babylon blocks will be minimal like few new delegations and few unbondings. Given that it seems to me that it we should avoid iteration over all data from past distribution.

Not a blocker for me for now, we can improve it after adding more tests to this 👍

x/btcstaking/keeper/btc_delegations.go Outdated Show resolved Hide resolved
delEvent := typedEvent.BtcDelStateUpdate
if delEvent.NewState == types.BTCDelegationStatus_ACTIVE {
// newly active BTC delegation
btcDel, err := k.GetBTCDelegation(ctx, delEvent.StakingTxHash)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I our activation event would contain all necesary data to build:

type BTCDelDistInfo struct {
	// btc_pk is the Bitcoin secp256k1 PK of this BTC delegation
	// the PK follows encoding in BIP-340 spec
	BtcPk *github_com_babylonchain_babylon_types.BIP340PubKey `protobuf:"bytes,1,opt,name=btc_pk,json=btcPk,proto3,customtype=github.com/babylonchain/babylon/types.BIP340PubKey" json:"btc_pk,omitempty"`
	// babylon_pk is the Babylon public key of the BTC delegation
	BabylonPk *secp256k1.PubKey `protobuf:"bytes,2,opt,name=babylon_pk,json=babylonPk,proto3" json:"babylon_pk,omitempty"`
	// staking_tx_hash is the staking tx hash of the BTC delegation
	StakingTxHash string `protobuf:"bytes,3,opt,name=staking_tx_hash,json=stakingTxHash,proto3" json:"staking_tx_hash,omitempty"`
	// voting_power is the voting power of the BTC delegation
	VotingPower uint64 `protobuf:"varint,4,opt,name=voting_power,json=votingPower,proto3" json:"voting_power,omitempty"`
}

We could avoid this random db read. Maybe todo ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not straightforward and in some sense not evitable. A BTC delegation can stake to multiple finality providers thus has the field FpBtcPkList for a list of finality providers' PKs. Meanwhile, BTCDelDistInfo does not care about finality providers' PKs.

Given that we limit the number of GetBTCDelegation invocations (the hotspot) to O(# of newly active BTC delegation) (which is optimal), maybe this is fine. Wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm maybe we will see when we will do load testing again :)

I wondering, as we will launch mainnet without restaking, maybe we should just have 1 to 1 mapping between delegation and finality provider ? I really do not like having features just in case :)

}

// NOTE: we don't need to record events for pending BTC delegations since these
// do not affect voting power distribution
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just food for thought.

After thinking on it a bit longer, maybe it is usefull to have pending delegations in our VotingPowerDistCache ?

If we would do that then we could make it that the order of BTCDelDistInfo under each FinalityProviderDistInfo will be the same as the order of delegations in finality provider delegations index. This mean that whole VotingPowerDistCache will be a bit like current snapshot of our whole db.

Having such setup, we could add delegationIndex under each finality provider to each activation/unbonding event. Then during voting power update, we could use this indexes to avoid linear scans.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we would do that then we could make it that the order of BTCDelDistInfo under each FinalityProviderDistInfo will be the same as the order of delegations in finality provider delegations index. This mean that whole VotingPowerDistCache will be a bit like current snapshot of our whole db.

There is a problem that delegations in finality provider delegations index in the KV store are out of order -- BTC delegations are ordered by their staking tx hash, and finality provider delegations are ordered by the finality provider's BTC PK. So their orders are different from VotingPowerDistCache here. So this does not help with avoiding linear scans unfortunately

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm from what I see delegations in index are ordered by time of arribal i.e with each new delegation we just append them to index:

		if err := btcDelIndex.Add(stakingTxHash); err != nil {
			return types.ErrInvalidStakingTx.Wrapf(err.Error())
		}

So if we would update BtcDels []*BTCDelDistInfo just by processing events in order then this field would have the same order as delegation index of validator.

What am I missing ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah you are right, yes in the BTC delegation index the BTC delegations are ordered by time. However I do not see how this helps with avoiding linear scan. The BTC delegation index is keyed by finality provider PK and BTC staker PK, while the events are only keyed by the Babylon height. It needs some thought if we want to use BTC delegation index to track state updates.

Copy link
Contributor

@gitferry gitferry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work!

NewState: types.BTCDelegationStatus_PENDING,
}
if err := sdkCtx.EventManager().EmitTypedEvent(event); err != nil {
panic(fmt.Errorf("failed to emit EventBTCDelegationStateUpdate for the new pending BTC delegation: %w", err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda agree with @RafilxTenfen in the sense that if the panic indeed happens, then it happens in all the validators and the chain will completely stall until a fix is deployed over >2/3 validators. But if we log error/alert here, we won't have a stall.

As we have used panics in all events emitting places, we can revisit this problem later.

x/btcstaking/keeper/power_dist_change.go Outdated Show resolved Hide resolved
x/btcstaking/keeper/power_dist_change.go Show resolved Hide resolved
@SebastianElvis SebastianElvis merged commit 32ef3f5 into base/index-optimise-btcstaking Feb 26, 2024
4 checks passed
@SebastianElvis SebastianElvis deleted the new-voting-power-update-algorithm branch February 26, 2024 00:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants