New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change liveness window storage method #2562

Open
ValarDragon opened this Issue Oct 23, 2018 · 6 comments

Comments

Projects
None yet
3 participants
@ValarDragon
Copy link
Member

ValarDragon commented Oct 23, 2018

Currently we store the liveness window per validator as a single bit array. On every block, we read and set to the liveness window bit array for every validator. On the current testnet, the liveness window is 10000 blocks (source: https://github.com/cosmos/cosmos-sdk/blob/master/x/slashing/params.go#L70). This is at least 1250 bytes (plus some additional overhead due to protobuf encoding)

This means that 1250B * 100 = 125 KB is being written and also read, every single block. Given an additional IAVL overhead, this may be a non-negligible contribution to #2131. When we scale to mainnet, one block every 5 seconds translates to ~ 35k blocks per 48 hours, hence are window will have to be about ~35k blocks. This means > 400KB are being read, and then written every block. (> 800KB of total IO activity)

I propose instead separating more of the bit array into separate leaves. I propose that in state, per validator, we split the bit array into separate bit arrays of 4096 bits. Now staking would need a get operation that would obtain the correct thing from the store, and then get the right index. This is relatively simple though, since note that every element in the bit_array has all bits the same, except the last 12 bits. So just index the bitarray in the IAVL tree by log_2(window_size) - 12 bits, and within the bit array use the last 12 bits to index the correct bit within there. This would keep the reads/writes 63KB respectively, regardless of window size. We can abstract this more, by allowing any power of 2 as the batch size.

There should be no issues with rebalancing, since the keys will all be created at once for each bonded validator.

@alexanderbez

This comment has been minimized.

Copy link
Contributor

alexanderbez commented Oct 23, 2018

This is relatively simple though, since note that every element in the bit_array has all bits the same, except the last 12 bits.

Can you elaborate on this?

@ValarDragon

This comment has been minimized.

Copy link
Member

ValarDragon commented Oct 23, 2018

Sure. If everything has an index, and is broken up into 2**12 blocks, the blocks can be aligned s.t. block 0 starts at index 0 and ends at index 2**12 - 1. In this case, the last 12 bits of every element within a block is the same. Then when you want to read a particular index i you do:

get_block(i >> 12).Get(i & 0xFFF)

@cwgoes

This comment has been minimized.

Copy link
Contributor

cwgoes commented Dec 14, 2018

@ValarDragon We split the array by key, and use an index offset - which part of our implementation scales with the size of the window?

@ValarDragon

This comment has been minimized.

Copy link
Member

ValarDragon commented Dec 14, 2018

Ah your right, I must have misread the code originally. We should still do what the suggestion is though, since were currently adding at least 32 bytes to the IAVL tree if not more, for 1 bit of data, which is quite silly / inefficient.

@cwgoes

This comment has been minimized.

Copy link
Contributor

cwgoes commented Dec 14, 2018

OK but this is quite minor - also I think we should see if we could isolate these sorts of optimizations in an intermediary layer instead of requiring that they be implemented at the application level; it makes things more confusing to read.

@ValarDragon

This comment has been minimized.

Copy link
Member

ValarDragon commented Dec 14, 2018

I don't think this is minor in the short term. At 35k blocks per window, that means there are 35 thousand state entries. We currently amino decode a length prefixed bool per entry. I'm unsure how length prefixing works on bools, so lets assume this is just 1 byte. The IAVL leaf for it is 32 bytes. This means at 100 validators, there are 100 * 35000 * (32 + 1) bytes = 116MB, accounting for the additional layer of inner nodes it creates, its more like 200MB. And this is for just 3.5 million bits of information (437KB)

We can generalize this by making a hash-based tree friendly bitarray type that takes a prefix, so the API within slashing would be as a normal bit array.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment