Add block time to block table #5

RiccardoM · 2020-09-02T05:36:33Z

Currently when storing a new block, we save the block timestamp only. However, some applications might want to get the block time as well. For this reason, we need to add a row computing the time as follows:

if no blocks exist, use the genesis time to compute the block time
if a previous block exists, use that block's timestamp to compute the block time

References forbole/callisto#39

RiccardoM · 2020-09-02T09:34:18Z

@HarleyAppleChoi @kwunyeung When taking a look at this issue, I've found how that we cannot implement this inside Juno.

The reason is that in order to compute the time block, you have to have:

the genesis time, and
the time for the block previous to the one you are computing the difference for

These two requirements collide with the current working of Juno. Currently, you can run Juno in three different modes:

listening for new blocks only;
parsing old blocks only until the current block height;
listening for new blocks and at the same time parsing old blocks.

While (2) does not have any problems, (1) and (3) do. Particularly, the listening for new blocks might cause some problems:

to compute the block time you have to have the previous block timestamp, but
the previous block might not exist yet due to it not being parsed.

This conflict would cause extremely incorrect block time calculations.

Solution

When thinking how to solve this problem, I figured out that it might be a lot easier and faster to perform this kind of computation client side.

The idea is the following:

Subscribe to new blocks.
Each time a new block is created, get all the timestamps for the existing blocks.
Compute the average block time.

The time complexity of this solution is actually O(n) which would result in the computation taking > 1s only when the blocks count is very hihg (>= 10.000.000 blocks) as you can see from this NodeJS looping performance post.

To better show my solution, I've created a repository: AvgBlockTimeCalculator. Inside it you can find and test the code by yourself.

Please let me know what you guys think about this.

kwunyeung · 2020-09-06T18:55:30Z

@RiccardoM I guess calculating each block time would be fine being done on client side but not the average block time. For example, the current block height of cosmoshub-3 is at 3.26m. It's not possible to get all block timestamp and calculate the average when a user request. Consider there are 50 concurrent users, the server will be querying the database for hundreds of millions of block data.

From the repo, the average block time is only being calculated by the subscription which only taking the responded data into account. It will not have the average block time from genesis block.

Saving each block time in the database is good as this data won't be changed after a single processing. I'm thinking if the following solution is possible.

Parse the blocks without calculating the block time
Setup a timer, get all processed blocks without block time and calculate block time of H block against H-1.
If H-1 doesn't existing, skip H.

There are two types of AverageBlockTime, OverallAverage and AverageBy{TimeInterval}.

To calculate the OverallAverage block time, we don't need to know the block time of each block. This can be done by calculating

OverallAverage = (CurrentBlockTimestamp - GenesisBlockTime) / CurrentBlockHeight

The AverageBy{TimeInterval} can be calculated by another timer process or cron job. For example, every day midnight, make the following calculation

AverageByLast24Hrs = (TodayLastBlockTimestamp - YesterdayBlockTimestamp) / (TodayLastBlockHeight - YesterdayLastBlockHeight)

RiccardoM · 2020-09-07T07:52:19Z

@kwunyeung Let me answer you by points

Consider there are 50 concurrent users, the server will be querying the database for hundreds of millions of block data.

I honestly don't think this is a problem at all. Hasura has shown us that it can scale pretty well. You can see this inside their "See Hasura scale to 1 million active GraphQL subscriptions" article. You can see that they had 1 million clients and they were updating 1 million rows per seconds. At this workload, Postgres was at only 28% load.

I don't think we will ever reach this number, so I think we can safely assume we will not have any problem about this.

From the repo, the average block time is only being calculated by the subscription which only taking the responded data into account. It will not have the average block time from genesis block.

Actually, that subscription gets all the blocks ever created, starting from the first up to the latest. So the blocktime is the most accurate possible. It takes all the blocks, gets their time and computes the average block time of all of them.

I'm thinking if the following solution is possible.

Parse the blocks without calculating the block time

Setup a timer, get all processed blocks without block time and calculate block time of H block against H-1.

If H-1 doesn't existing, skip H.

The problem with this approach is that we might have to run this process a lot of times before getting all the block times since it's recursive (to compute H you need H - 1).

Let's take an example. Let's assume that we now start BDJuno against a chain that has 1.000.000 blocks already created. Let's also assume that in order to parse all the blocks, and get to the latest height, this will take BDJuno 1 hour per 10.000 blocks. Let's also assume we update the block time once per hour.
In this case, we will have

the block time computed considering only the first block for the first 1 hour
the block time computed on the blocks 1 - 10.000 for the second hour
the block time computed on the blocks 10.001 - 20.000 for the third hour
... and so on

This will result in a block time that is never in real time, since it's only computed once every X amount of time.

kwunyeung · 2020-09-08T07:24:27Z

Ok, then we don't have to store the block time of every block. Just calculate on client side. The average block time can be done by the simple math I mentioned and they should be saved in the table when a new block is processed. This data is being treated as the latest status of the chain. The aggregated data would help us create more statistical analysis. In the current version, I have AverageBlockTimePerMin, AverageBlockTimePerHour and AverageBlockTimePerDay. The same goes for delegation ratio and number of transactions.

RiccardoM · 2020-09-08T08:39:52Z

@kwunyeung I think that can be done inside BDJuno then, am I correct? If so I will take care of creating an issue to describe how to implement this properly.

kwunyeung · 2020-09-08T08:41:20Z

@RiccardoM yes it can be done inside BDJuno. Thanks!

RiccardoM mentioned this issue Sep 2, 2020

Get average blocktime for chain forbole/callisto#39

Closed

RiccardoM added the enhancement New feature or request label Sep 2, 2020

RiccardoM self-assigned this Sep 2, 2020

RiccardoM mentioned this issue Sep 8, 2020

Track chain average block times forbole/callisto#50

Closed

RiccardoM closed this as completed Sep 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add block time to block table #5

Add block time to block table #5

RiccardoM commented Sep 2, 2020

RiccardoM commented Sep 2, 2020

kwunyeung commented Sep 6, 2020

RiccardoM commented Sep 7, 2020

kwunyeung commented Sep 8, 2020

RiccardoM commented Sep 8, 2020

kwunyeung commented Sep 8, 2020

Add block time to block table #5

Add block time to block table #5

Comments

RiccardoM commented Sep 2, 2020

RiccardoM commented Sep 2, 2020

Solution

kwunyeung commented Sep 6, 2020

RiccardoM commented Sep 7, 2020

kwunyeung commented Sep 8, 2020

RiccardoM commented Sep 8, 2020

kwunyeung commented Sep 8, 2020