Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add block time to block table #5

Closed
RiccardoM opened this issue Sep 2, 2020 · 6 comments
Closed

Add block time to block table #5

RiccardoM opened this issue Sep 2, 2020 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@RiccardoM
Copy link
Contributor

Currently when storing a new block, we save the block timestamp only. However, some applications might want to get the block time as well. For this reason, we need to add a row computing the time as follows:

  • if no blocks exist, use the genesis time to compute the block time
  • if a previous block exists, use that block's timestamp to compute the block time

References forbole/callisto#39

@RiccardoM RiccardoM added the enhancement New feature or request label Sep 2, 2020
@RiccardoM RiccardoM self-assigned this Sep 2, 2020
@RiccardoM
Copy link
Contributor Author

@HarleyAppleChoi @kwunyeung When taking a look at this issue, I've found how that we cannot implement this inside Juno.

The reason is that in order to compute the time block, you have to have:

  1. the genesis time, and
  2. the time for the block previous to the one you are computing the difference for

These two requirements collide with the current working of Juno. Currently, you can run Juno in three different modes:

  1. listening for new blocks only;
  2. parsing old blocks only until the current block height;
  3. listening for new blocks and at the same time parsing old blocks.

While (2) does not have any problems, (1) and (3) do. Particularly, the listening for new blocks might cause some problems:

  • to compute the block time you have to have the previous block timestamp, but
  • the previous block might not exist yet due to it not being parsed.

This conflict would cause extremely incorrect block time calculations.

Solution

When thinking how to solve this problem, I figured out that it might be a lot easier and faster to perform this kind of computation client side.

The idea is the following:

  1. Subscribe to new blocks.
  2. Each time a new block is created, get all the timestamps for the existing blocks.
  3. Compute the average block time.

The time complexity of this solution is actually O(n) which would result in the computation taking > 1s only when the blocks count is very hihg (>= 10.000.000 blocks) as you can see from this NodeJS looping performance post.

To better show my solution, I've created a repository: AvgBlockTimeCalculator. Inside it you can find and test the code by yourself.

Please let me know what you guys think about this.

@kwunyeung
Copy link
Member

@RiccardoM I guess calculating each block time would be fine being done on client side but not the average block time. For example, the current block height of cosmoshub-3 is at 3.26m. It's not possible to get all block timestamp and calculate the average when a user request. Consider there are 50 concurrent users, the server will be querying the database for hundreds of millions of block data.

From the repo, the average block time is only being calculated by the subscription which only taking the responded data into account. It will not have the average block time from genesis block.

Saving each block time in the database is good as this data won't be changed after a single processing. I'm thinking if the following solution is possible.

  1. Parse the blocks without calculating the block time
  2. Setup a timer, get all processed blocks without block time and calculate block time of H block against H-1.
  3. If H-1 doesn't existing, skip H.

There are two types of AverageBlockTime, OverallAverage and AverageBy{TimeInterval}.

To calculate the OverallAverage block time, we don't need to know the block time of each block. This can be done by calculating

OverallAverage = (CurrentBlockTimestamp - GenesisBlockTime) / CurrentBlockHeight

The AverageBy{TimeInterval} can be calculated by another timer process or cron job. For example, every day midnight, make the following calculation

AverageByLast24Hrs = (TodayLastBlockTimestamp - YesterdayBlockTimestamp) / (TodayLastBlockHeight - YesterdayLastBlockHeight)

@RiccardoM
Copy link
Contributor Author

@kwunyeung Let me answer you by points

Consider there are 50 concurrent users, the server will be querying the database for hundreds of millions of block data.

I honestly don't think this is a problem at all. Hasura has shown us that it can scale pretty well. You can see this inside their "See Hasura scale to 1 million active GraphQL subscriptions" article. You can see that they had 1 million clients and they were updating 1 million rows per seconds. At this workload, Postgres was at only 28% load.

I don't think we will ever reach this number, so I think we can safely assume we will not have any problem about this.


From the repo, the average block time is only being calculated by the subscription which only taking the responded data into account. It will not have the average block time from genesis block.

Actually, that subscription gets all the blocks ever created, starting from the first up to the latest. So the blocktime is the most accurate possible. It takes all the blocks, gets their time and computes the average block time of all of them.


I'm thinking if the following solution is possible.

  1. Parse the blocks without calculating the block time
  2. Setup a timer, get all processed blocks without block time and calculate block time of H block against H-1.
  3. If H-1 doesn't existing, skip H.

The problem with this approach is that we might have to run this process a lot of times before getting all the block times since it's recursive (to compute H you need H - 1).

Let's take an example. Let's assume that we now start BDJuno against a chain that has 1.000.000 blocks already created. Let's also assume that in order to parse all the blocks, and get to the latest height, this will take BDJuno 1 hour per 10.000 blocks. Let's also assume we update the block time once per hour.
In this case, we will have

  • the block time computed considering only the first block for the first 1 hour
  • the block time computed on the blocks 1 - 10.000 for the second hour
  • the block time computed on the blocks 10.001 - 20.000 for the third hour
  • ... and so on

This will result in a block time that is never in real time, since it's only computed once every X amount of time.

@kwunyeung
Copy link
Member

Ok, then we don't have to store the block time of every block. Just calculate on client side. The average block time can be done by the simple math I mentioned and they should be saved in the table when a new block is processed. This data is being treated as the latest status of the chain. The aggregated data would help us create more statistical analysis. In the current version, I have AverageBlockTimePerMin, AverageBlockTimePerHour and AverageBlockTimePerDay. The same goes for delegation ratio and number of transactions.

@RiccardoM
Copy link
Contributor Author

@kwunyeung I think that can be done inside BDJuno then, am I correct? If so I will take care of creating an issue to describe how to implement this properly.

@kwunyeung
Copy link
Member

@RiccardoM yes it can be done inside BDJuno. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants