Progressbar based on time-based estimation of transactions. #2310

Merged
merged 2 commits into from Feb 22, 2013

Conversation

Projects
None yet
10 participants
Owner

sipa commented Feb 14, 2013

Improvement upon #2294.

Owner

sipa commented Feb 14, 2013

(added an own pull so I can try fixing the pulltester build issue)

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/69be30d7e64eba0c936d9221b73b2bc35bebc7e2 for binaries and test log.

qubez commented Feb 15, 2013

Interesting, I had a idea about five days ago of how to improve estimation (which can be network-bound or cpu-bound):

-Include with each checkpoint the size of the blockchain at that checkpoint (which will loosely correspond with CPU work as well as download bandwidth)
-Include with the last checkpoint a multiplier to estimate additional CPU expense per block to do full verification after last checkpoint (i.e. blocks will take 3x more time to verify with signature checking). This can also be a guess of future network growth: "how many more transactions per block will we see than between the last two checkpoints".

then:
-Obtain estimated (or actual) blockchain size at the current block, from interpolation between known sizes at checkpoints,
-Estimate the total size/cpu_work of the current blockchain by extrapolating beyond the last checkpoint to the current height,
-Progress bar percentage is current block's estimated size, divided by total estimated size.

progressbarformula

More advanced than I could write: curve fitting. The blockchain growth looks pretty darn exponential!

Checkpoint blockchain sizes:
block, size
(0, 293)
011111, 2569495
033333, 7820351
074000, 28029745
105000, 69840955
134444, 341554543
168000, 1032555361
193000, 2491771562
210000, 4099216740
216116, 4855459871

Contributor

gavinandresen commented Feb 15, 2013

nit: compiling with clang 3.3 on my Mac:
src/qt/bitcoingui.cpp:534:9: warning: unused variable 'totalSecs' [-Wunused-variable]
int totalSecs = genesisBlockDate.secsTo(currentDate);

And running against a pre-0.8 datadirectory gets a crash, passing a NULL *pblockindex to GuessVerificationProgress from BitcoinGUI::setNumBlocks

Owner

sipa commented Feb 15, 2013

updated

Contributor

schildbach commented Feb 15, 2013

Some feedback:

There is a space (or LF) missing before "Transactions after this will not yet be visible."

I'd prefer proper plurals. Using workarounds like "day(s)" are looking unprofessional. Easy fix: Drop to the next lower unit just before the switch from 2 to 1 would occur.

qubez commented Feb 15, 2013

I ran the win32 pull-tester build without issue, but was taken aback by the new message. "six hours behind" looks like it's going to take six hours to do something.

I don't know if this is the way forward as this communicates even less information. I would rather have an estimate of how long it's going to run before it's caught up. This would be more useful information to end users, they want to know how long until they can use Bitcoin.

Member

luke-jr commented Feb 15, 2013

@schildbach Qt will automatically change "day(s)" to "day" or "days" as appropriate.

Contributor

schildbach commented Feb 15, 2013

@luke-jr Hu? I'm almost sure I saw "day(s)" in the actual running software. I'm running Ubuntu Linux 12.10 in case this matters.

Owner

laanwj commented Feb 15, 2013

This isn't based on the last version of my commits. I fixed the totalSecs already, along with some other changes such as the LF before the transaction message.

@schildbach: in Qt that is avoided by adding the correct plurals in the english translation file, no need to hack around in the source code.

@qubez if you know better just submit a pull request to improve it, I'm pretty darn sick of all the complaining, whatever we change to the progress bar (or something else) there is at least someone that complains that the old situation was better. That's why I was reluctant to touch it, or do much development on bitcoin at all anymore for that matter.

Owner

sipa commented Feb 15, 2013

@laanwj In that case, pull my commits into your pullreq and I'll close this?

Owner

laanwj commented Feb 16, 2013

Yep good idea

Contributor

schildbach commented Feb 16, 2013

I just retested this on latest sipa/progressbar and I am definately seeing the "(s)" in day(s) and hour(s). I'm trying to do a screenshot.

Contributor

schildbach commented Feb 16, 2013

Screenshot from 2013-02-16 11:40:50

laanwj and others added some commits Feb 10, 2013

Change progress bar from block-based to time-based
This is less confusing to most people, and doesn't rely on estimates
of the total number of blocks received from other nodes.
Owner

sipa commented Feb 16, 2013

@laanwj Updated to your latest pull's code (I think...)

@qubez If you're keen on making graphs, can you compare with the method implemented in this pullreq? It's fairly similar to what you propose, except it uses transactions and not blockchain size, as transactions are already computed (so don't require extra checkpoints data)... Or, as said, feel free to write a patch yourself.

@laanwj @schildbach I still see week(s) in the current code. You say that is fixed by adding it to the translations - that isn't done yet then?

Diapolo commented Feb 16, 2013

@schildbach As long as the english translation master file is not updated in the correct way, you will see this generic string. Most of the time I update these file or laanwy does. This is a non-issue, really :).

Owner

laanwj commented Feb 16, 2013

@sipa yes this is the last version, and indeed the translations still have to be updated

Contributor

schildbach commented Feb 17, 2013

There is more. Sometimes it says phrases like "and 0 day(s)". Will the whole phrase also automagically get cut by the translations?

Owner

laanwj commented Feb 17, 2013

I'm pretty sure you cannot get 0 day(s) as under 2 days it shows hours, but it's possible to get 0 hour(s) if secs < 90_60 *but_ count < nTotalBlocks. I suppose something less than an hour could be displayed as an hour.

Contributor

schildbach commented Feb 17, 2013

@laanwj emphasis on the "and". Probably it was "and 0 hours", I just wanted to describe the scheme. Could in future as well happen with "and 0 years" if you decide to add centuries.

Owner

laanwj commented Feb 17, 2013

Fyi the "and" has been gone for a while.

Contributor

schildbach commented Feb 17, 2013

@laanwj Ok thanks for fixing.

With the current version, there is another problem: The progress bar does not appear to progress well. It started with about 100 week(s) behind, and the bar was nearly empty. Now I'm at 29 week(s) behind, and the bar is filled by perhaps 15%. I'd expect to be filled either at about 70% or at about 85%, depending on if the progress is relative to genesis or to app start.

Actually I'm not convinced that a progress bar is a suitable metaphor for blockchain download at all, because a) download never finishes and b) the blockchain is not a chain.

Owner

laanwj commented Feb 17, 2013

The progress bar was relative to app start a few versions ago, and that confused people, as they thought the download restarted every time.

The code in this pull request displays the absolute progress, in verification cycles computed from the number of transactions, from the genesis block to now. This is as precise as it gets: if it is 50% full then this means there is 50% download/verification time still to go.

Anyway, we had that discussion many times already. Believe me, it's a fruitless waste of time, better spent on things like payment protocols/workflows. Let's keep this thread restricted to concrete issues/showstoppers with the current code.

Contributor

schildbach commented Feb 17, 2013

@laanwj Please read my report. My arguement is that about 15% fits neither calculation. Genesis was 210 weeks ago, in my test case I had 29 weeks to go. In terms of time, I had 86% done, yet it showed only about 15%. In terms of transaction I cannot say, but if you look at
https://blockchain.info/charts/n-transactions
its pretty clear visually that the integral before "29 weeks ago" is more than 15% compared to the integral after.

Owner

sipa commented Feb 17, 2013

@schildbach That is because signature verification is only enabled after the last checkpoint, so transaction after that are given much higher weight in the calculation (15 times more). Feel free to suggest a better factor (or heuristic for determining it), but for a test I did, this resulted in a very accurate measurement of processing time.

qubez commented Feb 17, 2013

Signature verification gets "turned on" only for the last 5000 blocks - the problem @schildbach is describing is that most of the download time is downloading blocks 0-216116, the progress of which is figured as 50% by 108058, but the actual percentage of blockchain size/transactions downloaded and processed by then is only about 2% - (see area under curve in my graphic above).

That's the problem which my big unacknowledged post describes a way to fix. Such a fix needs to mess with checkpoints.cpp; you're messing with checkpoints.cpp...

This pull is two things:
1 - bar wording changed ("blocks remaining" -> "days behind")
2 - post-final-checkpoint "correction factor"

"# 2" being here has made it a place to discuss the other missing code:
3 - correction of progress bar percentage to reflect actual download progress (size MB) instead of block number.

If you had just changed the wording with this pull, then "lets also fix # 3" likely wouldn't be bikeshedded here (I searched instead of filing a new issue.)

I figure I could do # 3 with one very long weighted sum, but I'm at the "build env, installed git, now..." part of being able to code/submit.

Owner

sipa commented Feb 17, 2013

@qubez Using block chain size or transaction count shouldn't matter much - they are very correlated.

What this pull does is change the progressbar to reflect "verification time progress". Effectively, it counts/estimates transactions, weighing those after the last checkpoint higher, as those require higher much more CPU.

The factor used to scale post-checkpoint transactions should reflect the time factor (download + process + sigcheck) / (download + process). The factor that is currently hardcoded (15.0) was determined by running on some system, and indeed not taking download time into account. Doing so would probably result in a lower number. Feel free to do benchmarks so the factor can be improved.

If I understand you correctly, you suggest that your method (using block chain size, and hardcode sizes in the checkpoints) will better account for download time? I'm not sure about that - assuming transactions have a constant size (which, on average, is true) and the post-checkpoint time factor is reasonably, the estimation implemented here should be very accurate.

qubez commented Feb 17, 2013

I have no problem believing that sipafactor(tm) is correct for your system and for a reindex or import of blocks already downloaded. However:

1:
a. verification up to last checkpoint is not well multithreaded, vs
b. signature verification is well multithreaded

result: end user reindex on 1 core vs 16 core will have a very different "time factor" correction.

and

2: end user initial blockchain download may have p2p network speed as a limiting factor, not CPU, in which case block number + correction factor after checkpoint will be completely incorrect - the progress bar should be the percentage of blockchain data estimated to have been downloaded (where 100% is an estimate of the current total size).

combined with:

3: The progress bar is only useful for initial blockchain download - any other time, the progress bar is 99% when you restart Bitcoin the next day. Therefore, the progress bar should be accurate for initial blockchain download.

Storing the known size of the blockchain at several points makes for better estimation that the "factor" kicking in at one point near the end of the download:

sipapercent

There undoubtedly is room for some kind of factor, since we won't be storing and can't retrieve the exact size of the blockchain after the last checkpoint. Instead of simply using the same calculated block size from the last checkpoint interval, estimating data after the last checkpoint could have a factor like yours where we guess both additional CPU cost and future growth of block size.

Owner

sipa commented Feb 17, 2013

@qubez I think you misunderstand this patch, given the graph above. I do not have a constant time per block (with a different speed before/after the checkpoint) - it's a constant time per transaction (with a different speed before/after the checkpoint), and transactions have a pretty constant size on average, so it's proportional to the size in bytes of the chain.

Also, I do not claim the current implementation is correct for everyone. That's not possible with a constant slowdown factor after the checkpoint. For it to be exact, this factor would depend on whether or not your downloading (or reindexing from disk), the degree of parallellism you have for script validation, and a few other things probably.

What I do claim, is that given a good factor (specifcally for the user's system - abstracting from how that factor is obtained), it is accurate in the general case. For example, if download time dominates everything and local processing/verification can be ignored, setting the factor=1 should result in more or less the same progress you'd calculate.

Member

gmaxwell commented Feb 17, 2013

@qubez Please don't post graphs which are pure conjecture. If you're going to go through the trouble of making a graph please make it on an actual measurement. What you're proposing is actually just a less accurate version of what Sipa already implemented his change uses the transaction count which is very closely related to size and more strongly correlated with the work than size is).

Automatic sanity-testing: WARNING, see http://jenkins.bluematt.me/pull-tester/9f2467ad6241ce6cf0897ed30c676598d59441a7 for binaries and test log.

This pull decreases total test coverage, please add unit tests to test all new code and help us add test cases for existing code.
Coverage report can be found at http://jenkins.bluematt.me/pull-tester/9f2467ad6241ce6cf0897ed30c676598d59441a7/bitcoin/src/total.coverage/

gavinandresen added a commit that referenced this pull request Feb 22, 2013

Merge pull request #2310 from sipa/progressbar
Progressbar based on time-based estimation of transactions.

@gavinandresen gavinandresen merged commit 9dca719 into bitcoin:master Feb 22, 2013

@sipa sipa deleted the sipa:progressbar branch May 3, 2013

laudney pushed a commit to reddcoin-project/reddcoin that referenced this pull request Mar 19, 2014

Merge pull request #2310 from sipa/progressbar
Progressbar based on time-based estimation of transactions.

This function is returning 0.999999 or 1.000001 when it should be 1.000000. I know this is just an estimative, but the full progress isn't a special case? Related issue: #3963

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment