Autoprune #4701

rdponticelli · 2014-08-14T20:09:54Z

This pull implements a new mode of operation which automatically removes old block files trying to maintain at most a maximum amount of disk space used by the node. This amount is configured by the user with the -prune switch.

There's also a lightweight sanity check which executes periodically during runtime to make sure the minimum block files required for the node to be operative are present.

This should allow to lower the amount of resources needed to run a node.

See the individual commits, about all the changes introduced.

gmaxwell · 2014-08-14T22:06:57Z

Very cool. I went to go make some random suggestions but found you implemented them already. I'll give this more review soon.

sipa · 2014-08-23T11:28:20Z

src/init.cpp

@@ -225,6 +225,7 @@ std::string HelpMessage(HelpMessageMode mode)
    strUsage += "  -maxorphanblocks=<n>   " + strprintf(_("Keep at most <n> unconnectable blocks in memory (default: %u)"), DEFAULT_MAX_ORPHAN_BLOCKS) + "\n";
    strUsage += "  -par=<n>               " + strprintf(_("Set the number of script verification threads (%u to %d, 0 = auto, <0 = leave that many cores free, default: %d)"), -(int)boost::thread::hardware_concurrency(), MAX_SCRIPTCHECK_THREADS, DEFAULT_SCRIPTCHECK_THREADS) + "\n";
    strUsage += "  -pid=<file>            " + _("Specify pid file (default: bitcoind.pid)") + "\n";
+    strUsage += "  -pruned                " + _("Run in a pruned state") + "\n";


Nit: "Prune old blocks" may be an easier explanation for the flag.

gmaxwell · 2014-08-24T00:30:01Z

luke: wrt pruning depth, probably what would be good eventually is a size target and then the software can make use of the size target usefully... but I don't know that it makes sense at this point since we don't yet have a good way to make use of a sparse blockchain.

luke-jr · 2014-08-24T01:00:24Z

We have to use it for reorgs. Setting a default prune depth is probably dangerous enough to becoming an (inconsistent) consensus rule already.

gmaxwell · 2014-08-24T01:12:05Z

By sparse I mean containing any blocks other than the last N. If you'll note, the number 288 above comes from my comments on the prior PR as a minimum number I'd consider acceptable as an absolute minimum for the purpose of reorgs.

sipa · 2014-08-26T14:19:55Z

src/main.cpp

@@ -2881,11 +2884,17 @@ bool CheckDiskSpace(uint64_t nAdditionalBytes)
    return true;
 }

+boost::filesystem::path GetBlockFilePath(const CDiskBlockPos &pos, const char *prefix)
+{
+    boost::filesystem::path path = GetDataDir() / "blocks" / strprintf("%s%05u.dat", prefix, pos.nFile);


Nit: no need for the intermediate variable.

sipa · 2014-08-26T23:39:25Z

When a block is being disconnected due to a reorg, and its data cannot be loaded from disk, there is currently just a state.Abort with "Failed to read block". Exceedingly unlikely, but we need to be able to deal with such situations. I wonder whether crashing with some extra help/debug output may be enough, or whether we need to retry downloading the missing data...

EDIT: Downloading the missing data might work for block data, but not for undo data, so it will be unlikely to be useful.

gmaxwell · 2014-08-26T23:49:47Z

I'd like to re-download, and thought that would be interesting to explore with headers first in place— but the problem is that if the undo data is deleted we cannot usefully redownload.

Edit: ah, you noticed that. Yea, well— we could have different retention policies for undo date. I considered that future work. If ever we make the undo data normative we could just store hashes of it and fetch it from peers too.

sipa · 2014-08-26T23:53:43Z

Of course, we could for example delete blocks data at depth N, but only delete undo data at depth N_3 or so (undo data is 7-10 times smaller than block data). Of course, that just moves the problem further to what to do when an N_3 deep reorg is encountered.

sipa · 2014-08-27T00:49:31Z

Untested ACK. I guess I'm fine with resolving the missing-block/undo problem for reorgs later.

These are the main functional changes on this state: * Do not allow running with a wallet or txindex. * Check for data at startup is mandatory only up to the last 288 blocks. * NODE_NETWORK flag is unset. * Requests for pruned blocks from other peers is answered with "notfound" and they are disconnected, not to stall their IBD.

This mode introduces a configuration parameter to keep block files at less than a fixed amount of MiB.

We can do it now that the logic to avoid opening the files several times has been moved to their own functions and is handled mainly through variables.

rdponticelli · 2015-01-12T19:03:38Z

Rebased.

theuni · 2015-01-12T21:19:27Z

src/main.cpp

-                            BOOST_FOREACH(PairType& pair, merkleBlock.vMatchedTxn)
-                                if (!pfrom->setInventoryKnown.count(CInv(MSG_TX, pair.second)))
-                                    pfrom->PushMessage("tx", block.vtx[pair.first]);
+                    if (!ReadBlockFromDisk(block, (*mi).second)) {


It seems this doesn't make the distinction between missing a pruned block and a failed read. If a non-pruned block fails to read when pruning is enabled, shouldn't we fail as before?

Alternatively.. couldn't the pruning check happen before ReadBlockFromDisk(), to avoid the overhead entirely for pruned nodes? If we're comfortable with randomly answering with a notfound, why not do it constantly?

There is a distinction in the expectation of what a node does. If you enable pruning, the node does not promise to the network to behave as a full node, so it's fine to not answer. If a node advertizes as NODE_NETWORK, and can't answer a request for a block, it's buggy.

shivaenigma · 2015-01-29T12:47:21Z

Testing this from https://github.com/luke-jr/bitcoin/tree/0.10.0rc3.autoprune
Just curious on what is the desired behaviour of -reindex when run on already pruned blocks ?

Michagogo · 2015-01-29T13:48:00Z

I would assume reindexing would force it to redownload all the blocks from
scratch.

On Thursday, January 29, 2015, shivaenigma notifications@github.com wrote:

Testing this from
https://github.com/luke-jr/bitcoin/tree/0.10.0rc3.autoprune
Just curious on what is the desired behaviour of -reindex when run on
already pruned blocks ?

—
Reply to this email directly or view it on GitHub
#4701 (comment).

mrbandrews · 2015-01-29T19:59:36Z

Hi. I've been testing this also (building from source) and I think the latest commit may have re-introduced the issue of re-opening a block and undo file for each block in the active chain. Thus, on testnet (about 320k blocks) each call to CheckBlockFiles results in 640k calls to the file system. I know that @rdponticelli has a separate PR (#4515) which appears to still have the code which should prevent this (using setRequiredDataFilesAreOpenable) - and which autoprune may eventually be built on top of.

It seems from the comment on the last commit that it was intended that this check was moved into a different function, but if so, it doesn't seem to be working as intended?

ghost · 2015-02-01T19:35:52Z

This has been tagged as v0.11. What time frame is that indicative of?

gmaxwell · 2015-02-01T20:46:35Z

@21E14 presumably and hopefully in the next couple months. Right now much attention is focused on getting 0.10 out (as it should be), after that you should expect to see more attention on getting this merged from the rest of the contributors.

laanwj · 2015-02-01T20:56:59Z

@21E14 July 2015 is the time frame for 0.11. The tag is no guarantee that it will make it into that release though, but just a reminder. If it isn't ready to merge well before 0.11's release date it will be bumped to 0.12.

You can help by testing and reviewing the code.

ghost · 2015-02-02T00:29:00Z

@gmaxwell @laanwj I'm assuming a few minor releases in-between?

This PR is looking pretty good so far. Running the daemon though, just for kicks, with the prune option set to less than 300 MiB results in the following awkward message:

AppInit2 : parameter interaction: -prune -> setting -disablewallet=1
Autoprune configured below the minimum of 300MiB. Setting at the maximum possible of 17592186044415MiB, to avoid pruning too much. Please, check your configuration.

More to the point, why even let a 'misconfigured prune' carry on?

shivaenigma · 2015-02-04T17:40:47Z

Did more testing from https://github.com/luke-jr/bitcoin/tree/0.10.0rc3.autoprune
Tested using -prune=300 and blocks are getting deleted

But sometimes size of .bitcoin/blocks is more than much more than 300
2015-02-04 17:34:59 Data for blocks from 1 to 184525 has been pruned 2015-02-04 17:34:59 Undo data for blocks from 1 to 184525 has been pruned
du -sh ~/.bitcoin/blocks/ is 496MB

Switching from pruned mode to nonpruned mode causes this:
Error checking required block files. There must be missing or unreadable data. Do you want to rebuild the block database now?

Michagogo · 2015-02-04T18:29:49Z

@shivaenigma What's the size of the index/ directory? Perhaps that's the rest. I don't know if the index shrinks with a pruned node. A couple hundred megabytes is insignificant compared to 30+ GB, but indeed, with pruned nodes like this that does become a factor. And at the end, do you mean switching to non-pruned mode? In that case, yes, of course you're missing data -- you've just deleted most of the blockchain!

shivaenigma · 2015-02-04T18:53:59Z

@Michagogo
size of my blocks/index/ 34MB. I think the parameter -pruned=300MB is misleading, even if after pruning the size end of 496MB . How does the error scale, so if I set 2GB will it take 2.1GB or 3GB

so I guess its checking all the blocks at startup on non pruned mode and throws an error. I think there should be way to disable this check. Because now I can never switch from pruned mode to nonpruned mode even if I dont care about missing inital blocks

Michagogo · 2015-02-04T19:06:14Z

Uh, what? By definition, non-pruned means that you have the entire
blockchain. So yes, you do need to redownload it. It would be nice, though,
if it were smart enough to make the switch gracefully and just fill in the
history, with the headers-first mechanism. If you mean you don't want to
prune more, you can just set the threshold to something high (100000000
or whatever) and you'll be covered for the foreseeable future.

On Wed, Feb 4, 2015 at 8:54 PM, shivaenigma notifications@github.com
wrote:

@Michagogo https://github.com/Michagogo
size of my blocks/index/ 34MB. I think the parameter -pruned=300MB is
misleading, even if after pruning the size end of 496MB . How does the
error scale, so if I set 2GB will it take 2.1GB or 3GB

so I guess its checking all the blocks at startup on non pruned mode and
throws an error. I think there should be way to disable this check. Because
now I can never switch from pruned mode to nonpruned mode even if I dont
care about missing inital blocks

—
Reply to this email directly or view it on GitHub
#4701 (comment).

shivaenigma · 2015-02-05T06:07:52Z

If you mean you don't want to prune more, you can just set the threshold to something high (100000000 or whatever) and you'll be covered for the foreseeable future

Yes this is actually what I wanted . Thanks

sipa · 2015-02-16T20:45:02Z

src/init.cpp

+            LogPrintf("Autoprune configured to use less than %uMiB on disk for block files.\n", nPrune / 1024 / 1024);
+        else {
+            nPrune = ~0;
+            LogPrintf("Autoprune configured below the minimum of %uMiB. Setting at the maximum possible of %uMiB, to avoid pruning too much. Please, check your configuration.\n", MIN_BLOCK_FILES_SIZE / 1024 / 1024, nPrune / 1024 / 1024);


I think it's more clear if you just say something like "Leaving pruned mode enabled, but not deleting any more blocks for now".

sipa · 2015-03-01T12:21:07Z

Do you plan to work on this any more in the future? If not, I may try to maintain/update it.

laanwj · 2015-03-09T08:31:16Z

Closing in favor of #5863

laanwj added the Feature label Aug 18, 2014

sipa reviewed Aug 23, 2014
View reviewed changes

rdponticelli force-pushed the autoprune branch 3 times, most recently from 9b08385 to 8ba326b Compare August 26, 2014 13:45

sipa reviewed Aug 26, 2014
View reviewed changes

rdponticelli force-pushed the autoprune branch 2 times, most recently from 8e778c8 to c4d4ed9 Compare August 26, 2014 23:21

rdponticelli force-pushed the autoprune branch from c4d4ed9 to 8065cc4 Compare August 27, 2014 18:28

sipa mentioned this pull request Aug 27, 2014

Add a switch to allow running in a pruned state #4481

Closed

rdponticelli force-pushed the autoprune branch 2 times, most recently from 597ee8e to 5c93fb5 Compare September 2, 2014 01:47

rdponticelli added 5 commits January 12, 2015 13:44

Accept pruned blocks when loading the database.

1339b7b

Add a counter to log the range of pruned blocks.

41c1569

Implement autoprune.

c12d6e5

This mode introduces a configuration parameter to keep block files at less than a fixed amount of MiB.

Simplify CheckBlockFiles routine.

bbb769c

We can do it now that the logic to avoid opening the files several times has been moved to their own functions and is handled mainly through variables.

rdponticelli force-pushed the autoprune branch from f53d60e to bbb769c Compare January 12, 2015 17:58

theuni reviewed Jan 12, 2015
View reviewed changes

sipa reviewed Feb 16, 2015
View reviewed changes

sdaftuar mentioned this pull request Mar 6, 2015

Add autoprune functionality #5863

Merged

laanwj closed this Mar 9, 2015

bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autoprune #4701

Autoprune #4701

rdponticelli commented Aug 14, 2014

gmaxwell commented Aug 14, 2014

sipa Aug 23, 2014

gmaxwell commented Aug 24, 2014

luke-jr commented Aug 24, 2014

gmaxwell commented Aug 24, 2014

sipa Aug 26, 2014

sipa commented Aug 26, 2014

gmaxwell commented Aug 26, 2014

sipa commented Aug 26, 2014

sipa commented Aug 27, 2014

rdponticelli commented Jan 12, 2015

theuni Jan 12, 2015

sipa Feb 16, 2015

shivaenigma commented Jan 29, 2015

Michagogo commented Jan 29, 2015

mrbandrews commented Jan 29, 2015

ghost commented Feb 1, 2015

gmaxwell commented Feb 1, 2015

laanwj commented Feb 1, 2015

ghost commented Feb 2, 2015

shivaenigma commented Feb 4, 2015

Michagogo commented Feb 4, 2015

shivaenigma commented Feb 4, 2015

Michagogo commented Feb 4, 2015

shivaenigma commented Feb 5, 2015

sipa Feb 16, 2015

sipa commented Mar 1, 2015

laanwj commented Mar 9, 2015

Autoprune #4701

Autoprune #4701

Conversation

rdponticelli commented Aug 14, 2014

gmaxwell commented Aug 14, 2014

sipa Aug 23, 2014

Choose a reason for hiding this comment

gmaxwell commented Aug 24, 2014

luke-jr commented Aug 24, 2014

gmaxwell commented Aug 24, 2014

sipa Aug 26, 2014

Choose a reason for hiding this comment

sipa commented Aug 26, 2014

gmaxwell commented Aug 26, 2014

sipa commented Aug 26, 2014

sipa commented Aug 27, 2014

rdponticelli commented Jan 12, 2015

theuni Jan 12, 2015

Choose a reason for hiding this comment

sipa Feb 16, 2015

Choose a reason for hiding this comment

shivaenigma commented Jan 29, 2015

Michagogo commented Jan 29, 2015

mrbandrews commented Jan 29, 2015

ghost commented Feb 1, 2015

gmaxwell commented Feb 1, 2015

laanwj commented Feb 1, 2015

ghost commented Feb 2, 2015

shivaenigma commented Feb 4, 2015

Michagogo commented Feb 4, 2015

shivaenigma commented Feb 4, 2015

Michagogo commented Feb 4, 2015

shivaenigma commented Feb 5, 2015

sipa Feb 16, 2015

Choose a reason for hiding this comment

sipa commented Mar 1, 2015

laanwj commented Mar 9, 2015