Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoprune #4701

Closed
wants to merge 9 commits into from
Closed

Autoprune #4701

wants to merge 9 commits into from

Conversation

rdponticelli
Copy link
Contributor

This pull implements a new mode of operation which automatically removes old block files trying to maintain at most a maximum amount of disk space used by the node. This amount is configured by the user with the -prune switch.

There's also a lightweight sanity check which executes periodically during runtime to make sure the minimum block files required for the node to be operative are present.

This should allow to lower the amount of resources needed to run a node.

See the individual commits, about all the changes introduced.

@gmaxwell
Copy link
Contributor

Very cool. I went to go make some random suggestions but found you implemented them already. I'll give this more review soon.

@laanwj laanwj added the Feature label Aug 18, 2014
@@ -225,6 +225,7 @@ std::string HelpMessage(HelpMessageMode mode)
strUsage += " -maxorphanblocks=<n> " + strprintf(_("Keep at most <n> unconnectable blocks in memory (default: %u)"), DEFAULT_MAX_ORPHAN_BLOCKS) + "\n";
strUsage += " -par=<n> " + strprintf(_("Set the number of script verification threads (%u to %d, 0 = auto, <0 = leave that many cores free, default: %d)"), -(int)boost::thread::hardware_concurrency(), MAX_SCRIPTCHECK_THREADS, DEFAULT_SCRIPTCHECK_THREADS) + "\n";
strUsage += " -pid=<file> " + _("Specify pid file (default: bitcoind.pid)") + "\n";
strUsage += " -pruned " + _("Run in a pruned state") + "\n";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "Prune old blocks" may be an easier explanation for the flag.

@gmaxwell
Copy link
Contributor

luke: wrt pruning depth, probably what would be good eventually is a size target and then the software can make use of the size target usefully... but I don't know that it makes sense at this point since we don't yet have a good way to make use of a sparse blockchain.

@luke-jr
Copy link
Member

luke-jr commented Aug 24, 2014

We have to use it for reorgs. Setting a default prune depth is probably dangerous enough to becoming an (inconsistent) consensus rule already.

@gmaxwell
Copy link
Contributor

By sparse I mean containing any blocks other than the last N. If you'll note, the number 288 above comes from my comments on the prior PR as a minimum number I'd consider acceptable as an absolute minimum for the purpose of reorgs.

@rdponticelli rdponticelli force-pushed the autoprune branch 3 times, most recently from 9b08385 to 8ba326b Compare August 26, 2014 13:45
@@ -2881,11 +2884,17 @@ bool CheckDiskSpace(uint64_t nAdditionalBytes)
return true;
}

boost::filesystem::path GetBlockFilePath(const CDiskBlockPos &pos, const char *prefix)
{
boost::filesystem::path path = GetDataDir() / "blocks" / strprintf("%s%05u.dat", prefix, pos.nFile);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: no need for the intermediate variable.

@rdponticelli rdponticelli force-pushed the autoprune branch 2 times, most recently from 8e778c8 to c4d4ed9 Compare August 26, 2014 23:21
@sipa
Copy link
Member

sipa commented Aug 26, 2014

When a block is being disconnected due to a reorg, and its data cannot be loaded from disk, there is currently just a state.Abort with "Failed to read block". Exceedingly unlikely, but we need to be able to deal with such situations. I wonder whether crashing with some extra help/debug output may be enough, or whether we need to retry downloading the missing data...

EDIT: Downloading the missing data might work for block data, but not for undo data, so it will be unlikely to be useful.

@gmaxwell
Copy link
Contributor

I'd like to re-download, and thought that would be interesting to explore with headers first in place— but the problem is that if the undo data is deleted we cannot usefully redownload.

Edit: ah, you noticed that. Yea, well— we could have different retention policies for undo date. I considered that future work. If ever we make the undo data normative we could just store hashes of it and fetch it from peers too.

@sipa
Copy link
Member

sipa commented Aug 26, 2014

Of course, we could for example delete blocks data at depth N, but only delete undo data at depth N_3 or so (undo data is 7-10 times smaller than block data). Of course, that just moves the problem further to what to do when an N_3 deep reorg is encountered.

@sipa
Copy link
Member

sipa commented Aug 27, 2014

Untested ACK. I guess I'm fine with resolving the missing-block/undo problem for reorgs later.

These are the main functional changes on this state:

* Do not allow running with a wallet or txindex.
* Check for data at startup is mandatory only up to the last 288 blocks.
* NODE_NETWORK flag is unset.
* Requests for pruned blocks from other peers is answered with "notfound" and they are disconnected, not to stall their IBD.
This mode introduces a configuration parameter to keep block files at less than a fixed amount of MiB.
We can do it now that the logic to avoid opening the files several times has been
moved to their own functions and is handled mainly through variables.
@rdponticelli
Copy link
Contributor Author

Rebased.

BOOST_FOREACH(PairType& pair, merkleBlock.vMatchedTxn)
if (!pfrom->setInventoryKnown.count(CInv(MSG_TX, pair.second)))
pfrom->PushMessage("tx", block.vtx[pair.first]);
if (!ReadBlockFromDisk(block, (*mi).second)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this doesn't make the distinction between missing a pruned block and a failed read. If a non-pruned block fails to read when pruning is enabled, shouldn't we fail as before?

Alternatively.. couldn't the pruning check happen before ReadBlockFromDisk(), to avoid the overhead entirely for pruned nodes? If we're comfortable with randomly answering with a notfound, why not do it constantly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a distinction in the expectation of what a node does. If you enable pruning, the node does not promise to the network to behave as a full node, so it's fine to not answer. If a node advertizes as NODE_NETWORK, and can't answer a request for a block, it's buggy.

@shivaenigma
Copy link

Testing this from https://github.com/luke-jr/bitcoin/tree/0.10.0rc3.autoprune
Just curious on what is the desired behaviour of -reindex when run on already pruned blocks ?

@Michagogo
Copy link
Contributor

I would assume reindexing would force it to redownload all the blocks from
scratch.

On Thursday, January 29, 2015, shivaenigma notifications@github.com wrote:

Testing this from
https://github.com/luke-jr/bitcoin/tree/0.10.0rc3.autoprune
Just curious on what is the desired behaviour of -reindex when run on
already pruned blocks ?


Reply to this email directly or view it on GitHub
#4701 (comment).

@mrbandrews
Copy link
Contributor

Hi. I've been testing this also (building from source) and I think the latest commit may have re-introduced the issue of re-opening a block and undo file for each block in the active chain. Thus, on testnet (about 320k blocks) each call to CheckBlockFiles results in 640k calls to the file system. I know that @rdponticelli has a separate PR (#4515) which appears to still have the code which should prevent this (using setRequiredDataFilesAreOpenable) - and which autoprune may eventually be built on top of.

It seems from the comment on the last commit that it was intended that this check was moved into a different function, but if so, it doesn't seem to be working as intended?

@ghost
Copy link

ghost commented Feb 1, 2015

This has been tagged as v0.11. What time frame is that indicative of?

@gmaxwell
Copy link
Contributor

gmaxwell commented Feb 1, 2015

@21E14 presumably and hopefully in the next couple months. Right now much attention is focused on getting 0.10 out (as it should be), after that you should expect to see more attention on getting this merged from the rest of the contributors.

@laanwj
Copy link
Member

laanwj commented Feb 1, 2015

@21E14 July 2015 is the time frame for 0.11. The tag is no guarantee that it will make it into that release though, but just a reminder. If it isn't ready to merge well before 0.11's release date it will be bumped to 0.12.

You can help by testing and reviewing the code.

@ghost
Copy link

ghost commented Feb 2, 2015

@gmaxwell @laanwj I'm assuming a few minor releases in-between?

This PR is looking pretty good so far. Running the daemon though, just for kicks, with the prune option set to less than 300 MiB results in the following awkward message:


AppInit2 : parameter interaction: -prune -> setting -disablewallet=1
Autoprune configured below the minimum of 300MiB. Setting at the maximum possible of 17592186044415MiB, to avoid pruning too much. Please, check your configuration.


More to the point, why even let a 'misconfigured prune' carry on?

@shivaenigma
Copy link

Did more testing from https://github.com/luke-jr/bitcoin/tree/0.10.0rc3.autoprune
Tested using -prune=300 and blocks are getting deleted

But sometimes size of .bitcoin/blocks is more than much more than 300
2015-02-04 17:34:59 Data for blocks from 1 to 184525 has been pruned 2015-02-04 17:34:59 Undo data for blocks from 1 to 184525 has been pruned
du -sh ~/.bitcoin/blocks/ is 496MB

Switching from pruned mode to nonpruned mode causes this:
Error checking required block files. There must be missing or unreadable data. Do you want to rebuild the block database now?

@Michagogo
Copy link
Contributor

@shivaenigma What's the size of the index/ directory? Perhaps that's the rest. I don't know if the index shrinks with a pruned node. A couple hundred megabytes is insignificant compared to 30+ GB, but indeed, with pruned nodes like this that does become a factor. And at the end, do you mean switching to non-pruned mode? In that case, yes, of course you're missing data -- you've just deleted most of the blockchain!

@shivaenigma
Copy link

@Michagogo
size of my blocks/index/ 34MB. I think the parameter -pruned=300MB is misleading, even if after pruning the size end of 496MB . How does the error scale, so if I set 2GB will it take 2.1GB or 3GB

so I guess its checking all the blocks at startup on non pruned mode and throws an error. I think there should be way to disable this check. Because now I can never switch from pruned mode to nonpruned mode even if I dont care about missing inital blocks

@Michagogo
Copy link
Contributor

Uh, what? By definition, non-pruned means that you have the entire
blockchain. So yes, you do need to redownload it. It would be nice, though,
if it were smart enough to make the switch gracefully and just fill in the
history, with the headers-first mechanism. If you mean you don't want to
prune more, you can just set the threshold to something high (100000000
or whatever) and you'll be covered for the foreseeable future.

On Wed, Feb 4, 2015 at 8:54 PM, shivaenigma notifications@github.com
wrote:

@Michagogo https://github.com/Michagogo
size of my blocks/index/ 34MB. I think the parameter -pruned=300MB is
misleading, even if after pruning the size end of 496MB . How does the
error scale, so if I set 2GB will it take 2.1GB or 3GB

so I guess its checking all the blocks at startup on non pruned mode and
throws an error. I think there should be way to disable this check. Because
now I can never switch from pruned mode to nonpruned mode even if I dont
care about missing inital blocks


Reply to this email directly or view it on GitHub
#4701 (comment).

@shivaenigma
Copy link

If you mean you don't want to prune more, you can just set the threshold to something high (100000000 or whatever) and you'll be covered for the foreseeable future

Yes this is actually what I wanted . Thanks

LogPrintf("Autoprune configured to use less than %uMiB on disk for block files.\n", nPrune / 1024 / 1024);
else {
nPrune = ~0;
LogPrintf("Autoprune configured below the minimum of %uMiB. Setting at the maximum possible of %uMiB, to avoid pruning too much. Please, check your configuration.\n", MIN_BLOCK_FILES_SIZE / 1024 / 1024, nPrune / 1024 / 1024);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more clear if you just say something like "Leaving pruned mode enabled, but not deleting any more blocks for now".

@sipa
Copy link
Member

sipa commented Mar 1, 2015

Do you plan to work on this any more in the future? If not, I may try to maintain/update it.

@laanwj
Copy link
Member

laanwj commented Mar 9, 2015

Closing in favor of #5863

@laanwj laanwj closed this Mar 9, 2015
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet