-
Notifications
You must be signed in to change notification settings - Fork 38.1k
blockstorage: Adjust fastprune limit if block exceeds blockfile size #27191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concept ACK
I think assert is the appropriate move here to prevent the infinite memory loop! We mine as well just crash early. It might also make sense to not allow -fastprune on anything besides regtest as well. I'm not sure if that is covered by ArgsManager::DEBUG_ONLY
src/node/blockstorage.cpp
Outdated
| bool finalize_undo = false; | ||
| if (!fKnown) { | ||
| while (m_blockfile_info[nFile].nSize + nAddSize >= (gArgs.GetBoolArg("-fastprune", false) ? 0x10000 /* 64kb */ : MAX_BLOCKFILE_SIZE)) { | ||
| const unsigned int max_blockfile_size{gArgs.GetBoolArg("-fastprune", false) ? 0x10000 : MAX_BLOCKFILE_SIZE}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cleanup is already addressed in #27039 ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good, should make rebasing easy! Here, it's not a cleanup but needed to use max_blockfile_size in the added assert.
fe6d866 to
d526dbb
Compare
Not sure if suppressing options for specific networks is something we usually do (are there any other examples for this?). But I added some more explanation to the |
|
Not sure how I feel about this change. It is trivial to create blocks exceeding this limit on regtest. If this happens, it should at least log a message explaining why it failed. The size limit constants used by fastprune seem arbitrary to me. What is their rationale, just fewer resources allocated/used while running the pruning functional tests? If so, why are the revision files not size restricted as well? |
I'll add a log message with my next push.
It was introduced in c286a22 (#15946) without much discussion - I'd say it's a quick-and-dirty way to allow testing the pruning code in functional tests - tests like I stumbled upon this while reviewing #27125, when I had the great idea to quickly sync on signet with |
d526dbb to
e930814
Compare
|
I changed the approach to @TheCharlatan's idea (added you as coauthor) - now we don't assert anymore but raise the size of the blockfile to the size of the added block if needed. |
-fastprune|
ACK e930814 fwiw :P |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK e930814
I also wrote a test for this that freezes (infinite loop) on master but passes on this branch. Take it or leave it, I just wanted to see the mechanism work before acking:
Show Signature
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
ACK e930814fdb9261f180d5c436cefe52e9cf38fd67
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCAAdFiEE5hdzzW4BBA4vG9eM5+KYS2KJyToFAmQbPmoACgkQ5+KYS2KJ
yTq1vQ//ddoQ67cVvJRnnTcXybDOjTtHHVMSoWCyujBBCDr54sOVNr0VgqTgkOYB
tAHWOVjlVwbIOCVG71GmmbfWAaQ97+3RICvRsAq0WaD8CrkV677nY4d5VYA471kU
qKnmRXm4j0Shb+7l4p6kDhX67pWDD1q9kACia823BsMhi1c7AbqYJM9Wte9oVrjM
FxZaWjFTcZiFSk9o6kDnAA4Va4H8vQCI5SGAyYynw1LaBJvDT+UjEPsheKGDQ45p
JqkASuNDTNzGtLGZsYR2W42zbIImHg8YMVqIaJUtCWZJgGMUlQz4qGRA4tylAE/H
IDcyuz+dCRmqlVHWczgNrYrSRASmFzWpvNFBdL67x36OndY/o4QehGU3PF/gazAR
5o6jIrr6AieuttFwWEF82s7fwCJWtkbx1jTVPCpRXrKPNhQvaVMcVBYx5ah0IG93
V3n6q3iayIu8xZ3yY1MSuGtszbxTkxSMH7cwOGNGfj2nAwGdANFmtYzU3lE2SjBd
HD1felNhxRcSyjvxLWh5+21O1BuD7sj7+lrULFZR5eYKUzn2dI266HwJNOir8HrT
FybjNY1RQThHizZG8iWQjsEBvBVCkZxm0xMVQE8W33TIOBjxouJs2BAX5CVR2RyF
NfhriauvkMXJYJh4clbVy814BhCm+aWSmEwl5ODsVtjnC6QgUH0=
=b/RR
-----END PGP SIGNATURE-----
pinheadmz's public key is on keybase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK e930814, but I think it would be great to add the test written by @pinheadmz #27191 (review) because the blockmanager code is fragile and a regression could happen, and because the test setup is very clean so it could probably be used to branch out and test other things.
| if (gArgs.GetBoolArg("-fastprune", false)) { | ||
| max_blockfile_size = 0x10000; // 64kiB | ||
| if (nAddSize >= max_blockfile_size) { | ||
| max_blockfile_size = nAddSize + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "blockstorage: Adjust fastprune limit if block exceeds blockfile size" (e930814)
What is the +1 for? Would be helpful to add a code comment saying what the extra byte is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked this carefully again and don't think anymore that the + 1 is necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is necessary, as long as the while condition says
while (m_blockfile_info[nFile].nSize + nAddSize >= max_blockfile_size)
it won't terminate if nAddSize == max_blockfile_size, because m_blockfile_info[nFile].nSize will be 0 if we skip to the next block.
Maybe we could lose the + 1 if we'd change the >= from the while loop into a >, however I really don't want to change this critical code in this PR because it would mean we'd try to squeeze another byte into all blocks, affecting blockstorage in normal operation as well, not just the test-only -fastprune mode.
So I'll add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense the +1 is needed to be consistent with >= below. It does look like there is a minor bug here that will cause block files to be at most MAX_BLOCKFILE_SIZE-1 bytes and never reach the maximum size. But if this is a bug, it goes back to 5382bcf when this code was introduced. Agree it would not be good to change that behavior here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The functional test covers the +1 nicely as well. Without the +1 (and removing the assertion on L632) the test will hang until timed out at 30s
| max_blockfile_size = nAddSize + 1; | ||
| } | ||
| } | ||
| while (m_blockfile_info[nFile].nSize + nAddSize >= max_blockfile_size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "blockstorage: Adjust fastprune limit if block exceeds blockfile size" (e930814)
Would be good to add to add assert(nAddSize <= max_blockfile_size); here to make it clear what this function is assuming and catch the problem instead of going into an infinite loop if outside code changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, although I think we should assert nAddSize < max_blockfile_size; - see explanation above.
I think throwing an exception or calling abort() would be ok, but better to reserve the assert macro for conditions which are actually impossible and not use it for error handling, because that would water down the meaning of other asserts. |
e930814 to
19e9073
Compare
|
Thanks for the reviews! I pushed an update to address the outstanding comments and added the test by @pinheadmz (thanks!). |
19e9073 to
af16fc9
Compare
|
unrelated: Red CI can be ignored, or if you care a lot, you can rebase |
If the added block exceeds the blockfile size in test-only -fastprune mode, the node would get stuck in an infinite loop and run out of memory. Avoid this by raising the blockfile size to the size of the added block in this situation. Co-authored-by: TheCharlatan <seb.kung@gmail.com>
af16fc9 to
8f14fc8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 8f14fc8. Added new assert, test, and comment since last review
| if (gArgs.GetBoolArg("-fastprune", false)) { | ||
| max_blockfile_size = 0x10000; // 64kiB | ||
| if (nAddSize >= max_blockfile_size) { | ||
| max_blockfile_size = nAddSize + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense the +1 is needed to be consistent with >= below. It does look like there is a minor bug here that will cause block files to be at most MAX_BLOCKFILE_SIZE-1 bytes and never reach the maximum size. But if this is a bug, it goes back to 5382bcf when this code was introduced. Agree it would not be good to change that behavior here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 8f14fc8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 8f14fc8
Show Signature
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
ACK 8f14fc86225d8fe77353f61ebd6b0bdda8d13aa9
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCAAdFiEE5hdzzW4BBA4vG9eM5+KYS2KJyToFAmRQBA4ACgkQ5+KYS2KJ
yTr14BAAw1BUdgVGe3UKG5epiL1XsijS0stgQRrXVwPJKM/7O4nAy5LRQBx/ASB+
+xH0CGkd8dULO9yPrXZHXn2tjlv36Wde+ECHy76BRy6qPYgQYf/KfX0RKl7IUw3D
RNTXxYPOopXDO9h9U2jmQpwZDU7Ri3DOD5BujVq+oAu0a+782QvoTMo/Yv9mEiKr
JOyNhSzCuy1vHK/bgO+CafNQLW0JwyQVntVDtVj3vC36OG0ChauUfJV7vALJ5KYI
/s5F9JA9BUPSNH1a6ruqCyc2pJR1URj1MVu44DQko4ICKQpphcq54r8dtlX4FvK2
/1F+czB4BKTmvhJyujynMte1FrgcHeZ/8df4tyvBa3+6j4y1jUjLHKEJaMXTgxC3
NdI1OMjskZdbTYnr+kewTemI261SXUOx87bL95BG2Mqk7drKpraQG7PG9JkB94oQ
g7W1B0xapGlP+AItM4esikInBKXE17ITxy5pQwznyonHCnAOiT9rbw12RSqXH/le
yiDhXsj6BzY6fiR7nOwjbzJuqlg6RGCgmbjz5zk5MYeiW0fuCbh30rbM07MY6+6u
CV9BQ0+Hn4RSDYQ4Na8v2dY0s8BbA4gd7swzcjPCveBppb1dQ2qt5CQ7Zk/Y5l21
jeG1vtRxvVt3U9QENfdghAj4yabhTSkO6pWc1nBK7DFOvOMFVmw=
=k0Pv
-----END PGP SIGNATURE-----
pinheadmz's public key is on keybase
| height = self.nodes[0].getblockcount() + 1 | ||
| block = create_block(hashprev=tip, ntime=time, txlist=[tx], coinbase=create_coinbase(height=height)) | ||
| add_witness_commitment(block) | ||
| block.solve() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to make this simple test more complicated than it needs to be? See #27553
…ceeds blockfile size 8f14fc8 test: cover fastprune with excessive block size (Matthew Zipkin) 271c23e blockstorage: Adjust fastprune limit if block exceeds blockfile size (Martin Zumsande) Pull request description: The debug-only `-fastprune` option used in several tests is not always safe to use: If a `-fastprune` node receives a block larger than the maximum blockfile size of `64kb` bad things happen: The while loop in `BlockManager::FindBlockPos` never terminates, and the node runs oom because memory for `m_blockfile_info` is allocated in each iteration of the loop. The same would happen if a naive user used `-fastprune` on anything other than regtest (so this can be tested by syncing on signet for example, the first block that crashes the node is at height 2232). Change the approach by raising the blockfile size to the size of the block, if that block otherwise wouldn't fit (idea by TheCharlatan). ACKs for top commit: ryanofsky: Code review ACK 8f14fc8. Added new assert, test, and comment since last review TheCharlatan: ACK 8f14fc8 pinheadmz: ACK 8f14fc8 Tree-SHA512: df2fea30613ef9d40ebbc2416eacb574f6d7d96847db5c33dda22a29a2c61a8db831aa9552734ea4477e097f253dbcb6dcb1395d43d2a090cc0588c9ce66eac3
Summary: If the added block exceeds the blockfile size in test-only -fastprune mode, the node would get stuck in an infinite loop and run out of memory. Avoid this by raising the blockfile size to the size of the added block in this situation. Co-authored-by: TheCharlatan <seb.kung@gmail.com> This is a backport of [[bitcoin/bitcoin#27191 | core#27191]] Depends on D16028 Note that this fix does not address the non-test issue of blocks larger than 128 MiB. But at least now we get a meaningful assertion failure rather than an OOM crash. Test Plan: `ninja all check-all` Check that without the change in blockstorage.cpp the new test slowly consumes all the memory and the test eventually fails. Reviewers: #bitcoin_abc, Fabien Reviewed By: #bitcoin_abc, Fabien Differential Revision: https://reviews.bitcoinabc.org/D16023
The debug-only
-fastpruneoption used in several tests is not always safe to use:If a
-fastprunenode receives a block larger than the maximum blockfile size of64kbbad things happen: The while loop inBlockManager::FindBlockPosnever terminates, and the node runs oom because memory form_blockfile_infois allocated in each iteration of the loop.The same would happen if a naive user used
-fastpruneon anything other than regtest (so this can be tested by syncing on signet for example, the first block that crashes the node is at height 2232).Change the approach by raising the blockfile size to the size of the block, if that block otherwise wouldn't fit (idea by TheCharlatan).