New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rpc: Optimize serialization and enhance metadata of dumptxoutset output #29612
base: master
Are you sure you want to change the base?
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code CoverageFor detailed information about the code coverage, see the test coverage report. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the Possibly this is due to a silent merge conflict (the changes in this pull request being Leave a comment here, if you need help tracking down a confusing failure. |
714b3dd
to
8d67b35
Compare
8d67b35
to
b0cfbce
Compare
Well, I'm not on holiday :-) tACK b0cfbce I do suggest adding a comment to
I generated a snapshot and loaded it using the patch in #29551 (only waited for it to reach the tip, not a full background sync). For those testing who don't want to wait out the long flush, see #28358. |
33aa19d
to
97eb214
Compare
Thanks for the review and testing!
As it is worded, wouldn't this be more appropriate to add in I have now amended the comment a bit and added a version of it to both |
97eb214
to
923ec90
Compare
re-ACK 923ec90 That looks good. The reason I suggested (also) writing something in the header is because it's more likely to be noticed by some future dev looking through existing leveldb uses, seeing if they can replace it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, ACK 923ec90
Just some comment nits that could easily be done in a follow-up too.
Coin coin; | ||
unsigned int iter{0}; | ||
std::vector<std::pair<uint32_t, Coin>> coins; | ||
|
||
// To reduce space the serialization format of the snapshot avoids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Might it be useful to add a small diagram of the format to the dumptxoutset
help output?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, will think about what that could look like if I have to retouch or in the follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent some time thinking about a diagram that would work but I wasn't very happy what I could come up with. I instead copied the expression used in the issue. It shows the format in a concise way that I find understandable: list(Txid, list((vout,Coin))
. Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also add the size of the coins list list(Txid, number_of_coins, list((vout,Coin)))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add this if I have to retouch, but if not I can do this as a follow-up in one of my other assumeutxo PRs.
843b95e
to
71d41d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-ACK 71d41d5
71d41d5
to
b411eb8
Compare
So far I have addressed the minor feedback in the latest push and I am now working on the extended metadata for the snapshot. I am currently planning to add the following:
Let me know if you disagree with this choice or have something else to add, otherwise I will have the code ready soon. |
🚧 At least one of the CI tasks failed. Make sure to run all tests locally, according to the Possibly this is due to a silent merge conflict (the changes in this pull request being Leave a comment here, if you need help tracking down a confusing failure. |
@fjahr sounds good. Let me know when that's ready and I'll bake a halving block snapshot... |
4aae939
to
cfc780f
Compare
Alright, ready for review again. I hope I have addressed all the ideas that were mentioned as intended. Sorry but the enhancement of the metadata is now one big commit for now, I can try to split it up if it makes review easier. |
cfc780f
to
6ed6bff
Compare
Doesn't matter that much, but it's probably to put the Here's some new snapshot torrents (not sure if I'm seeding them correctly...). Torrent for testnet: Torrent for signet: Torrent for mainnet at the halving block (I'll update #28553 to use this): |
src/kernel/chainparams.h
Outdated
@@ -183,4 +185,6 @@ class CChainParams | |||
ChainTxData chainTxData; | |||
}; | |||
|
|||
std::optional<std::string> GetNetworkForMagic(MessageStartChars& pchMessageStart); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6ed6bff: this doesn't seem kernel-worthy. chaintype.h
is a better home for it, next to ChainTypeToString
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine where it is, I like keeping these single type utilities small.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK on the enhanced metadata, I think this includes now everything we need. Will update #27432 in a bit to the new format and test.
Co-authored-by: Aurèle Oulès <aurele@oules.com> Co-authored-by: TheCharlatan <seb.kung@gmail.com>
The following data is added: - A newly introduced utxo set magic - A version number - The network magic - The block height
6ed6bff
to
b2cf3f7
Compare
Rebased, resolved conflicts (the functional tests have changed a bit now), and addressed the open feedback. Thanks everyone for reviewing! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, just some small comments, will review again quickly once they are addressed.
const auto mainnet_msg = CChainParams::Main()->MessageStart(); | ||
const auto testnet_msg = CChainParams::TestNet()->MessageStart(); | ||
auto regtest_opts = CChainParams::RegTestOptions{}; | ||
const auto regtest_msg = CChainParams::RegTest(regtest_opts)->MessageStart(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit b2cf3f7:
Nit: Could just do const auto regtest_msg = CChainParams::RegTest({})->MessageStart();
here and for the signet_msg
.
@@ -14,6 +14,7 @@ | |||
#include <logging.h> | |||
#include <primitives/block.h> | |||
#include <primitives/transaction.h> | |||
#include <protocol.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit b2cf3f7:
Is this include from a prior iteration? It seems unused (also in the header file).
@@ -2712,7 +2712,7 @@ UniValue CreateUTXOSnapshot( | |||
auto write_coins_to_file = [&](AutoFile& afile, const Txid& last_hash, const std::vector<std::pair<uint32_t, Coin>>& coins, size_t& written_coins_count) { | |||
afile << last_hash; | |||
WriteCompactSize(afile, coins.size()); | |||
for (auto [n, coin] : coins) { | |||
for (const auto& [n, coin] : coins) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit b2cf3f7:
Nit: I think this should be moved to the first commit.
@@ -542,3 +543,33 @@ std::unique_ptr<const CChainParams> CChainParams::TestNet() | |||
{ | |||
return std::make_unique<const CTestNetParams>(); | |||
} | |||
|
|||
std::vector<int> CChainParams::GetAvailableSnapshotHeights() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit b2cf3f7
Nit (clang-format-diff): Open braces on new line (here and the other new functions and methods below).
std::array<uint8_t, SNAPSHOT_MAGIC_BYTES.size()> snapshot_magic; | ||
s >> snapshot_magic; | ||
if (snapshot_magic != SNAPSHOT_MAGIC_BYTES) { | ||
throw std::ios_base::failure("Invalid UTXO set snapshot magic bytes. Please check if this is indeed a snapshot file or if you are using an outdated snapshot format.", std::error_code{}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: seems like the second parameter (std::error_code{}
) isn't needed (here and in all other instances below)
throw std::ios_base::failure("Invalid UTXO set snapshot magic bytes. Please check if this is indeed a snapshot file or if you are using an outdated snapshot format.", std::error_code{}); | |
throw std::ios_base::failure("Invalid UTXO set snapshot magic bytes. Please check if this is indeed a snapshot file or if you are using an outdated snapshot format."); |
std::ostringstream oss; | ||
for (auto it = available_heights.begin(); it != available_heights.end(); ++it) { | ||
oss << (it != available_heights.begin() ? ", " : "") << *it; | ||
} | ||
std::string heights_formatted = oss.str(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could use our fancy Join
helper here (available in util/string.h)
std::ostringstream oss; | |
for (auto it = available_heights.begin(); it != available_heights.end(); ++it) { | |
oss << (it != available_heights.begin() ? ", " : "") << *it; | |
} | |
std::string heights_formatted = oss.str(); | |
std::string heights_formatted = Join(available_heights, ", ", [&](const auto& i) { return ToString(i); }); |
The second attempt at implementing the
dumptxoutset
space optimization as suggested in #25675. Closes #25675.This builds on the work done in #26045, addresses open feedback, adds some further improvements (most importantly usage of compact size), documentation, and an additional test.
The original snapshot at height 830,000 came in at 10.82 GB. With this change, the same snapshot is 8.94 GB, a reduction of 17.4%.
This also enhances the metadata of the output file and adds the following data to allow for better error handling and make future upgrades easier: