Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpc: allow dumptxoutset to dump human-readable data #18689

Closed
wants to merge 1 commit into from

Conversation

pierreN
Copy link
Contributor

@pierreN pierreN commented Apr 17, 2020

Adds additional optional arguments to dumptxoutset. If any are present, a human-readable file is written to disk instead of the compact binary serialized form currently in use. This does not change the current default behavior of dumptxoutset.


Thanks to the future assumeutxo feature (#15605), we now have a dumptxoutset RPC (#16899) which can write the whole UTXO set to disk. However, the current format, although compact, is not easily readable by standard tools (e.g. for someone who would like to study the UTXO set). Plus this binary format might change in the future AFAIK.

Providing power users an easy way to have a human-readable dump of the UTXOs would be a useful feature. We would this way replace 3rd party hackish tools with possible side effects.

On my machine (slow SSD):

  • dumping the whole original 4GB binary UTXO set takes around 1mn40
  • dumping the set in whole ASCII form takes less than 9GB and 3mn30 (ofc file size/time depends on which ASCII data you write to disk; you can select them via the format argument).

Thanks!

@hebasto
Copy link
Member

hebasto commented Apr 17, 2020

What are possible/expected use cases?

@MarcoFalke
Copy link
Member

MarcoFalke commented Apr 17, 2020

Concept ACK

@DrahtBot
Copy link
Contributor

DrahtBot commented Apr 17, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@pierreN
Copy link
Contributor Author

pierreN commented Apr 18, 2020

@hebasto the most common use case will be for users to easily study the whole UTXO set in only a few minutes.

With this PR it should be really easy to dump any format you want via the format parameter (type of data, number of occurrences and order are all respected).
Also, just adding one element to the ascii_types vector allows you to dump any new type of ASCII data.

For example, you can now trivially plot a graph of when current UTXOs were created (took 3mn30 on my machine):

$ bitcoin-cli dumptxoutset utxos.dat '["height","value"]' false ' '
$ awk '{if($1 in map) { map[$1] += $2; } else { map[$1] = $2 }} END { for(height in map) { print height, map[height]; }}' utxos.dat | sort -n > plot.dat
$ gnuplot -e "plot 'plot.dat' w l; pause -1;"

image

Another example would be to sum the amount of coinbase values in the UTXO set (took 2mn30):

$ bitcoin-cli dumptxoutset utxos.dat '["coinbase","value"]' false ' '
$ fgrep "1 " utxos.dat | awk '{sum += $2} END {print sum/100000000, "btc unspent coinbase"}'
1.76077e+06 btc unspent coinbase

Or really, just anything the user wants. With some more work from the user, it could also make it easier to track some indicators (such as bootstrapping/syncing SOPR).

Another example: doing bitcoin file archaeology.
Since all methods using scriptPubSig to etch data on the blockchain spam the UTXO set, you can retrieve a superset of all TXIDs of "files stored in the blockchain" via this RPC call.

@pierreN
Copy link
Contributor Author

pierreN commented May 1, 2020

rebased cd20cb8

@brakmic
Copy link
Contributor

brakmic commented May 1, 2020

ACK cd20cb8

Built, run and tested on macOS Catalina 10.15.4

./test/functional/rpc_dumptxoutset.py
2020-05-01T13:40:08.556000Z TestFramework (INFO): Initializing test directory /var/folders/7q/4ffytzk562dd2ky4bfg9_w7h0000gn/T/bitcoin_func_test_oc2tzp0y
2020-05-01T13:40:11.072000Z TestFramework (INFO): no_option
2020-05-01T13:40:11.113000Z TestFramework (INFO): all_data
2020-05-01T13:40:11.216000Z TestFramework (INFO): partial_data_1
2020-05-01T13:40:11.304000Z TestFramework (INFO): partial_data_order
2020-05-01T13:40:11.369000Z TestFramework (INFO): partial_data_double
2020-05-01T13:40:11.447000Z TestFramework (INFO): no_header
2020-05-01T13:40:11.537000Z TestFramework (INFO): separator
2020-05-01T13:40:11.617000Z TestFramework (INFO): all_options
2020-05-01T13:40:11.748000Z TestFramework (INFO): Stopping nodes
2020-05-01T13:40:12.313000Z TestFramework (INFO): Cleaning up /var/folders/7q/4ffytzk562dd2ky4bfg9_w7h0000gn/T/bitcoin_func_test_oc2tzp0y on exit
2020-05-01T13:40:12.313000Z TestFramework (INFO): Tests successful
./src/bitcoin-cli -regtest dumptxoutset dump.dat '["txid", "vout"]' false ':'
{
  "coins_written": 407,
  "base_hash": "01ba165996f7a7899e56b37584398adb892a5df7566b95e8de457ab588784740",
  "base_height": 407,
  "path": "/Users/brakmic/Library/Application Support/Bitcoin/regtest/dump.dat"
}
cat "/Users/brakmic/Library/Application Support/Bitcoin/regtest/dump.dat"
208c48f15ed2971709d81da915b72255e50b9251c558dc45981632ed6e4cd300:0
338e1fde4b86e2daaba2bd7cb4f8d77e600f47e7814645aafb480f56f4f41103:0
e73b0564bd56d359bd8df64fa3b9fd8586c3ff0430081aff1f97a9600c834403:0
23ff11ec2801f1c4838fc19863f7fa8d9283ac29e644180a6eaba160fd2a9c03:0
03ba026c466ab490a19b0aa8a39abeeccc6cff24d4a24a34d5f4304ae21e5304:0
98e8ceec62fb6442acaa939461482a46d7c03968082430661854815571eca204:0
4b84d555d8a19ab3cc38152e446fdbd059ec535ab67806a61628f238e495ff04:0
[...snip...]

@DrahtBot DrahtBot mentioned this pull request May 7, 2020
16 tasks
luke-jr pushed a commit to bitcoinknots/bitcoin that referenced this issue Jun 9, 2020
if (!is_compact) {
const auto& arr = request.params[1].get_array();
const std::unordered_map<std::string, cb_t> ascii_map(std::begin(ascii_types), std::end(ascii_types));
for(auto i = 0; i < arr.size(); ++i) {
Copy link
Member

@luke-jr luke-jr Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto doesn't really work in this context...

warning: comparison of integer expressions of different signedness: ‘int’ and ‘size_t’ {aka ‘long unsigned int’} [-Wsign-compare]

Copy link
Contributor Author

@pierreN pierreN Jun 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, funny that a compiler can get the warning but don't properly choose the type of i. I guess the 0 must confuse it.

Thanks for catching this, I've just updated the branch.

@jamesob
Copy link
Member

jamesob commented Aug 26, 2020

Cool, I'll take a look in the next few days.

@jamesob
Copy link
Member

jamesob commented Nov 19, 2020

Concept ACK - will review soon.

@MarcoFalke MarcoFalke removed the Tests label Nov 20, 2020

std::vector<RPCArg> ascii_args;
std::transform(std::begin(ascii_types), std::end(ascii_types), std::back_inserter(ascii_args),
[](const std::pair<std::string, cb_t>& t) { return RPCArg{t.first, RPCArg::Type::STR, RPCArg::Optional::OMITTED, "Info to write for a given UTXO"}; });
Copy link
Contributor

@benthecarman benthecarman Mar 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice if these were more descriptive. It's unclear exactly what the serialization is for these argsr

@MarcoFalke
Copy link
Member

MarcoFalke commented Apr 13, 2021

Still needs rebase

@pierreN
Copy link
Contributor Author

pierreN commented May 6, 2021

Sorry for the few months delay.
I have a bit more time now and will try to follow through with this PR.

I'll update the branch in a few days (I was syncing when my old SSD died).

@Sjors
Copy link
Member

Sjors commented May 21, 2021

Consider moving this functionality to the new bitcoin-util instead. You could add a command that converts the binary format to human readable.

@DrahtBot
Copy link
Contributor

DrahtBot commented May 24, 2021

🐙 This pull request conflicts with the target branch and needs rebase.

Want to unsubscribe from rebase notifications on this pull request? Just convert this pull request to a "draft".

@@ -2564,7 +2608,7 @@ static RPCHelpMan dumptxoutset()
};
}

UniValue CreateUTXOSnapshot(NodeContext& node, CChainState& chainstate, CAutoFile& afile)
UniValue CreateUTXOSnapshot(const bool is_compact, const bool show_header, const std::string& separator, NodeContext& node, CChainState& chainstate, CAutoFile& afile, const std::vector<std::pair<std::string, coinascii_cb_t>>& requested)
Copy link
Member

@luke-jr luke-jr Oct 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it'd be nicer to avoid the two mutually-exclusive bools. Maybe a good case for a class enum?

@@ -183,7 +183,7 @@ CreateAndActivateUTXOSnapshot(NodeContext& node, const fs::path root, F malleati
FILE* outfile{fsbridge::fopen(snapshot_path, "wb")};
CAutoFile auto_outfile{outfile, SER_DISK, CLIENT_VERSION};

UniValue result = CreateUTXOSnapshot(node, node.chainman->ActiveChainstate(), auto_outfile);
UniValue result = CreateUTXOSnapshot(false, false, "", node, node.chainman->ActiveChainstate(), auto_outfile, {});
Copy link
Member

@luke-jr luke-jr Oct 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first false here should be true, as ActivateSnapshot can only handle the binary/compact format.

@josibake
Copy link
Contributor

josibake commented Dec 28, 2021

Concept ACK

@pierreN are you still working on this? I'm happy to try and take it over the finish line if you're not.

@jamesob
Copy link
Member

jamesob commented Jan 3, 2022

re-Concept ACK and at a high-level the code looks pretty good. Nice job on the tests. In need of a rebase though.

luke-jr pushed a commit to luke-jr/bitcoin that referenced this issue Feb 8, 2022
luke-jr pushed a commit to luke-jr/bitcoin that referenced this issue Feb 8, 2022
luke-jr pushed a commit to luke-jr/bitcoin that referenced this issue Feb 8, 2022
luke-jr pushed a commit to luke-jr/bitcoin that referenced this issue Feb 8, 2022
@luke-jr
Copy link
Member

luke-jr commented Feb 8, 2022

If you decide to revive this PR, I've done an extensive rebase at master...luke-jr:rpc_dumptxoutset_hr (leave off the last commit), rebasing it on top of (but not compatible with) #24202, and splitting up the different functionality across multiple commits.

@MarcoFalke
Copy link
Member

MarcoFalke commented Mar 22, 2022

I think this was picked up in ##24202 , so can be closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet