RPC: Introduce getblockstats to plot things #10757

Open
wants to merge 4 commits into
from

Conversation

Projects
None yet
5 participants
Member

jtimon commented Jul 6, 2017 edited

It returns per block statistics about several things. It should be easy to add more if people think of other things to add or remove some if I went too far (but once written, why not keep it?).

The currently available options are: minfee,maxfee,totalfee,minfeerate,maxfeerate,avgfee,avgfeerate,txs,ins,outs

For the x axis, one can use height or block.nTime (I guess I could add mediantime if there's interest).

To calculate fees, -txindex is required.

jtimon referenced this pull request in ElementsProject/elements Jul 6, 2017

Open

Rpc: Introduce call for per-block fee statistics #185

Some ideas for additions:

  • Non-fee total output amount
  • Coinbase reward
  • Money supply including this block
  • Transaction weight txweight (it can be derived from existing fields, however)

I would prefer to see both time and mediantime returned, since they are available.

Should we return non-independent fields, such as avgfee when also including totalfee and txs?

I find that for bitcoin-related data, the median is often more useful than the average of a distribution. Including medianweight, medianfee, medianfeerate, medianoutput etc would expose these useful quantities to the user.

Member

jnewbery commented Jul 7, 2017

but once written, why not keep it?

Because more code => more bugs and more maintenance effort. I prefer:

If it's not really needed, why add it?

This is perhaps a nice-to-have, but since #8704, getblock can return all transactions in a block (without requiring txindex). Those can then be parsed and analysed offline.

Is there a compelling use-case I'm missing here? This seems like a feature only a small subset of users would be interested in, in which case an offline tools seems more appropriate.

Sorry - not meaning to be negative, but my default reaction to new RPCs/arguments tends towards NACK unless I can see a compelling and widespread use-case.

This is perhaps a nice-to-have, but since #8704, getblock can return all transactions in a block (without requiring txindex). Those can then be parsed and analysed offline.

This code pulls each transaction input's previous outpoint in order to compute transaction fees. Replicating that in RPC would require thousands of calls for most blocks.

Member

jnewbery commented Jul 7, 2017

This code pulls each transaction input's previous outpoint

Ah yes, of course. Concept ACK in that case. Doing this with getblock / getrawtransaction is infeasible.

Member

jtimon commented Jul 7, 2017 edited

Because more code => more bugs and more maintenance effort. I prefer:

Sure, but I mean, removing for example the avgfee or avgfeerate won't safe much code or testing code, just a few lines. Forget I said this, if there's specific functions to remove because nobody will want them, let's remove those and focus on the ones people want. Adding specific things only a few people want can also happen in their own branches, so it's no big deal.

The only use case is gather statistics, presumably to plot things, create charts. That is, at least, compelling to me, but I don't think that will have widespread usage. I also don't think all rpc calls have it. Is getchaintxstats, for example, a widespread use case?

If that's enough reason not to merge this, it's fine, I can maintain it as a separate branch that I periodically rebase, it is simple enough, so that won't be a big deal. On the other hand, if I can get it reviewed and merged it'll be less work for me in the long run and I also get the review.

Non-fee total output amount
Coinbase reward

Sounds good.

Money supply including this block

Mhmm, it would be simpler to calculate here from start to end here than from genesis. But it's pretty trivial to write a function in any language that returns the total supply for a given height without access to any historic data. Unless you are talking about discounting op_return outputs or something like that. I don't think this is very interesting here. Perhaps that can be done in getchaintxstats ?

Transaction weight txweight (it can be derived from existing fields, however)

In fact I'm using weight for everything. I should s/size/weight/ and probably also show size separately.
Maybe separate feerates in by weight and serialize size? I don't know...

I would prefer to see both time and mediantime returned, since they are available.

Yeah, the mediantime takes a little bit longer to be calculated but not much and one can always disable anything. In fact, the height and time shouldn't be treated in any special way for being "the x axis" and should be allowed to be disabled like the rest.

Should we return non-independent fields, such as avgfee when also including totalfee and txs?

This is a good question. This is mostly what I meant by "why not if it's this easy?".
But yeah, I guess non-independent are good candidates to be removed.

re median: yeah, that sounds interesting too, good idea!

Mhmm, it would be simpler to calculate here from start to end here than from genesis. But it's pretty trivial to write a function in any language that returns the total supply for a given height without access to any historic data. Unless you are talking about discounting op_return outputs or something like that. I don't think this is very interesting here. Perhaps that can be done in getchaintxstats ?

I was thinking of the more trivial version, rather than the supply - provably_unspendable version, so keeping that as external code makes more sense. Maintaining the sum of spendable outputs against block height is a much more ambitious idea, and it may make sense in the future. However, it is probably out of scope of this PR.

@promag

Partial review, suggestion to use std::set.

Nit, rename allowed_plot_values to valid_plot_values.
Nit, rename getperblockstats to just getblockstats?

src/rpc/blockchain.cpp
+ }
+}
+
+static bool IsAllowedPlotValue(const std::string& plot_value, std::vector<std::string>& allowed_plot_values)
@promag

promag Jul 7, 2017

Contributor

Remove.

src/rpc/blockchain.cpp
+
+UniValue getperblockstats(const JSONRPCRequest& request)
+{
+ std::string str_allowed_plot_values = "minfee,maxfee,totalfee,minfeerate,maxfeerate,avgfee,avgfeerate,txs,ins,outs";
@promag

promag Jul 7, 2017

Contributor

Remove.

src/rpc/blockchain.cpp
+UniValue getperblockstats(const JSONRPCRequest& request)
+{
+ std::string str_allowed_plot_values = "minfee,maxfee,totalfee,minfeerate,maxfeerate,avgfee,avgfeerate,txs,ins,outs";
+ std::vector<std::string> allowed_plot_values;
@promag

promag Jul 7, 2017

Contributor
std::set<std::string> allowed_plot_values = {"minfee", "maxfee", "..."};
src/rpc/blockchain.cpp
+{
+ std::string str_allowed_plot_values = "minfee,maxfee,totalfee,minfeerate,maxfeerate,avgfee,avgfeerate,txs,ins,outs";
+ std::vector<std::string> allowed_plot_values;
+ boost::split(allowed_plot_values, str_allowed_plot_values, boost::is_any_of(","));
@promag

promag Jul 7, 2017

Contributor

Remove.

src/rpc/blockchain.cpp
+ "\nArguments:\n"
+ "1. \"start\" (numeric, required) The height of the block that starts the window.\n"
+ "2. \"end\" (numeric, optional) The height of the block that ends the window (default: current tip).\n"
+ "3. \"plotvalues\" (string, optional) Values to plot (comma separated), default(all): " + str_allowed_plot_values +
@promag

promag Jul 7, 2017

Contributor
"..." + boost::join(allowed_plot_values, ",")
src/rpc/blockchain.cpp
+ "3. \"plotvalues\" (string, optional) Values to plot (comma separated), default(all): " + str_allowed_plot_values +
+ "\nResult:\n"
+ "{\n"
+ "}\n"
@promag

promag Jul 7, 2017

Contributor

Missing result example.

src/rpc/blockchain.cpp
+ "}\n"
+ "\nExamples:\n"
+ + HelpExampleCli("getperblockstats", "1000 1000 \"minfeerate,avgfeerate\"")
+ + HelpExampleRpc("getperblockstats", "1000 1000 \"maxfeerate,avgfeerate\"")
@promag

promag Jul 7, 2017

Contributor

Duplicate.

@jtimon

jtimon Jul 8, 2017

Member

One is HelpExampleCli and the other is HelpExampleRpc

@promag

promag Jul 8, 2017

Contributor

Ops sorry, overlooked it.

src/rpc/blockchain.cpp
+ if (request.params.size() > 2) {
+ str_plot_values = request.params[2].get_str();
+ }
+ std::vector<std::string> plot_values;
@promag

promag Jul 7, 2017

Contributor
std::set<std::string> plot_values;
if (request.params.size() > 2) {
  boost::split(plot_values, request.params[2].get_str(), boost::is_any_of(","));

  // only validate in this case
  // ... 
} else {
  plot_values = allowed_plot_values;
}
src/rpc/blockchain.cpp
+ std::vector<std::string> plot_values;
+ boost::split(plot_values, str_plot_values, boost::is_any_of(","));
+ for (const std::string plot_value : plot_values) {
+ if (!IsAllowedPlotValue(plot_value, allowed_plot_values)) {
@promag

promag Jul 7, 2017

Contributor
if (allowed_plot_values.count(plot_value) == 0) {
@jtimon

jtimon Jul 8, 2017

Member

duh, I was so much over-complicating things so much for no good reason...thank you!

Member

jtimon commented Jul 8, 2017 edited

Thanks again for the great feedback!

@promag I think I solved all your nits except for #10757 (comment)

@clarkmoody I think I added most of your suggestions, explicitly excluding anything that involved accumulations neither from height=1 nor from height=start.
The former potentially implies a world of complexity and the latter can be trivially calculated on the visual side: I would completely discard any accumulator redundancy in this rpc beforehand.

And for the rest of the redundancies, @jnewbery and @clarkmoody - thanks again for pointing it out -, it's never too late to remove them before merging like a trivial squash and it's never too soon to start saying which ones you would bikesay* out first. Also bikesay the names for the curves and even the order in the list (duplicated for c++ and python).

In the meantime, I embraced redundancy since, as said, it will be trivial for me to remove later. And also the pertinent optimizations to skip calculations when plot_values.count("minfee") == 0 or actually only when the extra calculation is more expensive than the searching in plot_values which is a set of strings.

For example, we have blockfees, reward, subsidy, complying with consensus rule reward == blockfees + subsidy. Only 2 of the 3 are necessary, at least one is redundant. My personal preference is removing either subsidy or reward or subsidy, but not blockfees. But at said once written there's no problem with me in just making sure their tests don't surprise me until we decide which ones didn't deserve it.

Which one seems bikesaying in principle. But not in this case.
blockfees/total_fees serves for other calculations like avgfeerate. Let's not remove that one, just rename it.

But it is more interesting to propose new ones than to rename or vote for removal IMO. I believe the most interesting addition to this point was utxo_size_inc, which would welcomed some review from people who measures sizes more carefully like @sipa , since this doesn't use GetSerializeSize for Coin intentionally, independently of the optimization to read Coin if available in the utxo before calling RpcGetTx. I'm still not sure what to do with pre/post segwit feerates, does anybody care about the pre ones? which one needs the scale factor? none?

REM CalculateTruncatedMedian doesn't need to be a template at this point, but there's no harm being static IMO

EDIT: still some TODOs, mostly documentation and pending decisions

Contributor

promag commented Jul 8, 2017

@jtimon no problem. There are some nits to fix but I'll review more in depth later.

src/policy/feerate.cpp
@@ -20,20 +20,22 @@ CFeeRate::CFeeRate(const CAmount& nFeePaid, size_t nBytes_)
nSatoshisPerK = 0;
}
-CAmount CFeeRate::GetFee(size_t nBytes_) const
+CAmount CFeeRate::GetTruncatedFee(size_t nBytes_) const
@promag

promag Jul 10, 2017

Contributor

Drop _? Same below.

@jtimon

jtimon Jul 11, 2017

Member

Well, yes, conserving the old name only saves 1 line of extra disruption. But I guess if we're touching the variable name we should use the new style. just bytes?

src/rpc/blockchain.cpp
@@ -685,6 +686,22 @@ UniValue getblockheader(const JSONRPCRequest& request)
return blockheaderToJSON(pblockindex);
}
+static void ReadBlockCheckPruned(const CBlockIndex* pblockindex, CBlock& block)
@promag

promag Jul 10, 2017

Contributor

Keep argument order as ReadBlockFromDisk? Is there a convention for where the output arguments should be?

@jtimon

jtimon Jul 10, 2017

Member

Not that I know of, but your proposed change sounds good to me.

src/rpc/blockchain.cpp
+{
+ T median;
+ size_t size = scores.size();
+ std::sort(scores.begin(), scores.end());
@promag

promag Jul 10, 2017

Contributor

Nit, could sort only after size == 1 case.

src/rpc/blockchain.cpp
+}
+
+// outpoint (needed for the utxo index) + nHeight + fCoinBase
+static const size_t PER_UTXO_OVERHEAD = sizeof(COutPoint) + sizeof(uint32_t) + sizeof(bool);
@promag

promag Jul 10, 2017 edited

Contributor

static constexpr ...

@jtimon

jtimon Jul 10, 2017

Member

what's the gain? https://stackoverflow.com/a/41132221/935325 says it's the same...

src/rpc/blockchain.cpp
+ ReadBlockCheckPruned(pindex, block);
+
+ for (const auto& tx : block.vtx) {
+
@promag

promag Jul 10, 2017

Contributor

Remove empty line.

src/rpc/blockchain.cpp
+ "\nArguments:\n"
+ "1. \"start\" (numeric, required) The height of the block that starts the window.\n"
+ "2. \"end\" (numeric, optional) The height of the block that ends the window (default: current tip).\n"
+ "3. \"plotvalues\" (string, optional) Values to plot (comma separated), default(all): " + boost::join(valid_plot_values, ",") +
@promag

promag Jul 10, 2017

Contributor

Replace plotvalues with stats? Also, 3rd argument could be object options?

@jtimon

jtimon Jul 10, 2017

Member

Isn't the string simpler?

@promag

promag Jul 11, 2017

Contributor

Ignore options suggestion.

src/rpc/blockchain.cpp
+ int end;
+ if (request.params.size() > 1) {
+ end = request.params[1].get_int();
+ if (end < 0 || end > chainActive.Height()) {
@promag

promag Jul 10, 2017

Contributor

Nit, negative block could mean end = height - end to avoid early blocks (not new concept here I believe)?

@jtimon

jtimon Jul 10, 2017

Member

Mhmm, interesting. To be clear you mean start=-10 end=200 would be equivalent to start=190 end=200, right?

@promag

promag Jul 11, 2017

Contributor

No, I meant negative values are relative to the tip. To get the stats for the last 10 blocks you would pass start = -10 without querying the current block height.

src/rpc/blockchain.cpp
+ map_stats[plot_value] = UniValue(UniValue::VARR);
+ }
+
+ CBlockIndex* pindex = chainActive[end];
@promag

promag Jul 10, 2017

Contributor

Remove.

src/rpc/blockchain.cpp
+
+ CBlockIndex* pindex = chainActive[end];
+ for (int i = end; i >= start; i--) {
+ UpdateBlockStats(pindex, plot_values, map_stats);
@promag

promag Jul 10, 2017

Contributor
UpdateBlockStats(chainActive[i], ...);
@jtimon

jtimon Jul 10, 2017

Member

This will be slightly less efficient, no?

@promag

promag Jul 11, 2017

Contributor

I guess it takes few more cycles but non critical code should be cleaner?

Member

jtimon commented Jul 11, 2017 edited

Here are some images generated using this branch in combination with (WIP): https://github.com/jtimon/rpc-explorer

GUI detail:

screenshot_plotter

GUI detail zoom:

screenshot_plotter_zoom

Hide some:

plotter1

Hide more:

plotter2

Fees:

plotterfees

Utxo size increase:

plotter_utxo

jtimon changed the title from RPC: Introduce getperblockstats to plot things to RPC: Introduce getblockstats to plot things Jul 11, 2017

jtimon added some commits Jul 8, 2017

@jtimon jtimon Avoid special case for truncated zeros with new CFeeRate::GetTruncate…
…dFee
fc389a4
@jtimon jtimon RPC: Separate ReadBlockCheckPruned() from getblock() 95128f8
@jtimon jtimon RPC: Introduce getblockstats 7158154
@jtimon jtimon QA: Test new getblockstats RPC 90c7264
Member

jtimon commented Jul 12, 2017

Without the documentation for the result it was impossible to distinguish a weird choice to spring discussion from an implementation mistake. Removed the other TODO comments.
Coded more pending suggestions by @promag (hopefully all pending ones? if not, please insist) with some extra bikeshedding derived from s/plotvalues/stats/ and adapt tests to start and end being allowed to be negative.

More cleanups can be done, specially in the tests if we go further with #10757 (comment) and not calculate in inverse order (there's no point if we don't get the slight optimization).

jnewbery referenced this pull request Jul 13, 2017

Closed

Add histunspent RPC #10804

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment