-
Notifications
You must be signed in to change notification settings - Fork 35.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add feerate histogram to getmempoolinfo #21422
Conversation
Hey @kiminuo, can you amend the commit and remove my mention? Keep in mind that mentions in commits lead to notifications. Ty. |
Yes, sorry for that. |
e4abe8f
to
d114770
Compare
Thanks for picking this up. |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
476f660
to
81cd4bb
Compare
Concept ACK. For the test commit, maybe a more descriptive title that can be understood on its own. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few initial comments.
@jonatack Thanks for the superb review! I'm working on incorporating your suggestions. |
b2f8ddb
to
3d923f4
Compare
2a05b39
to
fb0824c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for switching it from BTC/kvB to sat/vB!
I looked over this more thoroughly and saw quite a few remaining fixups and suggestions. One thing to ensure is that the help docs are the same as the actual output and use the correct fee rate units and arg types.
It would be good if the functional test actually verified that the values are correctly calculated after creating a few txns; ATM it is only really a smoke test that verifies the output structure.
suggestions
diff --git a/src/rpc/blockchain.cpp b/src/rpc/blockchain.cpp
index 8d7ce13e57..fbae7cdbaf 100644
--- a/src/rpc/blockchain.cpp
+++ b/src/rpc/blockchain.cpp
@@ -1512,7 +1512,7 @@ UniValue MempoolInfoToJSON(const CTxMemPool& pool, const std::optional<MempoolHi
* ... cumulated fees */
std::vector<uint64_t> sizes(limits.size(), 0);
std::vector<uint64_t> count(limits.size(), 0);
- std::vector<uint64_t> fees(limits.size(), 0);
+ std::vector<CAmount> fees(limits.size(), 0);
for (const CTxMemPoolEntry& e : pool.mapTx) {
const CAmount fee{e.GetFee()};
@@ -1528,15 +1528,19 @@ UniValue MempoolInfoToJSON(const CTxMemPool& pool, const std::optional<MempoolHi
CHECK_NONFATAL(dsize > 0);
CHECK_NONFATAL(asize + dsize - size > 0);
- CAmount fpb{fee / size}; // fee per byte
- CAmount afpb{afees / asize}; // fee per byte including ancestors
- CAmount dfpb{dfees / dsize}; // fee per byte including descendants
- CAmount tfpb{(afees + dfees - fee) / (asize + dsize - size)};
- CAmount feeperbyte{std::max(std::min(dfpb, tfpb), std::min(fpb, afpb))};
+ const CAmount fpb{fee / size}; // fee per byte
+ const CAmount afpb{afees / asize}; // fee per byte including ancestors
+ const CAmount dfpb{dfees / dsize}; // fee per byte including descendants
+ const CAmount tfpb{(afees + dfees - fee) / (asize + dsize - size)};
+ const CAmount fee_per_byte{std::max(std::min(dfpb, tfpb), std::min(fpb, afpb))};
+
+ assert(fpb == CFeeRate(fee, size).GetFee(1));
+ assert(afpb == CFeeRate(afees, asize).GetFee(1));
+ assert(dfpb == CFeeRate(dfees, dsize).GetFee(1));
+ assert(tfpb == CFeeRate(afees + dfees - fee, asize + dsize - size).GetFee(1));
+
- // Distribute feerates into feelimits
+ // Distribute fee rates into fee limits
for (int i = limits.size() - 1; i >= 0; --i) {
- if (feeperbyte >= limits[i]) {
+ if (fee_per_byte >= limits[i]) {
sizes[i] += size;
++count[i];
fees[i] += fee;
@@ -1545,23 +1549,23 @@ UniValue MempoolInfoToJSON(const CTxMemPool& pool, const std::optional<MempoolHi
}
}
UniValue ranges(UniValue::VOBJ);
for (size_t i = 0; i < limits.size(); ++i) {
UniValue info_sub(UniValue::VOBJ);
- info_sub.pushKV("sizes", sizes[i]);
- info_sub.pushKV("count", count[i]);
- info_sub.pushKV("fees", fees[i]);
- info_sub.pushKV("from_feerate", limits[i]);
+ info_sub.pushKV("size", sizes.at(i));
+ info_sub.pushKV("count", count.at(i));
+ info_sub.pushKV("fees", fees.at(i));
+ info_sub.pushKV("from", limits.at(i));
if (i == limits.size() - 1) {
- info_sub.pushKV("to_feerate", "Max"); // TODO.
+ info_sub.pushKV("to", "max"); // TODO.
} else {
- info_sub.pushKV("to_feerate", limits[i + 1]);
+ info_sub.pushKV("to", limits[i + 1]);
}
- total_fees += fees[i];
- ranges.pushKV(ToString(limits[i]), info_sub);
+ total_fees += fees.at(i);
+ ranges.pushKV(ToString(limits.at(i)), info_sub);
}
UniValue info(UniValue::VOBJ);
@@ -1580,7 +1584,7 @@ static RPCHelpMan getmempoolinfo()
{
{"fee_histogram", RPCArg::Type::ARR, RPCArg::Optional::OMITTED_NAMED_ARG, "Fee statistics grouped by fee rate ranges",
{
- {"fee_rate", RPCArg::Type::NUM, RPCArg::Optional::NO, "Fee rate (in satoshis) to group the fees by"},
+ {"fee_rate", RPCArg::Type::NUM, RPCArg::Optional::NO, "Fee rate (in " + CURRENCY_ATOM + "/vB) to group the fees by"},
},
},
},
@@ -1591,31 +1595,31 @@ static RPCHelpMan getmempoolinfo()
{RPCResult::Type::NUM, "size", "Current tx count"},
{RPCResult::Type::NUM, "bytes", "Sum of all virtual transaction sizes as defined in BIP 141. Differs from actual serialized size because witness data is discounted"},
{RPCResult::Type::NUM, "usage", "Total memory usage for the mempool"},
- {RPCResult::Type::STR_AMOUNT, "total_fee", "Total fees for the mempool in " + CURRENCY_UNIT + ", ignoring modified fees through prioritizetransaction"},
+ {RPCResult::Type::STR_AMOUNT, "total_fee", "Total fees for the mempool in " + CURRENCY_UNIT + "/kvB, ignoring modified fees through prioritizetransaction"},
{RPCResult::Type::NUM, "maxmempool", "Maximum memory usage for the mempool"},
- {RPCResult::Type::STR_AMOUNT, "mempoolminfee", "Minimum fee rate in " + CURRENCY_UNIT + "/kB for tx to be accepted. Is the maximum of minrelaytxfee and minimum mempool fee"},
+ {RPCResult::Type::STR_AMOUNT, "mempoolminfee", "Minimum fee rate in " + CURRENCY_UNIT + "/kvB for tx to be accepted. Is the maximum of minrelaytxfee and minimum mempool fee"},
{RPCResult::Type::STR_AMOUNT, "minrelaytxfee", "Current minimum relay fee for transactions"},
{RPCResult::Type::NUM, "unbroadcastcount", "Current number of transactions that haven't passed initial broadcast yet"},
{RPCResult::Type::OBJ, "fee_histogram", "",
{
- {RPCResult::Type::OBJ_DYN, "ranges", "Feerate groups",
+ {RPCResult::Type::OBJ_DYN, "ranges", "",
{
- {RPCResult::Type::OBJ, "<feerate-group>", "Object per feerate group",
+ {RPCResult::Type::OBJ, "<fee_rate_group>", "Fee rate group named by its lower bound (in " + CURRENCY_ATOM + "/vB), identical to the \"from\" field below",
{
- {RPCResult::Type::NUM, "size", "Cumulated size of all transactions in feerate group"},
- {RPCResult::Type::NUM, "count", "Amount of transactions in feerate group"},
- {RPCResult::Type::NUM, "fees", "Cumulated fees of all transactions in feerate group (in satoshis)"},
- {RPCResult::Type::NUM, "from_feerate", "Group contains transactions with feerates equal or greater than this value (in satoshis)"},
- {RPCResult::Type::NUM, "to_feerate", "Group contains transactions with feerates less than this value (in satoshis)"},
+ {RPCResult::Type::NUM, "size", "Cumulative size of all transactions in the fee rate group"},
+ {RPCResult::Type::NUM, "count", "Number of transactions in the fee rate group"},
+ {RPCResult::Type::NUM, "fees", "Cumulative fees of all transactions in the fee rate group (in " + CURRENCY_ATOM + "/vB)"},
+ {RPCResult::Type::NUM, "from", "Group contains transactions with fee rates equal or greater than this value (in " + CURRENCY_ATOM + "/vB)"},
+ {RPCResult::Type::NUM, "to", "Group contains transactions with fee rates less than this value (in " + CURRENCY_ATOM + "/vB)"},
}}}},
- {RPCResult::Type::NUM, "total_fees", "Total available fees in mempool (in satoshis)"},
+ {RPCResult::Type::NUM, "total_fees", "Total available fees in mempool (in " + CURRENCY_ATOM + "/vB)"},
}},
}},
RPCExamples{
HelpExampleCli("getmempoolinfo", "") +
- HelpExampleCli("getmempoolinfo", R"([1, 10, 100, 200, 400, 800])") +
+ HelpExampleCli("getmempoolinfo", R"('[1, 5, 10, 25, 50, 100]')") +
HelpExampleRpc("getmempoolinfo", "") +
- HelpExampleRpc("getmempoolinfo", R"([1, 10, 100, 200, 400, 800])")
+ HelpExampleRpc("getmempoolinfo", R"([1, 5, 10, 25, 50, 100])")
},
diff --git a/test/functional/mempool_fee_histogram.py b/test/functional/mempool_fee_histogram.py
index 0cbaa97bfc..91c6dccc30 100755
--- a/test/functional/mempool_fee_histogram.py
+++ b/test/functional/mempool_fee_histogram.py
@@ -31,15 +31,15 @@ class MempoolFeeHistogramTest(BitcoinTestFramework):
total_fees = 0
for key, bin in info['fee_histogram']['ranges'].items():
- assert_equal(int(key), bin['from_feerate'])
+ assert_equal(int(key), bin['from'])
if bin['fees'] > 0:
assert_greater_than(bin['count'], 0)
else:
assert_equal(bin['count'], 0)
assert_greater_than_or_equal(bin['fees'], 0)
- assert_greater_than_or_equal(bin['sizes'], 0)
- if bin['to_feerate'] != 'Max':
- assert_greater_than(bin['to_feerate'], bin['from_feerate'])
+ assert_greater_than_or_equal(bin['size'], 0)
+ if bin['to'] != 'max':
+ assert_greater_than(bin['to'], bin['from'])
total_fees += bin['fees']
fb0824c
to
ccb043c
Compare
This is on my TODO list. So I will improve it over coming days. |
The Bitcoin Core wallet does not use this feature? |
The original idea of this patch was to add a small and easy to maintain efficient way to generate a fee histogram to bitcoin core (see original PR description #15836 (comment)). As described there, it was always possible to use However, it is not efficient to dumb out all transactions of the complete mempool in a json format just to calculate the fee histogram with a python script. Thus I think this is a valuable addition to Bitcoin Core. The patch size is minimal and it seems like there is some interest in this feature. EDIT: this PR has nothing to do with the Bitcoin cores internal wallet. It's pure mempool statistics. |
Does not use yet? |
My point is that a wallet without fee estimation is not very usable and harms the overall network. If we don't want to have an effective way of determining the fee in Bitcoin Core, we might as well remove the entire wallet from the codebase. |
There seems to be confusion here. This PR does not touch the fee estimator or the wallet in any way. The fee estimator already keeps track of how many transactions at specific feerates {in the mempool, have confirmed within N blocks, haven't confirmed within N blocks} there are internally; the wallet queries it directly for feerate estimates. I don't know of any examples where the wallet needs to see a graph of something. If you feel that the fee estimation algorithms need improvement, by all means I'd say some review/PRs there would be welcome. Maybe take a look at #25380 or #21161. For those that feel very passionately about improving how the wallet calculates fees, my personal recommendation would be to review #26152 to have the strongest impact. |
It's confusing to me why this PR, tagged as edit: glozow was 1min faster |
That would require just moving fee histogram code out of |
Bitcoin Core has had fee estimation logic for a long time ( The Bitcoin Core wallet automatically uses this internal feerate estimation logic to decide on the fees of transactions. You need to override things, or use more low-level RPCs, to not use it. That feerate estimation logic works by looking at how quickly mempool transactions get confirmed, over longer time windows, not by looking at the current mempool composition (looking at just your own mempool may give very bad estimates if your relay policy isn't exactly what the network and miners use, which is very hard to guarantee). This PR is about exposing mempool feerate histogram data through RPC. It's not a feerate estimator, and is entirely unrelated to the wallet. While the logic could in theory be used for feerate estimation too, that's not what this PR does, and I'm concerned that doing so would risk giving very inaccurate results when faced with diverging network policies. Conceptually, however, I think it's reasonable to add something like this, as just RPC (or GUI) output. It can be informative for users, even if it can't be directly used for automatic fee estimation. Of course, that's subject to reviewer interest. |
I am not against sipa as I understand fee estimation is difficult however There are lot of occasions when I found it to be irrelevant and just look at mempool.space or multiple explorers One of the project that I found more useful: https://github.com/TrueLevelSA/btc-congestion-manager Anyway, this PR tries to improve how projects use mempool info and should not have been controversial at any point. Request to all developers including maintainers to re consider their opinion. |
For Wasabi's purposes this output is needed purely for fee estimation, so it'd be ideal if Core would estimate fees properly. Here's how I see things:
|
@nopara73 I'm aware of As I said, I think an RPC like this is valuable for people who want additional information for fee estimation, but using it automatically without at least someone looking at the result seems dangerous to me. E.g. future softforks your node doesn't know about could permit an attacker to stuff your mempool with high-fee unminable transactions. |
@sipa By not providing sort-of accurate estimations from the current mempool, we force people to rely on third-party estimators, which is arguably much worse than having an independent estimation which works most of the time, although it may be games under certain specific conditions. I think that "perfect is enemy of good" rule applies here perfectly. |
@sipa sorry for the late reply, at first I was like "yes, that makes perfect sense," however an argument came to my mind - this morning during shower :) - in favor of it, so I'd like to share that. Can't we assume miners are seeking the highest fee transactions at all times? In other words: maximize their income from fees. If we could, then adjusting Now I may argue against this with "it is not guaranteed that the miners have the highest fee txs that my node does as well." To this, first I'd say: we can guarantee they want to have them and isn't that enough? In summary, unless I am missing the point and you're referring to another kind of game/attack here that does not involve the attacker creating blocks of high fee transactions, we can be reassured the attack is too expensive to be worth trying. |
Using One workaround is to use It's a pity the PR has been closed since the complexity is low, there is clearly wide interest for it across many wallets, and a fee rate histogram is a useful tool in the toolbox when estimating fees. In addition, any middleware implementing the Electrum protocol needs to use one of the inefficient methods described to implement the
+1. |
Co-authored-by: Jonas Schnelli <dev@jonasschnelli.ch> Co-authored-by: Jon Atack <jon@atack.com> Github-Pull: bitcoin#21422 Rebased-From: 0b87ba9
I think the discussion here is conflating two things:
It's clear that the existing fee estimation algorithms make it inaccurate for many real use cases. There are good reasons for why it works the way it does, but that doesn't change the fact that the outcome is bad. We can certainly discuss what can be done about that. I've posted about one possible approach in #27995. However, this PR is about something else; adding a feerate histogram. That seems to me like a vaguely reasonable, independent feature, but it has nothing to do with our fee estimation algorithm. And wanting a feerate histogram so you can implement fee estimation yourself just because the fee estimation feature itself does a bad job seems to be the wrong way around. If that's the goal we should just think about how to improve the algorithm. |
To be clear, my desire for an efficient way to retrieve fee rate histogram data has nothing to do with Bitcoin Core's fee estimation algorithm. Because fee estimation is inherently difficult and no algorithm can know the user's time preference for every transaction, Sparrow Wallet displays the fee rate histogram chart in the UI as another "tool in the toolbox". This allows a user to observe trends and adjust the fee rate to place a transaction in a particular fee rate band to suit their time preference. This is not to suggest such an approach is always superior to algorithmic fee rate estimation, merely that more data is generally useful when trying to predict the future. |
I think the Even if this pull was merged, it will just give one answer and I don't think it is clear whether this is the answer that users actually want? See #21422 (comment) and the previous discussion threads. |
I think it's worth noting that a feerate histogram built from individual transaction feerates can be very misleading for fee estimation. A CPFP could look like n low feerate data points + 1 extremely high feerate data point. A few dozen 100kvB high-feerate transactions may dominate the block you are trying to get into, but they look identical to 100vB transactions as data points. As a user, I have no idea how I would look at a feerate histogram and get an idea of what feerate to put on my transaction. I agree that fee estimation using the current contents of the mempool can be very useful. However, the process of taking current mempool contents and meaningfully translating them into high/med/low priority bands is quite involved. Linking one implementation which is a lot more accurate than using individual feerates, but quite computationally heavy and not necessarily appropriate for the oom'd raspis either. Perhaps with cluster mempool we can expose a mempool entry's mining score, but that score still doesn't tell you much about how much block space is taken up by each data point.
This is the most appropriate solution 👍. A simple map from txid to feerate would do if you're trying to keep a feerate histogram.
Yeah, the nice thing about only needing fee and vsize is these will never change for a given txid in your mempool. You only need to query the entries you don't already have in your external data structure.
On Core's side this idea seems the most worth pursuing, if somebody wants to open a pull. |
Hey, developer of matrix here. I think a neural net fee estimator is going to be pretty performant compared to mempool space. My logger already takes into account things like CPFP and calculates the total package fee rate, and with approx 50MB of training data it's capable of estimating fees such that if the necessary rate goes down, it reacts more quickly that estimatesmartfee. Like the built-in fee estimator, I do not use the current contents of the mempool. Rather, the training data packages up the time of tx seen, block count from seen to confirmed, and total package feerate. When it comes to inference-time, lighter nodes like RPis can download and validate a pretrained model. Hence, no fear of OOMing. |
The points regarding CPFP etc are well noted, although it's possible these are in practice more applicable to higher time preference fee estimations? Estimation aside, there are not insignificant client costs to maintaining a "txid to vsize and fee rate" mempool replica. By my measurement, this data structure with ~125,000 transactions requires ~14Mb of RAM. Further, the initial load using |
If memory usage is the only concern, I am sure you could toss the transactions into (lossy) buckets and then subscribe via zmq to avoid duplicates, without having to remember txids, or individual size/fee details.
If you only care about the top N blocks worth of transactions, the memory usage likely could also be reduced. |
I don't see the use of the histogram for fee estimation. You can have a lot of transactions with minimum fee that generate a large bar dominating the histogram. I see more useful a representation like mempool.space showing the list of transactions ordered by decreasing fee and grouped by blocks. That is a list of pending blocks with associated feerate ranges. |
@santochibtc "Feerate histogram" here is a bytes vs fee diagram, not a #tx vs fee diagram. |
Thanks @sipa, yes, it is "a histogram of the fee rates paid by transactions in the memory pool, weighted by transaction size", but my point is still valid. If you want to include your tx in the next block you must estimate your fee based on the pending txs that are paying more per byte. I think that the histogram is not always a good picture of the mempool to estimate fees. |
It's good enough. |
Co-authored-by: Jonas Schnelli <dev@jonasschnelli.ch> Co-authored-by: Jon Atack <jon@atack.com> Github-Pull: bitcoin#21422 Rebased-From: 0b87ba9
Co-authored-by: Jonas Schnelli <dev@jonasschnelli.ch> Co-authored-by: Jon Atack <jon@atack.com> Github-Pull: bitcoin#21422 Rebased-From: 0b87ba9
Co-authored-by: Jonas Schnelli <dev@jonasschnelli.ch> Co-authored-by: Jon Atack <jon@atack.com> Github-Pull: bitcoin#21422 Rebased-From: 0b87ba9
This PR is a slightly modified version of #15836 (jonasschnelli):
My attempts to contact jonasschnelli were, unfortunately, unsuccessful so I decided to create this PR in an attempt to move this forward. If this is somehow problematic, please let me know to work it out.edit: Jonas is happy the work continues.This PR
Note that REST support which is in #15836 is not included in this PR. It can be improved in this PR or in a follow-up one if it is deemed useful/required/etc.
Applied review comments from the old PR
rest_mempool_info_with_fee_histogram
is never called."if (feeperbyte >= feelimits[i] && (i == feelimits.size() - 1 || feeperbyte < feelimits[i + 1])) {
std::to_string(...)
replaced with `ToString(...) as done in Replace std::to_string with locale-independent alternative #18134Test commands
$ test/functional/test_runner.py mempool_fee_histogram.py # To run the new test
Output on testnet (2022-07-09)
./bitcoin-cli -testnet -rpcuser=test -rpcpassword=test getmempoolinfo "[0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14, 17, 20, 25, 30, 40, 50, 60, 70, 80, 100, 120, 140, 170, 200]"
JSON output
Various inputs