Add feerate histogram to getmempoolinfo #21422

kiminuo · 2021-03-12T10:32:00Z

This PR is a slightly modified version of #15836 (jonasschnelli):

This follows the approach of adding statistical information to Bitcoin Core that would otherwise be inefficient to calculate outside of the codebase.

Adds an optional feerate histogram to getmempoolinfo.

The concept and code is heavily inspired by the stats jhoenicke runs (https://github.com/jhoenicke/mempool, http://bitcoin-mempool.info).

If someone has a good idea how to make the feerate-groups dynamic but also semi-constant for similar fee environments, please comment.

If this is feature we'd like to have in master (concept ACKs), I'd continue this with writing tests.

A simple plot of the data is here.
RPC output sample is here.

My attempts to contact jonasschnelli were, unfortunately, unsuccessful so I decided to create this PR in an attempt to move this forward. If this is somehow problematic, please let me know to work it out. edit: Jonas is happy the work continues.

This PR

Note that REST support which is in #15836 is not included in this PR. It can be improved in this PR or in a follow-up one if it is deemed useful/required/etc.

Applied review comments from the old PR

Add feerate histogram to getmempoolinfo #15836 (comment) - "Remove these files and maybe update .gitignore?"
Add feerate histogram to getmempoolinfo #15836 (comment) - "This must be before the above line (order is important) otherwise rest_mempool_info_with_fee_histogram is never called."
Add feerate histogram to getmempoolinfo #15836 (comment) - "This gives me a new warning on build: [...]" This is already addressed, I believe.
Add feerate histogram to getmempoolinfo #15836 (comment) - Simplify if (feeperbyte >= feelimits[i] && (i == feelimits.size() - 1 || feeperbyte < feelimits[i + 1])) {
0b6ba66 - Added test proposed by promag
Add feerate histogram to getmempoolinfo #15836 (comment) - std::to_string(...) replaced with `ToString(...) as done in Replace std::to_string with locale-independent alternative #18134

Test commands

$ ./bitcoin-cli -testnet getmempoolinfo "[0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14, 17, 20, 25, 30, 40, 50, 60, 70, 80, 100, 120, 140, 170, 200]" # To test the new behavior

$ test/functional/test_runner.py mempool_fee_histogram.py # To run the new test

$ ./bitcoin-cli -testnet -rpcuser=test -rpcpassword=test help getmempoolinfo
getmempoolinfo ( [fee_rate,...] )

Returns details on the active state of the TX memory pool.

Arguments:
1. fee_histogram    (json array, optional) Fee statistics grouped by fee rate ranges
     [
       fee_rate,    (numeric, required) Fee rate (in sat/vB) to group the fees by
       ...
     ]

Result:
{                               (json object)
  "loaded" : true|false,        (boolean) True if the mempool is fully loaded
  "size" : n,                   (numeric) Current tx count
  "bytes" : n,                  (numeric) Sum of all virtual transaction sizes as defined in BIP 141. Differs from actual serialized size because witness data is discounted
  "usage" : n,                  (numeric) Total memory usage for the mempool
  "total_fee" : n,              (numeric) Total fees for the mempool in BTC, ignoring modified fees through prioritizetransaction
  "maxmempool" : n,             (numeric) Maximum memory usage for the mempool
  "mempoolminfee" : n,          (numeric) Minimum fee rate in BTC/kvB for tx to be accepted. Is the maximum of minrelaytxfee and minimum mempool fee
  "minrelaytxfee" : n,          (numeric) Current minimum relay fee for transactions
  "unbroadcastcount" : n,       (numeric) Current number of transactions that haven't passed initial broadcast yet
  "fee_histogram" : {           (json object)
    "fee_rate_groups" : {       (json object)
      "<fee_rate_group>" : {    (json object) Fee rate group named by its lower bound (in sat/vB), identical to the "from" field below
        "size" : n,             (numeric) Cumulative size of all transactions in the fee rate group (in vBytes)
        "count" : n,            (numeric) Number of transactions in the fee rate group
        "fees" : n,             (numeric) Cumulative fees of all transactions in the fee rate group (in sat)
        "from" : n             (numeric) Group contains transactions with fee rates equal or greater than this value (in sat/vB)
      },
      ...
    },
    "total_fees" : n            (numeric) Total available fees in mempool (in sat)
  }
}

Examples:
> bitcoin-cli getmempoolinfo
> bitcoin-cli getmempoolinfo "[0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14, 17, 20, 25, 30, 40, 50, 60, 70, 80, 100, 120, 140, 170, 200]"
> curl --user myusername --data-binary '{"jsonrpc": "1.0", "id": "curltest", "method": "getmempoolinfo", "params": []}' -H 'content-type: text/plain;' http://127.0.0.1:8332/
> curl --user myusername --data-binary '{"jsonrpc": "1.0", "id": "curltest", "method": "getmempoolinfo", "params": [[0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14, 17, 20, 25, 30, 40, 50, 60, 70, 80, 100, 120, 140, 170, 200]]}' -H 'content-type: text/plain;' http://127.0.0.1:8332/

Output on testnet (2022-07-09)

./bitcoin-cli -testnet -rpcuser=test -rpcpassword=test getmempoolinfo "[0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14, 17, 20, 25, 30, 40, 50, 60, 70, 80, 100, 120, 140, 170, 200]"

JSON output

{
"loaded": true,
"size": 10,
"bytes": 2652,
"usage": 13504,
"total_fee": 0.00363010,
"maxmempool": 300000000,
"mempoolminfee": 0.00001000,
"minrelaytxfee": 0.00001000,
"unbroadcastcount": 0,
"fee_histogram": {
  "fee_rate_groups": {
    "0": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 0
    },
    "1": {
      "size": 403,
      "count": 2,
      "fees": 403,
      "from": 1
    },
    "2": {
      "size": 554,
      "count": 2,
      "fees": 561,
      "from": 2
    },
    "3": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 3
    },
    "4": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 4
    },
    "5": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 5
    },
    "6": {
      "size": 255,
      "count": 1,
      "fees": 1345,
      "from": 6
    },
    "7": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 7
    },
    "8": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 8
    },
    "10": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 10
    },
    "12": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 12
    },
    "14": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 14
    },
    "17": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 17
    },
    "20": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 20
    },
    "25": {
      "size": 352,
      "count": 1,
      "fees": 9505,
      "from": 25
    },
    "30": {
      "size": 144,
      "count": 1,
      "fees": 4520,
      "from": 30
    },
    "40": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 40
    },
    "50": {
      "size": 374,
      "count": 1,
      "fees": 20000,
      "from": 50
    },
    "60": {
      "size": 351,
      "count": 1,
      "fees": 23937,
      "from": 60
    },
    "70": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 70
    },
    "80": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 80
    },
    "100": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 100
    },
    "120": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 120
    },
    "140": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 140
    },
    "170": {
      "size": 0,
      "count": 0,
      "fees": 0,
      "from": 170
    },
    "200": {
      "size": 219,
      "count": 1,
      "fees": 302739,
      "from": 200
    }
  },
  "total_fees": 363010
}
}

Various inputs

./bitcoin-cli -testnet -rpcuser=test -rpcpassword=test getmempoolinfo "[0]"       # OK, one fee rate group covering all possible transaction fee rates (i.e. `>=0`); total_fee = fee_histogram.total_fees
./bitcoin-cli -testnet -rpcuser=test -rpcpassword=test getmempoolinfo "[-1]"      # reports: Non-negative values are expected
./bitcoin-cli -testnet -rpcuser=test -rpcpassword=test getmempoolinfo "[0,0]"     # reports: Strictly increasing values are expected
./bitcoin-cli -testnet -rpcuser=test -rpcpassword=test getmempoolinfo "[0,1,0]"   # reports: Strictly increasing values are expected
./bitcoin-cli -testnet -rpcuser=test -rpcpassword=test getmempoolinfo "[0, 1000000000000000000000000000000000]" # reports: JSON integer out of range
./bitcoin-cli -testnet -rpcuser=test -rpcpassword=test getmempoolinfo '[1.2,2,3]' # reports: JSON integer out of range
./bitcoin-cli -testnet -rpcuser=test -rpcpassword=test getmempoolinfo '["2"]'     # reports: Error parsing JSON: ['2']

promag · 2021-03-12T10:45:10Z

Hey @kiminuo, can you amend the commit and remove my mention? Keep in mind that mentions in commits lead to notifications. Ty.

kiminuo · 2021-03-12T10:46:35Z

Hey @kiminuo, can you amend the commit and remove my mention? Keep in mind that mentions in commits lead to notifications. Ty.

Yes, sorry for that.

jonasschnelli · 2021-03-12T19:34:03Z

Thanks for picking this up.

DrahtBot · 2021-03-16T11:09:02Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept NACK	0xB10C
Concept ACK	jonatack, JeremyRubin, ghost, jamesob, molnard, sipa
Stale ACK	stratospher, kristapsk
Ignored review	glozow

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#26525 (Remove -mempoolfullrbf option by BitcoinErrorLog)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

jonatack · 2021-03-21T10:31:34Z

Concept ACK. For the test commit, maybe a more descriptive title that can be understood on its own.

jonatack

A few initial comments.

src/rpc/blockchain.cpp

test/functional/mempool_fee_histogram.py

test/functional/test_runner.py

src/rpc/blockchain.cpp

kiminuo · 2021-03-21T20:23:22Z

@jonatack Thanks for the superb review! I'm working on incorporating your suggestions.

src/rpc/blockchain.cpp

jonatack

Thanks for switching it from BTC/kvB to sat/vB!

I looked over this more thoroughly and saw quite a few remaining fixups and suggestions. One thing to ensure is that the help docs are the same as the actual output and use the correct fee rate units and arg types.

It would be good if the functional test actually verified that the values are correctly calculated after creating a few txns; ATM it is only really a smoke test that verifies the output structure.

suggestions

diff --git a/src/rpc/blockchain.cpp b/src/rpc/blockchain.cpp
index 8d7ce13e57..fbae7cdbaf 100644
--- a/src/rpc/blockchain.cpp
+++ b/src/rpc/blockchain.cpp
@@ -1512,7 +1512,7 @@ UniValue MempoolInfoToJSON(const CTxMemPool& pool, const std::optional<MempoolHi
          * ... cumulated fees */
         std::vector<uint64_t> sizes(limits.size(), 0);
         std::vector<uint64_t> count(limits.size(), 0);
-        std::vector<uint64_t> fees(limits.size(), 0);
+        std::vector<CAmount> fees(limits.size(), 0);
 
         for (const CTxMemPoolEntry& e : pool.mapTx) {
             const CAmount fee{e.GetFee()};
@@ -1528,15 +1528,19 @@ UniValue MempoolInfoToJSON(const CTxMemPool& pool, const std::optional<MempoolHi
             CHECK_NONFATAL(dsize > 0);
             CHECK_NONFATAL(asize + dsize - size > 0);

-            CAmount fpb{fee / size};     // fee per byte
-            CAmount afpb{afees / asize}; // fee per byte including ancestors
-            CAmount dfpb{dfees / dsize}; // fee per byte including descendants
-            CAmount tfpb{(afees + dfees - fee) / (asize + dsize - size)};
-            CAmount feeperbyte{std::max(std::min(dfpb, tfpb), std::min(fpb, afpb))};
+            const CAmount fpb{fee / size};     // fee per byte
+            const CAmount afpb{afees / asize}; // fee per byte including ancestors
+            const CAmount dfpb{dfees / dsize}; // fee per byte including descendants
+            const CAmount tfpb{(afees + dfees - fee) / (asize + dsize - size)};
+            const CAmount fee_per_byte{std::max(std::min(dfpb, tfpb), std::min(fpb, afpb))};
+ 
+            assert(fpb == CFeeRate(fee, size).GetFee(1));
+            assert(afpb == CFeeRate(afees, asize).GetFee(1));
+            assert(dfpb == CFeeRate(dfees, dsize).GetFee(1));
+            assert(tfpb == CFeeRate(afees + dfees - fee, asize + dsize - size).GetFee(1));
+
-            // Distribute feerates into feelimits
+            // Distribute fee rates into fee limits
             for (int i = limits.size() - 1; i >= 0; --i) {
-                if (feeperbyte >= limits[i]) {
+                if (fee_per_byte >= limits[i]) {
                     sizes[i] += size;
                     ++count[i];
                     fees[i] += fee;
@@ -1545,23 +1549,23 @@ UniValue MempoolInfoToJSON(const CTxMemPool& pool, const std::optional<MempoolHi
             }
         }
 
         UniValue ranges(UniValue::VOBJ);
         for (size_t i = 0; i < limits.size(); ++i) {
             UniValue info_sub(UniValue::VOBJ);
-            info_sub.pushKV("sizes", sizes[i]);
-            info_sub.pushKV("count", count[i]);
-            info_sub.pushKV("fees", fees[i]);
-            info_sub.pushKV("from_feerate", limits[i]);
+            info_sub.pushKV("size", sizes.at(i));
+            info_sub.pushKV("count", count.at(i));
+            info_sub.pushKV("fees", fees.at(i));
+            info_sub.pushKV("from", limits.at(i));
 
             if (i == limits.size() - 1) {
-                info_sub.pushKV("to_feerate", "Max"); // TODO.
+                info_sub.pushKV("to", "max"); // TODO.
             } else {
-                info_sub.pushKV("to_feerate", limits[i + 1]);
+                info_sub.pushKV("to", limits[i + 1]);
             }
 
-            total_fees += fees[i];
-            ranges.pushKV(ToString(limits[i]), info_sub);
+            total_fees += fees.at(i);
+            ranges.pushKV(ToString(limits.at(i)), info_sub);
         }
 
         UniValue info(UniValue::VOBJ);
@@ -1580,7 +1584,7 @@ static RPCHelpMan getmempoolinfo()
                 {
                     {"fee_histogram", RPCArg::Type::ARR, RPCArg::Optional::OMITTED_NAMED_ARG, "Fee statistics grouped by fee rate ranges",
                         {
-                            {"fee_rate", RPCArg::Type::NUM, RPCArg::Optional::NO, "Fee rate (in satoshis) to group the fees by"},
+                            {"fee_rate", RPCArg::Type::NUM, RPCArg::Optional::NO, "Fee rate (in " + CURRENCY_ATOM + "/vB) to group the fees by"},
                         },
                     },
                 },
@@ -1591,31 +1595,31 @@ static RPCHelpMan getmempoolinfo()
                         {RPCResult::Type::NUM, "size", "Current tx count"},
                         {RPCResult::Type::NUM, "bytes", "Sum of all virtual transaction sizes as defined in BIP 141. Differs from actual serialized size because witness data is discounted"},
                         {RPCResult::Type::NUM, "usage", "Total memory usage for the mempool"},
-                        {RPCResult::Type::STR_AMOUNT, "total_fee", "Total fees for the mempool in " + CURRENCY_UNIT + ", ignoring modified fees through prioritizetransaction"},
+                        {RPCResult::Type::STR_AMOUNT, "total_fee", "Total fees for the mempool in " + CURRENCY_UNIT + "/kvB, ignoring modified fees through prioritizetransaction"},
                         {RPCResult::Type::NUM, "maxmempool", "Maximum memory usage for the mempool"},
-                        {RPCResult::Type::STR_AMOUNT, "mempoolminfee", "Minimum fee rate in " + CURRENCY_UNIT + "/kB for tx to be accepted. Is the maximum of minrelaytxfee and minimum mempool fee"},
+                        {RPCResult::Type::STR_AMOUNT, "mempoolminfee", "Minimum fee rate in " + CURRENCY_UNIT + "/kvB for tx to be accepted. Is the maximum of minrelaytxfee and minimum mempool fee"},
                         {RPCResult::Type::STR_AMOUNT, "minrelaytxfee", "Current minimum relay fee for transactions"},
                         {RPCResult::Type::NUM, "unbroadcastcount", "Current number of transactions that haven't passed initial broadcast yet"},
                         {RPCResult::Type::OBJ, "fee_histogram", "",
                             {
-                                {RPCResult::Type::OBJ_DYN, "ranges", "Feerate groups",
+                                {RPCResult::Type::OBJ_DYN, "ranges", "",
                                 {
-                                    {RPCResult::Type::OBJ, "<feerate-group>", "Object per feerate group",
+                                    {RPCResult::Type::OBJ, "<fee_rate_group>", "Fee rate group named by its lower bound (in " + CURRENCY_ATOM + "/vB), identical to the \"from\" field below",
                                     {
-                                        {RPCResult::Type::NUM, "size", "Cumulated size of all transactions in feerate group"},
-                                        {RPCResult::Type::NUM, "count", "Amount of transactions in feerate group"},
-                                        {RPCResult::Type::NUM, "fees", "Cumulated fees of all transactions in feerate group (in satoshis)"},
-                                        {RPCResult::Type::NUM, "from_feerate", "Group contains transactions with feerates equal or greater than this value (in satoshis)"},
-                                        {RPCResult::Type::NUM, "to_feerate", "Group contains transactions with feerates less than this value (in satoshis)"},
+                                        {RPCResult::Type::NUM, "size", "Cumulative size of all transactions in the fee rate group"},
+                                        {RPCResult::Type::NUM, "count", "Number of transactions in the fee rate group"},
+                                        {RPCResult::Type::NUM, "fees", "Cumulative fees of all transactions in the fee rate group (in " + CURRENCY_ATOM + "/vB)"},
+                                        {RPCResult::Type::NUM, "from", "Group contains transactions with fee rates equal or greater than this value (in " + CURRENCY_ATOM + "/vB)"},
+                                        {RPCResult::Type::NUM, "to", "Group contains transactions with fee rates less than this value (in " + CURRENCY_ATOM + "/vB)"},
                                     }}}},
-                                {RPCResult::Type::NUM, "total_fees", "Total available fees in mempool (in satoshis)"},
+                                {RPCResult::Type::NUM, "total_fees", "Total available fees in mempool (in " + CURRENCY_ATOM + "/vB)"},
                             }},
                     }},
                 RPCExamples{
                     HelpExampleCli("getmempoolinfo", "") +
-                    HelpExampleCli("getmempoolinfo", R"([1, 10, 100, 200, 400, 800])") +
+                    HelpExampleCli("getmempoolinfo", R"('[1, 5, 10, 25, 50, 100]')") +
                     HelpExampleRpc("getmempoolinfo", "") +
-                    HelpExampleRpc("getmempoolinfo", R"([1, 10, 100, 200, 400, 800])")
+                    HelpExampleRpc("getmempoolinfo", R"([1, 5, 10, 25, 50, 100])")
                 },

diff --git a/test/functional/mempool_fee_histogram.py b/test/functional/mempool_fee_histogram.py
index 0cbaa97bfc..91c6dccc30 100755
--- a/test/functional/mempool_fee_histogram.py
+++ b/test/functional/mempool_fee_histogram.py
@@ -31,15 +31,15 @@ class MempoolFeeHistogramTest(BitcoinTestFramework):
         total_fees = 0
 
         for key, bin in info['fee_histogram']['ranges'].items():
-            assert_equal(int(key), bin['from_feerate'])
+            assert_equal(int(key), bin['from'])
             if bin['fees'] > 0:
                 assert_greater_than(bin['count'], 0)
             else:
                 assert_equal(bin['count'], 0)
             assert_greater_than_or_equal(bin['fees'], 0)
-            assert_greater_than_or_equal(bin['sizes'], 0)
-            if bin['to_feerate'] != 'Max':
-                assert_greater_than(bin['to_feerate'], bin['from_feerate'])
+            assert_greater_than_or_equal(bin['size'], 0)
+            if bin['to'] != 'max':
+                assert_greater_than(bin['to'], bin['from'])
             total_fees += bin['fees']

src/rpc/blockchain.cpp

kiminuo · 2021-03-26T09:07:05Z

It would be good if the functional test actually verified that the values are correctly calculated after creating a few txns; ATM it is only really a smoke test that verifies the output structure.

This is on my TODO list. So I will improve it over coming days.

src/rpc/blockchain.cpp

maflcko · 2023-03-21T07:41:26Z

How can we expect the Bitcoin Core wallet to behave reasonably if it can't predict the fees well?

The Bitcoin Core wallet does not use this feature?

jonasschnelli · 2023-03-21T08:15:20Z

The original idea of this patch was to add a small and easy to maintain efficient way to generate a fee histogram to bitcoin core (see original PR description #15836 (comment)).

As described there, it was always possible to use getrawmempool (@jhoenicke script) to achieve this goal.

However, it is not efficient to dumb out all transactions of the complete mempool in a json format just to calculate the fee histogram with a python script.

Thus I think this is a valuable addition to Bitcoin Core.

The patch size is minimal and it seems like there is some interest in this feature.

EDIT: this PR has nothing to do with the Bitcoin cores internal wallet. It's pure mempool statistics.

kristapsk · 2023-03-21T09:43:50Z

How can we expect the Bitcoin Core wallet to behave reasonably if it can't predict the fees well?

The Bitcoin Core wallet does not use this feature?

Does not use yet?

prusnak · 2023-03-21T10:03:24Z

The Bitcoin Core wallet does not use this feature?

My point is that a wallet without fee estimation is not very usable and harms the overall network. If we don't want to have an effective way of determining the fee in Bitcoin Core, we might as well remove the entire wallet from the codebase.

glozow · 2023-03-21T10:19:36Z

There seems to be confusion here. This PR does not touch the fee estimator or the wallet in any way. The fee estimator already keeps track of how many transactions at specific feerates {in the mempool, have confirmed within N blocks, haven't confirmed within N blocks} there are internally; the wallet queries it directly for feerate estimates. I don't know of any examples where the wallet needs to see a graph of something.

If you feel that the fee estimation algorithms need improvement, by all means I'd say some review/PRs there would be welcome. Maybe take a look at #25380 or #21161.

For those that feel very passionately about improving how the wallet calculates fees, my personal recommendation would be to review #26152 to have the strongest impact.

0xB10C · 2023-03-21T10:21:47Z

It's confusing to me why this PR, tagged as [RPC/REST/ZMQ] and only touching src/rest.cpp and files in src/rpc/*, should have anything to do with the wallet or the GUI. AFAIK the wallet doesn't use the RPC interface to communicate with bitcoind. If the wallet or the GUI would want a histogram, it couldn't use the code added in this PR.

edit: glozow was 1min faster

kristapsk · 2023-03-21T10:27:45Z

If the wallet or the GUI would want a histogram, it couldn't use the code added in this PR.

That would require just moving fee histogram code out of MempoolInfoToJSON() to a separate function somewhere out of rpc/, code itself seems to me reusable.

sipa · 2023-03-21T11:54:04Z

Bitcoin Core has had fee estimation logic for a long time (estimatefee RPC was added in 2014, replaced with estimatesmartfee in 2015).

The Bitcoin Core wallet automatically uses this internal feerate estimation logic to decide on the fees of transactions. You need to override things, or use more low-level RPCs, to not use it.

That feerate estimation logic works by looking at how quickly mempool transactions get confirmed, over longer time windows, not by looking at the current mempool composition (looking at just your own mempool may give very bad estimates if your relay policy isn't exactly what the network and miners use, which is very hard to guarantee).

This PR is about exposing mempool feerate histogram data through RPC. It's not a feerate estimator, and is entirely unrelated to the wallet. While the logic could in theory be used for feerate estimation too, that's not what this PR does, and I'm concerned that doing so would risk giving very inaccurate results when faced with diverging network policies.

Conceptually, however, I think it's reasonable to add something like this, as just RPC (or GUI) output. It can be informative for users, even if it can't be directly used for automatic fee estimation. Of course, that's subject to reviewer interest.

ghost · 2023-03-21T17:06:37Z

I am not against sipa as I understand fee estimation is difficult however estimatesmartfee sucks and could be improved.

There are lot of occasions when I found it to be irrelevant and just look at mempool.space or multiple explorers

One of the project that I found more useful: https://github.com/TrueLevelSA/btc-congestion-manager

Anyway, this PR tries to improve how projects use mempool info and should not have been controversial at any point.

Request to all developers including maintainers to re consider their opinion.

nopara73 · 2023-03-22T07:54:42Z

For Wasabi's purposes this output is needed purely for fee estimation, so it'd be ideal if Core would estimate fees properly. Here's how I see things:

estimatesmartfee estimates fees based on the PAST
This PR enables us to put minimums on Core's fee estimations. Eg. if Core estimates 6 hours at 10 sat/b, based on the past, but the mempool is already filled with 12 hours of 11 sat/b estimations, then what we do (currently with Knots) is that based on the mempool histogram we upgrade Core's estimations accordingly. This'd mean we're finally considering the PRESENT
Finally, for future work we can also aim somehow at predicting the FUTURE (neuro net, AI stuff? Update: I think this repo by @djkazic wanna do something like this: https://github.com/djkazic/matrix)

sipa · 2023-03-22T13:36:53Z

@nopara73 I'm aware of estimatesmartfee's limitations, and understand the desire for something based on current mempool composition that can react faster to current conditions. The problem is that looking your own mempool only gives a reliable result in case you're running with policies that more or less match what miners are doing, and if not, becomes easily gameable. That is the reason why estimatesmartfee intentionally does not use mempool composition.

As I said, I think an RPC like this is valuable for people who want additional information for fee estimation, but using it automatically without at least someone looking at the result seems dangerous to me. E.g. future softforks your node doesn't know about could permit an attacker to stuff your mempool with high-fee unminable transactions.

prusnak · 2023-03-22T14:25:27Z

@sipa By not providing sort-of accurate estimations from the current mempool, we force people to rely on third-party estimators, which is arguably much worse than having an independent estimation which works most of the time, although it may be games under certain specific conditions. I think that "perfect is enemy of good" rule applies here perfectly.

nopara73 · 2023-03-30T04:10:34Z

@sipa sorry for the late reply, at first I was like "yes, that makes perfect sense," however an argument came to my mind - this morning during shower :) - in favor of it, so I'd like to share that.

Can't we assume miners are seeking the highest fee transactions at all times? In other words: maximize their income from fees. If we could, then adjusting estimatesmartfee results based on the possibility of block acceptance of the current mempool state of the node does not seem gameable to me, because by extension we could assume that the highest fee transactions we have are precisely the ones that the miners seek.

Now I may argue against this with "it is not guaranteed that the miners have the highest fee txs that my node does as well."

To this, first I'd say: we can guarantee they want to have them and isn't that enough?
Then I'd turn this around and look at it from an attacker's point of view: the way to game this would be to create transactions those have high fees, get to the targeted node, but don't get to the miners. And now I'd point out the attacker cannot guarantee his high fee transactions don't get to the miners, especially with fullrbf around. This makes the attack super expensive, since in order to make the node think the fees are very high, the attacker has to create many blocks of transactions and guarantee that only the targeted node will get to know of these transactions.

In summary, unless I am missing the point and you're referring to another kind of game/attack here that does not involve the attacker creating blocks of high fee transactions, we can be reassured the attack is too expensive to be worth trying.

craigraw · 2023-06-27T12:48:15Z

Using getrawmempool true is unfortunately not a solution that can be applied generally to solve this use case. On systems with low memory (4Gb or less, for example a Raspberry Pi or the Futurebit Apollo) this RPC call causes the kernel to terminate the Bitcoin Core process (OOM) when the mempool is reasonably full. This occurs consistently on all released versions AFAIK.

One workaround is to use getrawmempool false followed by repeated calls to getmempoolentry. This of course is even more inefficient, leading any reasonable client implementation to maintain a replica of mempool entries to avoid having to load them all, one by one, every time a fee rate histogram is required. This trades better performance for significantly increased memory utilisation.

It's a pity the PR has been closed since the complexity is low, there is clearly wide interest for it across many wallets, and a fee rate histogram is a useful tool in the toolbox when estimating fees. In addition, any middleware implementing the Electrum protocol needs to use one of the inefficient methods described to implement the mempool.get_fee_histogram call.

Request to all developers including maintainers to re consider their opinion.

+1.

Co-authored-by: Jonas Schnelli <dev@jonasschnelli.ch> Co-authored-by: Jon Atack <jon@atack.com> Github-Pull: bitcoin#21422 Rebased-From: 0b87ba9

sipa · 2023-06-28T17:47:41Z

I think the discussion here is conflating two things:

Problems with Bitcoin Core's fee estimation algorithm.
The lack of a fee histogram RPC.

It's clear that the existing fee estimation algorithms make it inaccurate for many real use cases. There are good reasons for why it works the way it does, but that doesn't change the fact that the outcome is bad. We can certainly discuss what can be done about that. I've posted about one possible approach in #27995.

However, this PR is about something else; adding a feerate histogram. That seems to me like a vaguely reasonable, independent feature, but it has nothing to do with our fee estimation algorithm. And wanting a feerate histogram so you can implement fee estimation yourself just because the fee estimation feature itself does a bad job seems to be the wrong way around. If that's the goal we should just think about how to improve the algorithm.

craigraw · 2023-06-29T06:12:00Z

To be clear, my desire for an efficient way to retrieve fee rate histogram data has nothing to do with Bitcoin Core's fee estimation algorithm. Because fee estimation is inherently difficult and no algorithm can know the user's time preference for every transaction, Sparrow Wallet displays the fee rate histogram chart in the UI as another "tool in the toolbox". This allows a user to observe trends and adjust the fee rate to place a transaction in a particular fee rate band to suit their time preference. This is not to suggest such an approach is always superior to algorithmic fee rate estimation, merely that more data is generally useful when trying to predict the future.

maflcko · 2023-06-29T06:32:25Z

One workaround is to use getrawmempool false followed by repeated calls to getmempoolentry. This of course is even more inefficient, leading any reasonable client implementation to maintain a replica of mempool entries to avoid having to load them all, one by one, every time a fee rate histogram is required. This trades better performance for significantly increased memory utilisation.

I think the getmempoolentry can be batched for a slight improvement. Also, you don't have to store the full entry in the application, just the txid and fee info would be enough? Finally, you'd only have to call getmempoolentry once per transaction.

Even if this pull was merged, it will just give one answer and I don't think it is clear whether this is the answer that users actually want? See #21422 (comment) and the previous discussion threads.
So anyone who wanted to give a better answer would have to implement some smarter algorithm in the application on a full copy of the mempool anyway, and this RPC wouldn't help with that.

glozow · 2023-06-29T09:30:40Z

Sparrow Wallet displays the fee rate histogram chart in the UI as another "tool in the toolbox". This allows a user to observe trends and adjust the fee rate to place a transaction in a particular fee rate band to suit their time preference.

a fee rate histogram is a useful tool in the toolbox when estimating fees

I think it's worth noting that a feerate histogram built from individual transaction feerates can be very misleading for fee estimation. A CPFP could look like n low feerate data points + 1 extremely high feerate data point. A few dozen 100kvB high-feerate transactions may dominate the block you are trying to get into, but they look identical to 100vB transactions as data points. As a user, I have no idea how I would look at a feerate histogram and get an idea of what feerate to put on my transaction.

I agree that fee estimation using the current contents of the mempool can be very useful. However, the process of taking current mempool contents and meaningfully translating them into high/med/low priority bands is quite involved. Linking one implementation which is a lot more accurate than using individual feerates, but quite computationally heavy and not necessarily appropriate for the oom'd raspis either.

Perhaps with cluster mempool we can expose a mempool entry's mining score, but that score still doesn't tell you much about how much block space is taken up by each data point.

leading any reasonable client implementation to maintain a replica of mempool entries

This is the most appropriate solution 👍. A simple map from txid to feerate would do if you're trying to keep a feerate histogram.

Also, you don't have to store the full entry in the application, just the txid and fee info would be enough? Finally, you'd only have to call getmempoolentry once per transaction.

Yeah, the nice thing about only needing fee and vsize is these will never change for a given txid in your mempool. You only need to query the entries you don't already have in your external data structure.

I think the getmempoolentry can be batched for a slight improvement.

On Core's side this idea seems the most worth pursuing, if somebody wants to open a pull.

djkazic · 2023-06-29T11:09:47Z

Hey, developer of matrix here.

I think a neural net fee estimator is going to be pretty performant compared to mempool space. My logger already takes into account things like CPFP and calculates the total package fee rate, and with approx 50MB of training data it's capable of estimating fees such that if the necessary rate goes down, it reacts more quickly that estimatesmartfee.

Like the built-in fee estimator, I do not use the current contents of the mempool. Rather, the training data packages up the time of tx seen, block count from seen to confirmed, and total package feerate. When it comes to inference-time, lighter nodes like RPis can download and validate a pretrained model. Hence, no fear of OOMing.

craigraw · 2023-06-29T14:33:58Z

The points regarding CPFP etc are well noted, although it's possible these are in practice more applicable to higher time preference fee estimations? Estimation aside, there are not insignificant client costs to maintaining a "txid to vsize and fee rate" mempool replica. By my measurement, this data structure with ~125,000 transactions requires ~14Mb of RAM.

Further, the initial load using getmempoolentry takes around 20 seconds on a locally running node, and would certainly be slower if the node was remote or running on an RPi. Because of this, I find it better to measure the time required for getrawmempool false and make a decision on whether to risk calling getrawmempool true instead of the getmempoolentry workaround. If a fee rate histogram RPC is not likely, I'd certainly be interested in a better solution to this.

maflcko · 2023-06-29T15:03:31Z

By my measurement, this data structure with ~125,000 transactions requires ~14Mb of RAM.

If memory usage is the only concern, I am sure you could toss the transactions into (lossy) buckets and then subscribe via zmq to avoid duplicates, without having to remember txids, or individual size/fee details.

... applicable to higher time preference fee estimations?

If you only care about the top N blocks worth of transactions, the memory usage likely could also be reduced.

santochibtc · 2023-06-30T20:43:12Z

I don't see the use of the histogram for fee estimation. You can have a lot of transactions with minimum fee that generate a large bar dominating the histogram. I see more useful a representation like mempool.space showing the list of transactions ordered by decreasing fee and grouped by blocks. That is a list of pending blocks with associated feerate ranges.

sipa · 2023-07-06T21:17:35Z

@santochibtc "Feerate histogram" here is a bytes vs fee diagram, not a #tx vs fee diagram.

santochibtc · 2023-07-07T17:51:06Z

Thanks @sipa, yes, it is "a histogram of the fee rates paid by transactions in the memory pool, weighted by transaction size", but my point is still valid. If you want to include your tx in the next block you must estimate your fee based on the pending txs that are paying more per byte. I think that the histogram is not always a good picture of the mempool to estimate fees.

nopara73 · 2023-07-08T08:02:39Z

It's good enough.

Co-authored-by: Jonas Schnelli <dev@jonasschnelli.ch> Co-authored-by: Jon Atack <jon@atack.com> Github-Pull: bitcoin#21422 Rebased-From: 0b87ba9

DrahtBot added the RPC/REST/ZMQ label Mar 12, 2021

kiminuo force-pushed the feature/2021-03-Feerate-histogram branch from e4abe8f to d114770 Compare March 12, 2021 13:43

kiminuo mentioned this pull request Mar 12, 2021

Add feerate histogram to getmempoolinfo #15836

Closed

DrahtBot mentioned this pull request Mar 16, 2021

refactor: replace util::Ref with std::any (C++17) #21366

Merged

kiminuo force-pushed the feature/2021-03-Feerate-histogram branch 3 times, most recently from 476f660 to 81cd4bb Compare March 21, 2021 09:55

jonatack reviewed Mar 21, 2021

View reviewed changes