Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Coin Selection with Murch's algorithm #10637
Conversation
Xekyo
suggested changes
Jun 20, 2017
•
Concept ACK, looks good to me. AFAICT, there's an issue with the lookahead that should be addressed still.
| + // Calculate remaining | ||
| + CAmount remaining = 0; | ||
| + for (CInputCoin utxo : utxo_pool) { | ||
| + remaining += utxo.txout.nValue; |
Xekyo
Jun 20, 2017
Contributor
Have you filtered utxo_pool to exclude utxo's that have a net-neg value? Otherwise you're underestimating the lookahead here. To get an accurate figure for what you may still collect downtree, you should only add utxo.txout.nValue >=0
instagibbs
Jun 20, 2017
Member
@gmaxwell has concerns that Core wallet is only doing semi-sane utxo handling by spending these. With exact match + sane backoff algorithm this concern may be alleviated?
achow101
Jun 20, 2017
Contributor
Indeed, that may be a problem. I will add that in as it is still good to have additional checks here even if done elsewhere.
gmaxwell
Jun 20, 2017
Member
I don't have much of a concern here about the 0/negative effective value inputs: Failing to select negative effective value inputs for an exact match won't lead to a UTXO count inflation, because changeless transactions are by definition strictly UTXO reducing.
Xekyo
Jun 20, 2017
•
Contributor
@instagibbs: I'm not completely opposed to spending net-negative UTXO, my concern here is primarily that it actually may cause the lookahead to be underestimated causing valid solutions not to be found.
I realize now that the knapsack algorithm would also not select uneconomic UTXO anymore, as if it had selected enough value before it reached them it would have already returned the set, and if it actually starts exploring them, cannot add more value in the first place.
Advocatus Diaboli: Would it be that terrible though, if UTXO were only considered when they actually have a net positive value? During times of low fees, they'd be used both during BnB and knapsack, during times of high fees, they wouldn't bloat the blocks and lose their owner money.
gmaxwell
Jun 22, 2017
Member
@Xekyo we should assume that it would be terrible unless someone can show that it will not cause another massive UTXO bloat event... but thats offtopic here, as I don't think anyone has this concern with exact matches.
runn1ng
Jul 1, 2017
•
The utxos with negative effective values are filtered anyway in wallet/wallet.cpp, which is the only place (except for tests) from where SelectCoinsBnB is called.
| + CoinSet actual_selection; | ||
| + CAmount value_ret = 0; | ||
| + | ||
| + ///////////////////////// |
Xekyo
Jun 20, 2017
Contributor
I would perhaps add a test that checks what happens if the utxo_pool includes a UTXO that is more costly to spend than its own value. As far as I can tell, this would currently reduce your lookahead and may cause a premature search failure.
| + | ||
| + // Select 5 Cent | ||
| + add_coin(4 * CENT, 4, actual_selection); | ||
| + add_coin(1 * CENT, 1, actual_selection); |
Xekyo
Jun 20, 2017
Contributor
AFAICT utxo_pool has : 4, 3, 2, & 1. Since you're exploring randomly selecting 5 then has two possible solutions: 4+1, 3+2.
achow101
Jun 20, 2017
Contributor
It is forced to be include first in these tests so the solution is deterministic.
| + add_coin(5 * CENT, 5, utxo_pool); | ||
| + add_coin(5 * CENT, 5, actual_selection); | ||
| + add_coin(4 * CENT, 4, actual_selection); | ||
| + add_coin(1 * CENT, 1, actual_selection); |
Xekyo
Jun 20, 2017
Contributor
Under above assumptions, there is two solutions here as well: 5+4+1, or 5+3+2.
achow101
Jun 20, 2017
Contributor
It is forced to be include first in these tests so the solution is deterministic.
| - LogPrint(BCLog::SELECTCOINS, "total %s\n", FormatMoney(nBest)); | ||
| - } | ||
| + CInputCoin coin(pcoin, i); | ||
| + coin.txout.nValue -= (output.nInputBytes < 0 ? 0 : effective_fee.GetFee(output.nInputBytes)); |
Xekyo
Jun 20, 2017
Contributor
It seems to me that you're also collecting coins that have a net-negative here. This will cause your lookahead to be underestimated, unless you cater to that case when calculating the remainder.
|
Have you tested the effect of random exploration vs largest first exploration?
I am not sure there is a significant privacy benefit for Random Exploration as for either selection method an attacker would already need to know about another eligible input that would achieve an exact match when switched out for one of the input set. What benefit do you expect from using Random Exploration? |
|
@Xekyo I was thinking that Random Exploration would be better for privacy but I see that it probably wouldn't help. If you think it would be better to change to LFE, I can certainly do that. |
|
@achow101: I don't know how strong the effect is, but I'd expect Random Exploration to increase the required computational effort. |
|
Noting that this PR has fairly heavy overlap with #10360 . From chatting with @achow101 the intention of this PR is to touch as little as possible while still getting BranchNBound coin selection. To make this successful it should really only be run on first iteration of the loop in CreateTransaction, when |
|
This PR I believe will still create just-over-dust change outputs when BnB finds an exact match. Whenever we are allowing BnB matches(first iteration) we should not make change outputs less than the exact match slack value. |
| + // Calculate cost of change | ||
| + // TODO: In the future, we should use the change output actually made for the transaction and calculate the cost | ||
| + // requred to spend it. | ||
| + CAmount cost_of_change = effective_fee.GetFee(148+34); // 148 bytes for the input, 34 bytes for making the output |
Xekyo
Jun 20, 2017
Contributor
This assumes that the input will be spent at a feerate at least as high as the current. This was a valid assumption in my thesis, as I was using a fixed fee rate. I'm not sure whether this a valid assumption for realnet transaction selection, as we've literally seen fees between 8-540 sat/byte in the past two weeks. We might want to consider discounting the cost of the input slightly.
instagibbs
Jun 20, 2017
Member
Depends on user time preferences. Could be an option that is set for those who regularly consolidate.
|
@instagibbs: In fact, BnB is designed to only work when creating a transaction without a change output. If we were creating a change in the first place, the extensive search pattern would be unnecessarily wasteful. |
|
To append onto my previous comments, any effective value match attempt should account for the fees just obtained by |
fanquake
added the
Wallet
label
Jun 20, 2017
|
I have made the BnB selector to be only run on the first pass of the coin selection loop. It is now set so that effective value is only used for the BnB selector and not the knapsack one. I have also added the negative effective value check and test just as a belt-and-suspenders thing. I also made BnB use Largest First Exploration instead of Random Exploration. |
| + backtrack = true; | ||
| + } else if (value_ret >= target_value) { // Selected value is within range | ||
| + done = true; | ||
| + } else if (tries <= 0) { // Too many tries, exit |
Xekyo
Jun 21, 2017
•
Contributor
Here's a unexpected behavior in my algorithm: if there is a number of input combinations whose value_ret all exceed the target_value when tries == 0 is passed, tries can go into the negative.
The tries check should be moved to the top of the checks.
|
Perhaps generically, we should never create change if the amount is less than the cost of creating + spending it (regardless of whether BnB was used to find the inputs or not)? |
|
@sipa one question is if we should allow the wallet to consider consolidation-level prices for that change. Perhaps the user is in a hurry now, but would consider spending that change at a much slower pace. Maybe for a first pass only consider the selected feerate, then Future Work allow a parameter which has more aggressive change protection given longer timescales. |
|
@instagibbs Yes, I agree; we should use long-eatimates for the spend part of change rather than the actual feerate the user is willing to pay now. Perhaps we can make it more conservative without doing that by using a factor 2 or 3 reduction? |
| + // Calculate cost of change | ||
| + // TODO: In the future, we should use the change output actually made for the transaction and calculate the cost | ||
| + // requred to spend it. | ||
| + CAmount cost_of_change = effective_fee.GetFee(148+34); // 148 bytes for the input, 34 bytes for making the output |
gmaxwell
Jun 22, 2017
Member
not correct for segwit. If this code ends up being changed to follow pieter's suggestion of dividing the rate by two or three it should be bounded by the min relay fee. (I'm not super fond of that suggestion).
|
@sipa @achow101 it would be very very easy in the current PR to ask for another estimate for the change, I think ~two loc addition, and minor addition to the selectcoins arguments to pass down a second fee. I think this would be much more desirable than a fixed division. Future work could do things like make that second confirmation target configurable. |
| @@ -2562,7 +2562,7 @@ bool CWallet::CreateTransaction(const std::vector<CRecipient>& vecSend, CWalletT | ||
| } | ||
| const CAmount nChange = nValueIn - nValueToSelect; | ||
| - if (nChange > 0) | ||
| + if (nChange > 0 && (!first_pass || nFeeRet == 0)) // nFeeRet is only 0 on the first pass if BnB was not used. |
gmaxwell
Jun 22, 2017
Member
Using nFeeRet to signal BNB usage is ugly. I think you shouldn't pass in nFeeRet at all, but have some explicit signal (e.g. boolean return) for BNB usage and if its set; after select coins set nFeeRet to nChange and use the same signal to bypass this branch.
I also think this condition is slightly incorrect but benign in the current code, lets say our configured feerate were zero: now BNB could find a solution and leave nFeeRet==0. (though nChange would currently be zero too, so it would be harmless but seems to me like the kind of thing to be brittle in future changes)
|
Travis failure seems to be unrelated |
This was referenced Jun 30, 2017
runn1ng
commented
Jul 2, 2017
|
just fyi, I have used your code as a reference for this code |
runn1ng
commented
Jul 2, 2017
|
I have to say, I don't understand the target size; maybe there is a bug there. In wallet.cpp, in This is then used as the exact target in the BnB algorithm. However, you should add the cost of the outputs + the small cost of the tx overhead into the target (done here for the simple case on 1 output - https://github.com/Xekyo/CoinSelectionSimulator/blob/master/src/main/scala/one/murch/bitcoin/coinselection/StackEfficientTailRecursiveBnB.scala#L28 ) Maybe it's done somewhere, but I don't see it. |
|
@runn1ng BnB uses effective values for the inputs so the fee is accounted for when coins are selected. The effective values are calculated in |
runn1ng
commented
Jul 2, 2017
•
|
That eff. value accounts for the inputs of the new transaction, but not for the outputs (plus the overhead of the tx itself, but that is only about 10 bytes). In |
|
Ah, yes. That is a bug. Thanks for finding that! |
instagibbs
referenced
this pull request
Jul 3, 2017
Merged
Add change output if necessary to reduce excess fee #10712
instagibbs
reviewed
Jul 3, 2017
In general the semantics of first_run and used_bnb seem tightly linked, and are seemingly used interchangeably. Perhaps something to revisit.
| @@ -2517,6 +2532,9 @@ bool CWallet::CreateTransaction(const std::vector<CRecipient>& vecSend, CWalletT | ||
| fFirst = false; | ||
| txout.nValue -= nFeeRet % nSubtractFeeFromAmount; | ||
| } | ||
| + } else if (first_pass){ | ||
| + // On the first pass BnB selector, include the fee cost for outputs | ||
| + output_fees += nFeeRateNeeded.GetFee(recipient.scriptPubKey.size()); |
instagibbs
Jul 3, 2017
Member
I think it may be better to directly check on serialized size of an output based on that pubkey
| bool CWallet::SelectCoinsMinConf(const CAmount& nTargetValue, const int nConfMine, const int nConfTheirs, const uint64_t nMaxAncestors, std::vector<COutput> vCoins, | ||
| - std::set<CInputCoin>& setCoinsRet, CAmount& nValueRet) const | ||
| + std::set<CInputCoin>& setCoinsRet, CAmount& nValueRet, CAmount& fee_ret, const CFeeRate effective_fee, bool& used_bnb, bool only_knapsack, int change_size) const |
instagibbs
Jul 3, 2017
Member
right now it only uses one or the other, so !only_knapsack means used_bnb. I assume this interface is future-looking to where we may try multiple strategies?
achow101
Jul 3, 2017
Contributor
The idea behind this was to have BnB be just strictly on top of the current behavior, and separating it like this makes that possible. The first time through the loop uses BnB, but then every time after that uses only the current selector. The loop behavior also stays the same since nFeeRet will remain 0 if the BnB fails.
| + // Get the fee rate to use for the change fee rate | ||
| + CFeeRate change_feerate; | ||
| + FeeCalculation feeCalc; | ||
| + change_feerate = GetMinimumFeeRate(1008, ::mempool, ::feeEstimator, &feeCalc); |
instagibbs
Jul 3, 2017
Member
just set it when declaring the variable two lines above and make it const
| @@ -2544,6 +2490,7 @@ bool CWallet::CreateTransaction(const std::vector<CRecipient>& vecSend, CWalletT | ||
| AvailableCoins(vAvailableCoins, true, coinControl); | ||
| nFeeRet = 0; | ||
| + bool first_pass = true; |
instagibbs
Jul 3, 2017
Member
Add a comment saying this triggers BnB to be the only type tried when true
| @@ -2556,7 +2503,22 @@ bool CWallet::CreateTransaction(const std::vector<CRecipient>& vecSend, CWalletT | ||
| CAmount nValueToSelect = nValue; | ||
| if (nSubtractFeeFromAmount == 0) | ||
| nValueToSelect += nFeeRet; | ||
| + | ||
| + // Get the fee rate to use effective values in coin selection |
instagibbs
Jul 3, 2017
Member
Since we're moving it already, there's no reason to not just move this block outside the loop, right? See: https://github.com/bitcoin/bitcoin/pull/10360/files#diff-b2bb174788c7409b671c46ccc86034bdR2476
| + return false; | ||
| + } | ||
| + } | ||
| + if (first_pass) { |
instagibbs
Jul 3, 2017
Member
this should be used_bnb? Kind of unclear what the difference is currently.
| @@ -837,7 +850,8 @@ class CWallet : public CCryptoKeyStore, public CValidationInterface | ||
| * completion the coin set and corresponding actual target value is | ||
| * assembled | ||
| */ | ||
| - bool SelectCoinsMinConf(const CAmount& nTargetValue, int nConfMine, int nConfTheirs, uint64_t nMaxAncestors, std::vector<COutput> vCoins, std::set<CInputCoin>& setCoinsRet, CAmount& nValueRet) const; | ||
| + // TODO: Change the hard coded change_size when we aren't only using P2PKH change outputs |
instagibbs
Jul 3, 2017
Member
if we're going to change it later to something without a default/dynamic value, maybe just get rid of the default arg and pass it each time.
| @@ -962,11 +976,23 @@ class CWallet : public CCryptoKeyStore, public CValidationInterface | ||
| */ | ||
| static CAmount GetMinimumFee(unsigned int nTxBytes, unsigned int nConfirmTarget, const CTxMemPool& pool, const CBlockPolicyEstimator& estimator, FeeCalculation *feeCalc = nullptr, bool ignoreGlobalPayTxFee = false); | ||
| /** | ||
| + * Estimate the minimum fee rate considering user set parameters | ||
| + * and the required fee |
instagibbs
Jul 3, 2017
Member
perhaps note it doesn't have the maxtxfee check inside it, making it slightly asymmetrical to the total fee one.
runn1ng
commented
Jul 3, 2017
|
@achow101 for some reason, when I do simulations either on @xekyo set (in scala) or on bitcoinjs randomly generated data (with the algo rewritten into javascript), the total fees are actually lower when I make the target lower (that is, when I do not include the output cost in the target). So maybe tightening the target rejects more transactions and then the fallbacks somehow make better results. I will investigate more when I have the time and write results here Xekyo/CoinSelectionSimulator#5 |
instagibbs
and others
added some commits
May 5, 2017
|
@runn1ng if you wouldn't mind, I'd like to know what the difference in rate of change creation for each of those experiments as well. |
@runn1ng: Um wait. "Target" is the amount to be selected. We are talking about the "cost of change" parameter that gives the leniency window for the exact match, right? Also, do you mean "input cost" instead of "output cost"? It would be lovely if you could post your experiment's results somewhere, so we all have the same dataset to discuss. |
runn1ng
commented
Jul 11, 2017
|
@Xekyo The problem with your experiment is that it's non-deterministic... but maybe I could put there some pre-set random seed |
runn1ng
commented
Jul 13, 2017
•
|
edit: ignore the graphs, see comment below
|
runn1ng
commented
Jul 13, 2017
•
|
edit: ignore the graphs, see comment below
|
runn1ng
commented
Jul 14, 2017
•
|
edit: ignore the graphs, see comment below
|
|
@runn1ng would you be able to try the strategy with Core's current selector as fallback? The easiest way to do that would be to add/modify the test cases for coin selection. |
|
@runn1ng: re random data: I'd surmise that BnB doesn't perform well on small datasets as there are too few possible combinations. That could easily cause the fallback algorithm to dominate. re moneypot: I haven't comprehensively tested all possible fallback algorithms, it is possible that Largest First selection as a fallback to BnB is more efficient as it doesn't take away as many small utxo that can be used to create combinations. Do I understand correctly that you calculated "cost of change" and then took a percentage of that, or is this percentage only on the cost of the input? If you did the former, it appears that using just the cost of an additional output as "cost of change" leads to a minimum, considering that 34 bytes is 18.7% of what I proposed as "cost of change" with output+input. |
runn1ng
commented
Jul 15, 2017
•
That is weird indeed. I am running code from your repo. To be sure I reverted all my local changes and I still get 72506973. When I made the correction here Xekyo/CoinSelectionSimulator#9 , I get total cost 70858076 I use only |
runn1ng
commented
Jul 15, 2017
•
|
I am running the code through |
runn1ng
commented
Jul 15, 2017
•
|
I get totally different numbers than in your paper with the other scenarios too. The numbers don't correspond to neither of the three tables, unfortunately. edit: oooh, that's because I am running "MoneyPot After LF", which was the default scenario, but it's actually with additional UTXOs from a previous run. The actual scenario from the paper (the first one) is TestCaseMoneyPotEmpty, right. |
|
Yes correct. The Moneypot after LF, is running the MoneyPot scenario starting with the resulting UTXO pool of running it with Largest First selection before. |
runn1ng
commented
Jul 17, 2017
|
I found out that the two repos for coinselect simulation returned different results for the same strategy, so I painstakingly went through both of them and found where they differ... and put tons of of PRs to both, so they now both return the same results with the same fees + setup The differences in simulations were:
...and those added to significant differences. Anyway, when I fixed all the issues, those are the results/graphs I get: this is for the moneypot scenario, with the fees 10 sat/bitcoin this is for the moneypot scenario, when I increase the fee to 200 sat/bitcoin (but I left the values, so more utxos become unspendable) In both, rand is slightly better. I am not sure what happened, if it's because the scenario is different (without the large UTXO set) or because of the subtle differences in benchmarking. The shape is similar though. This is the scenario of small randomly generated wallets BnB+LF performs better, optimum about 50% cost of change. So, different strategies/parameters are better at different scenarios. Again, there is danger of overfitting on one scenario - plus there might be some more subtle bugs in the benchmark code... in my wallet code, I will probably just use BnB+LF with 50% of cost and call it a day :) I haven't shown this in graphs, but having BnB is always better than not having it. :) If you want to repeat the tests, my forks of the repos are here 1 2 |
runn1ng
commented
Jul 17, 2017
|
@xekyo
Hm, that doesn't seem to be the case, the minimum is not 18%, but 30% to 50% on these two scenarios. |
|
@runn1ng is there a plausible explanation why not accounting for the full cost of the change is cheaper overall? |
runn1ng
commented
Jul 18, 2017
Hm, I already spent too much time on this... :/ I will see if I have time to look into the bitcoin coin selection tests and how to add benchmarks there, but not promising anything. |
runn1ng
commented
Jul 18, 2017
•
|
I think - and that's a speculation - that it's because the target is "tighter", so the BnB will reject more "lose" matches and will continue searching until it finds better match. So less fee is spent then, even when some matches are rejected that didn't have to be (and those spend more on fees). Btw. An interesting thing I just noticed - in the "small random" example, there is not that many BnB matches in the first place! Around 30 (out of 10.000 transactions). It still has an effect on the result... |
| + while (selection.at(depth).second) { | ||
| + // Reset this utxo's selection | ||
| + if (selection.at(depth).first) { | ||
| + value_ret -= utxo_pool.at(depth).txout.nValue; |
runn1ng
Jul 18, 2017
•
This line never fires.
It never happens, that an utxo is at the same time in an exclusion branch (which is what .second does) and is also selected (what .first does). Which makes sense; you never at the same time select and not select an utxo :)
With all my simulations, this line never seems to fire (when I rewrote this to JS).
So the other line after if can also be deleted.
runn1ng
Jul 18, 2017
I also think that .second is not needed at all; all the information necessary is in the first and depth; the only situation where .first != !(.second) is after we backtrack here, but the information in .second is useless anyway (since we will change it anyway before we backtrack to it again).
achow101
Jul 18, 2017
Contributor
Right. That appears to be a relic of when this randomly selected which branch to try first before I changed it to always try including first.
runn1ng
added a commit
to runn1ng/coinselect
that referenced
this pull request
Jul 18, 2017
|
|
runn1ng |
3d806ec
|
| + } | ||
| + } | ||
| + | ||
| + if (!done) { |
runn1ng
Jul 18, 2017
This is never true here. done is never true when backtrack is true. (Istanbul caught that :))
Xekyo
Jul 18, 2017
Contributor
This block doesn't happen when backtrack is false and done is true which happens when a solution is found.
runn1ng
Jul 18, 2017
In the if at the start of the while cycle, either backtrack or done is set, never both. We got here only when backtrack == true, so done cannot be true.






achow101 commentedJun 20, 2017
This is an implementation of the Branch and Bound coin selection algorithm written by Murch (@Xekyo). I have it set so this algorithm will run first and if it fails, it will fall back to the current coin selection algorithm. The coin selection algorithms and tests have been refactored to separate files instead of having them all in wallet.cpp.
I have added some tests for the new algorithm and a test for all of coin selection in general. However, more tests may be needed, but I will need help with coming up with more test cases.
This PR uses some code borrowed from #10360 to use effective values when selecting coins.