Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisit Coin Selection #7664

Closed
morcos opened this issue Mar 9, 2016 · 26 comments
Closed

Revisit Coin Selection #7664

morcos opened this issue Mar 9, 2016 · 26 comments
Labels
Milestone

Comments

@morcos
Copy link
Member

morcos commented Mar 9, 2016

There is concern that the current coin selection algorithm is too biased towards creating small inputs. We should try to look at some data after the change for 0.12 and see if that is actually the case and regardless work at improving the coin selection algorithm to avoid that concern before the 0.13 release.

See #7657 and #4906 for back story.

Tag with 0.13 milestone please

@jonasschnelli
Copy link
Contributor

Agree with @morcos.
Something we might want to work towards is a flexible/modular Coinselection. Coinselection somehow depends on the use cases and could be modular therefore.

We could allow different coin selection classes (some OO magic / interfacing) and allow to select them in fundrawtransaction or over a global state only adjustable on startup -coinselection=*class*.

Bitcoinj has a similar approach.

@gmaxwell
Copy link
Contributor

In my experience the biggest impediment to doing anything with coin selection is the current testing approach that simply fail if anything changes at all. So when you intend to change it you just get a pile of failures and nothing that tells you if the change is behaving sanely.

I don't see any problem to having a selection, but I'm somewhat doubtful of the utility. Having a selection doesn't replace having a good default-- and right now we don't even have a good default; and most of the time the user actually doesn't really much useful information here (nothing the user would call a "use case" probably directly translates into the right strategy)-- keep in mind that most users have no idea about the underlying "coins" behavior in the system.

Perhaps the primary strategy distinguisher would be the importance of privacy-- e.g. should sweeping be kept to "do no privacy harm" levels, or more aggressive-- but the user is often poorly equipt to weigh the privacy considerations (e.g. don't what their future risks are, and may not know how powerful transaction analysis is), nor are they usually able to weigh the privacy impacts on their counterparties or the ecosystem as a whole.

Another thing to keep in mind is that implementation of GROUPCHECKSIGVERIFY would make strategies that try to spend many inputs at once much more attractive. (beyond the argument I've made before that you generally want to pay fees sooner rather than later, since fees tend to go up over time)

@RHavar
Copy link
Contributor

RHavar commented Mar 10, 2016

One untested idea that I haven't seen suggested before (quite possibly because I'm overlooking something): try minimize the value of the change output. If it's less than the dust-threshold, it can be safely omitted and donated to miners.

It should help keep the unspent set in check, while generating outputs that are actually useful for the next transaction(s). It also has the actively trying to avoid sourcing pointlessly large inputs (which are best saved for another transaction, especially if the user is sending multiple transactions before the next block confirms), and will also help create outputs which are useful for sending exact-change.

The biggest disadvantage I can see, is it'll tend to make the change very distinguishable from the destination, but that's a problem that already exists.

@gmaxwell
Copy link
Contributor

@RHavar technically thats what Bitcoin has always done. The subset sum tries to minimize the change (which generally minimizes the input set), and tiny amounts of change are just turned to fees. This has a bad behavior though of often resulting in small change (though above the threshold), which is not very useful for spending in the future; so big coins get spent and ground down to near-dust.

See ApproximateBestSubset in wallet.cpp.

@laanwj laanwj added this to the 0.13.0 milestone Mar 11, 2016
@laanwj
Copy link
Member

laanwj commented Mar 11, 2016

#1643 should also be mentioned here. Bad coin selection with lots of small outputs has been a long-running issue.

@maflcko
Copy link
Member

maflcko commented Mar 14, 2016

which generally minimizes the input set

I doubt this is true. The algorithm will generally start by selecting some appropriately sized large coins (vValues is sorted) and then "fill up the gaps" with small coins such that it either hits the target value or overshoots the target value + MIN_CHANGE by the smallest amount possible. (see e.g. the issue @laanwj linked to)

@RHavar
Copy link
Contributor

RHavar commented Mar 14, 2016

One parameter you could add without introducing a huge amount of complexity would be an "ideal" amount of outputs in your wallet, lets call it N. Having a low N would mean you can easily/cheaply sweep your wallet (at the expense of linking your coins more heavily, and big transaction chains). Having a high N would give you the option of more coins to select from, hopefully resulting in an more ideal match.

N could be a configuration option, and when the amount of outputs is less than N it could actively try to increase it (spending from the largest coins first is a good way), and when the outputs is greater than N it could actively try to decrease it (sourcing extraneous inputs is a good way).

@murchandamus
Copy link
Contributor

Would it perhaps make sense to increase the limit of what gets converted to fees? It would seem to me that this could reduce the number of tiny UTXO being created.

E.g. If the change is a magnitude smaller than the fee and a small portion of the total value of the transaction, add it to the fee:

if(change < 0.1* fee && change < 0.001 * target):
    fee = fee + change

That way, it would only apply to transactions where the change is only a tiny portion of the total value transacted, but still never donate more than transaction fee amounts. I.e. that way really large transactions wouldn't suddenly donate significant money, and micro transactions wouldn't lose a significant portion of the transacted value.

I haven't tested whether it would have significant impact, and these numbers are just from the top of my head as well.

@maflcko maflcko modified the milestones: 0.13.0, 0.14 Jun 15, 2016
@maflcko
Copy link
Member

maflcko commented Jun 15, 2016

Changed the milestone to 0.14

laanwj added a commit to laanwj/bitcoin that referenced this issue Jul 1, 2016
This reverts PR bitcoin#4906, "Coinselection prunes extraneous inputs from
ApproximateBestSubset".

Apparently the previous behavior of slightly over-estimating the set of
inputs was useful in cleaning up UTXOs.

See also bitcoin#7664, bitcoin#7657, as well as 2016-07-01 discussion on #bitcoin-core-dev IRC.
@dooglus
Copy link
Contributor

dooglus commented Jul 4, 2016

@gmaxwell Re "technically thats what Bitcoin has always done. The subset sum tries to minimize the change (which generally minimizes the input set), and tiny amounts of change are just turned to fees" I don't think that's true. From CWallet::SelectCoinsMinConf():

ApproximateBestSubset(vValue, nTotalLower, nTargetValue, vfBest, nBest);
if (nBest != nTargetValue && nTotalLower >= nTargetValue + MIN_CHANGE)
    ApproximateBestSubset(vValue, nTotalLower, nTargetValue + MIN_CHANGE, vfBest, nBest);

The code is trying to make exactly the nTargetValue, and if it is unable to make that exact amount it instead tries to make nTargetValue + MIN_CHANGE instead.

So most transactions end up with a change amount of 0.010xxxxx and you end up with a wallet full of these individual bitcents which are expensive to spend. See this tx for an example of a high fee caused by this 0.01* pollution.

[Edit: this one is even crazier]

In the tx linked above, notice that the change output is 0.01000109 BTC, while two of the inputs are 0.01000112 BTC. So by donating just three satoshis to the miner we could have avoided the change output completely in this example. I suspect (based on how change outputs are very often less than a thousand satoshis over 0.01 BTC) that the majority of change outputs could be eliminated if the coin selection code was changed to donate up to 1000 satoshis to the miner if it allows change to be avoided.

@gmaxwell
Copy link
Contributor

gmaxwell commented Jul 5, 2016

@dooglus indeed, I forgot that. A legacy of old code that overly prioritized free transactions. If there is no change it should treat up to some threshold over as being just as good as an exact match, and turn it into change. The threshold could be something like half the cost of spending a change output at a current feerate-- e.g. at 50 base units per byte that threshold would be about 3600 base units.

@dooglus
Copy link
Contributor

dooglus commented Jul 5, 2016

@gmaxwell "and turn it into change" .. you mean turn it into fee, right? We're talking about avoiding creating change in this instance.

How are we to determine the current feerate? Do we just use CWallet::GetMinimumFee() for some fixed nTxBytes size? Using a hardcoded '3600 base units' seems like the wrong way to go about it.

@gmaxwell
Copy link
Contributor

gmaxwell commented Jul 5, 2016

Fee yes. :)

The size should depend on the kind of keys being used for change and reflect the marginal size increase from adding the input e.g. it would be ~ 32+4+1+73+34 bytes for a regular p2pkh change address. Effectively coin worth less than the fee for that size could currently costs more to spend than its worth-- so we really shouldn't be making outputs smaller than that, and preferably not smaller than 100 times that value, since presumably people would like to keep their fees as small percentage of the funds moved. :)

@dooglus
Copy link
Contributor

dooglus commented Jul 11, 2016

we really shouldn't be making outputs smaller than that, and preferably not smaller than 100 times that value, since presumably people would like to keep their fees as small percentage of the funds moved. :)

I'm not sure this logic is sound. We're looking for the threshold at which we decide to just include the 'change' in the miner fee instead of making a new output. That is effectively giving 100% of the change to the miner.

Suppose the cost (in fees) of spending an output is 0.0001 and the value of the output is 0.0010. We are choosing between:

A) create a change output; spending it will involve a 10% fee, but the user will still get a 0.0009 value from spending it
B) don't create a change output, just immediately give it all in extra fees.

While A) involves a 10% fee when the output is spent, B) involves an immediate 100% fee. Neither choice lets the user keep their fee as a small percentage. Best for the user would be if we simply never created a change output worth less than the expected cost of spending it.

@gmaxwell
Copy link
Contributor

gmaxwell commented Jul 13, 2016

There are two ways to handle change too small, give it as fees or add more inputs to grow it. Whichever we do, it should result in making change only if it is a large multiple of the cost to spend it. If we're unable to address it some other way, I think it wouldn't be unreasonable to turn it into fees, even right up to that threshold: better to spend it and get faster confirmation, then to end up with a worthless output that you'll just abandon and bloat up the txo set.

@luke-jr
Copy link
Member

luke-jr commented Jul 14, 2016

I don't see how bitcent change is "expensive to spend". Seems like a reasonable decision to make it configurable, though - ideally you'd want to target your average spend amount, which varies from user to user. Another possibility there is to try to guess based on historical spends.

@RHavar
Copy link
Contributor

RHavar commented Jul 14, 2016

I don't see how bitcent change is "expensive to spend".

Well, even if you only have 10 BTC, and that gets churned into 1000 outputs over time (as they currently do in 0.12 depending on your spend:receive habits), it's pretty painful to spend from, regardless if your normal spend amount is 0.01 btc. For instance, even though I normally send tiny transactions, I'm in a position where trying to sweep 50% of my wallet results in "transaction too large" error. And when I lower that amount so I can actually make a transaction, it costs $8 (?) in transaction fees or something.

I really do think that if it was just tiny bit more aggressive in create change-less transactions, most of the problems would go away (and privacy improve).

@rebroad
Copy link
Contributor

rebroad commented Aug 1, 2016

Surely the maths behind this has already existed for years - it feels like a wheel that has already been invented. I'd have thought a variety of input sizes would be the ideal, perhaps binary based, i.e. 16, 8, 4, 2, 1 etc, but stopping at the point (creating them) where they are close to the current tx fees. The wallet therefore should perhaps take into account all inputs currently tracked in deciding what sized inputs to create next. No longer does the age of an input play such a part (i.e. the priority) as I suspect more miners are simply focused on getting the max for their blocks these days.

@murchandamus
Copy link
Contributor

murchandamus commented Nov 4, 2016

I've finished my thesis on coin selection on Monday.

I've described an algorithm on pages 46-49 for coin selection which I call "Branch'n'Bound" in the document. (I only came up with it after Scaling Bitcoin, so it wasn't included in my presentation there.)

It is a two step scheme.

  1. A randomized depth-first-search on a binary tree generated from omitting or including the effective value of UTXOs to find an exact match.
  2. If no exact match is found, equi-probable selection on the UTXO pool. (Which provides different sized outputs for a better chance of exact matches, and has performed pretty nicely by itself in my simulations.)

Simulating with the dataset provided by RHavar in the other thread and compared to Bitcoin Core's current strategy it:

  • has a smaller tendency to produce large input sets
  • increases exact matches (transactions without a change output) by more than ×50
  • maintains a smaller number of UTXO in the wallet in face of 2.06 incoming transactions per outgoing transaction
  • has a smaller average input set size (and thus lower user fees)
  • is less computationally intensive
  • fixes an issue with Bitcoin Core's fee estimation

This algorithm could be basis for a new coin selection algorithm of Bitcoin Core and I would like to invite questions and commentary. (If this thread is perhaps not the best medium for such a discussion, I'd be open to another format, e.g. the mailing list?)

@laanwj
Copy link
Member

laanwj commented Nov 11, 2016

@xekyo Very interesting work! So that algorithm does coin selection better (at least on that dataset) and implementing it would unentangle the current coin selection code mess and is also less computationally intensive. Sounds great to me. Looking forward to it.

To avoid having a pull request wither forever I think we should get some people to commit to testing it. Also at least one large site that uses bitcoin core for wallet.

@RHavar
Copy link
Contributor

RHavar commented Nov 11, 2016

To avoid having a pull request wither forever I think we should get some people to commit to testing it. Also at least one large site that uses bitcoin core for wallet.

I'd be happy to test a PR on a production site after it's reasonably stable. My site has made 2583 outgoing transactions in the last 7 days, (and received 4774 in the same time period) so it's probably a reasonable stress test. (if someone could just ping or mention me on it, if there's a PR so I don't miss it)

@rebroad
Copy link
Contributor

rebroad commented Nov 13, 2016

@xekyo Just reading your paper. It says "Transaction malleability was most notably blamed to have contributed to the demise of MtGox" - I think you'll find that the only person who blamed the demise on this was the owner of MtGox - and it was later proved to be not a significant contributor by various researchers with greater credibility than the person who made the claim you refer to. ;) Anyway, sorry, off-topic.

@murchandamus
Copy link
Contributor

murchandamus commented Nov 13, 2016

@RHavar: Thank you very much, I'll be happy to ping you. Thank you also for providing the dataset that I used to evaluate my thesis on. :)

@rebroad: Yeah, I'm aware of that, but I agree that I should have been clearer in that paragraph.

@maflcko maflcko modified the milestones: 0.15.0, 0.14.0 Jan 6, 2017
@murchandamus
Copy link
Contributor

I just saw the proposed changes in #9404, sorry I didn't get my proposal done for 0.14. :(

The proposed fix is much better than the previous behavior (#4082), also especially for computational effort, because unnecessary repetitions are forestalled.

However, the issue is due to the order that a transaction is created: Since the fee is being estimated before the inputs are selected, when the number of selected inputs deviates from the initial estimate, the fee estimate is off. This problem stems from the effective target of the selection increasing with each collected input, because the effective target is increased by the cost of one input for every input we select.

Instead we should estimate the fee rate first, before the inputs are selected, and deduct the cost of spending an input from each input's value upon selection.

effectiveValue = utxo.value - feePerByte × bytesPerInput

We'd measure the "effective contribution" of the input toward the fixed cost of target + transaction overhead + output costs. That way, we always get the fee right.

@murchandamus
Copy link
Contributor

@RHavar: The Branch and Bound algorithm is being implemented in this patch by Andrew Chow: #10637

@morcos
Copy link
Member Author

morcos commented Jul 6, 2017

This can bump to 0.16

@laanwj laanwj modified the milestones: 0.16.0, 0.15.0 Jul 6, 2017
lateminer pushed a commit to lateminer/bitcoin that referenced this issue Jan 7, 2018
This reverts PR bitcoin#4906, "Coinselection prunes extraneous inputs from
ApproximateBestSubset".

Apparently the previous behavior of slightly over-estimating the set of
inputs was useful in cleaning up UTXOs.

See also bitcoin#7664, bitcoin#7657, as well as 2016-07-01 discussion on #bitcoin-core-dev IRC.
@laanwj laanwj modified the milestones: 0.16.0, 0.17.0 Jan 11, 2018
@maflcko maflcko closed this as completed Mar 22, 2018
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

12 participants