Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue#1643: Coinselection prunes extraneous inputs from ApproximateBestSubset #4906

Merged
merged 3 commits into from
Dec 8, 2015

Conversation

murchandamus
Copy link
Contributor

Improvement for Issue#1643:
A further pass over the available inputs has been added to ApproximateBestSubset after a candidate set has been found. It will prune any extraneous inputs in the selected subset, in order to decrease the number of inputs and the resulting change.

@murchandamus
Copy link
Contributor Author

This is the first time I am trying to collaborate on an open-source project, please feel free to point out when I am doing something wrong.

@BitcoinPullTester
Copy link

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/p4906_3b92ce7ad950ceba14f1cc6ad637923f71ca61be/ for binaries and test log.
This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/
Contact BlueMatt on freenode if something looks broken.

@murchandamus
Copy link
Contributor Author

I was made aware by @sipa that a candidate set that ended up matching the targetValue with the addition of a dust UTXO would still be picked immediately. That seems to be correct.

However, unless one is trying to make a minuscule payment, it seems that the likelihood to add a bigger UTXO at some point that would exceed the target should be sufficiently larger, than the chance that you'd only add inputs that inch closer and closer to the target to match it finally without exceeding it. It should rather be unlikely to even happen in 1000 iterations (gut feeling), so the patch should work fine in the majority of situations.

@gmaxwell
Copy link
Contributor

Have you simulated this, e.g. how a wallet progressed over time? I would expect this to result in grinding down the wallet into lots of dust change over time even worse than the current approach.

Generally, so long as it doesn't result in bloating things up to the point where the transaction confirms slowly, we should generally prefer to make transactions bigger (under the rational that transaction fees will increase over time or at worst stay constant). E.g. If you already have a change output, a pass that looks at the scriptpubkeys you're spending and keeps adding more inputs assigned to the same pubkeys until the fee increase is some threshold or the change exceeds 2x the payment value (or something like that) would probably result in lower total transaction fees over time. (and better privacy)

@murchandamus
Copy link
Contributor Author

@gmaxwell I had not simulated it in advance. I will do so though, just haven't gotten around to it yet. Perhaps I'll get around to it this evening, otherwise I will probably get around to it tomorrow.

@murchandamus
Copy link
Contributor Author

@gmaxwell So, I have done some simulations.
My approach was:
• To recreate the wallet logic in Python.
• To add the pruning feature conditionally
• To have a wallet of each type, starting wtih the same utxopool, execute and receive a number of payments.
• They obviously end up with the same amount, and I can compare the number and sizes of final utxo.

So far I have tried a few different scenarios, but I ended up being limited at around 10k utxo and 1000 operations, as my python implementation was not as fast as I had hoped.
My initial experiments suggest that the pruning wallet does end up with a slightly bigger number of utxo, but it's only slightly bigger than the regular wallet. (I'll be adding the results later, when I am on the right computer.)
I have started implementing the simulation in a compiled language to speed it up in order to do some more and bigger experiments.

@laanwj laanwj added the Wallet label Sep 25, 2014
@gmaxwell
Copy link
Contributor

gmaxwell commented Oct 8, 2014

Please keep us updated on your progress. If you'd like some more eyes on your testing code, feel free to point it out.

@murchandamus
Copy link
Contributor Author

@gmaxwell The code for my simulation can be found here: https://github.com/Xekyo/CoinSelectionSimulator

It is working now, and can be executed in bearable time, but the code could be a bit clearer. I am currently in my exam period and haven't had as much time to work on it, as I would like.

I have tried a few different strategies to select coins, and have been experimenting with different distributions of incoming and outgoing payments.

Different strategies:

  • "Regular Wallet" implements the algorithm currently in use as I understand it, to be used as a baseline.
  • "Pruning Wallet" selects the coins just like the regular wallet, but afterwards discards inputs that are smaller than the remaining change in the order they were selected.
  • "Pruning Wallet with minimum number of Inputs", like PW, but when pruning keeps at least the miminum number of Inputs.
  • "Double Wallet": Like PW, but when unable to directly match the target, aims for twice the target, in order to create change in the magnitude of what the user was trying to transfer instead of dust.

Some results:

Experiment 1:
Start: 5000 random UTXO from range(2500, 250000) satoshis
10000 transactions (randomly 50/50 distributed incoming/outgoing) in the range(540,250000) satoshis
Regular Wallet: 4853 changes created, in range(1,1591) satoshis, average change 63.01±97.78 satoshi, spent (1,12) inputs per transaction sent, average 1.44±1.15

... writing up my results I realize that I will want to create a .csv instead of the current text-form result, sorry gonna go back to studying now, but I'll probably be able to do it in the next break.


Some questions I haven't been able to answer satisfyingly:

  • What would constitute realistic data for incoming and outgoing transactions of one wallet (how many incoming/outgoing transactions, what average size and distributions for each direction, is it necessary to regard the change in value over time)?
  • Haven't researched yet: When does the required transaction fee increase?

@murchandamus
Copy link
Contributor Author

Hi,
Finished with my exams for this semester, finally had time to pick this up again.

(When I created the .csv files with my output, I realized that my random models would sometimes generate spending instructions that asked for more than the wallet's current value. While those were were ignored as impossible, they still got into the statistics, so I wanted to fix that first.)

I've been thinking about what one might want to be optimizing for. This is what I got so far:

  1. Non-dust change: The creation of small change outputs should be avoided. They bloat the blockchain and are expensive to spend.
    -> If a change output has to be created, it would be preferable to create a change output of the magnitude of the payment value. (Which would also help obscure recipient output and change output.)
  2. Privacy: UTXO should be picked non-deterministically, and as few different pubkeys as possible should be involved.
    -> Whenever the CoinSelection selects a UTXO, any UTXO assigned to the same pubkey should be preferred in selection.
    -> Small inputs that are significantly smaller than the size of the change and don't share their pubkey with a larger input should be pruned, as they increase transaction fee and decrease privacy.
  3. Minimize transaction fee A transaction input set should be preferred when it costs less to send.
    -> If spending some input costs more than what it contributes to the outputs it should not be added. Also, inputs that are extraneous should be pruned if that would lower the transaction fee.
  4. Consolidate small UTXO?
    Once created, very small UTXO are in the blockchain anyway.
    Is it preferable to spend very small UTXO, in order to remove them from the UTXO-pool, or to ignore them until they become valuable enough to spend in their own right?

Here are the statistics from my latest experiment.
ex19

@gmaxwell
Copy link
Contributor

Very interesting results!

@gmaxwell
Copy link
Contributor

Consolidate small UTXO? Once created, very small UTXO are in the blockchain anyway. Is it preferable to spend very small UTXO, in order to remove them from the UTXO-pool

It's preferable to spend them: since it reduces the storage for a minimal full node (see the pruning patches in #4701)... subject to the restriction that you don't want someone to be able to gratitiously increase your transaction costs by sending you tiny utxo.

@RHavar
Copy link
Contributor

RHavar commented Feb 18, 2015

Some questions I haven't been able to answer satisfyingly:

What would constitute realistic data for incoming and outgoing transactions of one wallet (how many incoming/outgoing transactions, what average size and distributions for each direction, is it necessary to regard the change in value over time)?

Perhaps I can help. I have data from MoneyPot.com's hot wallet which you can use for simulation: https://gist.github.com/RHavar/7cd6f3fcf2bd3e485458

The positive amounts are amounts deposited, the negative amounts are sends. The data is sorted over time (oldest at the top, newest at the bottom) denominated in bitcoins.

Things to note:

  • deposits are only added to the list when they get 1 confirmation (so they tend to come in batches) and send amounts are added to the list instantly
  • At various times, the cumulative total dips below 0. This happens when people have won enough money to be able to withdraw more than the total people have put in the site. For the purpose of simulation, you might want to assume there is an additional 50 BTC input at the very start, to avoid this case.

@jgarzik
Copy link
Contributor

jgarzik commented Jul 23, 2015

This PR has been open for a while, but garnered no ACKs. The author seems to have put a fair amount of time and thought into it. However, this definitely needs more review and testing.

Ping, maintainers/testers/helpers?

@murchandamus
Copy link
Contributor Author

@jgarzik: Hi Jeff,
Thanks for the ping.
I had lost track of this, but I see now that @RHavar posted data that might help to explore what the patch would do with some realistic data for transaction sizes. I'll have a look this weekend.

@murchandamus
Copy link
Contributor Author

I've created another testcase using the MoneyPot.com data. On that one, I get different results yet again, this time the Regular wallet has the least UTXO in the end.

Looking over the results, I noticed that all wallets created a change output in the single digit satoshis. Do I remember correctly that the wallet shouldn't create Dust Outputs? Shouldn't the change be at least 540 satoshis? If so, I should probably fix that behavior still and then have another look.

However, if it ends up looking as promising as today, I would propose to just expire this Pull-Request.

@maflcko
Copy link
Member

maflcko commented Sep 23, 2015

@xekyo:

prune extraneous inputs

Wouldn't "extraneous inputs" imply an issue in the coin selection algorithm which should be fixed instead? I invite you to look at #6696 which is a different approach. I am happy to shoot up some simulations if anyone is interested. (Ping me at the other PR)

@laanwj
Copy link
Member

laanwj commented Nov 20, 2015

From what I understand ApproximateBestSubset is an approximate algorithm for the following:

Input: [sizes], nLow, nTarget
Output: X=subset([sizes]) for which nTargetValue <= sum(X) < nLow
        and sum(X)-nTargetValue as small as possible

The "approximate" part means that it may select a solution which is not the optimal one (e.g. sum(X)-nTargetValue is not really as small as possible).

Your fix drops elements i from the result X for which sum(X)-i is still >= nTargetvalue, which if possible leads to a trivially better solution.

From what I understand this can improve the result because vValue is sorted from low to high, and elements are added in that order, it can happen that an element is added to get the sum above nTargetValue, but makes earlier-added small elements redundant.

  • What about moving the post-processing step to the end? This removes the performance impact per checked subset, and still makes sure redundant outputs are removed from the final result.

I've managed to reproduce this as well. For e.g. these input values:

vValues = [10, 10, 10, 10, 10, 10, 10, 10, 10, 100, 1000]
nTargetValue = 251

ApproximateBestSubset sometimes selects [10, 10, 1000]. Postprocessing like this can drop the redundant 10 inputs.

Concept ACK - this cannot make things worse. Would be nice to have a unit test.

@maflcko
Copy link
Member

maflcko commented Nov 20, 2015

@laanwj Great that you had a look at this.

vValue is sorted from low to high ...

Guess I missed that during review.

unit test

@xekyo are you still working on this? Should I take over?

@murchandamus
Copy link
Contributor Author

Hi,

Am 20.11.2015 11:11 schrieb "MarcoFalke" notifications@github.com:

@xekyo are you still working on this? Should I take over?

I'd be interested to take another look, but I'm currently traveling. Please
feel free to take over. Otherwise, I'll check it out in December.

Xekyo

@murchandamus
Copy link
Contributor Author

@laanwj I've edited my fork to move the post-processing step to the end of ApproximateBestSubset. However, this patch may cause fewer dust outputs to be spent which contradicts Greg's assessment above. Are you sure about "cannot make things worse"? I feel my simulations have been somewhat inconclusive on that point.

A further pass over the available inputs has been added to ApproximateBestSubset after a candidate set has been found. It will prune any extraneous inputs in the selected subset, in order to decrease the number of input and the resulting change.
@murchandamus
Copy link
Contributor Author

@MarcoFalke I realized my mistake and fixed it. After all my commits, I did the rebase as you suggested, and pushed to "Fix-#1643". I hope I did that right. :)

@murchandamus
Copy link
Contributor Author

I've added a simple test with

   vValues = [10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 100, 100]
   nTargetValue = 222

with an expected nBest == 230 on one iteration.

Sorry, I'm not sure how I would run the test on my machine right now, so I figured I'd just push it. I'll check back tomorrow.


BOOST_FIXTURE_TEST_SUITE(coinselection_tests, BasicTestingSetup)

BOOST_AUTO_TEST_CASE(sanity)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just add this to src/wallet/test/wallet_tests.cpp.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks, I had been wondering why there was no wallet tests in the regular test directory.

@laanwj
Copy link
Member

laanwj commented Dec 7, 2015

ACK

@murchandamus
Copy link
Contributor Author

I rebased and used fixup on the commits that only fixed previous mistakes.

@murchandamus
Copy link
Contributor Author

@laanwj Travis had cleared my patch just when I realized that I had also messed up the indentation on the test. I've rebased it into two commits, one commit for the code, and one for the test. I expect this to turn green again momentarily. Sorry for keeping Travis so busy. :(

Is there anything else that needs to be done about this PR?

@laanwj laanwj merged commit fc0f52d into bitcoin:master Dec 8, 2015
laanwj added a commit that referenced this pull request Dec 8, 2015
fc0f52d Added a test for the pruning of extraneous inputs after ApproximateBestSet (Murch)
af9510e Moved set reduction to the end of ApproximateBestSubset to reduce performance impact (Murch)
5c03483 Coinselection prunes extraneous inputs from ApproximateBestSubset (AlSzacrel)
@maflcko
Copy link
Member

maflcko commented Dec 9, 2015

Wallet code itself looks good! Post-merge Tested ACK.

@xekyo Thanks for sticking with this so long! There seems to be a small issue with the test code but I will create another PR for this...

Note @laanwj: vValue is not sorted from low to high but from high to low but I think you meant it the right way. ;)

@maflcko maflcko mentioned this pull request Dec 9, 2015
@maflcko
Copy link
Member

maflcko commented Dec 9, 2015

Oh, and you mentioned this will improve cases such as #1643. Which is not true, at least for this very transaction mentioned in #1643.

Still, I assume pruning will help for really large wallets (like the ones of exchanges or heavy users) when they have a odd distribution of input amount sizes.

laanwj pushed a commit that referenced this pull request Dec 10, 2015
This is a combination of 3 commits.

- Coinselection prunes extraneous inputs from ApproximateBestSubset
  A further pass over the available inputs has been added to ApproximateBestSubset after a candidate set has been found. It will prune any extraneous inputs in the selected subset, in order to decrease the number of input and the resulting change.
- Moved set reduction to the end of ApproximateBestSubset to reduce performance impact
- Added a test for the pruning of extraneous inputs after ApproximateBestSet

Github-Pull: #4906
Rebased-From: 5c03483 af9510e fc0f52d
@morcos morcos mentioned this pull request Mar 9, 2016
laanwj added a commit to laanwj/bitcoin that referenced this pull request Jul 1, 2016
This reverts PR bitcoin#4906, "Coinselection prunes extraneous inputs from
ApproximateBestSubset".

Apparently the previous behavior of slightly over-estimating the set of
inputs was useful in cleaning up UTXOs.

See also bitcoin#7664, bitcoin#7657, as well as 2016-07-01 discussion on #bitcoin-core-dev IRC.
lateminer pushed a commit to lateminer/bitcoin that referenced this pull request Jan 7, 2018
This reverts PR bitcoin#4906, "Coinselection prunes extraneous inputs from
ApproximateBestSubset".

Apparently the previous behavior of slightly over-estimating the set of
inputs was useful in cleaning up UTXOs.

See also bitcoin#7664, bitcoin#7657, as well as 2016-07-01 discussion on #bitcoin-core-dev IRC.
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022
@murchandamus murchandamus deleted the Fix-#1643 branch October 6, 2023 19:03
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants