Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client/dcr: Improved UTXO selection algorithm #2169

Merged
merged 3 commits into from Feb 28, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
114 changes: 52 additions & 62 deletions client/asset/dcr/coin_selection.go
Expand Up @@ -4,6 +4,7 @@
package dcr

import (
"math/rand"
"sort"

"decred.org/dcrdex/dex/calc"
Expand Down Expand Up @@ -71,69 +72,62 @@ func sumUTXOs(set []*compositeUTXO) (tot uint64) {
return tot
}

// In the following utxo selection functions, the compositeUTXO slice MUST be
// sorted in ascending order (smallest first, largest last).

// subsetLargeBias begins by summing from the largest UTXO down until the sum is
// just below the requested amount. Then it picks the (one) final UTXO that
// reaches the amount with the least excess. Each utxo in the input must be
// smaller than amt - use via leastOverFund only!
func subsetLargeBias(amt uint64, utxos []*compositeUTXO) []*compositeUTXO {
// Add from largest down until sum is *just under*. Resulting sum will be
// less, and index will be the next (smaller) element that would hit amt.
var sum uint64
var i int
for i = len(utxos) - 1; i >= 0; i-- {
this := toAtoms(utxos[i].rpc.Amount) // must not be >= amt
if sum+this >= amt {
break
// subsetWithLeastSumGreaterThan attempts to select the subset of UTXOs with
// the smallest total value greater than amt. It does this by making
// 1000 random selections and returning the best one. Each selection
// involves two passes over the UTXOs. The first pass randomly selects
// each UTXO with 50% probability. Then, the second pass selects any
// unused UTXOs until the total value is greater than or equal to amt.
func subsetWithLeastSumGreaterThan(amt uint64, utxos []*compositeUTXO) []*compositeUTXO {
best := uint64(1 << 62)
var bestIncluded *[]bool
chappjc marked this conversation as resolved.
Show resolved Hide resolved
bestNumIncluded := 0

iterations := 1000
for nRep := 0; nRep < iterations; nRep++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only other thing from the cpp that might make sense is to break nreps (this outermost loop) if we happen to hit nTotal == amt.

included := make([]bool, len(utxos))
Copy link
Member

@chappjc chappjc Feb 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main bottleneck, causing ~2000 allocs per call. It may seem silly, but if we only allocate once and instead zero out the slice after each iteration, it screams. Copying memory is much faster than allocating memory.

diff --git a/client/asset/dcr/coin_selection.go b/client/asset/dcr/coin_selection.go
index af90c5a79..2be8a5dbc 100644
--- a/client/asset/dcr/coin_selection.go
+++ b/client/asset/dcr/coin_selection.go
@@ -80,21 +80,24 @@ func sumUTXOs(set []*compositeUTXO) (tot uint64) {
 // unused UTXOs until the total value is greater than or equal to amt.
 func subsetWithLeastSumGreaterThan(amt uint64, utxos []*compositeUTXO) []*compositeUTXO {
        best := uint64(1 << 62)
-       var bestIncluded *[]bool
+       var bestIncluded []bool
        bestNumIncluded := 0
 
-       iterations := 1000
+       rnd := rand.New(rand.NewSource(rand.Int63()))
+
+       included := make([]bool, len(utxos))
+
+       const iterations = 1000
        for nRep := 0; nRep < iterations; nRep++ {
-               included := make([]bool, len(utxos))
 
-               var found bool
                var nTotal uint64
                var numIncluded int
-               for nPass := 0; nPass < 2 && !found; nPass++ {
+       passes:
+               for nPass := 0; nPass < 2; nPass++ {
                        for i := 0; i < len(utxos); i++ {
                                var use bool
                                if nPass == 0 {
-                                       use = rand.Uint32()&1 == 1
+                                       use = rnd.Int63()&1 == 1
                                } else {
                                        use = !included[i]
                                }
@@ -105,15 +108,21 @@ func subsetWithLeastSumGreaterThan(amt uint64, utxos []*compositeUTXO) []*compos
                                        if nTotal >= amt {
                                                if nTotal < best || (nTotal == best && numIncluded < bestNumIncluded) {
                                                        best = nTotal
-                                                       bestIncluded = &included
+                                                       if bestIncluded == nil {
+                                                               bestIncluded = make([]bool, len(utxos))
+                                                       }
+                                                       copy(bestIncluded, included)
                                                        bestNumIncluded = numIncluded
-                                                       found = true
+                                                       break passes // next iter
                                                }
-                                               break
+                                               break // next pass
                                        }
                                }
                        }
                }
+               for i := range included {
+                       included[i] = false
+               }
        }
 
        if bestIncluded == nil {
@@ -121,7 +130,7 @@ func subsetWithLeastSumGreaterThan(amt uint64, utxos []*compositeUTXO) []*compos
        }
 
        set := make([]*compositeUTXO, 0, len(utxos))
-       for i, inc := range *bestIncluded {
+       for i, inc := range bestIncluded {
                if inc {
                        set = append(set, utxos[i])
                }

bench:

func Benchmark_subsetFund(b *testing.B) {
	amt := uint64(10e8)
	s := make([]*compositeUTXO, 4000)
	newU := func(amt float64) *compositeUTXO {
		return &compositeUTXO{
			rpc: &walletjson.ListUnspentResult{Amount: amt},
		}
	}

	for i := 0; i < b.N; i++ {
		b.StopTimer()

		for i := range s {
			v := rand.Float64() + float64(rand.Int63n(2))
			v = math.Round(v*1e8) / 1e8
			s[i] = newU(v)
		}
		sort.Slice(s, func(i, j int) bool {
			return s[i].rpc.Amount < s[i].rpc.Amount
		})

		b.StartTimer()

		leastOverFund(amt, s)
	}
}

before

Benchmark_subsetFund
Benchmark_subsetFund-32    	     727	   1637138 ns/op	 4152855 B/op	    2001 allocs/op

after

Benchmark_subsetFund
Benchmark_subsetFund-32    	    3734	    332022 ns/op	   46345 B/op	       4 allocs/op

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, looks better. I copied everything you had, except I only put one break passes in the outer if block.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except I only put one break passes in the outer if block.

That would seem to change the original method then. I was just eliminating the found bool, but I think it behaves differently now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem to match the cpp now however. Was just a deviation before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deviation was because I was considering that the slice was sorted in our case. In the cpp their array is not sorted and if they go over the required amount, they remove the element and keep trying. This actually works much better, but is a bit slower.

I've created a commit here that compares the results and shows the runtime of the shuffled one: cc28045
I think we should go for the slower but more accurate one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems strange to shuffle first, but OK, let's do that and follow the same approach of remove and keep trying if it goes over. This keep trying approach though would seem to make the best == amt break of the outer nRep loop more important though, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm yeah a best == amt break would definitely not hurt.


var found bool
var nTotal uint64
var numIncluded int
for nPass := 0; nPass < 2 && !found; nPass++ {
for i := 0; i < len(utxos); i++ {
var use bool
if nPass == 0 {
use = rand.Uint32()&1 == 1
chappjc marked this conversation as resolved.
Show resolved Hide resolved
chappjc marked this conversation as resolved.
Show resolved Hide resolved
} else {
use = !included[i]
}
if use {
included[i] = true
numIncluded++
nTotal += toAtoms(utxos[i].rpc.Amount)
if nTotal >= amt {
if nTotal < best || (nTotal == best && numIncluded < bestNumIncluded) {
best = nTotal
bestIncluded = &included
bestNumIncluded = numIncluded
found = true
}
break
}
}
}
}
sum += this
}

if i == -1 { // full set used
// if sum >= amt { return utxos } // would have been i>=0 after break above
if bestIncluded == nil {
return nil
}
// if i == len(utxos)-1 { return utxos[i:] } // shouldn't happen if each in set are < amt

// Find the last one to meet amt, as small as possible.
rem, set := utxos[:i+1], utxos[i+1:]
idx := sort.Search(len(rem), func(i int) bool {
return sum+toAtoms(rem[i].rpc.Amount) >= amt
})

return append([]*compositeUTXO{rem[idx]}, set...)
}

// subsetSmallBias begins by summing from the smallest UTXO up until the sum is
// at least the requested amount. Then it drops the smallest ones it had
// selected if they are not required to reach the amount.
func subsetSmallBias(amt uint64, utxos []*compositeUTXO) []*compositeUTXO {
// Add from smallest up until sum is enough.
var sum uint64
var idx int
for i, utxo := range utxos {
sum += toAtoms(utxo.rpc.Amount)
if sum >= amt {
idx = i
break
}
}
if sum < amt {
return nil
}
set := utxos[:idx+1]

// Now drop excess small ones.
for i, utxo := range set {
sum -= toAtoms(utxo.rpc.Amount)
if sum < amt {
idx = i // needed this one
break
set := make([]*compositeUTXO, 0, len(utxos))
for i, inc := range *bestIncluded {
if inc {
set = append(set, utxos[i])
}
}
return set[idx:]

return set
}

// leastOverFund attempts to pick a subset of the provided UTXOs to reach the
Expand All @@ -147,8 +141,8 @@ func subsetSmallBias(amt uint64, utxos []*compositeUTXO) []*compositeUTXO {
// large enough to fully fund the requested amount, if it exists. If the smaller
// set is insufficient, the single largest UTXO is returned. If instead the set
// of smaller UTXOs has enough total value, it will search for a subset that
// reaches the amount with least over-funding (see subsetSmallBias and
// subsetLargeBias). If that subset has less combined value than the single
// reaches the amount with least over-funding (see subsetWithLeastSumGreaterThan).
// If that subset has less combined value than the single
// sufficiently-large UTXO (if it exists), the subset will be returned,
// otherwise the single UTXO will be returned.
//
Expand Down Expand Up @@ -176,11 +170,7 @@ func leastOverFund(amt uint64, utxos []*compositeUTXO) []*compositeUTXO {
// Find a subset of the small UTXO set with smallest combined amount.
var set []*compositeUTXO
if sumUTXOs(small) >= amt {
set = subsetLargeBias(amt, small)
setX := subsetSmallBias(amt, small)
if sumUTXOs(setX) < sumUTXOs(set) {
set = setX
}
set = subsetWithLeastSumGreaterThan(amt, small)
} else if single != nil {
return []*compositeUTXO{single}
}
Expand Down
128 changes: 10 additions & 118 deletions client/asset/dcr/coin_selection_test.go
Expand Up @@ -8,110 +8,6 @@ import (
walletjson "decred.org/dcrwallet/v2/rpc/jsonrpc/types"
)

func Test_subsetLargeBias(t *testing.T) {
amt := uint64(10e8)
newU := func(amt float64) *compositeUTXO {
return &compositeUTXO{
rpc: &walletjson.ListUnspentResult{Amount: amt},
}
}
tests := []struct {
name string
utxos []*compositeUTXO
want []*compositeUTXO
}{
{
"1,3 exact",
[]*compositeUTXO{newU(1), newU(8), newU(9)},
[]*compositeUTXO{newU(1), newU(9)},
},
{
"subset large bias",
[]*compositeUTXO{newU(1), newU(3), newU(6), newU(7)},
[]*compositeUTXO{newU(3), newU(7)},
},
{
"subset large bias",
[]*compositeUTXO{newU(1), newU(3), newU(5), newU(7), newU(8)},
[]*compositeUTXO{newU(3), newU(8)},
},
{
"insufficient",
[]*compositeUTXO{newU(1), newU(8)},
nil,
},
{
"all exact",
[]*compositeUTXO{newU(1), newU(9)},
[]*compositeUTXO{newU(1), newU(9)},
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
if got := subsetLargeBias(amt, tt.utxos); !reflect.DeepEqual(got, tt.want) {
t.Errorf("subset() = %v, want %v", got, tt.want)
}
})
}
}

func Test_subsetSmallBias(t *testing.T) {
amt := uint64(10e8)
newU := func(amt float64) *compositeUTXO {
return &compositeUTXO{
rpc: &walletjson.ListUnspentResult{Amount: amt},
}
}
tests := []struct {
name string
utxos []*compositeUTXO
want []*compositeUTXO
}{
{
"1,3",
[]*compositeUTXO{newU(1), newU(8), newU(9)},
[]*compositeUTXO{newU(8), newU(9)},
},
{
"subset",
[]*compositeUTXO{newU(1), newU(9), newU(11)},
[]*compositeUTXO{newU(1), newU(9)},
},
{
"subset small bias",
[]*compositeUTXO{newU(1), newU(3), newU(6), newU(7)},
[]*compositeUTXO{newU(1), newU(3), newU(6)},
},
{
"subset small bias",
[]*compositeUTXO{newU(1), newU(3), newU(5), newU(7), newU(8)},
[]*compositeUTXO{newU(5), newU(7)},
},
{
"ok nil",
[]*compositeUTXO{newU(1), newU(8)},
nil,
},
{
"two, over",
[]*compositeUTXO{newU(5), newU(7), newU(11)},
[]*compositeUTXO{newU(5), newU(7)},
},
{
"insufficient",
[]*compositeUTXO{newU(1), newU(8)},
nil,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
if got := subsetSmallBias(amt, tt.utxos); !reflect.DeepEqual(got, tt.want) {
t.Errorf("subset() = %v, want %v", got, tt.want)
}
})
}
}

func Test_leastOverFund(t *testing.T) {
amt := uint64(10e8)
newU := func(amt float64) *compositeUTXO {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

I guess the test cases for the dumber leastOverFund were too easy? We chatted a bit about replacing the algo and it sounded like the new one got better results ~1/2 the time. Would it require much larger sets to test a case where it's a better result?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The better results ~1/2 the time was with the dynamic programming solution which only worked with very small amounts of UTXOs. When I tested this one with large amounts of UTXOs and compared it with the old solution, it got better results every single time.

The two solutions (this vs small and large bias subset) are equivalent, but the old one tries 2 random possibilities and this one tries 1000.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests well. ~16ms for a set of 4000 UTXOs between 0 and 2 DCR in search of a subset totaling 10DCR

func Test_subsetFund(t *testing.T) {
	amt := uint64(10e8)
	s := make([]*compositeUTXO, 4000)
	newU := func(amt float64) *compositeUTXO {
		return &compositeUTXO{
			rpc: &walletjson.ListUnspentResult{Amount: amt},
		}
	}
	for i := range s {
		v := rand.Float64() + float64(rand.Int63n(2))
		v = math.Round(v*1e8) / 1e8
		fmt.Println(v)
		s[i] = newU(v)
	}
	sort.Slice(s, func(i, j int) bool {
		return s[i].rpc.Amount < s[i].rpc.Amount
	})
	r := subsetWithLeastSumGreaterThan(amt, s)
	fmt.Println(len(r), sumUTXOs(r))
}

Expand Down Expand Up @@ -190,22 +86,18 @@ func Test_leastOverFund(t *testing.T) {
}

func Fuzz_leastOverFund(f *testing.F) {
seeds := []struct {
type seed struct {
amt uint64
n int
}{{
amt: 200,
n: 2,
}, {
amt: 20,
n: 1,
}, {
amt: 20,
n: 20,
}, {
amt: 2,
n: 40,
}}
}

seeds := make([]seed, 0, 40)
for i := 0; i < 100; i++ {
seeds = append(seeds, seed{
amt: uint64(rand.Intn(40)),
n: rand.Intn(65000),
})
}

for _, seed := range seeds {
f.Add(seed.amt, seed.n)
Expand Down