Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client/dcr: Improved UTXO selection algorithm #2169

Merged
merged 3 commits into from Feb 28, 2023

Conversation

martonp
Copy link
Contributor

@martonp martonp commented Feb 24, 2023

This updates the algorithm used to find the UTXOs with the least over fund. It does 1000 random selections of UTXOs, and picks the most optimal one.

Inspired by this:
https://github.com/bitcoin/bitcoin/blob/3015e0bca6bc2cb8beb747873fdf7b80e74d679f/src/wallet.cpp#L1129

This updates the algorithm used to find the UTXOs with the least
over fund. It does 1000 random selections of UTXOs, and picks
the most optimal one.
})
}
}

func Test_leastOverFund(t *testing.T) {
amt := uint64(10e8)
newU := func(amt float64) *compositeUTXO {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

I guess the test cases for the dumber leastOverFund were too easy? We chatted a bit about replacing the algo and it sounded like the new one got better results ~1/2 the time. Would it require much larger sets to test a case where it's a better result?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The better results ~1/2 the time was with the dynamic programming solution which only worked with very small amounts of UTXOs. When I tested this one with large amounts of UTXOs and compared it with the old solution, it got better results every single time.

The two solutions (this vs small and large bias subset) are equivalent, but the old one tries 2 random possibilities and this one tries 1000.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests well. ~16ms for a set of 4000 UTXOs between 0 and 2 DCR in search of a subset totaling 10DCR

func Test_subsetFund(t *testing.T) {
	amt := uint64(10e8)
	s := make([]*compositeUTXO, 4000)
	newU := func(amt float64) *compositeUTXO {
		return &compositeUTXO{
			rpc: &walletjson.ListUnspentResult{Amount: amt},
		}
	}
	for i := range s {
		v := rand.Float64() + float64(rand.Int63n(2))
		v = math.Round(v*1e8) / 1e8
		fmt.Println(v)
		s[i] = newU(v)
	}
	sort.Slice(s, func(i, j int) bool {
		return s[i].rpc.Amount < s[i].rpc.Amount
	})
	r := subsetWithLeastSumGreaterThan(amt, s)
	fmt.Println(len(r), sumUTXOs(r))
}

client/asset/dcr/coin_selection.go Outdated Show resolved Hide resolved
client/asset/dcr/coin_selection.go Outdated Show resolved Hide resolved
client/asset/dcr/coin_selection.go Outdated Show resolved Hide resolved
Comment on lines 86 to 88
iterations := 1000
for nRep := 0; nRep < iterations; nRep++ {
included := make([]bool, len(utxos))
Copy link
Member

@chappjc chappjc Feb 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main bottleneck, causing ~2000 allocs per call. It may seem silly, but if we only allocate once and instead zero out the slice after each iteration, it screams. Copying memory is much faster than allocating memory.

diff --git a/client/asset/dcr/coin_selection.go b/client/asset/dcr/coin_selection.go
index af90c5a79..2be8a5dbc 100644
--- a/client/asset/dcr/coin_selection.go
+++ b/client/asset/dcr/coin_selection.go
@@ -80,21 +80,24 @@ func sumUTXOs(set []*compositeUTXO) (tot uint64) {
 // unused UTXOs until the total value is greater than or equal to amt.
 func subsetWithLeastSumGreaterThan(amt uint64, utxos []*compositeUTXO) []*compositeUTXO {
        best := uint64(1 << 62)
-       var bestIncluded *[]bool
+       var bestIncluded []bool
        bestNumIncluded := 0
 
-       iterations := 1000
+       rnd := rand.New(rand.NewSource(rand.Int63()))
+
+       included := make([]bool, len(utxos))
+
+       const iterations = 1000
        for nRep := 0; nRep < iterations; nRep++ {
-               included := make([]bool, len(utxos))
 
-               var found bool
                var nTotal uint64
                var numIncluded int
-               for nPass := 0; nPass < 2 && !found; nPass++ {
+       passes:
+               for nPass := 0; nPass < 2; nPass++ {
                        for i := 0; i < len(utxos); i++ {
                                var use bool
                                if nPass == 0 {
-                                       use = rand.Uint32()&1 == 1
+                                       use = rnd.Int63()&1 == 1
                                } else {
                                        use = !included[i]
                                }
@@ -105,15 +108,21 @@ func subsetWithLeastSumGreaterThan(amt uint64, utxos []*compositeUTXO) []*compos
                                        if nTotal >= amt {
                                                if nTotal < best || (nTotal == best && numIncluded < bestNumIncluded) {
                                                        best = nTotal
-                                                       bestIncluded = &included
+                                                       if bestIncluded == nil {
+                                                               bestIncluded = make([]bool, len(utxos))
+                                                       }
+                                                       copy(bestIncluded, included)
                                                        bestNumIncluded = numIncluded
-                                                       found = true
+                                                       break passes // next iter
                                                }
-                                               break
+                                               break // next pass
                                        }
                                }
                        }
                }
+               for i := range included {
+                       included[i] = false
+               }
        }
 
        if bestIncluded == nil {
@@ -121,7 +130,7 @@ func subsetWithLeastSumGreaterThan(amt uint64, utxos []*compositeUTXO) []*compos
        }
 
        set := make([]*compositeUTXO, 0, len(utxos))
-       for i, inc := range *bestIncluded {
+       for i, inc := range bestIncluded {
                if inc {
                        set = append(set, utxos[i])
                }

bench:

func Benchmark_subsetFund(b *testing.B) {
	amt := uint64(10e8)
	s := make([]*compositeUTXO, 4000)
	newU := func(amt float64) *compositeUTXO {
		return &compositeUTXO{
			rpc: &walletjson.ListUnspentResult{Amount: amt},
		}
	}

	for i := 0; i < b.N; i++ {
		b.StopTimer()

		for i := range s {
			v := rand.Float64() + float64(rand.Int63n(2))
			v = math.Round(v*1e8) / 1e8
			s[i] = newU(v)
		}
		sort.Slice(s, func(i, j int) bool {
			return s[i].rpc.Amount < s[i].rpc.Amount
		})

		b.StartTimer()

		leastOverFund(amt, s)
	}
}

before

Benchmark_subsetFund
Benchmark_subsetFund-32    	     727	   1637138 ns/op	 4152855 B/op	    2001 allocs/op

after

Benchmark_subsetFund
Benchmark_subsetFund-32    	    3734	    332022 ns/op	   46345 B/op	       4 allocs/op

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, looks better. I copied everything you had, except I only put one break passes in the outer if block.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except I only put one break passes in the outer if block.

That would seem to change the original method then. I was just eliminating the found bool, but I think it behaves differently now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem to match the cpp now however. Was just a deviation before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deviation was because I was considering that the slice was sorted in our case. In the cpp their array is not sorted and if they go over the required amount, they remove the element and keep trying. This actually works much better, but is a bit slower.

I've created a commit here that compares the results and shows the runtime of the shuffled one: cc28045
I think we should go for the slower but more accurate one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems strange to shuffle first, but OK, let's do that and follow the same approach of remove and keep trying if it goes over. This keep trying approach though would seem to make the best == amt break of the outer nRep loop more important though, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm yeah a best == amt break would definitely not hurt.

@chappjc chappjc added this to the 0.6 milestone Feb 24, 2023
rnd := rand.New(rand.NewSource(rand.Int63()))
included := make([]bool, len(utxos))
const iterations = 1000
for nRep := 0; nRep < iterations; nRep++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only other thing from the cpp that might make sense is to break nreps (this outermost loop) if we happen to hit nTotal == amt.

Comment on lines 94 to 101
for i := 0; i < len(utxos); i++ {
var use bool
if nPass == 0 {
use = rnd.Int63()&1 == 1
} else {
use = !included[i]
}
if use {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still bugs me that there's no ternary operator in Go. The arguments against it are so lame.
We could do this, but it's less clear and concise: if (nPass == 0 && rnd.Int63()&1 == 1) || (nPass > 0 && !included[i]) {.
We'll never get a ternary and probably not a context.Merge either.

Copy link
Member

@JoeGruffins JoeGruffins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I made a benchmark test that you can throw in the tests if it looks ok:

func BenchmarkLeastOverFund(b *testing.B) {
	// Same amounts every time.
	rnd := rand.New(rand.NewSource(1))
	utxos := make([]*compositeUTXO, 2_000)
	for i := range utxos {
		utxo := &compositeUTXO{
			rpc: &walletjson.ListUnspentResult{
				Amount: float64(1+rnd.Int31n(100)) / float64(1e8),
			},
		}
		utxos[i] = utxo
	}
	b.ResetTimer()
	for n := 0; n < b.N; n++ {
		leastOverFund(10_000, utxos)
	}
}

@JoeGruffins
Copy link
Member

This is probably fine, but I noticed while writing the above test, if you allow zero amounts in the utxo values, they make it into the final utxos. I guess we can assume there will be no zero amounts however.

@chappjc chappjc merged commit 7363fe8 into decred:master Feb 28, 2023
@chappjc chappjc added the bonds fidelity bonds label Mar 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bonds fidelity bonds
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants