Skip to content

Conversation

yancyribbens
Copy link
Contributor

@yancyribbens yancyribbens commented Sep 11, 2022

The SelectCoinsBnB algorithm computes the waste score while it's trying to find a match. Then, if it finds a match, it computes the waste score a second time here. To simplify the code (particularly since this is a complex section) its easier to digest and understand if it does one thing, to find a possible selection. I don't think having it calculate the waste score twice adds much and we can depend on the test harness to make sure the waste score is correct.

@yancyribbens yancyribbens force-pushed the remove-redundant-waste-calculation branch 3 times, most recently from eeb4b5e to d785893 Compare September 12, 2022 08:57
@glozow glozow requested a review from murchandamus September 12, 2022 13:32
@DrahtBot
Copy link
Contributor

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #25932 (refactor: Simplify backtrack logic by yancyribbens)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@S3RK
Copy link
Contributor

S3RK commented Sep 14, 2022

There are two uses for waste metric in coin selection:

  1. BnB algorithm is designed to optimise for the waste. This is why keeping track of the current and best waste is required to make algorithm work
  2. waste is used to compare solutions produced by different algorithms. This is why there is a generic waste calculation for the SelectionResult

Also, the two definitions of waste are coincidentally equal right now, but they don't have to. There is a possibility that we will use additional heuristics for 2) without changing 1).

bool backtrack = false;
if (curr_value + curr_available_value < selection_target || // Cannot possibly reach target with the amount remaining in the curr_available_value.
curr_value > selection_target + cost_of_change || // Selected value is out of range, go back and try other branch
(curr_waste > best_waste && (utxo_pool.at(0).fee - utxo_pool.at(0).long_term_fee) > 0)) { // Don't select things which we know will be more wasteful if the waste is increasing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By removing this criteria for backtracking you significantly expand the search space that will lead a) to longer execution time b) overall less less solutions found due to hard cap of TOTAL_TRIES.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't change the search space. With this extra condition removed, this code path is used instead. And in the case where curr_waste > best_waste using the code path L98, nothing different happens because of this condition. Since best_waste is better than curr_waste, nothing happens and in both branches, backtrack=true is set. Running the benchmarks will show there's no difference.

Copy link
Contributor Author

@yancyribbens yancyribbens Sep 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@S3RK The search space is not changed. The code is now simpler, however, the change could be pulled from this PR and added to a new PR since it's independent of other improvements.

Copy link
Contributor

@S3RK S3RK Sep 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that without limitation to the number of iterations you'd get the same result with your algorithm. But we actually have a hard limit of TOTAL_TRIES. That means if there are more combinations of coins than TOTAL_TRIES we're not going to explore all of them. This is why we want to exit as early as possible, in particular this condition allows to cut off branches where we didn't reach the target yet but the solution is already worse than our current best.

curr_waste += (curr_value - selection_target); // This is the excess value which is added to the waste for the below comparison

// This is the excess waste for the current iteration.
curr_waste = (curr_value - selection_target);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just the excess not the waste. Check line42 in this file or GetSelectionWaste for the waste formula.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@S3RK shouldn't the variable be called excess_val or something instead of curr_waste then? To me either would make sense although as the variable is named right now, it doesn't seem to be called excess.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, waste Contains excess but there are other components to it

@yancyribbens
Copy link
Contributor Author

@S3RK thanks for taking a look and your comment. I understand the idea to formalize a scoring system so we can have a metric that's algorithm independent. However, the boolean check done here is all that's needed to ensure we minimize our UTXO set when fees are expensive:

(utxo_pool.at(0).fee - utxo_pool.at(0).long_term_fee) > 0

Which as I pointed out in a previous PR is a more confusing way of saying:
utxo_pool.at(0).fee > utxo_pool.at(0).long_term_fee

I guess we could leave the code as is where it independently computes the waste, although as you can see by this PR, it's not actually needed. IE if you notice, all tests still pass even though it no longer tracks the "waste score". Instead I suppose we get rid of the other waste calulation.

Also, the two definitions of waste are coincidentally equal right now, but they don't have to. There is a possibility that we will use additional heuristics for 2) without changing 1).

I think what you're saying is we don't acctually need this assert then to check for equality?

@S3RK
Copy link
Contributor

S3RK commented Sep 14, 2022

However, the boolean check done here is all that's needed to ensure we minimize our UTXO set when fees are expensive:

I addressed the question about the Boolean check in the inline thread. Here however, I want to warn against mis-representing the BnB algo. BnB doesn't just minimize UTXO set when fees are high, it optimizes for the waste metric that consists of two parts 1) excess 2) opportunity cost of spending inputs at long_term_fee_rate

I guess we could leave the code as is where it independently computes the waste, although as you can see by this PR, it's not actually needed. IE if you notice, all tests still pass even though it no longer tracks the "waste score".

If I remember the tests correctly, then this rather shows deficiency in tests rather than the correctness of implemented approach. 1) There is only one test where we hit TOTAL_TRIES limit and in this test we just verify there is no solution found 2) There are no tests asserting waste that has both components excess and input opportunity costs

I'd recommend you contribute these tests first before modifying the algorithm.

Instead I suppose we get rid of the other waste calulation.

The other waste calculation is used to set waste score for the BnB SelectionResult for the purpose of comparing it to other solution. See use-case 2 from my previous comment.

I think what you're saying is we don't acctually need this assert then to check for equality?

While I'd say we don't really need this assert, others might disagree and there's no harm in it.

@achow101
Copy link
Member

NACK

@S3RK is correct and this change would be detrimental. Waste in BnB is used as an optimizing parameter - it allows us to prune branches when searching. Removing it means that we are searching more branches (which impacts the iteration limit), and the change which makes it only the excess means that we may be pruning branches that we were not pruning previously, in addition to not pruning many branches that we would have.

@achow101 achow101 closed this Sep 14, 2022
@murchandamus
Copy link
Contributor

Yeah, @S3RK and @achow101 are correct. In a high-feerate situation (i.e. feerate > long_term_feerate_estimate) adding more inputs will only increase the waste. So if a partial input set already has a higher waste than the current best solution we cannot find a better solution (one with a lower waste) by adding more inputs. Thus, if the waste of a partial input set is higher than the best waste, we can skip the entire subtree from the search space. I believe this backtrack clause to be a major speed-up, because otherwise evaluating all possible subsets of the cut subtree would take up to exponential work in the size of the subtree.

I think that you were onto this optimization though:
#26104

@bitcoin bitcoin locked and limited conversation to collaborators Sep 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants