Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replaceable txs fee bumping #2113

Merged
merged 24 commits into from
Jan 18, 2022
Merged

Replaceable txs fee bumping #2113

merged 24 commits into from
Jan 18, 2022

Conversation

t-bast
Copy link
Member

@t-bast t-bast commented Dec 21, 2021

Alright, a word of advice before we start: this PR is not for the faint of heart.
If you're thinking "I have 30 minutes to spare, let's give this a quick review", you're going to be disappointed.
This is the kind of PR that will keep you awake at night, wondering why you even got into bitcoin in the first place (maybe it wasn't that bad doing boring API plumbing work in a SaaS company after all!).

But if you're one of the brave few who like a challenge, clear up your agenda for the next few days, grab a lot of coffee, get into your cozy don't-disturb-me-until-I'm-done clothes, and read on. 😎 ⚡

huge pr

In this PR, we introduce a deadline to replaceable transactions, and do the best we can to get these transactions confirmed before we reach that deadline.

For htlc transactions, that deadline is the htlc expiry:

  • if you have the preimage, you must absolutely get your transaction confirmed before your peer can use the timeout path, otherwise you may lose funds
  • if you're using the timeout path, you must get your transaction confirmed as soon as possible, to get your money back even if your peer reveals the preimage late

For commitment transactions, there are two cases:

  • there are no htlc outputs you can claim: in that case there's no reason to rush, no funds are at risk, you can wait for a few blocks to see your commit tx confirmed
  • there are htlc outputs you can claim: in that case the commit tx must absolutely be confirmed before the closest htlc expiry, to ensure that you can then claim those htlcs

You may be a bit lost in all the actors in the publish package, so here is a rough explanation of why the existing layers are important. We need to evaluate the feerate based on the deadline and the current block height only once we're ready to broadcast the transaction: this means it has to be inside the ReplaceableTxPublisher actor, once all the timelocks and preconditions have been fulfilled. But then why do we also need a retry mechanism in the TxPublisher actor? There are a few reasons for that. The first reason is to handle broadcast failures: if our transaction is evicted from the mempool (e.g. replaced by an external transaction), this bubbles up to the TxPublisher, who will then choose whether we should retry or not (depending on whether the external conflicting transaction gets confirmed or not). The second reason is to handle node restarts: we then lose the intermediate publishing state, but we may have put some transactions in the mempool. The TxPublisher will ensure we try different feerates as we get closer to the deadline, and eventually replace the mempool transaction if its feerate was too low. A third reason is that deadlines can change: we may initially have no htlcs to claim from a commit tx, but then we later receive a preimage, and realize that we now need to make the commit tx confirm more quickly than we thought. This request will go through the TxPublisher (which is the entry point for channel requests), which can create a new ReplaceableTxPublisher to replace the pending one.

Each commit is self-contained a has a detailed commit message. I recommend reviewing commit-by-commit to see each step taken towards the end goal. However, after that it may make sense to review everything at once, to zoom out and ensure that this doesn't have any fundamental design flaw.

I thought about splitting this PR into multiple PRs, but the issue is that using only parts of it is less safe than the current master branch: you really need to have the fee-bumping mechanism (that comes in the last commit) to make what the previous commits do safe. Since we know of a few node operators running master on mainnet, I think it's better to merge everything at once.

The things we do in the ReplaceableTxFunder actor to reach the right feerate are very ugly, but bitcoind doesn't give us the tools yet to make it better (because we have inputs that are external to the bitcoind wallet). Please help me lobby for bitcoin/bitcoin#23201 (and a follow-up PR to add the same arguments to the bumpfee RPCs) to allow us to remove those horrible hacks!

@codecov-commenter
Copy link

codecov-commenter commented Dec 21, 2021

Codecov Report

Merging #2113 (25fa5e6) into master (27579a5) will decrease coverage by 0.16%.
The diff coverage is 84.27%.

❗ Current head 25fa5e6 differs from pull request most recent head 6d1b34e. Consider uploading reports for the commit 6d1b34e to get more accurate results

@@            Coverage Diff             @@
##           master    #2113      +/-   ##
==========================================
- Coverage   87.62%   87.46%   -0.17%     
==========================================
  Files         166      169       +3     
  Lines       12886    13139     +253     
  Branches      543      559      +16     
==========================================
+ Hits        11292    11492     +200     
- Misses       1594     1647      +53     
Impacted Files Coverage Δ
.../fr/acinq/eclair/blockchain/fee/FeeEstimator.scala 100.00% <ø> (ø)
...in/scala/fr/acinq/eclair/channel/ChannelData.scala 100.00% <ø> (ø)
...n/scala/fr/acinq/eclair/json/JsonSerializers.scala 87.87% <ø> (ø)
...e/src/main/scala/fr/acinq/eclair/BlockHeight.scala 60.00% <60.00%> (ø)
...ir/channel/publish/ReplaceableTxPrePublisher.scala 66.66% <66.66%> (ø)
...cinq/eclair/channel/publish/FinalTxPublisher.scala 84.21% <75.00%> (+1.71%) ⬆️
...clair/channel/publish/ReplaceableTxPublisher.scala 81.57% <81.31%> (-3.38%) ⬇️
...q/eclair/channel/publish/ReplaceableTxFunder.scala 86.91% <86.91%> (ø)
...cinq/eclair/channel/publish/MempoolTxMonitor.scala 88.88% <88.88%> (+6.61%) ⬆️
...la/fr/acinq/eclair/transactions/Transactions.scala 96.80% <93.75%> (+0.25%) ⬆️
... and 35 more

When a channel is force-closed, we tell the `TxPublisher` the block before
which the transaction should be confirmed. Then it will be up to the
`TxPublisher` to set the feerate appropriately and fee-bump whenever
it becomes necessary.
We previously evaluated the feerate when we issued the command to publish
a transaction. However, the transaction may actually be published many
blocks later (especially for HTLC-timeout transactions, for which we may
issue commands long before the timeout has been reached).

We change the command to include the deadline, and evaluate the
feerate to meet that deadline when we're ready to broadcast a valid
transaction.
We should retry replaceable transactions whenever we find a conflict in
the mempool: as we get closer to the deadline, we will try higher fees.

This ensures we either succeed in replacing the mempool transaction, or
it gets confirmed before the deadline, which is good as well.

In particular, this handles the case where we restart our node. The mempool
conflict can be a transaction that we submitted ourselves, but we've
lost the context in the restart. We want to keep increasing the fees of
that transaction as we get closer to the deadline, so we need this retry
mechanism.
This commit contains very minimal logic changes: it mostly moves code
from the ReplaceableTxPublisher to two new actors (the first one to check
preconditions, the second one to encapsulate the logic for funding and
signing replaceable transactions).

This makes it easier to understand the state machine involved in each of
these steps, and will make it easier to introduce fee-bumping by reusing
these actors with minimal changes.
@t-bast t-bast marked this pull request as ready for review January 7, 2022 16:48
@t-bast t-bast requested a review from pm47 January 7, 2022 16:48
Once we've published a first version of a replaceable transaction, at every
new block we evaluate whether we should RBF it.

When we choose to RBF, we try to reuse the previous wallet inputs and
simply lower the change amount. When that's not possible, we ask bitcoind
to add more wallet inputs.
@t-bast
Copy link
Member Author

t-bast commented Jan 10, 2022

I have made additional E2E tests on regtest, with pending htlc-success and htlc-timeout. I have run a modified version of eclair where I increased the feerate every minute (using the CurrentFeerates event and a fake CurrentBlockCount event, without creating new blocks so that transactions stay in the mempool).

The transactions are correctly regularly fee-bumped, the state machine seems to correctly handle both cases:

  • the previous tx is replaced by the fee-bumped one
  • the fee-bumped one fails to replace the previous tx (fee increment too low)

It is working ok, nothing strange is happening. When we run out of utxos, funding simply fails and sends event to the node operator (at that point eclair cannot do anything until the bitcoind wallet receives new utxos to use for fee-bumping).

@pm47
Copy link
Member

pm47 commented Jan 10, 2022

commando

Copy link
Member

@pm47 pm47 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review for 3b1537e.

* Add a BlockHeight type
* Rename deadline to confirmation target
* Add comments on HtlcTx type inconsistency
Copy link
Member

@pm47 pm47 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review for 9bfdd90.

* Remove toString on BlockHeight
* Add confirmBefore to ReplaceableTransactionWithInputInfo
* Add new codecs to add confirmation target to txs
Copy link
Member

@pm47 pm47 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review for cc65aee, not much to say.

Copy link
Member

@pm47 pm47 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass on d1209ad.

This avoids passing around fields that are unchanged and repeating a long
list of arguments in each behavior.
* Protect against herd effect on new blocks
* MempoolTxMonitor broadcasts tx status at every block
We only need a Stop message at the `FinalTxPublisher` and
`ReplaceableTxPublisher` level, their child actors will be automatically
stopped whenever these parent actors stop themselves.
Instead of letting the TxPublisher create a new, competing replaceable
attempt, we can simply tell the existing one to update its confirmation
target.
Copy link
Member

@pm47 pm47 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Phew, that was something... but the end result is worth the effort!

Props for having gone the extra mile to facilitate the work of the reviewer. 👍

@t-bast
Copy link
Member Author

t-bast commented Jan 18, 2022

reviewers-mvp

@t-bast t-bast merged commit 40f7ff4 into master Jan 18, 2022
@t-bast t-bast deleted the tx-publisher-deadline branch January 18, 2022 17:50
@pm47
Copy link
Member

pm47 commented Jan 18, 2022

Me after a week-long deep dive

image

@t-bast t-bast mentioned this pull request Jan 19, 2022
t-bast added a commit that referenced this pull request Jan 19, 2022
This feature is now ready to be widely deployed.
The transaction format has been tested for a long time
(between Phoenix iOS and our node) and automatic
fee-bumping has been implemented in #2113.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants