Skip to content

htlcswitch: remove the bucket batchReplayBucket #9929

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 20, 2025

Conversation

yyforyongyu
Copy link
Member

This PR simplifies the htlc reforwarding logic by dropping the batch replay bucket.

Context

When an htlc is received, we call processRemoteAdds, which invokes DecodeHopIterators to decode the onion.

lnd/htlcswitch/link.go

Lines 3771 to 3773 in 92a5d35

decodeResps, sphinxErr := l.cfg.DecodeHopIterators(
fwdPkg.ID(), decodeReqs,
)

We then create a Tx which has a memory batch in BeginTxn, and process it in ProcessOnionPacket,

tx := p.router.BeginTxn(id, batchSize)

err = tx.ProcessOnionPacket(

This ProcessOnionPacket then call this b.Put which saves the htlc:

  • if IsCommitted is true, error out.
  • if this is a new htlc, save its payment hash to map b.replayCache and, use its seqNum as key and batchEntry as the value to map b.entries. The batchEntry has the fields hashPrefix and cltv.
  • if this is not a new htlc, add the seqNum to its b.ReplaySet, which is a map.

We then proceed to the tx.Commit, which calls PutBatch under the hood,

packets, replays, err := tx.Commit()

It will also be noop if IsCommitted is true.

Now, in this PutBatch,

  • we start with an empty ReplaySet
  • query the batchReplayBucket, if b.ID is found, which is the fwdPkg.ID(), we return the saved ReplaySet on the disk - most of the time it's an empty map. This is also the mechanism to skip checking replays when reforwarding htlcs during restart.
  • otherwise we iterate this memory batch via b.ForEach. We will iterate the map b.entries there,
    • we now query the sharedHashes bucket, if we find an item there, we add this entry's seqNum to ReplaySet. Note that when the item is found in sharedHashes bucket, it means it's a replay.
    • if we cannot find it in the shareHashes bucket, we will save the item's hashPrefix and cltv to disk.
  • we then merge the ReplaySet with the batch's internal replay set, b.ReplaySet.
  • finally we would save this ReplaySet on disk in batchReplayBucket.

The actual replay attack protection comes in the last step in ProcessOnionPacket, where we would check whether any of the decoded htlcs is found in the ReplaySet.

if replays.Contains(uint16(i)) {
log.Errorf("unable to process onion packet: %v",
sphinx.ErrReplayedPacket)

The Fix

We now drop this table and explicitly signal reforwarding during startup to skip the replay check. This means,

  • no more extra data is saved on disk, and,
  • no new GC mechanism needed.

In combination with #9906 we can sunset this bucket completely.

@yyforyongyu yyforyongyu added database Related to the database/storage of LND htlcswitch bug fix size/micro small bug fix or feature, less than 15 mins of review, less than 250 labels Jun 11, 2025
Copy link
Contributor

coderabbitai bot commented Jun 11, 2025

Important

Review skipped

Auto reviews are limited to specific labels.

🏷️ Labels to auto review (1)
  • llm-review

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

// failure error code. We infer that the offending error was due
// to a replayed packet because this index was found in the
// replay set.
if !reforward && replays.Contains(uint16(i)) {
Copy link
Collaborator

@ziggie1984 ziggie1984 Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but what if is a reforward and was a replay in the first run, we would now pass it through and forward it ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i thought about it too, unless the peer can crash all the nodes down the path I don't think the attack can succeed tho. The first time a replayed htlc is received, the node will fail back and wait for the remote's revoke_and_ack. Unless the node crashes and restarts to reload the htlc pkg, the replayed htlc won't be forwarded. Even if this edge case happens, the next hop will fail it, so I think it's extremely unlikely? btw lnd_sweep_test.go is a good place to observe the overall flow.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not need to crash know, the link only has to flap and then we are in that situation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is called when a Link starts:

lnd/htlcswitch/link.go

Lines 1327 to 1328 in 35102e7

if l.ShortChanID() != hop.Source {
err := l.resolveFwdPkgs(ctx)
which means if a link flaps this is called again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explained in this comment

Copy link
Collaborator

@ziggie1984 ziggie1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like an acceptable way forward however we are accepting a case where we might flap on the incoming link and therefore would reforward a potential "replayed" HTLC (which very likely be detected by the next node as a replay if it is LND). It might be very unlikely that it happens so I am ok with going with this approach. The proper solution would be to add a new Relayed-Fwd-Filter into the forwarding logic and query it when the above happens.

// failure error code. We infer that the offending error was due
// to a replayed packet because this index was found in the
// replay set.
if !reforward && replays.Contains(uint16(i)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not need to crash know, the link only has to flap and then we are in that situation.

}
}

// If the fwdPkg has already been processed, it means we are
// reforwarding the packets again, which happens only on a restart.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On restart of the channel not on restart of the LND server, or do you mean the same ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restart of lnd, tho link restart is a superset?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but what I wanted to say is, flaping of a link is way more likely than crashing a node, so just want to make sure we take this into account when talking about the edge case. But then can you say only on restart of the link ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if it's just the flapping of a link, it means it will go through the full cycle of start/stop, which has all the necessary data persisted. If a replay has been detected before, we will call sendMalformedHTLCError to store it in updateLogs, and update the commitment via AppendRemoteCommitChain, which will remove the Adds. Then, if the link restarts, it will restore the state, but the fwdPkg will have empty logs, as shown in the experiment here yyforyongyu#47. So as long as the link is going through normal restarts, we are protected. The only possible way is that somehow the remote node can crash our node, causing us to skip AppendRemoteCommitChain before the link shuts down, which is a different level of assumption I think, also don't think it's feasible.

Copy link
Collaborator

@ziggie1984 ziggie1984 Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a time period between processing the incoming ADD and updating the Channel via AppendRemoteCommitChain which is called within the ChannelLink object where we might run in the scenario described above.

Scenario:

  1. Incoming replayed ADD comes in on the incoming, we process it (DecodeHopInterator etc.)
  2. We figure out that it is a Replay
  3. Call sendMalformedHTLCError
  4. Before we can call AppendRemoteCommitChain (batchtimer hits => default 50ms) => Link flaps
  5. MalFormed msg never persited on the State => meaning we will forward the Incoming Replayed ADD the next time we start-up the link.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be a good argument if we assume the user will use a different batch commit interval as 50ms is a bit too short. That being said I think we do need to perform a commitment update, which is missing from this method when we fail some of the ADDs in processRemoteAdds.

The replay check also seems to be performed a bit late - we should instead perform this check when we receive an update_add_htlc, not revoke_and_ack. I'll see how easy it is to change that.

Copy link
Collaborator

@ziggie1984 ziggie1984 Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep it as is, I would propose that we add a comment that we are aware of this edge case and are willing to accept it (see my other comment #9929 (comment))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I don't think it relies on the batch ticker as there will be an immediate commitment update after calling processRemoteAdds,

lnd/htlcswitch/link.go

Lines 2541 to 2568 in 35102e7

if l.quiescer.CanSendUpdates() {
l.processRemoteAdds(fwdPkg)
} else {
l.quiescer.OnResume(func() {
l.processRemoteAdds(fwdPkg)
})
}
l.processRemoteSettleFails(fwdPkg)
// If the link failed during processing the adds, we must
// return to ensure we won't attempted to update the state
// further.
if l.failed {
return
}
// The revocation window opened up. If there are pending local
// updates, try to update the commit tx. Pending updates could
// already have been present because of a previously failed
// update to the commit tx or freshly added in by
// processRemoteAdds. Also in case there are no local updates,
// but there are still remote updates that are not in the remote
// commit tx yet, send out an update.
if l.channel.OweCommitment() {
if !l.updateCommitTxOrFail(ctx) {
return
}
}

Think I will keep this PR as is and create a new one to address that we need to check replay when receiving update_add_htlc, maybe fix it in the dyn series.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great analysis !

@yyforyongyu yyforyongyu force-pushed the fix-decalylog branch 2 times, most recently from 2ce3831 to cd880a8 Compare June 11, 2025 22:30
@yyforyongyu
Copy link
Member Author

It seems like an acceptable way forward however we are accepting a case where we might flap on the incoming link and therefore would reforward a potential "replayed" HTLC (which very likely be detected by the next node as a replay if it is LND).

I think it's better than keeping the batch delay db then adding another key to bloat the disk. Also can you think of a case where a replay can happen without crashing the node?

}
}

// If the fwdPkg has already been processed, it means we are
// reforwarding the packets again, which happens only on a restart.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but what I wanted to say is, flaping of a link is way more likely than crashing a node, so just want to make sure we take this into account when talking about the edge case. But then can you say only on restart of the link ?

DecodeHopIterators func([]byte, []hop.DecodeHopIteratorRequest) (
// are always presented for the same identifier. The last boolean is
// used to decide whether this is a reforwarding or not - when it's
// reforwarding, we skip the replay check enforced in our decaly log.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we should mention the edge case which we talked about, that in a very unlikely scenario an ADD can be reforwared and if it is a replay it will be forwarded to the next node.

Comment on lines 318 to 320
// PutBatch accepts a pending batch of hashed secret entries to write to disk.
// Each hashed secret is inserted with a corresponding time value, dictating
// when the entry will be evicted from the log.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to properly cleanup also the underlying replay set. No batch is written here anymore, also the tx.Commit() call does not commit anything anymore. So I think we should clean that up with either this PR or a followup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline and @ziggie1984 will create a follow-up PR to clean this.

Copy link
Collaborator

@ziggie1984 ziggie1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@saubyk saubyk added this to the v0.19.2 milestone Jun 12, 2025
// resulting and return it to ensure calls to put batch are
// idempotent.
replayBytes := batchReplayBkt.Get(b.ID)
if replayBytes != nil {
Copy link
Collaborator

@ziggie1984 ziggie1984 Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to point out that we will for reforward have more db roundtrips compared to the previous implementation if we have more than 1 incoming ADD in the fwd package.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be fixed in a follow up PR, I think we can skip the whole fetching of shared_secrets if it is a replay because we will mark them as replayed anyways in the refwd case,

so should probably be something link tx.Commit(reforward) => return all replays.

DecodeHopIterators func([]byte, []hop.DecodeHopIteratorRequest) (
// are always presented for the same identifier. The last boolean is
// used to decide whether this is a reforwarding or not - when it's
// reforwarding, we skip the replay check enforced in our decaly log.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decaly -> decay

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -3632,6 +3632,13 @@ func (l *channelLink) processRemoteSettleFails(fwdPkg *channeldb.FwdPkg) {
return
}

// Exit early if the fwdPkg is already processed.
if fwdPkg.State == channeldb.FwdStateCompleted {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. This is fine as a short circuit, as if the state is channeldb.FwdStateCompleted, then the check to fwdPkg.SettleFailFilter.Contains(uint16(i)) in the inner loop will always return true. Same for when we go to process adds.

reforward := fwdPkg.State != channeldb.FwdStateLockedIn
// If the fwdPkg has already been processed, it means we are
// reforwarding the packets again, which happens only on a restart.
reforward := fwdPkg.State == channeldb.FwdStateProcessed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just defined in this commit, but the useage in teh call site below was already present.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually introduced in a previous commit, where we added reforward := fwdPkg.State != channeldb.FwdStateLockedIn, and since FwdStateCompleted is skipped, we can now only check fwdPkg.State == channeldb.FwdStateProcessed.

@@ -138,11 +132,6 @@ func (d *DecayedLog) initBuckets() error {
return ErrDecayedLogInit
}

_, err = tx.CreateTopLevelBucket(batchReplayBucket)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we modify this to instead always try to delete the bucket on start up? Or will we reserve that for a distinct PR that handles the optional migration?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct - the migration is done in #9945.

replayBytes := batchReplayBkt.Get(b.ID)
if replayBytes != nil {
replays = sphinx.NewReplaySet()
return replays.Decode(bytes.NewReader(replayBytes))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we return replays at all, if we no longer decode the bytes from disk?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nvm, the sharedHashes bucket is just the primary source of truth.

Is there any situation wherein by only consulting sharedHashes we'd let through a replay that was already captured in the batchReplayBkt bucket?

IIRC historically, all onion replay processing was serial. We then introduced the batch replay bucket and processing to speed things up. Compared to the shared hashes, the batch replay bucket uses an ID derived from the source channel ID and the height of the outbound commitment that locked in those changes.

In retrospect, it's clear now that we were duplicating data across both sources, as replaying an HTLC is defined as using the very same shared secret, and assuming the same sender, that forces re-use of the payment hash as it's included in the associated data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any situation wherein by only consulting sharedHashes we'd let through a replay that was already captured in the batchReplayBkt bucket?

Don't think so. As for a replay to happen it must use a new commitment height, which means it will create a new key to query batchReplayBkt, then find nothing, and continue to query sharedHashes. So in the end it's the sharedHashes as the source of truth. This batchReplayBkt is only helpful for a very specific case - the node receives replays, but before it persists its forwarding pkg, it goes offline. When it comes online again, it will reload the forwarding logs, then query batchReplayBkt instead of the sharedHashes to find the replays and fail the HTLCs back.

In retrospect, it's clear now that we were duplicating data across both sources, as replaying an HTLC is defined as using the very same shared secret, and assuming the same sender, that forces re-use of the payment hash as it's included in the associated data.

Totally - think eventually we can move the replay attack protection to be handled at the forwarding logs instead. Also given sharedHashes are GCed based on the CLTV values, which also means the replays are allowed to happen after CLTV blocks. So as long as we are within CLTV blocks we are protected.

We now rely on the forwarding package's state to decide whether a given
packet is a reforwarding or not. If we know it's a reforwarding packet,
there's no need to check for replays in the `sharedHashes` bucket, which
behaves the same as if we are querying the `batchReplayBkt`.
This commit removes the `batchReplayBkt` as its only effect is to allow
reforwarding htlcs during startup.

Normally for every incoming htlc added, their shared secret is used as
the key to be saved into the `sharedHashBucket`, which will be used for
check for replays. In addition, the fwdPkg's ID, which is SCID+height is
also saved to the bucket `batchReplayBkt`. Since replays of HTLCs cannot
happen at the same commitment height, when a replay happens,
`batchReplayBkt` simply doesn't have this info, and we again rely on
`sharedHashBucket` to detect it. This means most of the time the
`batchReplayBkt` is a list of SCID+height with empty values.

The `batchReplayBkt` was previously used as a mechanism to check for
reforwardings during startup - when reforwarding htlcs, it quries this
bucket and finds an empty map, knowing this is a forwarding and skips
the check in `sharedHashBucket`. Given now we use a bool flag to
explicitly skip the replay check, this bucket is no longer useful.
We now move the check earlier in the loop so we don't even need to
decode the hop for already processed ADDs.
Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🦩

@Roasbeef Roasbeef merged commit 1756f40 into lightningnetwork:master Jun 20, 2025
37 of 38 checks passed
@yyforyongyu yyforyongyu deleted the fix-decalylog branch June 21, 2025 06:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fix database Related to the database/storage of LND htlcswitch size/micro small bug fix or feature, less than 15 mins of review, less than 250
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants