Skip to content

plugin: Expand stuck HSM detection to cover onchaind signing types#724

Merged
cdecker merged 1 commit into
mainfrom
detect-stuck-bgsync-nodes
Jun 3, 2026
Merged

plugin: Expand stuck HSM detection to cover onchaind signing types#724
cdecker merged 1 commit into
mainfrom
detect-stuck-bgsync-nodes

Conversation

@cdecker
Copy link
Copy Markdown
Collaborator

@cdecker cdecker commented Jun 3, 2026

Summary

  • is_stuck() previously only checked for SIGN_COMMITMENT_TX (5) and CHECK_PUBKEY (28), missing all onchaind signing types. A bgsync node stuck waiting for e.g. SIGN_ANY_REMOTE_HTLC_TO_US (143) would spin silently for the full 10-minute session window without advancing its blockheight.
  • Adds a stuck_request_types() -> Vec<u16> method that returns the blocking type numbers; is_stuck() now delegates to it.
  • Expands the sticky type set to cover all onchaind signing operations: SIGN_DELAYED_PAYMENT_TO_US (12), SIGN_REMOTE_HTLC_TO_US (13), SIGN_PENALTY_TO_US (14), SIGN_REMOTE_HTLC_TX (20), SIGN_MUTUAL_CLOSE_TX (21), SIGN_ANY_DELAYED_PAYMENT_TO_US (142), SIGN_ANY_REMOTE_HTLC_TO_US (143), SIGN_ANY_PENALTY_TO_US (144), SIGN_ANY_LOCAL_HTLC_TX (146), SIGN_ANCHORSPEND (147), SIGN_HTLC_TX_MINGLE (149).

Context

Discovered by inspecting bgsync session logs for a node stuck 124k blocks behind chain tip. The node had a unilateral close on one of its channels; onchaind fired when block 828110 was added and queued a SIGN_ANY_REMOTE_HTLC_TO_US (143) request to claim an expired HTLC timeout output. The signerproxy could not fulfil it, so the node blocked for ~9 minutes until the 10-minute preemption kicked in — making zero block progress per session.

Test plan

  • Verify is_stuck() still returns false when no requests are queued
  • Verify stuck_request_types() returns the correct type numbers when onchaind signing requests are pending
  • Confirm bgsync sessions for nodes with open/closed channels terminate early when the signer is unavailable

🤖 Generated with Claude Code

`is_stuck()` previously only checked for SIGN_COMMITMENT_TX (5) and
CHECK_PUBKEY (28), missing all onchaind signing types. A node stuck on
SIGN_ANY_REMOTE_HTLC_TO_US (143) would spin for the full 10-minute
bgsync session without making progress.

Add a `stuck_request_types()` method returning the blocking type
numbers, and expand the set to include all onchaind signing types:
SIGN_DELAYED_PAYMENT_TO_US (12), SIGN_REMOTE_HTLC_TO_US (13),
SIGN_PENALTY_TO_US (14), SIGN_REMOTE_HTLC_TX (20),
SIGN_MUTUAL_CLOSE_TX (21), SIGN_ANY_DELAYED_PAYMENT_TO_US (142),
SIGN_ANY_REMOTE_HTLC_TO_US (143), SIGN_ANY_PENALTY_TO_US (144),
SIGN_ANY_LOCAL_HTLC_TX (146), SIGN_ANCHORSPEND (147),
SIGN_HTLC_TX_MINGLE (149).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@cdecker cdecker merged commit 4c699c1 into main Jun 3, 2026
16 checks passed
@cdecker cdecker deleted the detect-stuck-bgsync-nodes branch June 3, 2026 21:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant