Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blockheight bug #6434

Merged
merged 2 commits into from Jul 25, 2023
Merged

Conversation

niftynei
Copy link
Collaborator

At cold start, if your node is behind the blocktip and you've sent your peer a blockheight counter from the future, we shouldn't confuse ourselves with our rollback/replay.

Should fix flakes in CI that were spotting BROKEN blockheight updates.

Also includes small patch to use the same GRPC paths for the rm cleanup.

Copy link
Contributor

@rustyrussell rustyrussell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack bb2c900

Makefile Outdated
find contrib/pyln-grpc-proto/pyln/ -type f -name "*.py" -print0 | xargs -0 sed -i'.bak' -e 's/^import \(.*\)_pb2 as .*__pb2/from pyln.grpc import \1_pb2 as \1__pb2/g'
rm -f contrib/pyln-grpc-proto/pyln/*.py.bak
find $(GRPC_PATH)/ -type f -name "*.py" -print0 | xargs -0 sed -i'.bak' -e 's/^import \(.*\)_pb2 as .*__pb2/from pyln.grpc import \1_pb2 as \1__pb2/g'
rm -f $(GRPC_PATH)/*.py.bak
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this last line is the real fix, since the find kinda worked, but the rm didn't?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no, I missed that the find uses a different path (up one level)... and there's even a comment, oops.

seems like the real fix would be to have rm do a recursive hunt also? ok let me try again here. rip.

At cold start, if your node is behind the blocktip and you've sent your
peer a blockheight counter from the future, we shouldn't confuse ourselves
with our rollback/replay.

Should fix flakes in CI that were spotting BROKEN blockheight updates.
Logs below from a previuos CI fail (edited for relative clarity)

The one that sasy "{ SENT_ADD_ACK_REVOCATION:111 }, our current 108` is
the tell; the last line is the node finally catching up to the tip.

In the test we get into this state by stopping and restarting the node.

```
2023-07-22T11:24:28.2754533Z lightningd-1 2023-07-22T11:19:34.188Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#2: Already have funding locked in
2023-07-22T11:24:28.2755486Z lightningd-1 2023-07-22T11:19:34.188Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-chan#2: attempting update blockheight a5b23dff5177badd6df725c
efeb83ceccbfc52dc64a16b38894a41f0ad8fa181
2023-07-22T11:24:28.2755778Z lightningd-1 2023-07-22T11:19:34.188Z DEBUG   lightningd: update_blockheight: height = 108
2023-07-22T11:24:28.2766210Z lightningd-1 2023-07-22T11:19:34.210Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#2: init LOCAL: remote_per_commit = 029563e7c898
5d8b95bdfe19e47e494bb8ec8d53ff4edb93f156be57667bfee8c9, old_remote_per_commit = 02bf3117c149d324361f0b418db8984b1e29af70c773eb2865a41ff7f583c7c9ed next_idx_local = 3 next_idx_remote = 3 revocations_recei
ved = 2 feerates { SENT_ADD_ACK_REVOCATION:3750 } range 253-150000 blockheights { SENT_ADD_ACK_REVOCATION:111 }, our current 108
2023-07-22T11:24:28.2768866Z lightningd-1 2023-07-22T11:19:34.211Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#2: peer_out WIRE_CHANNEL_REESTABLISH
2023-07-22T11:24:28.2769416Z lightningd-1 2023-07-22T11:19:34.211Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#2: billboard: Sent reestablish, waiting for the
irs
2023-07-22T11:24:28.2771115Z lightningd-1 2023-07-22T11:19:34.212Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#2: peer_in WIRE_CHANNEL_REESTABLISH
2023-07-22T11:24:28.2774150Z lightningd-1 2023-07-22T11:19:34.212Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#2: Got reestablish commit=3 revoke=2
2023-07-22T11:24:28.2776056Z lightningd-1 2023-07-22T11:19:34.212Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#2: next_revocation_number = 2
2023-07-22T11:24:28.2805639Z lightningd-1 2023-07-22T11:19:34.239Z **BROKEN** 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#2: current blockheight 108 less than last 111
2023-07-22T11:24:28.2823960Z lightningd-1 2023-07-22T11:19:34.240Z DEBUG   lightningd: Adding block 109: 5f67b6e110eb3c3457bea4fcf0d04ce9be90efeee5df8e083ed4266074ca911f
2023-07-22T11:24:28.2833154Z lightningd-1 2023-07-22T11:19:34.251Z **BROKEN** 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#2: current blockheight 108 less than last 111
2023-07-22T11:24:28.2833630Z lightningd-1 2023-07-22T11:19:34.252Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#2: Trying commit
2023-07-22T11:24:28.2834165Z lightningd-1 2023-07-22T11:19:34.252Z **BROKEN** 022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#2: current blockheight 108 less than last 111
2023-07-22T11:24:28.2835070Z lightningd-1 2023-07-22T11:19:34.252Z DEBUG   022d223620a359a47ff7f7ac447c85c46c923da53389221a0054c11c1e3ca31d59-channeld-chan#2: Can't send commit: nothing to send, feechange not wanted ({ SENT_ADD_ACK_REVOCATION:3750 }) blockheight not wanted ({ SENT_ADD_ACK_REVOCATION:111 })
2023-07-22T11:24:28.2835516Z lightningd-1 2023-07-22T11:19:34.350Z DEBUG   lightningd: Adding block 110: 5f43f3ac9d808e3a309720d1b0727a00d5a3d3ddca71d97401e233637e87639c
2023-07-22T11:24:28.2835962Z lightningd-1 2023-07-22T11:19:34.476Z DEBUG   lightningd: Adding block 111: 55b0d1e0a08ff6233e186e6735cb1cbec33e2b0a6e7d08f2622e8c1db30b54b9
```
Make this a bit less difficult to see what's going on; use common
variables for the paths etc.

Also make the final `rm` step recursive, same as the above find step
which patches the files.

Files are in `contrib/pyln-grpc-proto/pyln/grpc`, but the upper level
find command acts at a level above.
@niftynei
Copy link
Collaborator Author

Changes:

  • improved the commit message for the commit about the blockheight breaks
  • Updated the Makefile patch to use find which can do a recursive delete instead of rm glob matches, as to match the find that's creating the errant files in the first place (should be more robust if anything changes).

@rustyrussell rustyrussell merged commit f5bac83 into ElementsProject:master Jul 25, 2023
38 checks passed
@rustyrussell rustyrussell added this to the v23.08 milestone Jul 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants