Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix gossip code when channels are marked dying #7057

Merged
merged 12 commits into from Feb 12, 2024

Conversation

rustyrussell
Copy link
Contributor

@rustyrussell rustyrussell commented Feb 9, 2024

The "bad gossip" flakes were caused by the fact that:

  1. When we see a channel close, we simply mark the channel_announce dying which means we don't propagate it.
  2. We did not maintain this dying bit in various cases:
    1. We did not set it on the node_announcement once all channels are dying.
    2. Not unset it on the node_announcement if we see another channel confirmed.
    3. Nor did we set it if a new channel_update came in on a dying channel (we still accept updates!)
    4. Nor did we preserve it in node_announcements if we have to move it when all preceding channels are eliminated.
    5. We did not set it on fresh node_announcements if all channels were dying.
    6. We did not ignore dying channel_announcements when figuring out if we had to move the node_announcement.

I added a dev sanity check which helped test all of these!

@rustyrussell rustyrussell added gossip flake Known CI flakes labels Feb 9, 2024
@rustyrussell rustyrussell added this to the v24.02 milestone Feb 9, 2024
@rustyrussell rustyrussell force-pushed the guilt/fix-dead-gossip branch 2 times, most recently from 1ec6143 to 6ca04bb Compare February 9, 2024 04:56
@cdecker cdecker force-pushed the guilt/fix-dead-gossip branch 2 times, most recently from 9f49ef6 to 8b10f13 Compare February 9, 2024 13:50
@cdecker
Copy link
Member

cdecker commented Feb 9, 2024

Looks like there are still some bad gossip flakes, as well as a new valgrind error:

==39528== Invalid read of size 8
==39528==    at 0x127C4C: gossmap_find_node (gossmap.c:265)
==39528==    by 0x116E6A: gossmap_manage_handle_get_txout_reply (gossmap_manage.c:643)
==39528==    by 0x113298: recv_req (gossipd.c:589)
==39528==    by 0x123E3E: handle_read (daemon_conn.c:35)
==39528==    by 0x285A73: next_plan (io.c:59)
==39528==    by 0x2866A8: do_plan (io.c:407)
==39528==    by 0x2866EA: io_ready (io.c:417)
==39528==    by 0x288A86: io_loop (poll.c:453)
==39528==    by 0x113549: main (gossipd.c:687)
==39528==  Address 0x4af16d8 is 72 bytes inside a block of size 136 free'd
==39528==    at 0x484B27F: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==39528==    by 0x28FA75: del_tree (tal.c:456)
==39528==    by 0x28FD49: tal_free (tal.c:521)
==39528==    by 0x116E03: gossmap_manage_handle_get_txout_reply (gossmap_manage.c:633)
==39528==    by 0x113298: recv_req (gossipd.c:589)
==39528==    by 0x123E3E: handle_read (daemon_conn.c:35)
==39528==    by 0x285A73: next_plan (io.c:59)
==39528==    by 0x2866A8: do_plan (io.c:407)
==39528==    by 0x2866EA: io_ready (io.c:417)
==39528==    by 0x288A86: io_loop (poll.c:453)
==39528==    by 0x113549: main (gossipd.c:687)
==39528==  Block was alloc'd at
==39528==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==39528==    by 0x28F444: allocate (tal.c:256)
==39528==    by 0x28FABF: tal_alloc_ (tal.c:463)
==39528==    by 0x11676C: gossmap_manage_channel_announcement (gossmap_manage.c:516)
==39528==    by 0x112494: handle_recv_gossip (gossipd.c:212)
==39528==    by 0x1126BF: connectd_req (gossipd.c:313)
==39528==    by 0x123E3E: handle_read (daemon_conn.c:35)
==39528==    by 0x285A73: next_plan (io.c:59)
==39528==    by 0x2866A8: do_plan (io.c:407)
==39528==    by 0x2866EA: io_ready (io.c:417)
==39528==    by 0x288A86: io_loop (poll.c:453)
==39528==    by 0x113549: main (gossipd.c:687)
==39528== 

And here is what I believe the same use-after-free from the pov of memsan:

2024-02-09T14:24:42.5416291Z =================================================================
2024-02-09T14:24:42.5416885Z ==7714==ERROR: AddressSanitizer: heap-use-after-free on address 0x60d000000bd8 at pc 0x563e8722d857 bp 0x7ffcaeb931f0 sp 0x7ffcaeb929c0
2024-02-09T14:24:42.5417019Z READ of size 33 at 0x60d000000bd8 thread T0
2024-02-09T14:24:42.5417147Z =================================================================
2024-02-09T14:24:42.5417736Z ==7733==ERROR: AddressSanitizer: heap-use-after-free on address 0x60d000000628 at pc 0x56463f4ed857 bp 0x7fff2738ecd0 sp 0x7fff2738e4a0
2024-02-09T14:24:42.5417867Z READ of size 33 at 0x60d000000628 thread T0
2024-02-09T14:24:42.5418546Z     #0 0x563e8722d856 in __asan_memcpy (/home/runner/work/lightning/lightning/lightningd/lightning_gossipd+0x18a856) (BuildId: 8f255effbefb9a4176f1f884c54a8aca0f7e691a)
2024-02-09T14:24:42.5418924Z     #1 0x563e872e0fa0 in gossmap_find_node /home/runner/work/lightning/lightning/common/gossmap.c:265:17
2024-02-09T14:24:42.5419399Z     #2 0x563e87288127 in gossmap_manage_handle_get_txout_reply /home/runner/work/lightning/lightning/gossipd/gossmap_manage.c:643:28
2024-02-09T14:24:42.5419720Z     #3 0x563e8727633b in recv_req /home/runner/work/lightning/lightning/gossipd/gossipd.c:589:3
2024-02-09T14:24:42.5420070Z     #4 0x563e872d3f38 in handle_read /home/runner/work/lightning/lightning/common/daemon_conn.c:35:9
2024-02-09T14:24:42.5420385Z     #5 0x563e8756f445 in next_plan /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:59:9
2024-02-09T14:24:42.5420820Z     #6 0x563e87573b04 in do_plan /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:407:10
2024-02-09T14:24:42.5421139Z     #7 0x563e875730f3 in io_ready /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:417:8
2024-02-09T14:24:42.5421470Z     #8 0x563e8757f5de in io_loop /home/runner/work/lightning/lightning/ccan/ccan/io/poll.c:453:5
2024-02-09T14:24:42.5421814Z     #9 0x563e87276195 in main /home/runner/work/lightning/lightning/gossipd/gossipd.c:687:3
2024-02-09T14:24:42.5422122Z     #10 0x7fda12029d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
2024-02-09T14:24:42.5422404Z     #11 0x7fda12029e3f in __libc_start_main csu/../csu/libc-start.c:392:3
2024-02-09T14:24:42.5423064Z     #12 0x563e871ab6e4 in _start (/home/runner/work/lightning/lightning/lightningd/lightning_gossipd+0x1086e4) (BuildId: 8f255effbefb9a4176f1f884c54a8aca0f7e691a)
2024-02-09T14:24:42.5423075Z 
2024-02-09T14:24:42.5423462Z 0x60d000000bd8 is located 56 bytes inside of 136-byte region [0x60d000000ba0,0x60d000000c28)
2024-02-09T14:24:42.5423575Z freed by thread T0 here:
2024-02-09T14:24:42.5424215Z     #0 0x563e8722e282 in free (/home/runner/work/lightning/lightning/lightningd/lightning_gossipd+0x18b282) (BuildId: 8f255effbefb9a4176f1f884c54a8aca0f7e691a)
2024-02-09T14:24:42.5424548Z     #1 0x563e875a76e4 in del_tree /home/runner/work/lightning/lightning/ccan/ccan/tal/tal.c:456:9
2024-02-09T14:24:42.5424876Z     #2 0x563e875a6e00 in tal_free /home/runner/work/lightning/lightning/ccan/ccan/tal/tal.c:521:3
2024-02-09T14:24:42.5425349Z     #3 0x563e87287e7b in gossmap_manage_handle_get_txout_reply /home/runner/work/lightning/lightning/gossipd/gossmap_manage.c:633:2
2024-02-09T14:24:42.5425670Z     #4 0x563e8727633b in recv_req /home/runner/work/lightning/lightning/gossipd/gossipd.c:589:3
2024-02-09T14:24:42.5426013Z     #5 0x563e872d3f38 in handle_read /home/runner/work/lightning/lightning/common/daemon_conn.c:35:9
2024-02-09T14:24:42.5426320Z     #6 0x563e8756f445 in next_plan /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:59:9
2024-02-09T14:24:42.5426645Z     #7 0x563e87573b04 in do_plan /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:407:10
2024-02-09T14:24:42.5426956Z     #8 0x563e875730f3 in io_ready /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:417:8
2024-02-09T14:24:42.5427422Z     #9 0x563e8757f5de in io_loop /home/runner/work/lightning/lightning/ccan/ccan/io/poll.c:453:5
2024-02-09T14:24:42.5427733Z     #10 0x563e87276195 in main /home/runner/work/lightning/lightning/gossipd/gossipd.c:687:3
2024-02-09T14:24:42.5428037Z     #11 0x7fda12029d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
2024-02-09T14:24:42.5428043Z 
2024-02-09T14:24:42.5428175Z previously allocated by thread T0 here:
2024-02-09T14:24:42.5428845Z     #0 0x56463f4ed856 in __asan_memcpy (/home/runner/work/lightning/lightning/lightningd/lightning_gossipd+0x18a856) (BuildId: 8f255effbefb9a4176f1f884c54a8aca0f7e691a)
2024-02-09T14:24:42.5429217Z     #1 0x56463f5a0fa0 in gossmap_find_node /home/runner/work/lightning/lightning/common/gossmap.c:265:17
2024-02-09T14:24:42.5429696Z     #2 0x56463f548127 in gossmap_manage_handle_get_txout_reply /home/runner/work/lightning/lightning/gossipd/gossmap_manage.c:643:28
2024-02-09T14:24:42.5430020Z     #3 0x56463f53633b in recv_req /home/runner/work/lightning/lightning/gossipd/gossipd.c:589:3
2024-02-09T14:24:42.5430365Z     #4 0x56463f593f38 in handle_read /home/runner/work/lightning/lightning/common/daemon_conn.c:35:9
2024-02-09T14:24:42.5430678Z     #5 0x56463f82f445 in next_plan /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:59:9
2024-02-09T14:24:42.5430990Z     #6 0x56463f833b04 in do_plan /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:407:10
2024-02-09T14:24:42.5431309Z     #7 0x56463f8330f3 in io_ready /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:417:8
2024-02-09T14:24:42.5431628Z     #8 0x56463f83f5de in io_loop /home/runner/work/lightning/lightning/ccan/ccan/io/poll.c:453:5
2024-02-09T14:24:42.5432045Z     #9 0x56463f536195 in main /home/runner/work/lightning/lightning/gossipd/gossipd.c:687:3
2024-02-09T14:24:42.5432353Z     #10 0x7f5b34c29d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
2024-02-09T14:24:42.5432637Z     #11 0x7f5b34c29e3f in __libc_start_main csu/../csu/libc-start.c:392:3
2024-02-09T14:24:42.5433292Z     #12 0x56463f46b6e4 in _start (/home/runner/work/lightning/lightning/lightningd/lightning_gossipd+0x1086e4) (BuildId: 8f255effbefb9a4176f1f884c54a8aca0f7e691a)
2024-02-09T14:24:42.5433298Z 
2024-02-09T14:24:42.5433676Z 0x60d000000628 is located 56 bytes inside of 136-byte region [0x60d0000005f0,0x60d000000678)
2024-02-09T14:24:42.5433783Z freed by thread T0 here:
2024-02-09T14:24:42.5434428Z     #0 0x563e8722e52e in malloc (/home/runner/work/lightning/lightning/lightningd/lightning_gossipd+0x18b52e) (BuildId: 8f255effbefb9a4176f1f884c54a8aca0f7e691a)
2024-02-09T14:24:42.5434774Z     #1 0x563e875a5948 in allocate /home/runner/work/lightning/lightning/ccan/ccan/tal/tal.c:256:14
2024-02-09T14:24:42.5435127Z     #2 0x563e875a565f in tal_alloc_ /home/runner/work/lightning/lightning/ccan/ccan/tal/tal.c:463:17
2024-02-09T14:24:42.5435593Z     #3 0x563e8728547f in gossmap_manage_channel_announcement /home/runner/work/lightning/lightning/gossipd/gossmap_manage.c:516:8
2024-02-09T14:24:42.5435965Z     #4 0x563e8727a276 in handle_recv_gossip /home/runner/work/lightning/lightning/gossipd/gossipd.c:212:12
2024-02-09T14:24:42.5436307Z     #5 0x563e87279a7b in connectd_req /home/runner/work/lightning/lightning/gossipd/gossipd.c:313:3
2024-02-09T14:24:42.5436643Z     #6 0x563e872d3f38 in handle_read /home/runner/work/lightning/lightning/common/daemon_conn.c:35:9
2024-02-09T14:24:42.5436958Z     #7 0x563e8756f445 in next_plan /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:59:9
2024-02-09T14:24:42.5437274Z     #8 0x563e87573b04 in do_plan /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:407:10
2024-02-09T14:24:42.5437583Z     #9 0x563e875730f3 in io_ready /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:417:8
2024-02-09T14:24:42.5437920Z     #10 0x563e8757f5de in io_loop /home/runner/work/lightning/lightning/ccan/ccan/io/poll.c:453:5
2024-02-09T14:24:42.5438222Z     #11 0x563e87276195 in main /home/runner/work/lightning/lightning/gossipd/gossipd.c:687:3
2024-02-09T14:24:42.5438642Z     #12 0x7fda12029d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
2024-02-09T14:24:42.5438648Z 
2024-02-09T14:24:42.5439601Z SUMMARY: AddressSanitizer: heap-use-after-free (/home/runner/work/lightning/lightning/lightningd/lightning_gossipd+0x18a856) (BuildId: 8f255effbefb9a4176f1f884c54a8aca0f7e691a) in __asan_memcpy
2024-02-09T14:24:42.5439723Z Shadow bytes around the buggy address:
2024-02-09T14:24:42.5439933Z   0x0c1a7fff8120: fa fa fa fa fa fa fd fd fd fd fd fd fd fd fd fd
2024-02-09T14:24:42.5440130Z   0x0c1a7fff8130: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
2024-02-09T14:24:42.5440322Z   0x0c1a7fff8140: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
2024-02-09T14:24:42.5440524Z   0x0c1a7fff8150: fd fa fa fa fa fa fa fa fa fa fd fd fd fd fd fd
2024-02-09T14:24:42.5440710Z   0x0c1a7fff8160: fd fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa
2024-02-09T14:24:42.5440905Z =>0x0c1a7fff8170: fa fa fa fa fd fd fd fd fd fd fd[fd]fd fd fd fd
2024-02-09T14:24:42.5441095Z   0x0c1a7fff8180: fd fd fd fd fd fa fa fa fa fa fa fa fa fa fd fd
2024-02-09T14:24:42.5441278Z   0x0c1a7fff8190: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
2024-02-09T14:24:42.5441469Z   0x0c1a7fff81a0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
2024-02-09T14:24:42.5441758Z   0x0c1a7fff81b0: fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa
2024-02-09T14:24:42.5441934Z   0x0c1a7fff81c0: fa fa 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2024-02-09T14:24:42.5442175Z Shadow byte legend (one shadow byte represents 8 application bytes):
2024-02-09T14:24:42.5442274Z   Addressable:           00
2024-02-09T14:24:42.5442409Z   Partially addressable: 01 02 03 04 05 06 07 
2024-02-09T14:24:42.5442520Z   Heap left redzone:       fa
2024-02-09T14:24:42.5442765Z   Freed heap region:       fd
2024-02-09T14:24:42.5442875Z   Stack left redzone:      f1
2024-02-09T14:24:42.5442973Z   Stack mid redzone:       f2
2024-02-09T14:24:42.5443078Z   Stack right redzone:     f3
2024-02-09T14:24:42.5443182Z   Stack after return:      f5
2024-02-09T14:24:42.5443282Z   Stack use after scope:   f8
2024-02-09T14:24:42.5443378Z   Global redzone:          f9
2024-02-09T14:24:42.5443481Z   Global init order:       f6
2024-02-09T14:24:42.5443582Z   Poisoned by user:        f7
2024-02-09T14:24:42.5443682Z   Container overflow:      fc
2024-02-09T14:24:42.5443785Z   Array cookie:            ac
2024-02-09T14:24:42.5443884Z   Intra object redzone:    bb
2024-02-09T14:24:42.5443981Z   ASan internal:           fe
2024-02-09T14:24:42.5444092Z   Left alloca redzone:     ca
2024-02-09T14:24:42.5444192Z   Right alloca redzone:    cb
2024-02-09T14:24:42.5444276Z ==7714==ABORTING
2024-02-09T14:24:42.5444932Z     #0 0x56463f4ee282 in free (/home/runner/work/lightning/lightning/lightningd/lightning_gossipd+0x18b282) (BuildId: 8f255effbefb9a4176f1f884c54a8aca0f7e691a)
2024-02-09T14:24:42.5445263Z     #1 0x56463f8676e4 in del_tree /home/runner/work/lightning/lightning/ccan/ccan/tal/tal.c:456:9
2024-02-09T14:24:42.5445599Z     #2 0x56463f866e00 in tal_free /home/runner/work/lightning/lightning/ccan/ccan/tal/tal.c:521:3
2024-02-09T14:24:42.5446067Z     #3 0x56463f547e7b in gossmap_manage_handle_get_txout_reply /home/runner/work/lightning/lightning/gossipd/gossmap_manage.c:633:2
2024-02-09T14:24:42.5446384Z     #4 0x56463f53633b in recv_req /home/runner/work/lightning/lightning/gossipd/gossipd.c:589:3
2024-02-09T14:24:42.5446731Z     #5 0x56463f593f38 in handle_read /home/runner/work/lightning/lightning/common/daemon_conn.c:35:9
2024-02-09T14:24:42.5447044Z     #6 0x56463f82f445 in next_plan /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:59:9
2024-02-09T14:24:42.5447366Z     #7 0x56463f833b04 in do_plan /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:407:10
2024-02-09T14:24:42.5447675Z     #8 0x56463f8330f3 in io_ready /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:417:8
2024-02-09T14:24:42.5448000Z     #9 0x56463f83f5de in io_loop /home/runner/work/lightning/lightning/ccan/ccan/io/poll.c:453:5
2024-02-09T14:24:42.5448524Z     #10 0x56463f536195 in main /home/runner/work/lightning/lightning/gossipd/gossipd.c:687:3
2024-02-09T14:24:42.5448828Z     #11 0x7f5b34c29d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
2024-02-09T14:24:42.5448835Z 
2024-02-09T14:24:42.5448963Z previously allocated by thread T0 here:
2024-02-09T14:24:42.5449637Z     #0 0x56463f4ee52e in malloc (/home/runner/work/lightning/lightning/lightningd/lightning_gossipd+0x18b52e) (BuildId: 8f255effbefb9a4176f1f884c54a8aca0f7e691a)
2024-02-09T14:24:42.5449982Z     #1 0x56463f865948 in allocate /home/runner/work/lightning/lightning/ccan/ccan/tal/tal.c:256:14
2024-02-09T14:24:42.5450327Z     #2 0x56463f86565f in tal_alloc_ /home/runner/work/lightning/lightning/ccan/ccan/tal/tal.c:463:17
2024-02-09T14:24:42.5450796Z     #3 0x56463f54547f in gossmap_manage_channel_announcement /home/runner/work/lightning/lightning/gossipd/gossmap_manage.c:516:8
2024-02-09T14:24:42.5451169Z     #4 0x56463f53a276 in handle_recv_gossip /home/runner/work/lightning/lightning/gossipd/gossipd.c:212:12
2024-02-09T14:24:42.5451516Z     #5 0x56463f539a7b in connectd_req /home/runner/work/lightning/lightning/gossipd/gossipd.c:313:3
2024-02-09T14:24:42.5451848Z     #6 0x56463f593f38 in handle_read /home/runner/work/lightning/lightning/common/daemon_conn.c:35:9
2024-02-09T14:24:42.5452169Z     #7 0x56463f82f445 in next_plan /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:59:9
2024-02-09T14:24:42.5452485Z     #8 0x56463f833b04 in do_plan /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:407:10
2024-02-09T14:24:42.5452798Z     #9 0x56463f8330f3 in io_ready /home/runner/work/lightning/lightning/ccan/ccan/io/io.c:417:8
2024-02-09T14:24:42.5453240Z     #10 0x56463f83f5de in io_loop /home/runner/work/lightning/lightning/ccan/ccan/io/poll.c:453:5
2024-02-09T14:24:42.5453556Z     #11 0x56463f536195 in main /home/runner/work/lightning/lightning/gossipd/gossipd.c:687:3
2024-02-09T14:24:42.5453878Z     #12 0x7f5b34c29d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
2024-02-09T14:24:42.5453884Z 
2024-02-09T14:24:42.5454842Z SUMMARY: AddressSanitizer: heap-use-after-free (/home/runner/work/lightning/lightning/lightningd/lightning_gossipd+0x18a856) (BuildId: 8f255effbefb9a4176f1f884c54a8aca0f7e691a) in __asan_memcpy
2024-02-09T14:24:42.5454962Z Shadow bytes around the buggy address:
2024-02-09T14:24:42.5455172Z   0x0c1a7fff8070: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
2024-02-09T14:24:42.5455368Z   0x0c1a7fff8080: fd fd fa fa fa fa fa fa fa fa fd fd fd fd fd fd
2024-02-09T14:24:42.5455559Z   0x0c1a7fff8090: fd fd fd fd fd fd fd fd fd fd fd fd fa fa fa fa
2024-02-09T14:24:42.5455761Z   0x0c1a7fff80a0: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd
2024-02-09T14:24:42.5455954Z   0x0c1a7fff80b0: fd fd fd fd fd fa fa fa fa fa fa fa fa fa fd fd
2024-02-09T14:24:42.5456148Z =>0x0c1a7fff80c0: fd fd fd fd fd[fd]fd fd fd fd fd fd fd fd fd fa
2024-02-09T14:24:42.5456334Z   0x0c1a7fff80d0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
2024-02-09T14:24:42.5456522Z   0x0c1a7fff80e0: fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa
2024-02-09T14:24:42.5456713Z   0x0c1a7fff80f0: fa fa fd fd fd fd fd fd fd fd fd fd fd fd fd fd
2024-02-09T14:24:42.5456897Z   0x0c1a7fff8100: fd fd fd fa fa fa fa fa fa fa fa fa fd fd fd fd
2024-02-09T14:24:42.5457083Z   0x0c1a7fff8110: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa fa
2024-02-09T14:24:42.5457322Z Shadow byte legend (one shadow byte represents 8 application bytes):
2024-02-09T14:24:42.5457421Z   Addressable:           00
2024-02-09T14:24:42.5457557Z   Partially addressable: 01 02 03 04 05 06 07 
2024-02-09T14:24:42.5457666Z   Heap left redzone:       fa
2024-02-09T14:24:42.5457766Z   Freed heap region:       fd
2024-02-09T14:24:42.5457875Z   Stack left redzone:      f1
2024-02-09T14:24:42.5457973Z   Stack mid redzone:       f2
2024-02-09T14:24:42.5458072Z   Stack right redzone:     f3
2024-02-09T14:24:42.5458178Z   Stack after return:      f5
2024-02-09T14:24:42.5458396Z   Stack use after scope:   f8
2024-02-09T14:24:42.5458493Z   Global redzone:          f9
2024-02-09T14:24:42.5458597Z   Global init order:       f6
2024-02-09T14:24:42.5458696Z   Poisoned by user:        f7
2024-02-09T14:24:42.5458794Z   Container overflow:      fc
2024-02-09T14:24:42.5458896Z   Array cookie:            ac
2024-02-09T14:24:42.5458994Z   Intra object redzone:    bb
2024-02-09T14:24:42.5459090Z   ASan internal:           fe
2024-02-09T14:24:42.5459194Z   Left alloca redzone:     ca
2024-02-09T14:24:42.5459291Z   Right alloca redzone:    cb
2024-02-09T14:24:42.5459375Z ==7733==ABORTING

Very detailed, but almost bludgeoning devs with verbosity xD

As far as I can see the bad gossip is not triggering directly, but causing the **BROKEN** postprocessing step to fail:

2024-02-09T14:32:05.2334599Z lightningd-4 2024-02-09T14:24:11.068Z **BROKEN** gossipd: Bad gossip order: node_announce at 629 says not dying, all channels dying

@rustyrussell
Copy link
Contributor Author

Yep, trivial use-after-free! Obvious fix, may or may not prevent that broken bad gossip msg...

@rustyrussell
Copy link
Contributor Author

Found one place I missed, added final patch with much debug info, so we will get more info if there's another failure in CI.

@rustyrussell rustyrussell force-pushed the guilt/fix-dead-gossip branch 3 times, most recently from 8e0c37a to 4de1a2b Compare February 11, 2024 05:42
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It prints a message to stderr, but actually it's fine with this version:

```
dump-gossipstore: UNKNOWN GOSSIP minor VERSION 14 (expected 12)
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And fix up gossip_store backwards comment!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We want to be able to clear them, and fetch them.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…ing.

This avoids us gossiping about nodes which don't have live channels.

Interstingly, we previously tested that we *did* gossip such node
announcements, and now we fix that test.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…channel!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We can update dying channels, though it seems weird!  We accept gossip about them,
we just don't propagate it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We accept node_announcements on dying channels, but make sure we
set the dying flag it channels are alll dying.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…l_announcements.

We make sure a node_announcement is preceeded by at least one channel_announcement,
but dying ones don't count (as they are not broadcast!).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…move last channel.

Normally, channels are marked dying, the 12 blocks later, removed.
But for local channels, we can access any spliced channel already, so
we remove them immediately from our local gossip.  This left a hole in
our logic, if that channel was the last one keeping a
node_announcement alive.

Solution is to unify with the "moved node_announcement" path.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
…yet.

This happens if:
1. The peer sets a timestamp filter to non-zero, and
2. We have a channel_announcement without a channel_update.

The timestamp is 0 as a placeholder as part of the recent gossip rework
(we used to hold these channel_announcement in memory, which was complex).

But this means we won't send it in this case, and if we later send the
channel_update, CI will complain about 'Bad gossip order'.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@rustyrussell
Copy link
Contributor Author

I found it! Our timestamp filtering, which we still kind of support, was omitting channels-without-an-update yet, which is a narrow window.

@cdecker cdecker merged commit e7f1f29 into ElementsProject:master Feb 12, 2024
36 checks passed
@cdecker cdecker deleted the guilt/fix-dead-gossip branch February 12, 2024 10:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flake Known CI flakes gossip
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants