Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

master_sync_reply is racy #266

Closed
cdecker opened this issue Sep 7, 2017 · 1 comment
Closed

master_sync_reply is racy #266

cdecker opened this issue Sep 7, 2017 · 1 comment
Assignees
Labels

Comments

@cdecker
Copy link
Member

cdecker commented Sep 7, 2017

master_sync_reply in channeld assumes that there is only one such call pending at any point in time. This may not always be true, since we have 3 places in which this is called in response to timers and/or incoming messages from the peer. io_wait is used on the peer messages, so concurrent peer triggered calls should be safe, but if we have a commit timer running, followed by a call triggered by a peer message we'll run into the following error:

lightning_channeld: channeld/channel.c:518: master_sync_reply: Assertion `!peer->handle_master_reply' failed.

This'll kill channeld, requiring a reconnect.

@rustyrussell
Copy link
Contributor

Yep, I hit the same bug stress testing.

rustyrussell added a commit to rustyrussell/lightning that referenced this issue Sep 8, 2017
We hit:
	assert(!peer->handle_master_reply);

#4  0x000055bba3b030a0 in master_sync_reply (peer=0x55bba41c0030, 
    msg=0x55bba41c6a80 "", replytype=WIRE_CHANNEL_GOT_COMMITSIG_REPLY, 
    handle=0x55bba3b041cf <handle_reply_wake_peer>) at channeld/channel.c:518
#5  0x000055bba3b049bc in handle_peer_commit_sig (conn=0x55bba41c10d0, 
    peer=0x55bba41c0030, msg=0x55bba41c6a80 "") at channeld/channel.c:959
#6  0x000055bba3b05c69 in peer_in (conn=0x55bba41c10d0, peer=0x55bba41c0030, 
    msg=0x55bba41c67c0 "") at channeld/channel.c:1339
#7  0x000055bba3b123eb in peer_decrypt_body (conn=0x55bba41c10d0, 
    pcs=0x55bba41c0030) at common/cryptomsg.c:155
#8  0x000055bba3b2c63b in next_plan (conn=0x55bba41c10d0, plan=0x55bba41c1100)
    at ccan/ccan/io/io.c:59

We got a commit_sig from the peer while waiting for the master to
reply to acknowledge the commitsig we want to send
(handle_sending_commitsig_reply).

The fix is to go always talk to the master synchronous, and not try to
process anything but messages from the master daemon.  This avoids the
whole class of problems.

There's a fairly simple way to do this, as ccan/io lets you override
its poll call: we process any outstanding master requests there, or
add the master fd to the pollfds array.

Fixes: ElementsProject#266
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants