New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

channeld: prioritize read from peer over write to peer #2297

Merged
merged 3 commits into from Jan 29, 2019

Conversation

Projects
None yet
3 participants
@SimonVrouwe
Copy link
Collaborator

SimonVrouwe commented Jan 26, 2019

channeld: prioritize read from peer over write to peer

This solves (or at least reduces probability of) a deadlock in channeld
when there is lot of gossip traffic, see issue #2286. That issue is
almost identical to #1943 (deadlock in openingd) and so is the fix.

I did a simple test, monitoring socket traffic Recv-Q and Send-Q at 1 second interval, using ss -4 -t -p | egrep "lightning|State" during startup. The node has one channel with satoshisplace, so I don't know if this problem is specific to that node, but this solved the deadlock for me.

Result after this commit:

State   Recv-Q Send-Q Local Address:Port                 Peer Address:Port                
ESTAB 0 107 192.168.1.166:44902 88.98.213.235:9735 users:(("lightning_conne",pid=30591,fd=6))
ESTAB 0 106833 192.168.1.166:44902 88.98.213.235:9735 users:(("lightning_chann",pid=30628,fd=3))
ESTAB 0 2242 192.168.1.166:44902 88.98.213.235:9735 users:(("lightning_chann",pid=30628,fd=3))
ESTAB 0 118931 192.168.1.166:44902 88.98.213.235:9735 users:(("lightning_chann",pid=30628,fd=3))
ESTAB 0 192867 192.168.1.166:44902 88.98.213.235:9735 users:(("lightning_chann",pid=30628,fd=3))
ESTAB 0 60478 192.168.1.166:44902 88.98.213.235:9735 users:(("lightning_chann",pid=30628,fd=3))
ESTAB 0 191260 192.168.1.166:44902 88.98.213.235:9735 users:(("lightning_chann",pid=30628,fd=3))
ESTAB 0 71770 192.168.1.166:44902 88.98.213.235:9735 users:(("lightning_chann",pid=30628,fd=3))
ESTAB 0 0 192.168.1.166:44902 88.98.213.235:9735 users:(("lightning_chann",pid=30628,fd=3))
ESTAB 0 65781 192.168.1.166:44902 88.98.213.235:9735 users:(("lightning_chann",pid=30628,fd=3))
ESTAB 0 68496 192.168.1.166:44902 88.98.213.235:9735 users:(("lightning_chann",pid=30628,fd=3))
ESTAB 0 622707 192.168.1.166:44902 88.98.213.235:9735 users:(("lightning_chann",pid=30628,fd=3))
ESTAB 0 2916 192.168.1.166:44902 88.98.213.235:9735 users:(("lightning_chann",pid=30628,fd=3))
ESTAB 0 84782 192.168.1.166:44902 88.98.213.235:9735 users:(("lightning_chann",pid=30628,fd=3))
ESTAB 0 24917 192.168.1.166:44902 88.98.213.235:9735 users:(("lightning_chann",pid=30628,fd=3))
ESTAB 0 36609 192.168.1.166:44902 88.98.213.235:9735 users:(("lightning_chann",pid=30628,fd=3))
ESTAB 0 10866 192.168.1.166:44902 88.98.213.235:9735 users:(("lightning_chann",pid=30628,fd=3))
ESTAB 0 794 192.168.1.166:44902 88.98.213.235:9735 users:(("lightning_chann",pid=30628,fd=3))
ESTAB 0 76226 192.168.1.166:44902 88.98.213.235:9735 users:(("lightning_chann",pid=30628,fd=3))
ESTAB 1245 5648 192.168.1.166:44902 88.98.213.235:9735 users:(("lightning_chann",pid=30628,fd=3))
ESTAB 0 630629 192.168.1.166:44902 88.98.213.235:9735 users:(("lightning_chann",pid=30628,fd=3))
ESTAB 0 626984 192.168.1.166:44902 88.98.213.235:9735 users:(("lightning_chann",pid=30628,fd=3))
ESTAB 0 486528 192.168.1.166:44902 88.98.213.235:9735 users:(("lightning_chann",pid=30628,fd=3))
ESTAB 0 373345 192.168.1.166:44902 88.98.213.235:9735 users:(("lightning_chann",pid=30628,fd=3))
ESTAB 0 0 192.168.1.166:44902 88.98.213.235:9735 users:(("lightning_chann",pid=30628,fd=3))
ESTAB 0 0 192.168.1.166:44902 88.98.213.235:9735 users:(("lightning_chann",pid=30628,fd=3))
ESTAB 0 0 192.168.1.166:44902 88.98.213.235:9735 users:(("lightning_chann",pid=30628,fd=3))
ESTAB 0 0 192.168.1.166:44902 88.98.213.235:9735 users:(("lightning_chann",pid=30628,fd=3))
ESTAB 0 0 192.168.1.166:44902 88.98.213.235:9735 users:(("lightning_chann",pid=30628,fd=3))
...

Result before this commit:

State   Recv-Q Send-Q Local Address:Port                 Peer Address:Port                
ESTAB 133145 124528 192.168.1.166:45256 88.98.213.235:9735 users:(("lightning_chann",pid=31325,fd=3))
ESTAB 249555 162176 192.168.1.166:45256 88.98.213.235:9735 users:(("lightning_chann",pid=31325,fd=3))
ESTAB 250835 193264 192.168.1.166:45256 88.98.213.235:9735 users:(("lightning_chann",pid=31325,fd=3))
ESTAB 250835 193264 192.168.1.166:45256 88.98.213.235:9735 users:(("lightning_chann",pid=31325,fd=3))
ESTAB 250835 193264 192.168.1.166:45256 88.98.213.235:9735 users:(("lightning_chann",pid=31325,fd=3))
ESTAB 250835 193264 192.168.1.166:45256 88.98.213.235:9735 users:(("lightning_chann",pid=31325,fd=3))
...
channeld: prioritize read from peer over (read from gossipd and) writ…
…e to peer

This solves (or at least reduces probability of) a deadlock in channeld
when there is lot of gossip traffic, see issue #2286. That issue is
almost identical to #1943 (deadlock in openingd) and so is the fix.

@SimonVrouwe SimonVrouwe force-pushed the SimonVrouwe:26jan_channeld_read_peer_before_write branch from a5d960b to 1ebe311 Jan 27, 2019

common: handle peer input before gossipd input (for closingd, openingd)
Similar to the previous "handle peer input before gossip input", this
fixes similar potential deadlock for closingd and openingd which use
peer_or_gossip_sync_read.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@rustyrussell

This comment has been minimized.

Copy link
Contributor

rustyrussell commented Jan 29, 2019

Added same fix for the other daemons....

CHANGELOG.md: document deadlock fix.
Useful if others hit it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@cdecker

This comment has been minimized.

Copy link
Member

cdecker commented Jan 29, 2019

ACK b6b9246

@cdecker cdecker merged commit f1a837e into ElementsProject:master Jan 29, 2019

2 checks passed

ackbot PR ack'd by cdecker
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@SimonVrouwe SimonVrouwe deleted the SimonVrouwe:26jan_channeld_read_peer_before_write branch Feb 4, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment