Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lightningd hanging, not opening RPC port, not syncing, reducing latest block by 15 #2687

Closed
svewa opened this issue May 28, 2019 · 8 comments · Fixed by #2690
Closed

lightningd hanging, not opening RPC port, not syncing, reducing latest block by 15 #2687

svewa opened this issue May 28, 2019 · 8 comments · Fixed by #2690

Comments

@svewa
Copy link

svewa commented May 28, 2019

I have c-lightning running for over a year now, but some time maybe two month ago I noticed it stopped and would not properly start again. I tried updating to 0.7.0 and now to latest git, does not help.

config file:

log-level=io
network=bitcoin

starting lightningd:

ln@c-lightning:~$ lightning/lightningd/lightningd 
2019-05-28T22:08:01.236Z plugin-autoclean[OUT] 7b226a736f6e727063223a22322e30222c226964223a312c226d6574686f64223a226765746d616e6966657374222c22706172616d73223a7b7d7d0a0a
2019-05-28T22:08:01.236Z plugin-pay[OUT] 7b226a736f6e727063223a22322e30222c226964223a302c226d6574686f64223a226765746d616e6966657374222c22706172616d73223a7b7d7d0a0a
2019-05-28T22:08:01.297Z lightningd(3740): testing /home/ln/lightning/lightningd/lightning_channeld
2019-05-28T22:08:01.301Z lightningd(3740): testing /home/ln/lightning/lightningd/lightning_closingd
2019-05-28T22:08:01.305Z lightningd(3740): testing /home/ln/lightning/lightningd/lightning_connectd
2019-05-28T22:08:01.311Z lightningd(3740): testing /home/ln/lightning/lightningd/lightning_gossipd
2019-05-28T22:08:01.315Z lightningd(3740): testing /home/ln/lightning/lightningd/lightning_hsmd
2019-05-28T22:08:01.319Z lightningd(3740): testing /home/ln/lightning/lightningd/lightning_onchaind
2019-05-28T22:08:01.323Z lightningd(3740): testing /home/ln/lightning/lightningd/lightning_openingd
2019-05-28T22:08:01.475Z lightning_hsmd(3750): pid 3750, msgfd 17
2019-05-28T22:08:01.508Z lightning_connectd(3751): pid 3751, msgfd 20
2019-05-28T22:08:01.509Z lightning_hsmd(3750): Client: Received message 11 from client
2019-05-28T22:08:01.509Z lightning_hsmd(3750): Client: Received message 9 from client
2019-05-28T22:08:01.509Z lightning_hsmd(3750): new_client: 0
2019-05-28T22:08:01.725Z lightning_connectd(3751): Created IPv6 listener on port 9735
2019-05-28T22:08:01.725Z lightning_connectd(3751): Failed to connect 10 socket: Network is unreachable
2019-05-28T22:08:01.725Z lightning_connectd(3751): Created IPv4 listener on port 9735
2019-05-28T22:08:01.725Z lightning_connectd(3751): REPLY WIRE_CONNECTCTL_INIT_REPLY with 0 fds
2019-05-28T22:08:01.727Z lightning_gossipd(3752): pid 3752, msgfd 19
2019-05-28T22:08:01.762Z lightningd(3740): Loaded 0 HTLC signatures from DB
2019-05-28T22:08:01.766Z lightningd(3740): Loaded 0 HTLC signatures from DB
2019-05-28T22:08:01.772Z lightningd(3740): Loaded 0 HTLC signatures from DB
2019-05-28T22:08:01.779Z lightningd(3740): Loaded 0 HTLC signatures from DB
2019-05-28T22:08:01.785Z lightningd(3740): Loaded 0 HTLC signatures from DB
2019-05-28T22:08:01.789Z lightningd(3740): Loaded 0 HTLC signatures from DB
2019-05-28T22:08:01.792Z lightningd(3740): Loaded 6 channels from DB
2019-05-28T22:08:01.793Z lightningd(3740): Loading HTLCs for channel 14298
2019-05-28T22:08:01.794Z lightningd(3740): Restored 0 incoming and 0 outgoing HTLCS
2019-05-28T22:08:01.794Z lightningd(3740): Loading HTLCs for channel 15388
2019-05-28T22:08:01.795Z lightningd(3740): Restored 0 incoming and 0 outgoing HTLCS
2019-05-28T22:08:01.796Z lightningd(3740): Loading HTLCs for channel 15425
2019-05-28T22:08:01.797Z lightningd(3740): Restored 0 incoming and 0 outgoing HTLCS
2019-05-28T22:08:01.798Z lightningd(3740): Loading HTLCs for channel 18348
2019-05-28T22:08:01.799Z lightningd(3740): Restored 0 incoming and 0 outgoing HTLCS
2019-05-28T22:08:01.799Z lightningd(3740): Loading HTLCs for channel 21447
2019-05-28T22:08:01.800Z lightningd(3740): Restored 0 incoming and 0 outgoing HTLCS
2019-05-28T22:08:01.801Z lightningd(3740): Loading HTLCs for channel 23172
2019-05-28T22:08:01.801Z lightningd(3740): Restored 0 incoming and 0 outgoing HTLCS
2019-05-28T22:08:02.525Z lightning_hsmd(3750): Client: Received message 9 from client
2019-05-28T22:08:02.526Z lightning_hsmd(3750): new_client: 0
2019-05-28T22:08:02.526Z lightning_hsmd(3750): Client: Received message 10 from client
2019-05-28T22:08:02.527Z lightning_hsmd(3750): Client: Received message 10 from client
2019-05-28T22:08:02.527Z lightning_gossipd(3752): total store load time: 0 msec
2019-05-28T22:08:02.528Z lightning_hsmd(3750): Client: Received message 10 from client
2019-05-28T22:08:02.528Z lightning_gossipd(3752): gossip_store: Read 0/0/0/0 cannounce/cupdate/nannounce/cdelete from store in 1 bytes
2019-05-28T22:08:02.528Z lightning_hsmd(3750): Client: Received message 10 from client
2019-05-28T22:08:02.528Z lightning_hsmd(3750): Client: Received message 10 from client
2019-05-28T22:08:02.529Z lightning_hsmd(3750): Client: Received message 10 from client
2019-05-28T22:08:03.684Z lightningd(3740): bitcoin-cli estimatesmartfee 2 CONSERVATIVE exited with status 1
2019-05-28T22:08:07.783Z lightningd(3740): Adding block 556142: 000000000000000000173e47871cbacd921ab24034ba9de80e40af84d3d703d5
@ZmnSCPxj
Copy link
Collaborator

The last thing I am seeing on strace is that we are polling on fds 5, 7, 17, 20, and 19.

  • 5 = pay plugin. Seems to have gotten its manifest out and is now waiting for commands, so fine.
  • 7 = autoclean plugin. Same as above.
  • 17 = hsmd.
  • 20 = connectd.
  • 19 = gossipd.

This poll is thus the main io-loop.

The issue is that topology should be scanning forward for new blocks continuously, but strace shows it has only scanned one block, then after reading it, topology updates the db, but suddenly we enter the main loop (which we should not, because, topology should have its own loop....?).

@cdecker there may be a race condition? The main io_loop blocks on poll, but the topology loop should be running?

@ZmnSCPxj
Copy link
Collaborator

I believe I have cracked this issue.

After starting up plugins and the core daemons (hsmd, gossipd, connectd), one of the things we do is setup_topology:

/*~ Initialize block topology. This does its own io_loop to
* talk to bitcoind, so does its own db transactions. */
setup_topology(ld->topology, &ld->timers,
min_blockheight, max_blockheight);

As the comment says, it does an io_loop, which (in the absence of struct timers) will only exit if we execute a io_break.

Looking inside setup_topology, we see that io_loop is called without any struct timers (i.e. the NULL arguments):

/* Once it gets initial block, it calls io_break() and we return. */
io_loop(NULL, NULL);

Now, just prior to the above io_loop, we see that we initiate fee estimation:

start_fee_estimate(topo);

In particular, we only execute io_break if we get our first block and we get our first fee estimation. That's this code below:

/* get_init_blockhash sets topo->root, start_fee_estimate clears
* feerate_uninitialized (even if unsuccessful) */
static void maybe_completed_init(struct chain_topology *topo)
{
if (topo->feerate_uninitialized)
return;
if (!topo->root)
return;
io_break(topo);
}

And if we look at the logs:

2019-05-28T22:08:03.684Z lightningd(3740): bitcoin-cli estimatesmartfee 2 CONSERVATIVE exited with status 1

Looks like fee estimation from bitcoin-cli exited with status 1.

We search for this text exited with status and see:

log_unusual(bitcoind->log,
"%s exited with status %u",
bcli_args(tmpctx, bcli), exitstatus);

And we scroll down a little and also see that, in case of such a failure, the bitcoind system will helpfully automatically retry the command 1 second later (in particular, it will not invoke the callback inside chaintopology yet):

/* Retry in 1 second (not a leak!) */
new_reltimer(&bitcoind->ld->timers, notleak(bcli), time_from_sec(1),
retry_bcli, bcli);

Of course, it uses a timer..... which the setup_topology loop does not enable. Remember?

/* Once it gets initial block, it calls io_break() and we return. */
io_loop(NULL, NULL);

TLDR: If fee estimation completely fails, we retry 1 second later. Unfortunately, the initial setup_topology creates a io_loop that ignores timers. So we never retry fee estimation and just end up hanging.

@cdecker @rustyrussell it seems to me that the simplest solution is to write a io_loop_with_timers that handles timers, and merge the mainloop and the tiny io_loop in setup_topology to io_loop_with_timers. Is this safe? Why does the chaintopology io_loop ignore timers?

ZmnSCPxj added a commit to ZmnSCPxj/lightning that referenced this issue May 30, 2019
ZmnSCPxj added a commit to ZmnSCPxj/lightning that referenced this issue May 30, 2019
cdecker added a commit to ZmnSCPxj/lightning that referenced this issue May 31, 2019
Signed-off-by: Christian Decker <decker.christian@gmail.com>
cdecker pushed a commit to ZmnSCPxj/lightning that referenced this issue May 31, 2019
cdecker added a commit that referenced this issue May 31, 2019
Signed-off-by: Christian Decker <decker.christian@gmail.com>
cdecker pushed a commit that referenced this issue May 31, 2019
@svewa
Copy link
Author

svewa commented May 31, 2019

https://drop.geekbox.info/feeestimateerror.log

It seems the error was the whole time, that my bitcoind was 17.1 and did not support the 2nd paramter to the getsmartfeeestimate.... your patch made the verbose error message visible, but a check if the available bitcoind has the required features or minimum version would be helpful for sure, too.

@ZmnSCPxj
Copy link
Collaborator

ZmnSCPxj commented Jun 1, 2019

@svewa okay, I shall open a separate new issue.

@ZmnSCPxj
Copy link
Collaborator

ZmnSCPxj commented Jun 1, 2019

#2697

@darosior
Copy link
Collaborator

darosior commented Jun 3, 2019

It seems the error was the whole time, that my bitcoind was 17.1 and did not support the 2nd paramter to the getsmartfeeestimate.... your patch made the verbose error message visible, but a check if the available bitcoind has the required features or minimum version would be helpful for sure, too.

No, you get that because bitcoind could not estimatesmartfee (not because of a bad JSONRPC arg), I had this issue before I don't remember how I solved it. I'll search and keep you updated.

@darosior
Copy link
Collaborator

darosior commented Jun 3, 2019

Did you change your datetime (had a powercut or something) recently ? What are the bitcoin logs on an estimatesmartfee ?

@svewa
Copy link
Author

svewa commented Jun 3, 2019

No, you get that because bitcoind could not estimatesmartfee (not because of a bad JSONRPC arg)

Yes, I did. Just see the output in https://drop.geekbox.info/feeestimateerror.log

2019-05-31T18:22:32.564Z lightningd(10275): bitcoin-cli estimatesmartfee 2 CONSERVATIVE exited with status 1

When you run this very command bitcoin-cli estimatesmartfee 2 CONSERVATIVE against bitcoind 0.17.1, you get the exact same error as I got. It's bitcoind telling you, that you gave the wrong parameters. If you run it against 0.18.0 it works, also the help text of bitcoind describing the estimatesmartfee reflects that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants