Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flushing database cache causes p2p connections ping timeout #16008

Closed
hebasto opened this issue May 11, 2019 · 8 comments
Closed

Flushing database cache causes p2p connections ping timeout #16008

hebasto opened this issue May 11, 2019 · 8 comments
Labels

Comments

@hebasto
Copy link
Member

hebasto commented May 11, 2019

System:

  • ODROID-HC1 (ARMv7)
  • datadir resides on microSD
  • blocksdir resides on HDD (via USB-to-SATA bridge)
  • version 0.18.0 with default memory and network settings

debug.log:

2019-05-11T16:03:27Z UpdateTip: new best=0000000000000000004fe45d686e49f6279c9b1f42c3f6409ac05eaaa236479c height=507493 version=0x20000000 log2_work=88.035066 tx=297395773 date='2018-02-03T22:17:12Z' progress=0.724774 cache=718.8MiB(6456339txo)
2019-05-11T16:03:30Z UpdateTip: new best=000000000000000000312ef010e5e5fbe40de748bb5641d18589f7ccaad8c67a height=507494 version=0x20000000 log2_work=88.035117 tx=297397617 date='2018-02-03T22:30:46Z' progress=0.724779 cache=719.5MiB(6463104txo)
2019-05-11T16:10:45Z Pre-allocating up to position 0x4000000 in blk01165.dat
2019-05-11T16:11:54Z UpdateTip: new best=00000000000000000036468a6fd2b45bf0a1f59986829ae081b1bfdc084a98e1 height=507495 version=0x20000000 log2_work=88.035168 tx=297399843 date='2018-02-03T22:54:06Z' progress=0.724781 cache=30.0MiB(11097txo)
2019-05-11T16:12:27Z UpdateTip: new best=00000000000000000004456513b3ba2a6fdd92b615917870d068e8541b77d809 height=507496 version=0x20000000 log2_work=88.035219 tx=297400286 date='2018-02-03T22:54:24Z' progress=0.724782 cache=30.4MiB(14539txo)
2019-05-11T16:12:51Z Pre-allocating up to position 0x600000 in rev01165.dat
2019-05-11T16:12:51Z UpdateTip: new best=000000000000000000251354c53288c8f04467ddaa22c7fab0ccd6ed19faa422 height=507497 version=0x20000000 log2_work=88.035269 tx=297402384 date='2018-02-03T23:13:23Z' progress=0.724787 cache=31.2MiB(22254txo)
2019-05-11T16:13:43Z UpdateTip: new best=0000000000000000002992a54fbd1bcf3726ce318c18c2bb932b656638a0cbe5 height=507498 version=0x20000000 log2_work=88.03532 tx=297404121 date='2018-02-03T23:27:52Z' progress=0.724791 cache=32.0MiB(29607txo)
2019-05-11T16:13:45Z UpdateTip: new best=0000000000000000001ce55cca81380c6d006d043ab703d5152b1ff63be9e91c height=507499 version=0x20000000 log2_work=88.035371 tx=297404344 date='2018-02-03T23:30:00Z' progress=0.724791 cache=32.0MiB(30027txo)
2019-05-11T16:15:56Z UpdateTip: new best=00000000000000000011488194b6248a3bbfa45874eb0a363dcd22aa420d10f5 height=507500 version=0x20000000 log2_work=88.035422 tx=297404612 date='2018-02-03T23:32:12Z' progress=0.724791 cache=32.8MiB(36870txo)
2019-05-11T16:15:56Z socket send error Connection reset by peer (104)
2019-05-11T16:15:56Z socket send error Connection reset by peer (104)
2019-05-11T16:15:57Z socket recv error Connection reset by peer (104)
2019-05-11T16:19:48Z UpdateTip: new best=00000000000000000035774b92b534b73688af6abbdfb1e6d46c5e6517fc4742 height=507501 version=0x20000000 log2_work=88.035473 tx=297405258 date='2018-02-03T23:37:02Z' progress=0.724791 cache=33.5MiB(43894txo)
2019-05-11T16:20:35Z UpdateTip: new best=00000000000000000045d65ec1685b4fa65b096e60407850ac19db8c2c6ec641 height=507502 version=0x20000000 log2_work=88.035524 tx=297407473 date='2018-02-04T00:04:32Z' progress=0.724796 cache=34.3MiB(51334txo)
2019-05-11T16:23:09Z UpdateTip: new best=0000000000000000000477ccaee4b89c6c965e2c35a65b9da59df1cb8581f727 height=507503 version=0x20000000 log2_work=88.035575 tx=297408190 date='2018-02-04T00:05:53Z' progress=0.724797 cache=35.1MiB(59023txo)
2019-05-11T16:23:09Z ping timeout: 1381.505446s
2019-05-11T16:23:09Z ping timeout: 1493.637354s
2019-05-11T16:23:09Z ping timeout: 1493.782631s
2019-05-11T16:23:09Z ping timeout: 1336.238219s
2019-05-11T16:23:09Z ping timeout: 1493.785312s
2019-05-11T16:23:32Z New outbound peer connected: version: 70015, blocks=575558, peer=922
2019-05-11T16:23:33Z New outbound peer connected: version: 70015, blocks=575558, peer=923

Could it be related to #2784?

@jonasschnelli
Copy link
Contributor

Have you set a custom dbcache?

@hebasto
Copy link
Member Author

hebasto commented May 11, 2019

@jonasschnelli

Have you set a custom dbcache?

No custom -dbcache option. No custom -maxmempool option.

@gmaxwell
Copy link
Contributor

The ping timeouts look like they were >10 minutes after the flush. Is there something I'm missing connecting them?

@hebasto
Copy link
Member Author

hebasto commented May 17, 2019

@gmaxwell

The ping timeouts look like they were >10 minutes after the flush. Is there something I'm missing connecting them?

The ping timeout happens in ~20 minutes after the flush.

The following code was introduced in #2784 by @sipa:

bitcoin/src/net.h

Lines 44 to 45 in 277abed

/** Time after which to disconnect, after waiting for a ping response (or inactivity). */
static const int TIMEOUT_INTERVAL = 20 * 60;

bitcoin/src/net.cpp

Lines 1071 to 1075 in 277abed

else if (pnode->nPingNonceSent && pnode->nPingUsecStart + TIMEOUT_INTERVAL * 1000000 < GetTimeMicros())
{
LogPrintf("ping timeout: %fs\n", 0.000001 * (GetTimeMicros() - pnode->nPingUsecStart));
pnode->fDisconnect = true;
}

Here is additional info from the debug.log: there is a strong correlation between flushing events and the following in 20 minutes ping timeout events during IBD.

@laanwj laanwj added the P2P label Jun 5, 2019
@laanwj
Copy link
Member

laanwj commented Jun 5, 2019

I think the problem is that the P2P threads get starved while verification is in progress (esp. when there are multiple blocks in a row), so they don't get a chance to run (not even respond to pings). This is similar to issues we've had with the GUI startup.

@pstratem
Copy link
Contributor

pstratem commented Jun 6, 2019

either the flush can be made incremental or ping/pong (and maybe some others) can be handled out of order.

@hebasto
Copy link
Member Author

hebasto commented Apr 4, 2020

The following approach makes bitcoind behavior on ARMv7 much more stable:
https://gist.github.com/hebasto/9d64e1d5682b1ec46cf6981945b435df

@hebasto
Copy link
Member Author

hebasto commented Apr 16, 2020

It seems related to #18538 (comment).

@hebasto hebasto closed this as completed Apr 16, 2020
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Feb 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants