Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(core-p2p): frequent peer timeout errors #2363

Merged
merged 3 commits into from Apr 4, 2019

Conversation

Projects
None yet
4 participants
@supaiku0
Copy link
Contributor

supaiku0 commented Apr 3, 2019

Proposed changes

Each request goes through the accept-request handler which waits for acceptNewPeer to complete before returning a response. If peer A makes a request to peer B and peer B does not know peer A, it will try to ping peer A in acceptNewPeer. Now depending on different factors peer A might not respond so the ping runs into a timeout. When this happens the request of peer A runs into a timeout as well, since peer B did not respond fast enough, because peer A didn't reply to the ping. Afterwards peer B will suspend peer A since acceptNewPeer failed. The next time peer A makes a request peer B will not try to ping it again, thus the request completes without running into a timeout.

This can be easily observed when running a relay behind a firewall.

For example:

2019-03-27 00:02:34][DEBUG]: Could not accept new peer 5.196.105.32:4001: Error: Peer 5.196.105.32: could not get status response
[2019-03-27 00:02:34][DEBUG]: Suspended 5.196.105.32 for 2 minutes because of "Timeout"
[2019-03-27 00:02:34][DEBUG]: Request to http://5.196.105.38:4001/peer/status failed: timeout of 3000ms exceeded
[2019-03-27 00:02:34][DEBUG]: Request to http://178.32.65.139:4001/peer/status failed: timeout of 3000ms exceeded
[2019-03-27 00:02:34][DEBUG]: Request to http://5.196.105.39:4001/peer/status failed: timeout of 3000ms exceeded
[2019-03-27 00:02:34][DEBUG]: Request to http://5.196.105.37:4001/peer/status failed: timeout of 3000ms exceeded
[2019-03-27 00:02:34][DEBUG]: Request to http://178.32.65.143:4001/peer/status failed: timeout of 3000ms exceeded
[2019-03-27 00:02:34][DEBUG]: Request to http://178.32.65.137:4001/peer/status failed: timeout of 3000ms exceeded
[2019-03-27 00:02:34][DEBUG]: Request to http://178.32.65.141:4001/peer/status failed: timeout of 3000ms exceeded
[2019-03-27 00:02:34][DEBUG]: Request to http://178.32.65.136:4001/peer/status failed: timeout of 3000ms exceeded
[2019-03-27 00:02:34][DEBUG]: Request to http://178.32.65.140:4001/peer/status failed: timeout of 3000ms exceeded
[2019-03-27 00:02:34][DEBUG]: Request to http://178.32.65.142:4001/peer/status failed: timeout of 3000ms exceeded
[2019-03-27 00:02:34][DEBUG]: Request to http://178.32.65.138:4001/peer/status failed: timeout of 3000ms exceeded
[2019-03-27 00:02:34][DEBUG]: Request to http://54.36.121.115:4001/peer/status failed: timeout of 3000ms exceeded
[2019-03-27 00:02:34][DEBUG]: Request to http://46.105.160.106:4001/peer/status failed: timeout of 3000ms exceeded
[2019-03-27 00:02:34][DEBUG]: Request to http://54.38.120.34:4001/peer/status failed: timeout of 3000ms exceeded
[2019-03-27 00:02:34][DEBUG]: Request to http://151.80.125.38:4001/peer/status failed: timeout of 3000ms exceeded
[2019-03-27 00:02:34][DEBUG]: Request to http://45.32.238.137:4001/peer/status failed: timeout of 3000ms exceeded
[2019-03-27 00:02:34][DEBUG]: Request to http://51.255.105.53:4001/peer/status failed: timeout of 3000ms exceeded
[2019-03-27 00:02:34][DEBUG]: Request to http://54.38.120.36:4001/peer/status failed: timeout of 3000ms exceeded
[2019-03-27 00:02:34][DEBUG]: Request to http://46.105.160.104:4001/peer/status failed: timeout of 3000ms exceeded

The fix is to call acceptNewPeer non-blocking for GET requests which is fine, since the result is only used for POST requests.

While testing I also noticed that buildPeers is broken, which silently failed because it was swallowing thrown errors.

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Refactoring (improve a current implementation without adding a new feature or fixing a bug)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Build (changes that affect the build system)
  • Docs (documentation only changes)
  • Test (adding missing tests or fixing existing tests)
  • Other... Please describe:

Checklist

  • I have read the CONTRIBUTING documentation
  • Lint and unit tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Apr 3, 2019

Codecov Report

Merging #2363 into develop will decrease coverage by 0.05%.
The diff coverage is 0%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2363      +/-   ##
===========================================
- Coverage    66.04%   65.99%   -0.06%     
===========================================
  Files          400      400              
  Lines         8512     8519       +7     
  Branches       377      417      +40     
===========================================
  Hits          5622     5622              
- Misses        2847     2853       +6     
- Partials        43       44       +1
Impacted Files Coverage Δ
...ages/core-p2p/src/server/plugins/accept-request.ts 0% <0%> (ø) ⬆️
packages/core-p2p/src/monitor.ts 38.65% <0%> (-0.5%) ⬇️
...s/core-container/src/config/loaders/file-loader.ts 18.18% <0%> (-2.51%) ⬇️
packages/core-logger-winston/src/formatter.ts 42.85% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 52b229b...c794595. Read the comment docs.

@faustbrian faustbrian merged commit 39da913 into ArkEcosystem:develop Apr 4, 2019

10 checks passed

ci/circleci: test-node10-functional Your tests passed on CircleCI!
Details
ci/circleci: test-node10-integration-0 Your tests passed on CircleCI!
Details
ci/circleci: test-node10-integration-1 Your tests passed on CircleCI!
Details
ci/circleci: test-node10-integration-2 Your tests passed on CircleCI!
Details
ci/circleci: test-node10-unit Your tests passed on CircleCI!
Details
ci/circleci: test-node11-functional Your tests passed on CircleCI!
Details
ci/circleci: test-node11-integration-0 Your tests passed on CircleCI!
Details
ci/circleci: test-node11-integration-1 Your tests passed on CircleCI!
Details
ci/circleci: test-node11-integration-2 Your tests passed on CircleCI!
Details
ci/circleci: test-node11-unit Your tests passed on CircleCI!
Details

@faustbrian faustbrian referenced this pull request Apr 5, 2019

Merged

refactor(core-p2p): separate responsibilities of classes #2364

19 of 19 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.