Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests and guards for failure cases #8

Closed
wants to merge 7 commits into from

Conversation

kumavis
Copy link

@kumavis kumavis commented Nov 16, 2016

pull streams emit 'true' or an error on drain.
( see https://github.com/pull-stream/pull-stream#source-aka-readable )

I was getting a case where the source was unexpectedly ending, and 'true' was being passed up as the error. decodeFromReader seemed like the correct place to handle this case, as it is the end of pull streams and the start of a error-first-cb api.

I fixed that issue and added some additional protections.

note: the invalid prefix test is skipped until resolution of this issue: dominictarr/pull-reader#5

@dignifiedquire
Copy link
Owner

Thank you @kumavis :) will wait for response in the pull-reader issue for merging this.

@kumavis
Copy link
Author

kumavis commented Nov 16, 2016

just noticed the lint errors, addressing

olizilla added a commit to libp2p/js-libp2p-switch that referenced this pull request May 30, 2018
Pull streams pass true in the error position when the sream ends.
In https://github.com/multiformats/js-multistream-select/blob/5b19358b91850b528b3f93babd60d63ddcf56a99/src/select.js#L18-L21
...we're getting lots of instances of pull-length-prefixed stream
erroring early with `true` and it's passed back up to the dialer
in https://github.com/libp2p/js-libp2p-switch/blob/fef2d11850379a4720bb9c736236a81a067dc901/src/dial.js#L238-L241

The `_createMuxedConnection` contains an assumption that any error
that occurs when trying `_attemptMuxerUpgrade` is ok, and keeps the
relveant baseConnecton in the cache. If the pull-stream has ended
unexpectedly then keeping the connection arround starts causing
the "already piped" errors when we try and use the it later.

This PR adds a guard to avoid putting the connection back into the
cache if the stream has ended.

There is related work in an old PR to add a check for exactly this issue in
pull-length-prefixed dignifiedquire/pull-length-prefixed#8
...but it's still open, so this PR adds a check for true in
the error position at the site where the "already piped" errors
were appearing. Once the PR on pull-length-prefixed is merged this
check can be removed. It's not ideal to have it in this code as it
is far removed from the source, but it fixes the issue for now.

Arguably anywhere that `msDialer.handle` is called should do the
same check, but we're not seeing this error occur anywhere else so
to keep this PR small, I've left it as the minimal changeset to
fix the issue.

Of note, we had to add '/dns4/ws-star.discovery.libp2p.io/tcp/443/wss/p2p-websocket-star'
to the swarm config to trigger the "already piped" errors. There
is a minimal test app here https://github.com/tableflip/js-ipfs-already-piped-error

Manual testing shows ~50 streams fail in the first 2 mins of
running a node, and then things stabalise with ~90 active muxed
connections after that.

Fixes #235
Fixes ipfs/js-ipfs#1366
See dignifiedquire/pull-length-prefixed#8

License: MIT
Signed-off-by: Oli Evans <oli@tableflip.io>
@olizilla
Copy link

This just turned up in a hard to debug issue in libp2p-switch. Looks like the suggestions in dominictarr/pull-reader#5 are agreed, but it's waiting on someone to do the work. Could we get this PR merged in the meantime?

jacobheun pushed a commit to libp2p/js-libp2p-switch that referenced this pull request May 31, 2018
Pull streams pass true in the error position when the sream ends.
In https://github.com/multiformats/js-multistream-select/blob/5b19358b91850b528b3f93babd60d63ddcf56a99/src/select.js#L18-L21
...we're getting lots of instances of pull-length-prefixed stream
erroring early with `true` and it's passed back up to the dialer
in https://github.com/libp2p/js-libp2p-switch/blob/fef2d11850379a4720bb9c736236a81a067dc901/src/dial.js#L238-L241

The `_createMuxedConnection` contains an assumption that any error
that occurs when trying `_attemptMuxerUpgrade` is ok, and keeps the
relveant baseConnecton in the cache. If the pull-stream has ended
unexpectedly then keeping the connection arround starts causing
the "already piped" errors when we try and use the it later.

This PR adds a guard to avoid putting the connection back into the
cache if the stream has ended.

There is related work in an old PR to add a check for exactly this issue in
pull-length-prefixed dignifiedquire/pull-length-prefixed#8
...but it's still open, so this PR adds a check for true in
the error position at the site where the "already piped" errors
were appearing. Once the PR on pull-length-prefixed is merged this
check can be removed. It's not ideal to have it in this code as it
is far removed from the source, but it fixes the issue for now.

Arguably anywhere that `msDialer.handle` is called should do the
same check, but we're not seeing this error occur anywhere else so
to keep this PR small, I've left it as the minimal changeset to
fix the issue.

Of note, we had to add '/dns4/ws-star.discovery.libp2p.io/tcp/443/wss/p2p-websocket-star'
to the swarm config to trigger the "already piped" errors. There
is a minimal test app here https://github.com/tableflip/js-ipfs-already-piped-error

Manual testing shows ~50 streams fail in the first 2 mins of
running a node, and then things stabalise with ~90 active muxed
connections after that.

Fixes #235
Fixes ipfs/js-ipfs#1366
See dignifiedquire/pull-length-prefixed#8

License: MIT
Signed-off-by: Oli Evans <oli@tableflip.io>
@jacobheun
Copy link
Collaborator

@kumavis if you rebase with master you should be able to add the skipped test in for this. pull-reader ^1.3.0 should have the needed fix.

@jacobheun
Copy link
Collaborator

@dignifiedquire we should be able to close this in favor of the rebased version #16

@jacobheun
Copy link
Collaborator

Merged via #16

@jacobheun jacobheun closed this Mar 6, 2019
jacobheun added a commit to jacobheun/js-libp2p that referenced this pull request Jul 29, 2019
* chore: update contributors

* chore: release version v0.29.0

* fix: move emitters to last thing in the method (libp2p#218)

* fix: move emitters to last thing in the method

* fix: setImmediate everything

* chore: update contributors

* chore: release version v0.29.1

* fix: move 'pull-stream' from devDependencies to dependencies (libp2p#220)

'pull-stream' package is needed in dependencies because it is used in './src/limit-dialer/queue.js'.

* chore: update deps

* chore: update contributors

* chore: release version v0.29.2

* feat: dial to PeerId and/or Multiaddr in addition to PeerInfo (libp2p#222)

* chore: update deps

* feat: support dial to peerId and/or multiaddr in adition to peerInfo

* chore: update CI

* chore: update contributors

* chore: release version v0.30.0

* chore: no sauce

* chore: update deps

* chore: update contributors

* chore: release version v0.31.0

* fix: use the right callback

* chore: update deps

* chore: update contributors

* chore: release version v0.31.1

* feat: increase maxListeners to Infinity (libp2p#226)

* feat: increase maxListeners to Infinity

ipfs/js-ipfs-bitswap#142 (comment)

* fix linting

* chore: update deps

* chore: update contributors

* chore: release version v0.31.2

* feat: p2p addrs situation (libp2p#229)

* chore: update gitignore and CI

* chore: update deps

* test: update tests to new p2p-webrtc-star multiaddr format

* chore: update contributors

* chore: release version v0.32.0

* chore: update deps

* chore: update contributors

* chore: release version v0.32.1

* fix: remove unused protocol-buffers dep (libp2p#230)

* chore: update contributors

* chore: release version v0.32.2

* chore: update deps

* chore: update contributors

* chore: release version v0.32.3

* chore: update deps

* fix: increase dial timeout

* chore: update contributors

* chore: release version v0.32.4

* feat: Circuit Relay (libp2p#224)

* chore: update deps

* chore: update contributors

* chore: release version v0.33.0

* fix: don't dial on relay if not enabled (libp2p#234)

* chore: update deps

* chore: fix package.json

* chore: update contributors

* chore: release version v0.33.1

* chore: update deps

* fix: don't dial circuit if no transports available (libp2p#236)

* chore: update contributors

* chore: release version v0.33.2

* fix: circuit dialing

* feat: fix circuit dialing

* chore: upgrade deps

* chore: update circle ci config

* chore: adding missing dev dependency

* fix: removing unused dependency

* test: adding tests

* fix: remove unused dep

* chore: updating CI files (libp2p#238)

* chore: update contributors

* chore: release version v0.34.0

* chore: use latest SECIO API

* chore: update deps

* feat: use latest secio API

* chore: update deps

* chore: update contributors

* chore: release version v0.35.0

* chore: update deps

* chore: update contributors

* chore: release version v0.35.1

* docs: update name references and API touches

* chore: update name references

* refactor: update name to switch, make it a class and rename start and stop methods

* test: refactor tcp transport tests to avoid code duplication

* test: reuse same test code for Websockets, remove code duplication

* test: update aegir pre and post hooks

* chore: use pre-push instead

* test: update and deduplicate code on stream muxing tests

* test: restructure test suits

* test: refactor swarm-no-muxing tests

* test: refactor circuit-relay tests

* test: refactor browser tests too

* style: fix linting

* fix: enableCircuitRelay is async and therefore needs a callback

* fix: transports.add does not need to be async at all

* docs: fix badges

* test: Linux does not like that we use multiple sockets with port 0

* test: fix test

* chore: update contributors

* chore: release version v0.36.0

* chore: update deps

* chore: update contributors

* chore: release version v0.36.1

* feat: use mplex, update CI

* docs: typo

* feat: observe traffic and expose statistics (libp2p#243)

* chore: update deps

* chore: update contributors

* chore: release version v0.37.0

* fix: for when handler func is not defined

* fix: for when peerinfo resolves to undefined

* chore: update contributors

* chore: release version v0.37.1

* chore: update deps

* chore: update contributors

* chore: release version v0.37.2

* fix: one more observer edge case

* chore: update deps

* chore: fix linting

* test: fix transport tests before all step by increasing the timeout

* chore: update contributors

* chore: release version v0.37.3

* chore: update deps

Chore which i think fixes this issue also
https://github.com/libp2p/js-libp2p-switch/issues/235

* fix: revert version back to the current release

fix for https://github.com/libp2p/js-libp2p-switch/pull/249/files#r178832198

* chore: update deps

* chore: update deps

* chore: update contributors

* chore: update deps

* test: timeout

* chore: update contributors

* chore: release version v0.39.0

* chore: update deps

* chore: update contributors

* chore: update deps

* chore: update contributors

* chore: release version v0.39.2

* feat: improve circuit err messages (libp2p#250)

* feat: improve circuit err handling

* feat: add test to to validate err when circuit not enabled

* refactor: update files and add jsdocs to improve readability

refactor: initial refactor of dial.js

refactor: add more jsdocs to dial and clean up some code

refactor: make get-peer-info more readable

fix: jsdocs in dial

docs: update some jsdocs

refactor: make dial.js a bit easier to consume

fix: fix linting

docs: add more jsdocs and comments

refactor: clean up dial methods and encryption order

* test: add tests for get-peer-info

* docs: remove answered todo comment

answered at libp2p/js-libp2p-switch#252 (comment)

* fix: dont create base conn when muxed exists

* fix: tests and conflicts

* chore: update deps

* chore: update contributors

* chore: release version v0.40.0

* test: fix require of multiplex

* fix: libp2p#189 Prevent self-dial

* test: add selfdial test

* chore: add lead maintainer

* chore: update contributors

* chore: update contributors

* chore: release version v0.40.1

* fix: return on call to nextMuxer

When the call to multistream.Dialer.select is unsuccessful, call nextMuxer to try select the next one in the list but do not continue executing callback afterwards.

License: MIT
Signed-off-by: Alan Shaw <alan@tableflip.io>

* fix: drop connection when stream ends unexpectedly

Pull streams pass true in the error position when the sream ends.
In https://github.com/multiformats/js-multistream-select/blob/5b19358b91850b528b3f93babd60d63ddcf56a99/src/select.js#L18-L21
...we're getting lots of instances of pull-length-prefixed stream
erroring early with `true` and it's passed back up to the dialer
in https://github.com/libp2p/js-libp2p-switch/blob/fef2d11850379a4720bb9c736236a81a067dc901/src/dial.js#L238-L241

The `_createMuxedConnection` contains an assumption that any error
that occurs when trying `_attemptMuxerUpgrade` is ok, and keeps the
relveant baseConnecton in the cache. If the pull-stream has ended
unexpectedly then keeping the connection arround starts causing
the "already piped" errors when we try and use the it later.

This PR adds a guard to avoid putting the connection back into the
cache if the stream has ended.

There is related work in an old PR to add a check for exactly this issue in
pull-length-prefixed dignifiedquire/pull-length-prefixed#8
...but it's still open, so this PR adds a check for true in
the error position at the site where the "already piped" errors
were appearing. Once the PR on pull-length-prefixed is merged this
check can be removed. It's not ideal to have it in this code as it
is far removed from the source, but it fixes the issue for now.

Arguably anywhere that `msDialer.handle` is called should do the
same check, but we're not seeing this error occur anywhere else so
to keep this PR small, I've left it as the minimal changeset to
fix the issue.

Of note, we had to add '/dns4/ws-star.discovery.libp2p.io/tcp/443/wss/p2p-websocket-star'
to the swarm config to trigger the "already piped" errors. There
is a minimal test app here https://github.com/tableflip/js-ipfs-already-piped-error

Manual testing shows ~50 streams fail in the first 2 mins of
running a node, and then things stabalise with ~90 active muxed
connections after that.

Fixes libp2p#235
Fixes ipfs/js-ipfs#1366
See dignifiedquire/pull-length-prefixed#8

License: MIT
Signed-off-by: Oli Evans <oli@tableflip.io>

* fix: add utility methods to prevent already piped error

* chore: update contributors

* chore: release version v0.40.2

* fix: prevent undefined error during a mutual hangup

* chore: update contributors

* chore: release version v0.40.3

* feat: swap quick-lru by hashlru

This removes the only dependency using generators in the ipfs/libp2p ecosystem.
Next version of create-react-app will support ipfs out-of-box with this change.

* chore: update contributors

* chore: release version v0.40.4

* fix: stats - observer expects protocolTag

* fix: re-enable stats tests in node

* chore: Upgrade big.js to 5.1.2

* chore: Change require('big.js') to require('big.js').Big

* chore: update contributors

* chore: release version v0.40.5

* fix: no stats on multistream proto dial

* fix: adjust test values

* fix: handle error in protocol handshake

* chore: update contributors

* chore: release version v0.40.6

* chore: remove travis and circleci

* Add private network support (libp2p#266)

* feat: add support for private networks

fix: update protector.protect usage
chore: fix linting and update deps
test: add secio to pnet tests
docs: add private network info the readme
chore: update pnet package version
test: add skipped test back in and update it

* fix: improve erroring around invalid peers

docs: add some comments
chore: update deps
test: simplify identify test

* chore: update contributors

* chore: release version v0.40.7

* test: add sample network circuit relay tests (libp2p#275)

* test: add sample network circuit relay tests

* test: use ephemeral ports

* chore: update deps

chore: remove test pre-push
chore: update test ports

* chore: update contributors

* chore: release version v0.40.8

* chore: update mplex and stats test numbers

* feat: make switch a state machine (libp2p#278)

* feat: add basic state machine functionality to switch

* feat: make connections state machines

* refactor: clean up logs

* feat: add dialFSM to the switch

* feat: add better support for closing connections

* test: add tests for some uncovered lines

* feat: add warning emitter for muxer upgrade failed

* docs: update readme

* chore: update contributors

* chore: release version v0.41.0

* fix: ignore dial request when one is in progress (libp2p#283)

* chore: update contributors

* chore: release version v0.41.1

* fix: improve connection closing and error handling (libp2p#285)

* fix: improve connection closing and error handling

* test: improve identify test

*  chore: update deps

* fix: only emit from connections if there is a listener

* test: add more connection tests

* chore: update libp2p-mplex

* fix: dont dial an address that we have

* fix: ensure circuit listens last on start

* chore: update npm publish files

* chore: update contributors

* chore: release version v0.41.2

* fix: use retimer to avoid creating so many timers (libp2p#289)

* use retimer to avoid scheduling so many timers

* Fixed linting

* fix: improve connection tracking and closing (libp2p#291)

* chore: update deps

* fix: check we have a proper transport before filtering addresses

* fix: improve connection close on stop

* fix: improve stat stopping

* test: fix stats test

* fix: improve tracking of open connections

* chore: remove log

* fix: stats stop in browser

chore: fix linting and browser tests

* fix: remove uneeded set peer info

* fix: abort the base connection on close

* fix: catch edge cases of dialTimeout calling back twice

* fix: close all connections instead of checking peerbook peers

* test: update dial fsm test waits

* test: make parallel dial tests deterministic

fix: improve logic around disconnecting

fix: remove duplicate event handling logic

* chore: fix lint

* test: improve test reliability

* chore: update contributors

* chore: release version v0.41.3

* refactor: stat use for over forEach (libp2p#295)

forEach is 10x slower than a regular for(;;) loop, and it should
be avoided in hot code paths.

* fix: avoid sync callback in async functions (libp2p#297)

* fix: avoid sync callback in async functions

* test: add error check

* refactor: clean up async usage

* chore: clean up

* refactor: remove async waterfall usage on identify

* chore: fix linting

* chore: update contributors

* chore: release version v0.41.4

* fix: peerBook undefined libp2p#299

* fix: reduce bundle size (libp2p#292)

* fix: reduce bundle size

* fix: use bignumber everywhere

* chore: update deps

* chore: update contributors

* chore: release version v0.41.5

* fix: import async/setImmediate to avoid webpack errors (libp2p#303)

* test: add pull-mplex to test suite (libp2p#305)

* chore: use travis
* chore: update dependencies

* fix: dial in series until we have proper abort support (libp2p#306)

refactor: simplify the circuit dial logic

chore: remove travis windows cache

refactor: clean up dial many error logic

test: explicitly set correct address

test(refactor): update order of echo logic and add after

refactor: cleanup per feedback

* chore: update contributors

* chore: release version v0.41.6

* fix: peer disconnect event and improve logging performance (libp2p#309)

* fix: only emit disconnects from muxed conns

* fix: update disconnect logic

* chore: clean up logging to prevent unneeded string formatting

* chore: fix spelling

* chore: update contributors

* chore: release version v0.41.7

* feat: add basic dial queue to avoid many connections to peer (libp2p#310)

BREAKING CHANGE: This adds a very basic dial queue peer peer.
This will prevent multiple, simultaneous dial requests to the same
peer from creating multiple connections. The requests will be queued
per peer, and will leverage the same connection when possible.
The breaking change here is that `.dial`, will no longer return a
connection. js-libp2p, circuit relay, and kad-dht, which use `.dial`
were not using the returned connection. So while this is a breaking change
it should not break the existing libp2p stack. If custom applications
are leveraging the returned connection, they will need to convert to only
using the connection returned via the callback.

* chore: dont log priviatized unless it actually happened
* refactor: only get our addresses for filtering once

* feat: update identify to include supported protocols (libp2p#311)

* chore: update contributors

* chore: release version v0.42.0

* fix: ensure dials always use the latest PeerInfo from the PeerBook (libp2p#312)

* fix: ensure dials always use the latest PeerInfo from the PeerBook

This fixes an issue where if dial is called with a new instance
of PeerInfo, if it is the first dial to that peer, the queue was
forever associated with that instance. This is currently the case
when Circuit checks the HOP status of a potential relay. This ensures
that whenever we dial, we are updating the peer book and using the
latest PeerInfo in that dial request.

* test: add test for get peer info

* refactor: just use id with dialer queue

* chore: update contributors

* chore: release version v0.42.1

* fix: identify on dial (libp2p#313)

* chore: update contributors

* chore: release version v0.42.2

* feat: global dial queue (libp2p#314)

* feat: add a general queue to limit all dials

* fix: improve queue count logic and add better abort

* feat: add a basic blacklist

* fix: abort dial queue on error instead of stop

* feat: add a crude priority lane

* test: add test for blacklist error

* fix: make blacklist and max dials configurable

* refactor: blacklist after callback

* test: improve testings around blacklisting

* chore: update contributors

* chore: release version v0.42.3

* fix: improve dial queue and parallel dials (libp2p#315)

* feat: allow dialer queues to do many requests to a peer

* fix: parallel dials and validate cancelled conns

* feat: make dial timeout configurable

* fix: allow already connected peers to dial immediately

* refactor: add dial timeout to consts file

* fix: keep better track of in progress queues

* refactor: make dials race

* chore: update contributors

* chore: release version v0.42.4

* feat: limit the number of cold calls we can do (libp2p#316)

* feat: limit the number of cold calls we can do

* feat: add a backoff to blacklisting

* refactor: make cold calls configurable

* fix: make blacklist duration longer

* fix: improve blacklisting

* test: add some tests for queue

* feat: add jitter to blacklist ttl

* test: validate cold queue is removed

* feat: purge old queues every hour

* test: fix aegir post script node shutdown

* fix: abort the cold call queue on manager abort

* fix: improve queue cleanup and lower interval to 15 mins

* fix: improve connection tracking (libp2p#318)

* fix: centralize connection events and peer connects

* fix: remove unneeded peerBook put

* chore: update contributors

* chore: release version v0.42.5

* fix: dont blacklist good peers (libp2p#319)

* fix: revert to try each (libp2p#320)

* chore: update contributors

* chore: release version v0.42.6

* fix: missing queue (libp2p#323)

* fix: improve stopping logic (libp2p#324)

* chore: update contributors

* chore: release version v0.42.7

* chore: add discourse badge (libp2p#327)

* fix: dial self (libp2p#329)

* feat: support a priority queue for dials (libp2p#325)

* chore: update contributors

* chore: release version v0.42.8

* fix: dont compare empty strings (libp2p#330)

* chore: update contributors

* chore: release version v0.42.9

* fix: resolve transport sort order in browsers (libp2p#333)

* fix: resolve transport sort order in browsers

* fix: update sort logic

* fix: dont use peerinfo distinct (libp2p#334)

* fix: dont use peerinfo distinct

* refactor: remove unneeded code

* refactor: clean up

* refactor: fix feedback

* chore: update contributors

* chore: release version v0.42.10

* fix(stats): prevent 0ms timeDiff breaking movingAverage (libp2p#336)

* stats - stat - prevent 0ms timeDiff breaking movingAverage

* chore: remove commitlint

* chore: update contributors

* chore: release version v0.42.11

* fix: dont blindly add observed addresses to our list (libp2p#337)

Until we can properly validate the observed address our
peer tells us about, we shouldnt blindly add it to our
address list. Until we have better NAT management we cant
reliably validate that we're adding an appropriate address
for ourselves.

* fix: clear blacklist for peer when connection is established (libp2p#340)

* chore: update contributors

* chore: release version v0.42.12

* refactor: move switch into src/switch

* refactor: cleanup switch and move tests into test dir
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants