Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

comm: fix flakes in test_pingpong #2301

Merged
merged 1 commit into from Mar 13, 2020
Merged

Conversation

@benesch
Copy link
Member

benesch commented Mar 13, 2020

When a comm channel closes, the switchboard attempts to recycle the
underlying TCPStream for use by a future comm channel. Recycling
involves calling peer_addr() on this TCPStream so that the
connection can be sorted into the correct pool.

If the peer closes the connection before the call to peer_addr(), the
call to peer_addr() will fail with an error like "transport endpoint
is not connected." We would previously panic upon receiving this error,
which caused occasional flakes in test_pingpong in CI (see #2291).

This wasn't a problem in actual deployments of Materialize, as all
workers are long-lived and no peers ever attempt to terminate
connections.

This patch allows the comm::Connection::addr method to return an error,
and the connection pool simply declines to recycle a connection if
retriving its peer address results in an error.

Fix #2291.

Copy link
Member

frankmcsherry left a comment

This seems totally reasonable to me. Thanks!

.as_pathname()
.unwrap()
.to_path_buf()
.map(|a| a.as_pathname().unwrap().to_path_buf())

This comment has been minimized.

Copy link
@ruchirK

ruchirK Mar 13, 2020

Member

I know this isnt the context where we found the error but should we use expect / check for error instead of this call to unwrap too?

This comment has been minimized.

Copy link
@frankmcsherry

frankmcsherry Mar 13, 2020

Member

I'm a fan of this too. I tried to look up how as_pathname() could fail, couldn't figure it out in a few minutes, walked away. But, an expect("something") would at least document what is up locally so that readers don't need to wonder.

This comment has been minimized.

Copy link
@benesch

benesch Mar 13, 2020

Author Member

Ack, rewritten with an explanatory comment about why we expect as_pathname to return Some.

When a comm channel closes, the switchboard attempts to recycle the
underlying `TCPStream` for use by a future comm channel. Recycling
involves calling `peer_addr()` on this `TCPStream` so that the
connection can be sorted into the correct pool.

If the peer closes the connection before the call to `peer_addr()`, the
call to `peer_addr()` will fail with an error like "transport endpoint
is not connected." We would previously panic upon receiving this error,
which caused occasional flakes in test_pingpong in CI (see #2291).

This wasn't a problem in actual deployments of Materialize, as all
workers are long-lived and no peers ever attempt to terminate
connections.

This patch allows the comm::Connection::addr method to return an error,
and the connection pool simply declines to recycle a connection if
retriving its peer address results in an error.

Fix #2291.
@benesch benesch force-pushed the benesch:comm-flake branch from 3f60908 to 27b586c Mar 13, 2020
@benesch benesch merged commit bf77de5 into MaterializeInc:master Mar 13, 2020
14 checks passed
14 checks passed
buildkite/tests Build #5920 passed (16 minutes, 22 seconds)
Details
buildkite/tests/bath-lint-and-rustfmt Passed (1 minute, 11 seconds)
Details
buildkite/tests/bulb-bulb-full-sql-logic-tests Passed (0 seconds)
Details
buildkite/tests/bulb-short-sql-logic-tests Passed (1 minute, 17 seconds)
Details
buildkite/tests/cargo-test Passed (53 seconds)
Details
buildkite/tests/docker-build Passed (11 minutes, 35 seconds)
Details
buildkite/tests/face-with-monocle-miri-test Passed (2 minutes, 4 seconds)
Details
buildkite/tests/metabase-demo Passed (2 minutes, 7 seconds)
Details
buildkite/tests/paperclip-clippy-and-doctests Passed (5 minutes)
Details
buildkite/tests/pipeline Passed (15 seconds)
Details
buildkite/tests/racing-car-testdrive Passed (4 minutes, 29 seconds)
Details
buildkite/tests/shower-streaming-demo Passed (1 minute, 35 seconds)
Details
license/cla Contributor License Agreement is signed.
Details
netlify/materializeinc/deploy-preview Deploy preview canceled.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

3 participants
You can’t perform that action at this time.