Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration test and gating config fixes #10188

Merged
merged 15 commits into from
Mar 1, 2022

Conversation

georgeee
Copy link
Member

@georgeee georgeee commented Feb 8, 2022

This is the same fix as #10229 and #10276. So many PRs were created due branch compatibility issues.

This PR:

  • Fixes peers reliability test
    • Revert "remove peer reliability"
    • Whitelist ip nets from client trustlist as known
  • Fixes some issues with gating config
    • Do not reset known private addresses
    • Fix bug in trustedPeers of gating config
  • Fixes a race condition in Bitswap implementation
    • Fix concurrent error in bitswap implementation
  • Removes some flakiness from test suite
    • Update MDNS unit test
    • Update libp2p_helper test launching
    • Don't exit silently on fatal error
  • Nits
    • Print reason of disallowed connection: rewrote peer gating logic
    • Use mdns instead of mdns_legacy

Most fixes are result of the investigation into why is peers reliability test failing. This investigation revealed that mDNS discovery upon which the test relied stopped working in kubernetes cluster after recent change of firewall policy. Change in firewall policy can not be reverted as it was dictated by some security concerns. This PR implements an alternative solution: it allows to connect to peers from the 10.0.0.0/8 network even when they are not discovered through mDNS.

Explain how you tested your changes:
Existing tests provide a good coverage.

Checklist:

  • Modified the current draft of release notes with details on what is completed or incomplete within this project
  • Document code purpose, how to use it
    • Mention expected invariants, implicit constraints
  • Tests were added for the new behavior
    • Document test purpose, significance of failures
    • Test names should reflect their purpose
  • All tests pass (CI will check this if you didn't)
  • Serialized types are in stable-versioned modules
  • Does this close issues? debug peer-reliability test #10107

@georgeee georgeee added the ci-build-me Add this label to trigger a circle+buildkite build for this branch label Feb 8, 2022
@georgeee georgeee requested review from a team as code owners February 8, 2022 20:58
@georgeee georgeee force-pushed the georgeee/fix-testnet-bootstrapping-issue branch from c5e302d to 1973a61 Compare February 9, 2022 15:09
@georgeee georgeee force-pushed the georgeee/fix-testnet-bootstrapping-issue branch 5 times, most recently from 294a4f8 to 53ed907 Compare February 17, 2022 18:31
@georgeee georgeee requested a review from a team as a code owner February 17, 2022 18:31
@georgeee georgeee removed the ci-build-me Add this label to trigger a circle+buildkite build for this branch label Feb 17, 2022
@georgeee georgeee force-pushed the georgeee/fix-testnet-bootstrapping-issue branch 2 times, most recently from 7014f21 to 4d31043 Compare February 17, 2022 19:39
@nholland94 nholland94 force-pushed the georgeee/fix-testnet-bootstrapping-issue branch from 4d31043 to cec08fb Compare February 22, 2022 23:35
@QuiteStochastic QuiteStochastic added the ci-build-me Add this label to trigger a circle+buildkite build for this branch label Feb 23, 2022
@nholland94 nholland94 changed the title [WIP] fix testnet bootstrapping issue fix testnet bootstrapping issue Feb 23, 2022
@nholland94 nholland94 force-pushed the georgeee/fix-testnet-bootstrapping-issue branch from 16e3e0d to 7515d1b Compare February 24, 2022 01:46
This reverts commit 4af19fc.

Peers reliability test was switched off temorarily because of flakiness.
Problem: known private address set is overwritten when gating config is
set.

Solution: do not overwrite known private addresses.
Problem: when a connection is refused, reason for refusal is not printed
which complicates debugging of such occasions.

Solution: rewrite connection gating logic to keep reason of
accepting/denying of connection and print it.
Problem: when logging is configured for the libp2p helper executable,
errors are ignored.

Solution: log errors of logging configuration.
Problem: log level was set for an absent logger.

Solution: remove the faulty line.
georgeee and others added 9 commits March 1, 2022 03:20
Problem: legacy implementation of mDNS service is used, although there
is no obstacle to using a newer implementation.

Solution: use newer implementation.
Problem: test is using a wrong (potentionally faulty) logic for taking
timeouts into account.

Solution: refactor MDNS test.
Problem: a deprecated env variable is declared in the helm chart.

Solution: remove unused env variable.
Problem: in k8s context mdns support is fragile and
may sometimes be an obstacle in running integration tests.

Solution: whitelist ip nets from MINA_CLIENT_TRUSTLIST
as known private ip subnets. This would ensure peers won't be discarded
when discovered via routing discovery.
Problem: libp2p_helper tests sometimes timeout without good reason for
it.

Solution: extend timeout for the test to 25 minutes.
Problem: quckicheck tests finished with "fatal error: concurrent map
read and map write".

Solution: remove unsupervised concurrent access that triggered the fatal error.
Problem: in some cases, integration tests report to pass when they
actually exited with an error.

Solution: report an error to CI so that test is marked a failure.
@georgeee georgeee force-pushed the georgeee/fix-testnet-bootstrapping-issue branch from 03857aa to e650746 Compare March 1, 2022 00:23
@georgeee georgeee force-pushed the georgeee/fix-testnet-bootstrapping-issue branch from e650746 to c28464a Compare March 1, 2022 13:00
@georgeee georgeee changed the title fix testnet bootstrapping issue Integration test and gating config fixes Mar 1, 2022
@georgeee georgeee merged commit 7ad7f83 into compatible Mar 1, 2022
@georgeee georgeee deleted the georgeee/fix-testnet-bootstrapping-issue branch March 1, 2022 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-build-me Add this label to trigger a circle+buildkite build for this branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants