Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests remain sensitive to timing when on slower hardware #174

Open
gibmat opened this issue Dec 1, 2021 · 5 comments
Open

Tests remain sensitive to timing when on slower hardware #174

gibmat opened this issue Dec 1, 2021 · 5 comments
Labels
Bug Confirmed to be a bug

Comments

@gibmat
Copy link

gibmat commented Dec 1, 2021

While working on the Debian packaging for this library, I've found that several of the different builds for various architectures fail semi-randomly because tests fail due to timing issues. The recent pull request #167 makes things much better, but there are still occasional failures when building/running on "slower" hardware like an arm box.

A good way of testing that I've found to reliably expose this issue is to build the library and run its tests on a RaspberryPi 3B that I have available locally (arm64, running Debian bullseye off a micro-SD card). Building v1.10.1 plus that cherry-picked pull request, I will typically see one or maybe two test failures -- it's not always the same test that fails, nor do they seem to fail with equal probability. I haven't taken rigorous notes on the failing tests, but some of the more frequent ones are:

TestHandover_TransferLeadership
TestRolesAdjustment_ReplaceVoter
TestRolesAdjustment_ReplaceVoterHonorFailureDomain
TestRolesAdjustment_ReplaceVoterHonorWeight
TestRolesAdjustment_ReplaceStandByHonorFailureDomains

If there's other information that I can provide to help resolve this issue, just let me know!

@MathieuBordere
Copy link

I would expect #170 and #168 to help a lot in that case too, especially the first one.

@gibmat
Copy link
Author

gibmat commented Dec 3, 2021

#170 does indeed help, although I still get random test failures on my RaspberryPi. I ran 20 builds using sbuild (so each run is in a clean, fresh environment), and only 2 runs passed all tests. The others all had at least one test failure. On my normal build server (amd64), there's absolutely no issues with the tests passing run after run.

@MathieuBordere
Copy link

I think it has to do with the tls implementation in older go versions, can you (if you have time) experiment with go version 1.17 and see if you see the same behaviour? I'm not quite sure how to go forward, maybe I won't use tls on armhf in the tests and accept it's going to be slow, or search for a faster tls library.

@gibmat
Copy link
Author

gibmat commented Dec 12, 2021

This weekend I built v1.10.2 of this library on an arm64 system (RaspberryPi 3B), running Debian unstable and golang v1.17.5. I performed 25 builds, using sbuild, and observed the following tests failing. (None of the builds had all tests pass.)

  • TestClient_Dump (11x)
  • TestClient_Transfer (9x)
  • TestClient_Transfer (1x)
  • TestHandover_TransferLeadership (2x)
  • TestIntegration_ExecBindError (2x)
  • TestIntegration_LeadershipTransfer (2x)
  • TestMembership (9x)
  • TestNew_ClusteredKvReadWrite (11x)
  • TestNew_ClusteredTimeout (11x)
  • TestNew_Default (1x)
  • TestNew_KvReadWrite (1x)
  • TestProtocol_RequestWithDynamicBuffer (21x)
  • TestRolesAdjustment_ReplaceVoterHonorFailureDomain (1x)
  • TestRolesAdjustment_ReplaceVoterHonorWeight (1x)

@EKivutha
Copy link

Assign

@MathieuBordere MathieuBordere added the Bug Confirmed to be a bug label Jun 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Confirmed to be a bug
Projects
None yet
Development

No branches or pull requests

3 participants