Tests remain sensitive to timing when on slower hardware #174

gibmat · 2021-12-01T18:05:00Z

While working on the Debian packaging for this library, I've found that several of the different builds for various architectures fail semi-randomly because tests fail due to timing issues. The recent pull request #167 makes things much better, but there are still occasional failures when building/running on "slower" hardware like an arm box.

A good way of testing that I've found to reliably expose this issue is to build the library and run its tests on a RaspberryPi 3B that I have available locally (arm64, running Debian bullseye off a micro-SD card). Building v1.10.1 plus that cherry-picked pull request, I will typically see one or maybe two test failures -- it's not always the same test that fails, nor do they seem to fail with equal probability. I haven't taken rigorous notes on the failing tests, but some of the more frequent ones are:

TestHandover_TransferLeadership
TestRolesAdjustment_ReplaceVoter
TestRolesAdjustment_ReplaceVoterHonorFailureDomain
TestRolesAdjustment_ReplaceVoterHonorWeight
TestRolesAdjustment_ReplaceStandByHonorFailureDomains

If there's other information that I can provide to help resolve this issue, just let me know!

The text was updated successfully, but these errors were encountered:

MathieuBordere · 2021-12-02T07:18:43Z

I would expect #170 and #168 to help a lot in that case too, especially the first one.

gibmat · 2021-12-03T19:11:51Z

#170 does indeed help, although I still get random test failures on my RaspberryPi. I ran 20 builds using sbuild (so each run is in a clean, fresh environment), and only 2 runs passed all tests. The others all had at least one test failure. On my normal build server (amd64), there's absolutely no issues with the tests passing run after run.

MathieuBordere · 2021-12-07T11:14:06Z

I think it has to do with the tls implementation in older go versions, can you (if you have time) experiment with go version 1.17 and see if you see the same behaviour? I'm not quite sure how to go forward, maybe I won't use tls on armhf in the tests and accept it's going to be slow, or search for a faster tls library.

gibmat · 2021-12-12T19:18:57Z

This weekend I built v1.10.2 of this library on an arm64 system (RaspberryPi 3B), running Debian unstable and golang v1.17.5. I performed 25 builds, using sbuild, and observed the following tests failing. (None of the builds had all tests pass.)

TestClient_Dump (11x)
TestClient_Transfer (9x)
TestClient_Transfer (1x)
TestHandover_TransferLeadership (2x)
TestIntegration_ExecBindError (2x)
TestIntegration_LeadershipTransfer (2x)
TestMembership (9x)
TestNew_ClusteredKvReadWrite (11x)
TestNew_ClusteredTimeout (11x)
TestNew_Default (1x)
TestNew_KvReadWrite (1x)
TestProtocol_RequestWithDynamicBuffer (21x)
TestRolesAdjustment_ReplaceVoterHonorFailureDomain (1x)
TestRolesAdjustment_ReplaceVoterHonorWeight (1x)

EKivutha · 2022-01-29T07:11:25Z

Assign

MathieuBordere added the Bug Confirmed to be a bug label Jun 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests remain sensitive to timing when on slower hardware #174

Tests remain sensitive to timing when on slower hardware #174

gibmat commented Dec 1, 2021

MathieuBordere commented Dec 2, 2021

gibmat commented Dec 3, 2021

MathieuBordere commented Dec 7, 2021

gibmat commented Dec 12, 2021

EKivutha commented Jan 29, 2022

Tests remain sensitive to timing when on slower hardware #174

Tests remain sensitive to timing when on slower hardware #174

Comments

gibmat commented Dec 1, 2021

MathieuBordere commented Dec 2, 2021

gibmat commented Dec 3, 2021

MathieuBordere commented Dec 7, 2021

gibmat commented Dec 12, 2021

EKivutha commented Jan 29, 2022