Flaky Tests on CI #2494

apruden2008 · 2024-06-14T17:23:56Z

🐛 Bug Report

Issue: Flaky Tests on CI
Severity: Medium

Description

We have observed that some tests in our CI pipeline exhibit flaky behavior, requiring multiple runs to pass. This inconsistency is affecting the reliability and efficiency of our development process.

Affected Tests

algorithms - This test often fails unpredictably, and the root cause is currently unknown.

While not explicitly mentioned, other tests may also exhibit similar behavior, requiring multiple attempts to pass.

Steps to Reproduce

Run the CI pipeline.
Observe the failure of the algorithms test (and potentially others) intermittently.
Re-run the failed tests.
Notice that the tests may pass on subsequent attempts.

Expected Behavior

All tests should pass consistently on the first run, provided that the code is correct.

Actual Behavior

The algorithms test (and potentially others) fail intermittently without any changes to the code.
These tests often require multiple attempts to pass, leading to wasted time and resources.

Impact

Decreases confidence in the CI results.
Slows down the development process due to the need for re-running tests.
Makes it difficult to identify genuine issues in the codebase.

Possible Causes

Race conditions or timing issues within the tests or the code being tested.
Environmental issues related to the CI infrastructure.
Dependencies on external services or resources that may not be consistently available.

Suggested Actions

Investigation and Diagnosis
- Conduct a thorough investigation to identify the root cause of the flakiness in the algorithms test.
- Review the test code and the associated application code for potential issues.

Test Stabilization
- Implement fixes to address any identified issues causing the flakiness.
- Ensure that tests do not have hidden dependencies on external resources or timing conditions.

Enhancement of CI Infrastructure
- Ensure that the CI environment is consistent and reliable.
- Consider introducing additional logging or diagnostics to capture more information about the failures.

Documentation and Communication
- Document the findings and the steps taken to address the flaky tests.
- Communicate any changes to the team to ensure that everyone is aware of the improvements and any new best practices.

Additional Information

Please provide any logs or additional context that might help in diagnosing the issue.
If you have observed flaky behavior in other tests, please list them here as well.

The text was updated successfully, but these errors were encountered:

zosorock · 2024-06-14T17:27:23Z

Not sure if it helps but can we upgrade to Rust 1.79.0?

vicsn · 2024-06-14T19:20:27Z

Some comments:

Don't think a Rust upgrade will help, flakiness has been an issue for a while
One frequent cause of flakiness across all crates is that parameter downloading fails - perhaps this is AWS rate-limiting
Separately from the downloads failing, indeed there seems to be too high resource usage for the algorithms crate. As this is heavily influenced by the particular environment, Provable will triage this on our own CI independently.

zosorock · 2024-06-22T16:27:58Z

Funny enough that by lowering the resource class (or perhaps a fix in one of the PRs), CI is passing now for algorithms:
https://app.circleci.com/pipelines/github/AleoNet/snarkVM/13211/workflows/44a17171-197b-4df2-95c3-58e4180b57f8/jobs/576904

apruden2008 added bug Something isn't working does not block mainnet For when we make decisions that this will not block mainnet. labels Jun 14, 2024

apruden2008 assigned apruden2008 and unassigned apruden2008 Jun 14, 2024

This was referenced Jul 31, 2024

[Fix] Cache parameter downloads and limit test parallelization #2523

Merged

Limit parallelization of test_vm_execute_and_finalize #2527

Closed

aleojohn closed this as completed in #2523 Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky Tests on CI #2494

Flaky Tests on CI #2494

apruden2008 commented Jun 14, 2024

zosorock commented Jun 14, 2024

vicsn commented Jun 14, 2024 •

edited

Loading

zosorock commented Jun 22, 2024 •

edited

Loading

Flaky Tests on CI #2494

Flaky Tests on CI #2494

Comments

apruden2008 commented Jun 14, 2024

🐛 Bug Report

Description

Affected Tests

Steps to Reproduce

Expected Behavior

Actual Behavior

Impact

Possible Causes

Suggested Actions

Additional Information

zosorock commented Jun 14, 2024

vicsn commented Jun 14, 2024 • edited Loading

zosorock commented Jun 22, 2024 • edited Loading

vicsn commented Jun 14, 2024 •

edited

Loading

zosorock commented Jun 22, 2024 •

edited

Loading