Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically retry failed MIRI runs to work around intermittent failures #922

Merged
merged 2 commits into from
Nov 6, 2021

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Nov 5, 2021

Which issue does this PR close?

Closes #879

Rationale for this change

We are seeing intermittent failures while running MIRI checks on CI that are not reproducible locally and whose symptoms are consistent with miri being killed by the OOM killer (for example, perhaps github has over provisioned their runners so that the upper memory limit is not consistent). See here for details

When we manually rerun the MIRI run it often passes

What changes are included in this PR?

Automatically re-run MIRI up to 5 times looking for a clean run

Are there any user-facing changes?

@codecov-commenter
Copy link

Codecov Report

Merging #922 (2ca81ed) into master (bb05b00) will increase coverage by 0.00%.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #922   +/-   ##
=======================================
  Coverage   82.30%   82.30%           
=======================================
  Files         168      168           
  Lines       48028    48028           
=======================================
+ Hits        39529    39530    +1     
+ Misses       8499     8498    -1     
Impacted Files Coverage Δ
parquet/src/encodings/encoding.rs 93.71% <0.00%> (+0.19%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bb05b00...2ca81ed. Read the comment docs.

@alamb
Copy link
Contributor Author

alamb commented Nov 5, 2021

First run passed but also doesn't appear that the retry kicked in.
https://github.com/apache/arrow-rs/runs/4116270188?check_suite_focus=true

Running again

@alamb
Copy link
Contributor Author

alamb commented Nov 5, 2021

https://github.com/apache/arrow-rs/runs/4117071438?check_suite_focus=true also passed, but no retry kicked in. Will try again

@alamb
Copy link
Contributor Author

alamb commented Nov 5, 2021

@alamb
Copy link
Contributor Author

alamb commented Nov 5, 2021

I want to see at least one PR where MIRI doesn't pass and is rerun to success prior to saying this is ready to merge

@alamb
Copy link
Contributor Author

alamb commented Nov 5, 2021

OOh -- here is a run where MIRI crashed the first time but succeeded the second: https://github.com/apache/arrow-rs/runs/4118067989?check_suite_focus=true 🎉

Will keep rerunning and see how it goes

@alamb
Copy link
Contributor Author

alamb commented Nov 5, 2021

@alamb alamb changed the title (WIP) Automatically retry failed MIRI runs to work around intermittent failures Automatically retry failed MIRI runs to work around intermittent failures Nov 5, 2021
@alamb alamb marked this pull request as ready for review November 5, 2021 16:49
@alamb
Copy link
Contributor Author

alamb commented Nov 5, 2021

@houqp and @jimexist I think this approach is looking promising (even though it is not intellectually satisfying)

@alamb
Copy link
Contributor Author

alamb commented Nov 5, 2021

@alamb alamb requested review from houqp and jimexist November 5, 2021 18:26
@alamb alamb merged commit 62934e9 into apache:master Nov 6, 2021
@alamb alamb deleted the alamb/loop_miri branch November 6, 2021 09:55
alamb added a commit that referenced this pull request Nov 9, 2021
…ures (#922)

* Move MIRI checks into a shell script

* add retry loop
alamb added a commit that referenced this pull request Nov 9, 2021
…mittent failures (#934)

* Automatically retry failed MIRI runs to work around intermittent failures (#922)

* Move MIRI checks into a shell script

* add retry loop

* Do not use cache for miri
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MIRI check is failing on master
4 participants