Flaky Tests Tracking Issue #9412
Comments
I am taking #8928 (Perl issue)
…On Sat, Jan 13, 2018 at 1:44 PM, Sheng Zha ***@***.***> wrote:
What
Thanks to community members, we identified tests that are flaky and need
fixing. I'm putting together the list of open issues for tracking them and
calling for help on fixing them.
We use this issue for tracking progress and coordinating efforts.
Issue Requester Category Cause Status
#7645 <#7645> @rahul003
<https://github.com/rahul003> Operator numerical stability Disabled
#8211 <#8211> @indhub
<https://github.com/indhub> Autograd autograd memory footprint Disabled
#8230 <#8230> @indhub
<https://github.com/indhub> Operator numerical stability Disabled
#8283 <#8283> @indhub
<https://github.com/indhub> Utility external dependency Disabled
#8288 <#8288> @indhub
<https://github.com/indhub> Operator numerical stability (?) Disabled
#8299 <#8299> @indhub
<https://github.com/indhub> Operator testing through training. randomness
Disabled
#8892 <#8892> @marcoabreu
<https://github.com/marcoabreu> Operator testing through training.
randomness Disabled
#8934 <#8934> @marcoabreu
<https://github.com/marcoabreu> Operator segfault in MKL version Disabled
#9295 <#9295> @marcoabreu
<https://github.com/marcoabreu> Operator laop hangs in MKL version
Disabled
#9384 <#9384>
@eric-haibin-lin <https://github.com/eric-haibin-lin> Sparse/KVStore segfault
for sparse Disabled
#8834 <#8834> @marcoabreu
<https://github.com/marcoabreu> Scala Operator numerical stability
Disabled
#8928 <#8928> @marcoabreu
<https://github.com/marcoabreu> Perl CPU segfault Disabled
#9332 <#9332>
@KellenSunderland <https://github.com/kellensunderland> R external
dependency Disabled
Meaning of status:
- Disabled: temporarily disabled after discovery. Fix is needed.
- Flaky: test is enabled with retries. Fix is still needed.
- Fixed: fix has finished and test is no longer flaky.
To add new flaky test that was discovered
Create a new issue for the test, and comment here and refer the new issue.
To help fixing the tests
Pick an issue that hasn't been taken. Comment here that you are working on
which issue, and I will update the status in the table. Then start working
on the issue, and put details, findings and resolutions in the *original
issue*.
Requester of the original issue, as well as @apache/mxnet-committers
<https://github.com/orgs/apache/teams/mxnet-committers> should make sure
that as a result of the fix, the tests are:
- Reliably passing
- Avoid randomness if possible
- Avoid external dependency if possible
- Root-cause is found and fixed if it's actually a problem in code
base.
- Not resource-intensive
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub
<#9412>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AYSk2P_RWjRZM2O4lzS6fG4_RCkn88x3ks5tKSOxgaJpZM4RdYTd>
.
|
A few related PRs for reference on randomness problem in CI tests: Info from @szha: https://pypi.python.org/pypi/flaky See also: Email thread on dev@ titled: "Improving and rationalizing unit tests" and "Call for Help for Fixing Flaky Tests" @szha It may help to add the above information to the above list for easy reference. |
@bhavinthaker good suggestions. I added these references. |
Working on #8283 |
One more issue to be tracked - #10087 |
test_layer_norm has precision issues - #10114 |
@eric-haibin-lin is there an open issue for test_correlation? |
I'm always adding them to https://github.com/apache/incubator-mxnet/projects/9#card-6995282 - do I have to call them out here as well? |
We now use github project functionality for issue tracking. https://github.com/apache/incubator-mxnet/projects/9 |
What
Thanks to community members, we identified tests that are flaky and need fixing. I'm putting together the list of open issues for tracking them and calling for help on fixing them.
We use this issue for tracking progress and coordinating efforts.
TODO
Completed
Meaning of status:
How
To add new flaky test that was discovered
Create a new issue for the test, and comment here and refer the new issue.
To help fixing the tests
Pick an issue that hasn't been taken. Comment here that you are working on which issue, and I will update the status in the table. Then start working on the issue, and put details, findings and resolutions in the original issue. Also, a good resource for understanding the issue is the people who wrote the feature and the tests. As such, we can identify them from the commit history and ping them for help.
Requester of the original issue, as well as @apache/mxnet-committers should make sure that as a result of the fix, the tests are:
Reference
Discussions on dev
On GPU Randomness
The text was updated successfully, but these errors were encountered: