Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Mark tests that should only be run nightly. #8689

Closed

Conversation

KellenSunderland
Copy link
Contributor

Sorry for some reason I can't re-open #8554. Here's a new PR.

As discussed here in a few threads, segmenting tests will go a long way in stabilizing our CI. This PR is a WIP that aims to remove some of the most problematic tests. All tests ignored here either crashed on a P3 instance with the release version (r 0.12) of mxnet from the deep learning AMI, or they took longer to run than a minute. With these tests removed all tests were running in less than 2 minutes on a P3.

Tests are removed with test annotations (aka decorators). Initially I've only used @nightly to indicate a test that would best be run on a nightly basis and @crashing to indicate a crashing test. Once we've decorated tests with annotations we can selectively choose to run them by passing the -a arg to nose (as shown in the new Jenkinsfile).

Also made a few changes to speed up GPU builds on CI.

Checklist

Essentials

  • Passed code style checking (make lint)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage (coverage will now be nightly)
  • For user-facing API changes, API doc string has been updated. For new C++ functions in header files, their functionalities and arguments are well-documented.
  • To my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change



# This test takes several minutes to run, marking as nightly.
@attr('nightly')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't move the sparse tests to nightly. Try reduce the array dimensions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do...

@@ -1266,6 +1267,7 @@ def test_bneq(a, b):
test_bneq(a, b)


@attr('nightly')
def test_broadcast_binary_op():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to stay in unittests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do...

@szha
Copy link
Member

szha commented Nov 17, 2017

I understand the motivation for moving tests to nightly. The concern I still have is if we disable certain tests on PR-level and only run them nightly, the problems in code may not surface in PR as they should. One might submit a patch for optimization or fix on existing op which actually contains bug, whose tests only run nightly. Reviewers might assume that the tests are run by the CI while in fact they are not. In general, removing tests from CI in PR is equivalent of removing the guarantee that the particular functionality still works after the PR.

Admittedly some tests, including some of the tests I wrote, may be simple fuzzy tests with randomized inputs, that should have been designed as small fixtures to catch specific edge cases. That way, the each of the tests won't run for longer than seconds which is ideal. I agree we should do better at writing tests. Given the current situation in the test cases, it will be major endeavor moving from where we are to where the ideal world is. Given the constraint that we shouldn't be breaking mxnet in doing so, I think we should only replace existing long-running tests with carefully designed test cases instead of disabling them.

@KellenSunderland
Copy link
Contributor Author

@szha For what it's worth I agree with all of your points. I hope we can work together on updating tests soon (I plan to try and work on this full time).

@szha
Copy link
Member

szha commented Nov 17, 2017

@KellenSunderland thanks. The information you collected in this PR while analyzing the tests are valuable. It would be great if we could have a summary and a checklist of the test problems in an issue, so that everyone can pitch in on helping and see the progress.

@KellenSunderland KellenSunderland force-pushed the disable_problem_tests branch 2 times, most recently from fcbfca8 to 807e965 Compare November 18, 2017 02:15
@KellenSunderland
Copy link
Contributor Author

@piiswrong updated.
@szha You alright with this PR now, or would you still prefer we hold off for the time being?

@@ -3365,6 +3368,7 @@ def test_log_softmax():
check_numeric_gradient(sym, [data], rtol=0.05, atol=1e-3)


@attr('nightly')
def test_pick():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please leave this test as is. it's an operator that's not in contrib, so we shouldn't leave any chance of breaking it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roger.

@@ -18,6 +18,7 @@
import os
import mxnet as mx
import numpy as np
from nose.plugins.attrib import attr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused import

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ty

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually attr is used on line 192 right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, wrong file. I meant for the sparse tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Updated.

@KellenSunderland
Copy link
Contributor Author

@szha updated.

@KellenSunderland
Copy link
Contributor Author

FYI gpu test runtimes are: 14m20.971s on a p3.2xlarge.

@szha
Copy link
Member

szha commented Nov 20, 2017

@KellenSunderland please rebase and resolve conflict.

Setup Jenkins to ignore slow tests during PR builds.
Also marking one crashing test.  Github issue has been raised.
@piiswrong
Copy link
Contributor

closing due to outdated

@piiswrong piiswrong closed this Dec 12, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants