Mark tests that should only be run nightly. #8689

KellenSunderland · 2017-11-17T06:06:29Z

Sorry for some reason I can't re-open #8554. Here's a new PR.

As discussed here in a few threads, segmenting tests will go a long way in stabilizing our CI. This PR is a WIP that aims to remove some of the most problematic tests. All tests ignored here either crashed on a P3 instance with the release version (r 0.12) of mxnet from the deep learning AMI, or they took longer to run than a minute. With these tests removed all tests were running in less than 2 minutes on a P3.

Tests are removed with test annotations (aka decorators). Initially I've only used @nightly to indicate a test that would best be run on a nightly basis and @crashing to indicate a crashing test. Once we've decorated tests with annotations we can selectively choose to run them by passing the -a arg to nose (as shown in the new Jenkinsfile).

Also made a few changes to speed up GPU builds on CI.

Checklist

Essentials

Passed code style checking (make lint)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage (coverage will now be nightly)
For user-facing API changes, API doc string has been updated. For new C++ functions in header files, their functionalities and arguments are well-documented.
To my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

piiswrong · 2017-11-17T18:43:30Z

tests/python/unittest/test_sparse_operator.py

-
+
+# This test takes several minutes to run, marking as nightly.
+@attr('nightly')


We can't move the sparse tests to nightly. Try reduce the array dimensions.

piiswrong · 2017-11-17T18:44:32Z

tests/python/unittest/test_operator.py

@@ -1266,6 +1267,7 @@ def test_bneq(a, b):
    test_bneq(a, b)


+@attr('nightly')
 def test_broadcast_binary_op():


This needs to stay in unittests

szha · 2017-11-17T19:06:30Z

I understand the motivation for moving tests to nightly. The concern I still have is if we disable certain tests on PR-level and only run them nightly, the problems in code may not surface in PR as they should. One might submit a patch for optimization or fix on existing op which actually contains bug, whose tests only run nightly. Reviewers might assume that the tests are run by the CI while in fact they are not. In general, removing tests from CI in PR is equivalent of removing the guarantee that the particular functionality still works after the PR.

Admittedly some tests, including some of the tests I wrote, may be simple fuzzy tests with randomized inputs, that should have been designed as small fixtures to catch specific edge cases. That way, the each of the tests won't run for longer than seconds which is ideal. I agree we should do better at writing tests. Given the current situation in the test cases, it will be major endeavor moving from where we are to where the ideal world is. Given the constraint that we shouldn't be breaking mxnet in doing so, I think we should only replace existing long-running tests with carefully designed test cases instead of disabling them.

KellenSunderland · 2017-11-17T19:13:54Z

@szha For what it's worth I agree with all of your points. I hope we can work together on updating tests soon (I plan to try and work on this full time).

szha · 2017-11-17T19:15:44Z

@KellenSunderland thanks. The information you collected in this PR while analyzing the tests are valuable. It would be great if we could have a summary and a checklist of the test problems in an issue, so that everyone can pitch in on helping and see the progress.

KellenSunderland · 2017-11-18T02:40:06Z

@piiswrong updated.
@szha You alright with this PR now, or would you still prefer we hold off for the time being?

szha · 2017-11-18T02:50:50Z

tests/python/unittest/test_operator.py

@@ -3365,6 +3368,7 @@ def test_log_softmax():
            check_numeric_gradient(sym, [data], rtol=0.05, atol=1e-3)


+@attr('nightly')
 def test_pick():


please leave this test as is. it's an operator that's not in contrib, so we shouldn't leave any chance of breaking it.

szha · 2017-11-18T02:51:06Z

tests/python/unittest/test_random.py

@@ -18,6 +18,7 @@
 import os
 import mxnet as mx
 import numpy as np
+from nose.plugins.attrib import attr


unused import

Actually attr is used on line 192 right?

oops, wrong file. I meant for the sparse tests.

Gotcha. Updated.

KellenSunderland · 2017-11-18T03:06:26Z

@szha updated.

KellenSunderland · 2017-11-18T03:47:14Z

FYI gpu test runtimes are: 14m20.971s on a p3.2xlarge.

szha · 2017-11-20T07:09:05Z

@KellenSunderland please rebase and resolve conflict.

Setup Jenkins to ignore slow tests during PR builds. Also marking one crashing test. Github issue has been raised.

piiswrong · 2017-12-12T22:42:21Z

closing due to outdated

piiswrong reviewed Nov 17, 2017

View reviewed changes

KellenSunderland force-pushed the disable_problem_tests branch 2 times, most recently from fcbfca8 to 807e965 Compare November 18, 2017 02:15

szha reviewed Nov 18, 2017

View reviewed changes

KellenSunderland force-pushed the disable_problem_tests branch from 807e965 to 6cb6e16 Compare November 18, 2017 03:05

KellenSunderland force-pushed the disable_problem_tests branch from 6cb6e16 to 4d065b5 Compare November 18, 2017 03:44

KellenSunderland force-pushed the disable_problem_tests branch from 4d065b5 to 31c066c Compare November 20, 2017 22:09

Mark tests that should only be run nightly.

7657c37

Setup Jenkins to ignore slow tests during PR builds. Also marking one crashing test. Github issue has been raised.

KellenSunderland force-pushed the disable_problem_tests branch from 31c066c to 7657c37 Compare November 21, 2017 02:27

piiswrong closed this Dec 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mark tests that should only be run nightly. #8689

Mark tests that should only be run nightly. #8689

KellenSunderland commented Nov 17, 2017

piiswrong Nov 17, 2017

KellenSunderland Nov 17, 2017

piiswrong Nov 17, 2017

KellenSunderland Nov 17, 2017

szha commented Nov 17, 2017

KellenSunderland commented Nov 17, 2017

szha commented Nov 17, 2017

KellenSunderland commented Nov 18, 2017

szha Nov 18, 2017

KellenSunderland Nov 18, 2017

szha Nov 18, 2017

KellenSunderland Nov 18, 2017

KellenSunderland Nov 18, 2017

szha Nov 18, 2017

KellenSunderland Nov 18, 2017

KellenSunderland commented Nov 18, 2017

KellenSunderland commented Nov 18, 2017

szha commented Nov 20, 2017

piiswrong commented Dec 12, 2017



		# This test takes several minutes to run, marking as nightly.
		@attr('nightly')

Mark tests that should only be run nightly. #8689

Mark tests that should only be run nightly. #8689

Conversation

KellenSunderland commented Nov 17, 2017

Checklist

Essentials

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

szha commented Nov 17, 2017

KellenSunderland commented Nov 17, 2017

szha commented Nov 17, 2017

KellenSunderland commented Nov 18, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KellenSunderland commented Nov 18, 2017

KellenSunderland commented Nov 18, 2017

szha commented Nov 20, 2017

piiswrong commented Dec 12, 2017