flaky test: test_operator.test_activation #13915

wkcn · 2019-01-17T04:16:02Z

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fwindows-gpu/detail/PR-13609/7/pipeline

====================================================================

FAIL: test_operator.test_activation

----------------------------------------------------------------------

Traceback (most recent call last):

  File "C:\Anaconda3\envs\py3\lib\site-packages\nose\case.py", line 197, in runTest

    self.test(*self.arg)

  File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\common.py", line 173, in test_new

    orig_test(*args, **kwargs)

  File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\test_operator.py", line 6820, in test_activation

    name, op[0], shape, op[3], op[4], rtol_fd, atol_fd, num_eps)

  File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\test_operator.py", line 6086, in finite_diff_unary_op

    check_grad(op_ex, [data_np])

  File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\test_operator.py", line 6079, in <lambda>

    atol=atol, dtype=dtype)

  File "C:\jenkins_slave\workspace\ut-python-gpu\windows_package\python\mxnet\test_utils.py", line 921, in check_numeric_gradient

    ("NUMERICAL_%s"%name, "BACKWARD_%s"%name))

  File "C:\jenkins_slave\workspace\ut-python-gpu\windows_package\python\mxnet\test_utils.py", line 495, in assert_almost_equal

    raise AssertionError(msg)

AssertionError: 

Items are not equal:

Error 113966.931334 exceeds tolerance rtol=0.000010, atol=0.000001.  Location of maximum error:(0, 4, 8, 3), a=0.113967, b=0.000000

 NUMERICAL_data: array([[[[ 0.        ,  0.68042432,  0.        , ...,  0.34353021,

           0.13880596,  0.94525056],

         [ 0.11517657,  0.18770058,  0.30324909, ...,  0.97787645,...

 BACKWARD_data: array([[[[ 0.        ,  0.68042432,  0.        , ...,  0.34353021,

           0.13880596,  0.94525056],

         [ 0.11517657,  0.18770058,  0.30324909, ...,  0.97787645,...

-------------------- >> begin captured logging << --------------------

common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1553516940 to reproduce.

--------------------- >> end captured logging << ---------------------

The text was updated successfully, but these errors were encountered:

szha · 2019-02-08T23:15:50Z

One more: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fwindows-gpu/detail/PR-14098/3/pipeline/118#step-187-log-2806

perdasilva · 2019-05-16T07:25:19Z

and another: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fcentos-gpu/detail/master/619/pipeline

Creating PR to disable until a fix can be provided.

wkcn · 2019-05-22T01:01:02Z

Close it since the flaky test has been disabled.

asmushetzel · 2019-10-10T12:10:33Z

So this got closed because the fundamental tests (that should pass) are disabled. Not really what seems appropriate given that are some basic computations that should always work.

Well this is how the code for the activation of type "softrelu" looks like (I traced it to ensure that this is exactly the code that gets executed). The gradient computation is just plain wrong!

/*! \brief SoftReLU, also known as softplus activation */
struct softrelu : public mxnet_op::tunable {
template
MSHADOW_XINLINE static DType Map(DType a) {
// Avoid overflow of exp for large inputs.
// Thresholds 20.0 is chosen such that softrelu(a) = a
// for a > 20 using floating precision
if (a > DType(20.0f)) {
return a;
} else {
return DType(math::log1p(math::exp(a)));
}
}
};

MXNET_UNARY_MATH_OP(softrelu_grad, -math::expm1(-a));

asmushetzel · 2019-10-10T13:51:35Z

Realized I misunderstood the outermost logic. Went back to the original pull request from 2015 to figure out that the argument supplied to _grad function is the computed value of the forward pass, not the original argument. So the current code is correct.
Still this type of math must be tested and we should not simply switch tests off.

wkcn changed the title ~~flask test: test_operator.test_activate~~ flask test: test_operator.test_activation Jan 17, 2019

wkcn changed the title ~~flask test: test_operator.test_activation~~ flaky test: test_operator.test_activation Jan 17, 2019

TaoLv added Test Flaky labels Jan 17, 2019

wkcn mentioned this issue Jan 17, 2019

[MXNET-1258]fix unittest for ROIAlign Operator #13609

Merged

7 tasks

perdasilva mentioned this issue May 16, 2019

[Flaky Test] Disables flaky test_operator_gpu.test_activation #14969

Merged

5 tasks

wkcn closed this as completed May 22, 2019

asmushetzel mentioned this issue Oct 10, 2019

fixed gradient computation of activation softrelu #16428

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flaky test: test_operator.test_activation #13915

flaky test: test_operator.test_activation #13915

wkcn commented Jan 17, 2019 •

edited

szha commented Feb 8, 2019

perdasilva commented May 16, 2019

wkcn commented May 22, 2019

asmushetzel commented Oct 10, 2019

asmushetzel commented Oct 10, 2019

flaky test: test_operator.test_activation #13915

flaky test: test_operator.test_activation #13915

Comments

wkcn commented Jan 17, 2019 • edited

szha commented Feb 8, 2019

perdasilva commented May 16, 2019

wkcn commented May 22, 2019

asmushetzel commented Oct 10, 2019

asmushetzel commented Oct 10, 2019

wkcn commented Jan 17, 2019 •

edited