Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

flaky test: test_operator.test_activation #13915

Closed
wkcn opened this issue Jan 17, 2019 · 5 comments
Closed

flaky test: test_operator.test_activation #13915

wkcn opened this issue Jan 17, 2019 · 5 comments

Comments

@wkcn
Copy link
Member

wkcn commented Jan 17, 2019

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fwindows-gpu/detail/PR-13609/7/pipeline

====================================================================

FAIL: test_operator.test_activation

----------------------------------------------------------------------

Traceback (most recent call last):

  File "C:\Anaconda3\envs\py3\lib\site-packages\nose\case.py", line 197, in runTest

    self.test(*self.arg)

  File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\common.py", line 173, in test_new

    orig_test(*args, **kwargs)

  File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\test_operator.py", line 6820, in test_activation

    name, op[0], shape, op[3], op[4], rtol_fd, atol_fd, num_eps)

  File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\test_operator.py", line 6086, in finite_diff_unary_op

    check_grad(op_ex, [data_np])

  File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\unittest\test_operator.py", line 6079, in <lambda>

    atol=atol, dtype=dtype)

  File "C:\jenkins_slave\workspace\ut-python-gpu\windows_package\python\mxnet\test_utils.py", line 921, in check_numeric_gradient

    ("NUMERICAL_%s"%name, "BACKWARD_%s"%name))

  File "C:\jenkins_slave\workspace\ut-python-gpu\windows_package\python\mxnet\test_utils.py", line 495, in assert_almost_equal

    raise AssertionError(msg)

AssertionError: 

Items are not equal:

Error 113966.931334 exceeds tolerance rtol=0.000010, atol=0.000001.  Location of maximum error:(0, 4, 8, 3), a=0.113967, b=0.000000

 NUMERICAL_data: array([[[[ 0.        ,  0.68042432,  0.        , ...,  0.34353021,

           0.13880596,  0.94525056],

         [ 0.11517657,  0.18770058,  0.30324909, ...,  0.97787645,...

 BACKWARD_data: array([[[[ 0.        ,  0.68042432,  0.        , ...,  0.34353021,

           0.13880596,  0.94525056],

         [ 0.11517657,  0.18770058,  0.30324909, ...,  0.97787645,...

-------------------- >> begin captured logging << --------------------

common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1553516940 to reproduce.

--------------------- >> end captured logging << ---------------------
@wkcn wkcn changed the title flask test: test_operator.test_activate flask test: test_operator.test_activation Jan 17, 2019
@wkcn wkcn changed the title flask test: test_operator.test_activation flaky test: test_operator.test_activation Jan 17, 2019
@perdasilva
Copy link
Contributor

and another: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fcentos-gpu/detail/master/619/pipeline

Creating PR to disable until a fix can be provided.

@wkcn
Copy link
Member Author

wkcn commented May 22, 2019

Close it since the flaky test has been disabled.

@wkcn wkcn closed this as completed May 22, 2019
@asmushetzel
Copy link
Contributor

So this got closed because the fundamental tests (that should pass) are disabled. Not really what seems appropriate given that are some basic computations that should always work.

Well this is how the code for the activation of type "softrelu" looks like (I traced it to ensure that this is exactly the code that gets executed). The gradient computation is just plain wrong!

/*! \brief SoftReLU, also known as softplus activation */
struct softrelu : public mxnet_op::tunable {
template
MSHADOW_XINLINE static DType Map(DType a) {
// Avoid overflow of exp for large inputs.
// Thresholds 20.0 is chosen such that softrelu(a) = a
// for a > 20 using floating precision
if (a > DType(20.0f)) {
return a;
} else {
return DType(math::log1p(math::exp(a)));
}
}
};

MXNET_UNARY_MATH_OP(softrelu_grad, -math::expm1(-a));

@asmushetzel
Copy link
Contributor

Realized I misunderstood the outermost logic. Went back to the original pull request from 2015 to figure out that the argument supplied to _grad function is the computed value of the forward pass, not the original argument. So the current code is correct.
Still this type of math must be tested and we should not simply switch tests off.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants