Inconsistent results between test_batchnorm_fallback and test_batchnorm_training #11758

anirudh2290 · 2018-07-14T04:13:43Z

Description

Please see: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-11630/3/pipeline

nosetests -v tests/python/unittest/test_sparse_operator.py:test_batchnorm_fallback

======================================================================
FAIL: test_sparse_operator.test_batchnorm_fallback
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/ubuntu/sparse_support/mxnet/tests/python/unittest/common.py", line 175, in test_new
    orig_test(*args, **kwargs)
  File "/home/ubuntu/sparse_support/mxnet/tests/python/unittest/test_sparse_operator.py", line 2168, in test_batchnorm_fallback
    check_numeric_gradient(test, in_location, xmean_std, numeric_eps=1e-2, rtol=0.2, atol=0.01)
  File "/home/ubuntu/sparse_support/mxnet/python/mxnet/test_utils.py", line 914, in check_numeric_gradient
    ("NUMERICAL_%s"%name, "BACKWARD_%s"%name))
  File "/home/ubuntu/sparse_support/mxnet/python/mxnet/test_utils.py", line 493, in assert_almost_equal
    raise AssertionError(msg)
AssertionError:
Items are not equal:
Error 1.527669 exceeds tolerance rtol=0.200000, atol=0.010000.  Location of maximum error:(1, 0), a=-0.024199, b=-0.006835
 NUMERICAL_data: array([[ -1.8939614 ,   1.3321757 ,   0.5621314 ],
       [ -0.02419949,  17.293682  , -17.25668   ]], dtype=float32)
 BACKWARD_data: array([[ -1.8941365 ,   1.3321089 ,   0.56202877],
       [ -0.00683459,  17.263582  , -17.256752  ]], dtype=float32)
-------------------- >> begin captured logging << --------------------
common: INFO: Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=569235190 to reproduce.
common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=905120541 to reproduce.
--------------------- >> end captured logging << ---------------------

----------------------------------------------------------------------
Ran 1 test in 0.295s

Environment info (Required)

What to do:
1. Download the diagnosis script from https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py
2. Run the script using `python diagnose.py` and paste its output here.

Package used (Python/R/Scala/Julia):
(I'm using ...)

For Scala user, please provide:

Java version: (java -version)
Maven version: (mvn -version)
Scala runtime if applicable: (scala -version)

For R user, please provide R sessionInfo():

Build info (Required if built from source)

Compiler (gcc/clang/mingw/visual studio):

MXNet commit hash:
(Paste the output of git rev-parse HEAD here.)

Build config:
(Paste the content of config.mk, or the build command.)

Error Message:

(Paste the complete error message, including stack trace.)

Minimum reproducible example

(If you are using your own code, please provide a short script that reproduces the error. Otherwise, please provide link to the existing example.)

Steps to reproduce

(Paste the commands you ran that produced the error.)

What have you tried to solve it?

The text was updated successfully, but these errors were encountered:

haojin2 · 2018-07-14T05:00:30Z

Taking a look.

haojin2 · 2018-07-14T07:31:45Z

Fix in #11759 @anirudh2290

anirudh2290 mentioned this issue Jul 15, 2018

Fix eps value for test_batchnorm_fallback to get rid of flakiness #11759

Merged

5 tasks

anirudh2290 changed the title ~~Flaky test test_batchnorm_fallback~~ Inconsistent results between test_batchnorm_fallback and test_batchnorm_training Jul 17, 2018

nswamy added Test Flaky labels Jul 21, 2018

szha added this to To Do in Tests Improvement via automation Aug 1, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent results between test_batchnorm_fallback and test_batchnorm_training #11758

Inconsistent results between test_batchnorm_fallback and test_batchnorm_training #11758

anirudh2290 commented Jul 14, 2018 •

edited

haojin2 commented Jul 14, 2018

haojin2 commented Jul 14, 2018

Inconsistent results between test_batchnorm_fallback and test_batchnorm_training #11758

Inconsistent results between test_batchnorm_fallback and test_batchnorm_training #11758

Comments

anirudh2290 commented Jul 14, 2018 • edited

Description

Environment info (Required)

Build info (Required if built from source)

Error Message:

Minimum reproducible example

Steps to reproduce

What have you tried to solve it?

haojin2 commented Jul 14, 2018

haojin2 commented Jul 14, 2018

anirudh2290 commented Jul 14, 2018 •

edited