Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Inconsistent results between test_batchnorm_fallback and test_batchnorm_training #11758

Open
anirudh2290 opened this issue Jul 14, 2018 · 2 comments

Comments

@anirudh2290
Copy link
Member

anirudh2290 commented Jul 14, 2018

Description

Please see: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-11630/3/pipeline

nosetests -v tests/python/unittest/test_sparse_operator.py:test_batchnorm_fallback
======================================================================
FAIL: test_sparse_operator.test_batchnorm_fallback
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/ubuntu/sparse_support/mxnet/tests/python/unittest/common.py", line 175, in test_new
    orig_test(*args, **kwargs)
  File "/home/ubuntu/sparse_support/mxnet/tests/python/unittest/test_sparse_operator.py", line 2168, in test_batchnorm_fallback
    check_numeric_gradient(test, in_location, xmean_std, numeric_eps=1e-2, rtol=0.2, atol=0.01)
  File "/home/ubuntu/sparse_support/mxnet/python/mxnet/test_utils.py", line 914, in check_numeric_gradient
    ("NUMERICAL_%s"%name, "BACKWARD_%s"%name))
  File "/home/ubuntu/sparse_support/mxnet/python/mxnet/test_utils.py", line 493, in assert_almost_equal
    raise AssertionError(msg)
AssertionError:
Items are not equal:
Error 1.527669 exceeds tolerance rtol=0.200000, atol=0.010000.  Location of maximum error:(1, 0), a=-0.024199, b=-0.006835
 NUMERICAL_data: array([[ -1.8939614 ,   1.3321757 ,   0.5621314 ],
       [ -0.02419949,  17.293682  , -17.25668   ]], dtype=float32)
 BACKWARD_data: array([[ -1.8941365 ,   1.3321089 ,   0.56202877],
       [ -0.00683459,  17.263582  , -17.256752  ]], dtype=float32)
-------------------- >> begin captured logging << --------------------
common: INFO: Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=569235190 to reproduce.
common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=905120541 to reproduce.
--------------------- >> end captured logging << ---------------------

----------------------------------------------------------------------
Ran 1 test in 0.295s

Environment info (Required)

What to do:
1. Download the diagnosis script from https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py
2. Run the script using `python diagnose.py` and paste its output here.

Package used (Python/R/Scala/Julia):
(I'm using ...)

For Scala user, please provide:

  1. Java version: (java -version)
  2. Maven version: (mvn -version)
  3. Scala runtime if applicable: (scala -version)

For R user, please provide R sessionInfo():

Build info (Required if built from source)

Compiler (gcc/clang/mingw/visual studio):

MXNet commit hash:
(Paste the output of git rev-parse HEAD here.)

Build config:
(Paste the content of config.mk, or the build command.)

Error Message:

(Paste the complete error message, including stack trace.)

Minimum reproducible example

(If you are using your own code, please provide a short script that reproduces the error. Otherwise, please provide link to the existing example.)

Steps to reproduce

(Paste the commands you ran that produced the error.)

What have you tried to solve it?

@haojin2
Copy link
Contributor

haojin2 commented Jul 14, 2018

Taking a look.

@haojin2
Copy link
Contributor

haojin2 commented Jul 14, 2018

Fix in #11759 @anirudh2290

@anirudh2290 anirudh2290 changed the title Flaky test test_batchnorm_fallback Inconsistent results between test_batchnorm_fallback and test_batchnorm_training Jul 17, 2018
@szha szha added this to To Do in Tests Improvement via automation Aug 1, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Development

No branches or pull requests

3 participants