Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Dropout may mask values even when ratio=0.0 #9816

Closed
DickJC123 opened this issue Feb 17, 2018 · 9 comments
Closed

Dropout may mask values even when ratio=0.0 #9816

DickJC123 opened this issue Feb 17, 2018 · 9 comments

Comments

@DickJC123
Copy link
Contributor

DickJC123 commented Feb 17, 2018

One reasonable expectation is that a dropout layout would pass no values when the dropout ratio=1.0 and would pass all values with the dropout ratio=0.0. However, the current dropout test fails a ratio=0.0 test under some seeds because some values are masked. Adapting from the current nn/dropout-inl.h, we have essentially:

prob_keep = 1.0 - ratio;
rand_pick = random_uniform_pick(0.0,1.0);
mask_out = (rand_pick < prob_keep ? 1.0 : 0.0) * (1.0/prob_keep);
dropout_out = input_data * mask_out;

It must be that rand_pick above can include the 1.0 endpoint, so mask_out becomes 0.0. A quick fix might change the '<' to '<=', but then some values might be passed when the dropout ratio = 1.0 if the rand_pick can be 0.0. The possible differences between rng properties between cpu and gpu's should be considered: CUDNN rngs return values in (0.0,1.0] while most others are in the range [0.0,1.0). This can be reproduced in any of the commits of the ci_test_randomness3 PR #9791 if you edit the line above 'def test_dropout()' in test_operator.py to be only "@with_seed()", then execute: (output shown)

MXNET_TEST_SEED=990952066 nosetests --verbose tests/python/unittest/test_operator.py:test_dropout
[INFO] Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=881809903 to reproduce.
[WARNING] *** test-level seed set: all "@with_seed()" tests run deterministically ***
test_operator.test_dropout ... [INFO] Setting test np/mx/python random seeds, use MXNET_TEST_SEED=990952066 to reproduce.
FAIL
======================================================================
FAIL: test_operator.test_dropout
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/dcarter/mxnet_dev/dgx/mxnet/tests/python/unittest/common.py", line 152, in test_new
    orig_test(*args, **kwargs)
  File "/home/dcarter/mxnet_dev/dgx/mxnet/tests/python/unittest/test_operator.py", line 4596, in test_dropout
    check_dropout_ratio(0.0, shape)
  File "/home/dcarter/mxnet_dev/dgx/mxnet/tests/python/unittest/test_operator.py", line 4561, in check_dropout_ratio
    assert exe.outputs[0].asnumpy().min() == min_value
AssertionError: 
-------------------- >> begin captured logging << --------------------
common: INFO: Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=881809903 to reproduce.
common: WARNING: *** test-level seed set: all "@with_seed()" tests run deterministically ***
common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=990952066 to reproduce.
--------------------- >> end captured logging << ---------------------
@piiswrong
Copy link
Contributor

This needs to be fix.
One possible fix is to use identity pass through when rate=0

@DickJC123
Copy link
Contributor Author

I like making ratio=0 a special case with an identity pass-through. During the same fix, how about making ratio=1 a special case also to output all 0's (current behavior is to pass inf or nan I think)?

@sxjscience
Copy link
Member

There was an issue in mshadow about this dmlc/mshadow#213. The cuda rng generates (0.0, 1.0].

@samskalicky
Copy link
Contributor

@DickJC123 re-running the test with your scenario appears to be working for me (today). Are you still experiencing a problem or can we close this issue?

@piyushghai
Copy link
Contributor

piyushghai commented Jul 31, 2018

@samskalicky Able to reproduce the issue with seeds 111913613, 211508467, 1329041279

Traceback (most recent call last): File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/nose/case.py", line 197, in runTest self.test(*self.arg) File "/home/ubuntu/incubator-mxnet/tests/python/unittest/common.py", line 172, in test_new orig_test(*args, **kwargs) File "/home/ubuntu/incubator-mxnet/tests/python/unittest/test_operator.py", line 5610, in test_dropout check_dropout_ratio(0.0, shape) File "/home/ubuntu/incubator-mxnet/tests/python/unittest/test_operator.py", line 5554, in check_dropout_ratio assert exe.outputs[0].asnumpy().min() == min_value AssertionError:

@piyushghai
Copy link
Contributor

On running this on CPU, the flaky error is reproducible, as mentioned in the above comments.
On running this on GPU for 100k times, the test does not fail.

MXNET_TEST_COUNT=100000  nosetests --logging-level=DEBUG --verbose -s tests/python/gpu/test_operator_gpu.py:test_dropout
/home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
/home/ubuntu/anaconda3/lib/python3.6/site-packages/nose/util.py:453: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() or inspect.getfullargspec()
  inspect.getargspec(func)
[INFO] Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=1058042572 to reproduce.
test_operator_gpu.test_dropout ... [DEBUG] 1 of 100000: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1077219613 to reproduce.
[DEBUG] 2 of 100000: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1986463962 to reproduce.
[DEBUG] 3 of 100000: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1446820082 to reproduce.
[DEBUG] 4 of 100000: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1300194532 to reproduce.
[DEBUG] 5 of 100000: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=576630976 to reproduce.
[DEBUG] 6 of 100000: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1613399106 to reproduce.
.
.
.

[DEBUG] 99996 of 100000: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=92629950 to reproduce.
[DEBUG] 99997 of 100000: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=104203650 to reproduce.
[DEBUG] 99998 of 100000: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1609587233 to reproduce.
[DEBUG] 99999 of 100000: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=194799359 to reproduce.
[DEBUG] 100000 of 100000: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=675994555 to reproduce.
ok

----------------------------------------------------------------------
Ran 1 test in 7130.900s

OK

On running on GPU with the failure seeds for CPU : 111913613, 211508467, 1329041279, the test still does not fail.

MXNET_TEST_SEED=1329041279 nosetests --logging-level=DEBUG --verbose -s tests/python/gpu/test_operator_gpu.py:test_dropout
/home/ubuntu/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
/home/ubuntu/anaconda3/lib/python3.6/site-packages/nose/util.py:453: DeprecationWarning: inspect.getargspec() is deprecated, use inspect.signature() or inspect.getfullargspec()
  inspect.getargspec(func)
[INFO] Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=1997611840 to reproduce.
/home/ubuntu/incubator-mxnet/tests/python/gpu/../unittest/common.py:244: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  logger.warn('*** test-level seed set: all "@with_seed()" tests run deterministically ***')
[WARNING] *** test-level seed set: all "@with_seed()" tests run deterministically ***
test_operator_gpu.test_dropout ... [INFO] Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1329041279 to reproduce.
ok

----------------------------------------------------------------------
Ran 1 test in 2.103s

OK

@samskalicky
Copy link
Contributor

samskalicky commented Aug 8, 2018

Bug can be worked around by adding a threshold_eq (<=) operator in mshadow_op.h and calling that one instead in dropout-inl.h

Tested working with seed 111913613

anirudh2290 pushed a commit that referenced this issue Aug 20, 2018
* added mshadow op for threshold_eq (theshold currently does <, this will do <=)

modified dropout operator to use threshold_eq instead of theshold this will ensure equivalent behavior for the random numbers generated on CPU [0, 1) and GPU (0, 1]

removed fixed seed for test_dropout

* removed comment about flaky test
@piyushghai
Copy link
Contributor

@sandeep-krishnamurthy Related fix for this issue is merged, this issue should be good to close.

XinYao1994 pushed a commit to XinYao1994/incubator-mxnet that referenced this issue Aug 29, 2018
…pache#12091)

* added mshadow op for threshold_eq (theshold currently does <, this will do <=)

modified dropout operator to use threshold_eq instead of theshold this will ensure equivalent behavior for the random numbers generated on CPU [0, 1) and GPU (0, 1]

removed fixed seed for test_dropout

* removed comment about flaky test
anirudh2290 pushed a commit to anirudh2290/mxnet that referenced this issue Sep 19, 2018
…pache#12091)

* added mshadow op for threshold_eq (theshold currently does <, this will do <=)

modified dropout operator to use threshold_eq instead of theshold this will ensure equivalent behavior for the random numbers generated on CPU [0, 1) and GPU (0, 1]

removed fixed seed for test_dropout

* removed comment about flaky test
@ptrendx
Copy link
Member

ptrendx commented Jan 16, 2019

Unfortunately the issue with dropout operator is not fully solved - when using seed 579061237 on the GPU, rng produces 0 for one of the outputs in ratio=1.0 case, which, because of the usage of <=, ends up being inf instead of nan. Issue encountered in CI for my unrelated PR (and I tested it without the PR, it is reproducible): http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-13890/2/pipeline

@samskalicky

@ptrendx ptrendx mentioned this issue Jan 16, 2019
6 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants