-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Dropout inconsistency bug #16705
Comments
Also, I've confirmed that CPU-side does not have this problem. |
What behavior do we expect from a model that has two Dropouts, where no seeds have been set explicitly in advance? Are the dropout patterns identical or different? If the answer is 'different', then I would think that by setting the seeds in advance, the two-Dropout model would then have repeatable behavior, but the Dropouts would continue to be different. Also, feel free @sxjscience to chime in on the discussion of PR #16532. |
@DickJC123 The answer should be different because these two dropouts should share the same internal random number generator and the random state will be updated accordingly. For the inconsistency bug mentioned in this issue, it's not exactly related to the seeding problem. For example, consider the following script: import mxnet as mx
mx.random.seed(123)
x = mx.nd.ones((10, 10))
y = mx.nd.Dropout(x, cudnn_off=True)
with mx.autograd.record():
y = mx.nd.Dropout(x, cudnn_off=True) The first This means, the following two code snippets will obtain different results:
import mxnet as mx
mx.random.seed(123)
x = mx.nd.ones((3, 3), ctx=mx.gpu())
y = mx.nd.Dropout(x, cudnn_off=True)
with mx.autograd.record():
y = mx.nd.Dropout(x, cudnn_off=True)
print(y)
import mxnet as mx
mx.random.seed(123)
x = mx.nd.ones((3, 3), ctx=mx.gpu())
with mx.autograd.record():
y = mx.nd.Dropout(x, cudnn_off=True)
print(y)
|
@DickJC123 You may see that I've manually set |
Clearly, dropout in inference mode affects the random state:
|
With the help of @xidulu , we have located the root cause of the issue: The bug is triggered because we have multiple parallel GPU random resources: https://github.com/apache/incubator-mxnet/blob/c583e44816a5e383493f35e69daaa92a47e40e39/src/resource.cc#L93-L94 When we create a new Dropout Node, we will attach a random resource to the node: https://github.com/apache/incubator-mxnet/blob/c583e44816a5e383493f35e69daaa92a47e40e39/src/operator/nn/dropout.cc#L148-L164 Since there are multiple random resources, we select one in a round-robin fashion. Each resource has it's specific seed, which results in the inconsistent behavior. https://github.com/apache/incubator-mxnet/blob/c583e44816a5e383493f35e69daaa92a47e40e39/src/resource.cc#L344-L351 The simplest fix is to use 1 GPU random generator. Thus, setting import os
os.environ['MXNET_GPU_PARALLEL_RAND_COPY'] = '1'
import mxnet as mx
import numpy as np
import random
from numpy.testing import assert_allclose
base_y_np = None
for nrepeat in [1, 2, 3, 4]:
seed = 123
mx.random.seed(seed)
np.random.seed(seed)
random.seed(seed)
x = mx.nd.ones((3, 3), ctx=mx.gpu())
for _ in range(nrepeat):
y = mx.nd.Dropout(x, cudnn_off=True)
with mx.autograd.record():
y = mx.nd.Dropout(x, cudnn_off=True)
y_np = y.asnumpy()
if base_y_np is None:
base_y_np = y_np
else:
assert_allclose(base_y_np, y_np) |
@larroy Please comment if you want to take this issue. |
Hi |
In the following script, we should obtain the same dropout mask but currently the result is related to
nrepeat
. Note that I've turned off cudnn dropout by settingcudnn_off=True
.Output:
If we set the nrepeat to be the same value, the result is consistent
The text was updated successfully, but these errors were encountered: