fix dropout gpu seed #16532

roywei · 2019-10-18T08:06:30Z

Description

fix #15662

tests/python/gpu/test_operator_gpu.py

wkcn

LGTM. Thank you! : )

src/operator/nn/dropout-inl.h

roywei · 2019-10-21T02:45:17Z

@wkcn Thanks for the review!
Still investigate why local unit test passed, but CI constantly failed, seems seed is not fixed in CI.

On local GPU, running the following passed:

nosetests single test passed 10000 times
nosetests all test_operator_gpu passed
runing cudnn Dropout reproducibility #15662 (comment) directly from python

However, this test failed on CI with both mx.seed(fixed_seed) and @with_seed(fixed_seed) decorator.

I wil try to reproduce CI failure locally first. Or try to add this to nightly test where less nosetests are executed at the same time. Suspect other nosetest running parallel on CI workers will affect the result.

roywei · 2019-10-21T02:45:31Z

cc @eric-haibin-lin @sxjscience

roywei · 2019-10-21T07:38:07Z

I am able to reproduce CI failure locally now by running the following on P3.8x with DLAMI
ci/build.py --docker-registry mxnetci --nvidiadocker --platform ubuntu_gpu_cu101 --docker-build-retries 3 --shm-size 500m /work/runtime_functions.sh unittest_ubuntu_python2_gpu

result:

======================================================================
FAIL: test_operator_gpu.test_dropout_with_seed
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/usr/local/lib/python2.7/dist-packages/nose/util.py", line 620, in newfunc
    return func(*arg, **kw)
  File "/work/mxnet/tests/python/gpu/../unittest/common.py", line 177, in test_new
    orig_test(*args, **kwargs)
  File "/work/mxnet/tests/python/gpu/../unittest/test_operator.py", line 6946, in test_dropout_with_seed
    assert_almost_equal(b.asnumpy(), c.asnumpy())
  File "/work/mxnet/python/mxnet/test_utils.py", line 624, in assert_almost_equal
    raise AssertionError(msg)
AssertionError:
Items are not equal:
Error 100000002004087734272.000000 exceeds tolerance rtol=1.000000e-05, atol=1.000000e-20 (mismatch at least 0.110000%).
Location of maximum error: (0, 1), a=2.00000000, b=0.00000000
 ACTUAL: array([[0., 2., 2., ..., 2., 0., 0.],
       [0., 2., 2., ..., 0., 0., 2.],
       [2., 2., 2., ..., 0., 0., 2.],...
 DESIRED: array([[2., 0., 2., ..., 2., 0., 2.],
       [2., 2., 2., ..., 2., 2., 2.],
       [2., 0., 0., ..., 2., 2., 2.],...
-------------------- >> begin captured stdout << ---------------------

*** Maximum errors for vector of size 10000:  rtol=1e-05, atol=1e-20
--------------------- >> end captured stdout << ----------------------
-------------------- >> begin captured logging << --------------------
common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=179619306 to reproduce.
--------------------- >> end captured logging << ---------------------

However, running the test standalone with the same seed failed with CI enviroment passed:

MXNET_TEST_SEED=179619306 nosetests --logging-level=DEBUG --verbose -s  tests/python/gpu/test_operator_gpu.py:test_dropout_with_seed
[INFO] Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=980748466 to reproduce.
[WARNING] *** test-level seed set: all "@with_seed()" tests run deterministically ***
test_operator_gpu.test_dropout_with_seed ... [INFO] Setting test np/mx/python random seeds, use MXNET_TEST_SEED=179619306 to reproduce.
[07:36:44] ../src/base.cc:84: Upgrade advisory: this mxnet has been built against cuDNN lib version 7401, which is older than the oldest version tested by CI (7600).  Set MXNET_CUDNN_LIB_CHECKING=0 to quiet this warning.
ok

----------------------------------------------------------------------
Ran 1 test in 13.896s

OK

wkcn

Hi @roywei , could you please add a unit-test for multi-GPU? The result of Dropout on multi-GPU should be different.
Thanks a lot : )

wkcn · 2019-10-21T08:02:15Z

src/operator/nn/dropout-inl.h

+        Tensor<xpu, 1, unsigned>(reinterpret_cast<unsigned *>(workspace_ptr),
+                                 Shape1(1), s);
+      prnd->GetRandInt(random_number);
+      uint64_t seed_ = 17 + reinterpret_cast<uint64_t>(&random_number[0]) % 4096;


Why does it need the modulus operator % ? the modulus operator makes the seed_ between 0+17~4096+17

I'm just keeping the original logic here:
https://github.com/roywei/incubator-mxnet/blob/master/src/operator/nn/dropout-inl.h#L95

and here:
https://github.com/roywei/incubator-mxnet/blob/master/src/operator/nn/dropout-inl.h#L491

while fixing seed to respect mxnet seed.

I think it should be uint64_t seed_ = 17 + static_cast<uint64_t>(random_number[0]) % 4096;, because the type of random_number[0] is unsigned.

https://github.com/apache/incubator-mxnet/blob/master/3rdparty/mshadow/mshadow/tensor.h#L591

this will give a segfault during dropout. Also why would dropout on mult-gpu return different result? I thought the seed in fixed at global? so dropout on different GPU will use the same seed, thus return the same result?

Sorry that I didn't express it clearly. If different GPUs use the same seed, the result of drouout of different GPU should be the same. When training a model, do GPUs use different random seed?

roywei · 2019-10-21T08:14:57Z

Hi @roywei , could you please add a unit-test for multi-GPU? The result of Dropout on multi-GPU should be different.
Thanks a lot : )

I believe GPU unit tests are running on instances with 1GPU. I will try to move the entire test to nightly tests where it's using P3 instances with 4 gpus. I can add multi-gpu test there. Hopefully the seed can be properly fixed with less parallel jobs on CI workers.

marcoabreu · 2019-10-21T09:49:38Z

tests/nightly/test_dropout_with_seed.py

+from mxnet.test_utils import assert_almost_equal
+
+
+def test_dropout_with_seed():


Add the with seed annotation

I'm manually choosing a random seed and set it before each forward, so with_seed decorator will not take effect. See comment: #16532 (comment)

In line 29 you're generating a random number to feed the seed though, so that generator needs to be fed with a seed as well

marcoabreu · 2019-10-21T09:51:01Z

Ci runs on g3.8xlarge with 2 GPUs

eric-haibin-lin · 2019-10-21T23:40:36Z

src/operator/nn/dropout-inl.h

+        Tensor<xpu, 1, unsigned>(reinterpret_cast<unsigned *>(workspace_ptr),
+                                 Shape1(1), s);
+      prnd->GetRandInt(random_number);
+      uint64_t seed_ = 17 + static_cast<uint64_t>(random_number[0]) % 4096;


The tensor is on GPU, we need to explicitly copy it back to CPU using cudaMemCopy. You might get garbage value if you just access data on GPU mem address directly

wkcn · 2019-10-22T05:19:26Z

tests/nightly/test_dropout.py

+    with mx.autograd.record():
+        result2 = dropout(data1)
+
+    mx.random.seed(seed, ctx=mx.gpu(0))


should it be mx.random.seed(seed, ctx=mx.gpu(1)) ?

I'm trying to fix seed on gpu 0 only, so gpu1 still have random seed, so result3 and result2 will be different. Otherwise, I will need to create a different seed2 and fix it on gpu1, it would take the same effect. (different seed on gpu0 and gpu1 leading to result2 and result3 to be different)

updated to use a different seed on gpu1

comments addressed.

eric-haibin-lin · 2019-10-22T17:16:46Z

src/operator/nn/dropout-inl.h

+      // copy generated random int to cpu
+      unsigned data = 0;
+      CUDA_CALL(cudaMemcpy(&data, &random_number[0], sizeof(unsigned), cudaMemcpyDeviceToHost));
+      uint64_t seed_ = 17 + static_cast<uint64_t>(data) % 4096;


@ptrendx @DickJC123 any concern for the fix?

So there are multiple problems with this:

why are you trying to seed the dropout every time you do forward? It will not actually do anything I believe as get_cudnn_dropout_desc after the first call calls cudnnRestoreDropoutDescriptor which seems to ignore the seed value. If you actually want to seed the dropout every time (via cudnnSetDropoutDescriptor) that would be super costly as seeding random number generator is way more expensive than actually using it.

you should never use cudaMemcpy in the operator code. If you really need to copy values from GPU to CPU you should use cudaMemcpyAsync followed by cudaStreamSynchronize. The difference is cudaMemcpy synchronizes on all streams (so it waits on all GPU activity and prevents all subsequent work to be done on all worker threads) vs cudaStreamSynchronize which waits only on the stream that you pass as argument (so other GPU workers are not affected).

What about having the following logic inside get_cudnn_dropout_desc:

if (!state_space->handle.size) { request = ResourceManager::Get()->Request(cpu_ctx, ResourceRequest::kRandom) seed = request.GetRandInt() CUDNN_CALL(cudnnSetDropoutDescriptor(..., seed)) } else { // use a dummy seed (e.g. 0) for cudnnRestoreDropoutDescriptor }

and we remove the seed argument for get_cudnn_dropout_desc?

On a separate note, I see 34 occurrence of cudaMemcpy in various places (mainly operators) in the codebase, we probably need to do some cleanup:

src/kvstore/kvstore_utils.cu: CUDA_CALL(cudaMemcpy(sort_output_ptr, dptr, sort_output_bytes, src/kvstore/kvstore_utils.cu: CUDA_CALL(cudaMemcpy(&num_selected_out, num_selected_ptr, num_selected_bytes, src/ndarray/ndarray_function.cu: CUDA_CALL(cudaMemcpy(&nnr_out, &row_flg[num_rows-1], sizeof(dim_t), src/operator/contrib/adamw.cu: CUDA_CALL(cudaMemcpy(&scale, scale_blob.dptr<DType>(), sizeof(DType), src/operator/contrib/boolean_mask.cu: CUDA_CALL(cudaMemcpy(&valid_num, &prefix_sum[idx_size - 1], sizeof(int32_t), src/operator/contrib/index_array.cu: CUDA_CALL(cudaMemcpy(workspace.dptr_, cpu_workspace.data(), sizeof(int64_t) * (2 * naxes), src/operator/contrib/index_array.cu: CUDA_CALL(cudaMemcpy(workspace.dptr_, inshape.data(), sizeof(dim_t) * ndim, src/operator/contrib/multi_proposal.cu: FRCNN_CUDA_CHECK(cudaMemcpy(&mask_host[0], src/operator/contrib/multi_proposal.cu: FRCNN_CUDA_CHECK(cudaMemcpy(workspace_proposals.dptr_, &anchors[0], src/operator/contrib/multi_proposal.cu: FRCNN_CUDA_CHECK(cudaMemcpy(keep, &_keep[0], sizeof(int) * _keep.size(), src/operator/contrib/proposal.cu: FRCNN_CUDA_CHECK(cudaMemcpy(&mask_host[0], src/operator/contrib/proposal.cu: FRCNN_CUDA_CHECK(cudaMemcpy(workspace_proposals.dptr_, src/operator/contrib/proposal.cu: FRCNN_CUDA_CHECK(cudaMemcpy(&cpu_im_info[0], im_info.dptr_, src/operator/contrib/proposal.cu: FRCNN_CUDA_CHECK(cudaMemcpy(keep, &_keep[0], sizeof(int) * _keep.size(), src/operator/numpy/np_boolean_mask_assign.cu: CUDA_CALL(cudaMemcpy(&valid_num, &prefix_sum[mask_size], sizeof(size_t), src/operator/numpy/np_nonzero_op.cu: CUDA_CALL(cudaMemcpy(&valid_num, &prefix_sum[in_size - 1], sizeof(int32_t), src/operator/numpy/np_nonzero_op.cu: CUDA_CALL(cudaMemcpy(out.data().dptr<int64_t>(), &temp, sizeof(int64_t), src/operator/numpy/np_unique_op.cu: CUDA_CALL(cudaMemcpy(&valid_num, thrust::raw_pointer_cast(&prefix_sum[input_size - 1]), src/operator/numpy/np_unique_op.cu: CUDA_CALL(cudaMemcpy(&valid_num, thrust::raw_pointer_cast(&prefix_sum[temp_shape[0] - 1]), src/operator/numpy/np_unique_op.cu: CUDA_CALL(cudaMemcpy(outputs[0].data().dptr<DType>(), inputs[0].data().dptr<DType>(), src/operator/numpy/random/dist_common.cu:CUDA_CALL(cudaMemcpy(dst, src, sizeof(float), cudaMemcpyDeviceToHost)); src/operator/numpy/random/dist_common.cu:CUDA_CALL(cudaMemcpy(dst, src, sizeof(double), cudaMemcpyDeviceToHost)); src/operator/numpy/random/np_multinomial_op.cu: CUDA_CALL(cudaMemcpy(&pvals_[0], input, sizeof(DType) * prob_length, src/operator/rnn-inl.h: CUDA_CALL(cudaMemcpy(sequence_length_cpu_itype, sequence_length_ptr_gpu, src/operator/tensor/cast_storage-inl.cuh: CUDA_CALL(cudaMemcpy(&nnr, &row_flg[num_rows - 1], sizeof(dim_t), cudaMemcpyDeviceToHost)); src/operator/tensor/cast_storage-inl.cuh: CUDA_CALL(cudaMemcpy(&nnz, &(indptr[num_rows]), sizeof(IType), cudaMemcpyDeviceToHost)); src/operator/tensor/dot-inl.cuh: CUDA_CALL(cudaMemcpy(&nnr, nnr_ptr, nnr_bytes, cudaMemcpyDeviceToHost)); src/operator/tensor/dot-inl.cuh: CUDA_CALL(cudaMemcpy(&nnr_out, &row_flg_out[num_cols_l-1], sizeof(dim_t), src/operator/tensor/elemwise_binary_op_basic.cu: CUDA_CALL(cudaMemcpy(&nnr_out, &common_row_table[num_rows-1], sizeof(nnvm::dim_t), src/operator/tensor/indexing_op.cu: CUDA_CALL(cudaMemcpy(&is_valid, is_valid_ptr, sizeof(char), src/operator/tensor/indexing_op.cu: CUDA_CALL(cudaMemcpy(&nnr, grad_row_idx + data_size, sizeof(RType), src/operator/tensor/indexing_op.cu: CUDA_CALL(cudaMemcpy(&nnr, &prefix_sum[num_rows-1], sizeof(dim_t), src/operator/tensor/matrix_op.cu: CUDA_CALL(cudaMemcpy(&nnr, &out_indptr[indptr_len-1], sizeof(RType), src/operator/tensor/square_sum.cu: CUDA_CALL(cudaMemcpy(&is_diff, is_diff_ptr, sizeof(int32_t), cudaMemcpyDeviceToHost));

@reminisce @haojin2 can we fix the cudaMemcpy calls in the numpy op? They impact GPU performance

@eric-haibin-lin let's create an issue for tracking

I already created one #16583

roywei · 2019-10-22T17:34:19Z

Passed nightly test locally:
command:

ci/build.py --docker-registry mxnetci --nvidiadocker --platform ubuntu_nightly_gpu --docker-build-retries 3 --shm-size 500m /work/runtime_functions.sh nightly_python

result:

----------------------------------------------------------------------
Ran 2 tests in 351.086s

OK
build.py: 2019-10-22 17:05:18,213Z INFO Waiting for status of container 5d41b56c614c for 600 s.
build.py: 2019-10-22 17:05:18,443Z INFO Container exit status: {'StatusCode': 0, 'Error': None}
build.py: 2019-10-22 17:05:18,443Z INFO Container exited with success 👍
build.py: 2019-10-22 17:05:18,443Z INFO Stopping container: 5d41b56c614c
build.py: 2019-10-22 17:05:18,445Z INFO Removing container: 5d41b56c614c

src/operator/nn/dropout-inl.h

roywei · 2019-10-22T22:25:20Z

Just noticed our test_dropout unittest is disabled due to flakiness. (#14288)

I have verified locally this test passed with my PR, the test is still flaky though (failed with 10000 runs, use MXNET_TEST_SEED=107821594)

python3 -m "nose" tests/python/gpu/test_operator_gpu.py:test_dropout
[INFO] Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=1842556802 to reproduce.
[22:22:55] ../src/base.cc:84: Upgrade advisory: this mxnet has been built against cuDNN lib version 7401, which is older than the oldest version tested by CI (7600).  Set MXNET_CUDNN_LIB_CHECKING=0 to quiet this warning.
.
----------------------------------------------------------------------
Ran 1 test in 16.091s

OK

roywei · 2019-10-23T17:30:52Z

With updated implementation, dropout seed will come from cpu, setting seed on different gpu won't work. (it also does not work before)
tested again nightly and unit test passed. ran performance test and get similar speed as in #13896

In [1]: import mxnet as mx
   ...: a = mx.nd.ones((10, 200, 300, 500), ctx=mx.gpu(0))
   ...: a.attach_grad()
   ...: mx.autograd.set_recording(True)
   ...: %timeit mx.nd.Dropout(a, 0.5, mode='always').wait_to_read()

[10:00:10] ../src/base.cc:84: Upgrade advisory: this mxnet has been built against cuDNN lib version 7401, which is older than the oldest version tested by CI (7600).  Set MXNET_CUDNN_LIB_CHECKING=0 to quiet this warning.
4.51 ms ± 6.66 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)```

tests/nightly/test_dropout.py

eric-haibin-lin · 2019-10-25T17:55:31Z

@ptrendx any concerns on the new fix?

DickJC123 · 2019-10-25T19:51:18Z

src/operator/rnn-inl.h

      CUDNN_CALL(cudnnSetDropoutDescriptor(dropout_desc_, s->dnn_handle_,
                                           param_.p,  // discard probability
                                           dropout_states, dropout_bytes,
-                                           seed_));
+                                           0));


In the minimum, the comment should be updated. The way these calls work is:
cudnnSetDropoutDescriptor(..., dropout_states==NULL,...) // Set dropout probability and seed, leave states alone.
cudnnSetDropoutDescriptor(..., dropout_states!=NULL,...) // Set dropout probability and seed, init states based on these values.

cudnnRestoreDropoutDescriptor() // Set dropout probability, seed and states ptr from provided args.

Ok, and I found another issue, the seed will be fixed during resource initialization. So once a dropout layer is created, event if we don't fix seed, each dropout result will be the same.

If we want each forward to have a different result, and the random result must respect mxnet random seed. The only solution is to get mxnet seed during each forward. Is that right?

new test case

@with_seed() def test_dropout_with_seed(): info = np.iinfo(np.int32) seed = np.random.randint(info.min, info.max) _test_dropout(mx.cpu(), seed) _test_dropout(mx.gpu(), seed) _test_dropout(mx.cpu()) _test_dropout(mx.gpu()) def _test_dropout(ctx, seed=None): data = mx.nd.ones((100, 100), ctx=ctx) dropout = mx.gluon.nn.Dropout(0.5) if seed: mx.random.seed(seed) with mx.autograd.record(): result1 = dropout(data) if seed: mx.random.seed(seed) with mx.autograd.record(): result2 = dropout(data) if seed: # dropout on gpu should return same result with fixed seed assert_almost_equal(result1.asnumpy(), result2.asnumpy()) else: # seed not fixed, result should be different with assert_raises(AssertionError): assert_almost_equal(result1.asnumpy(), result2.asnumpy())

Dick is looking into it, but I don't think sentence event if we don't fix seed, each dropout result will be the same is true. What is true though, is the fact that the dropout as is implemented now would not respond to the change of seed on the MXNet side after the initial creation of the op. Seeding each forward would destroy performance, so how about a solution like this - if you know that you will cache the value of seed (like in the dropout descriptor resource case), every time when you get the descriptor it should internally ask the random resource "did a user set a new seed?" (which could be implemented as a set in the random resource that would keep track of who asked about this already and resetting that set when user calls to mxnet.random.seed). If the answer is "no", then no reseeding is required, but if the answer is yes, the dropout descriptor should be reseeded with a new value.

echoing @ptrendx 's comment, we should implement SetSeed() for the cudnn resource in https://github.com/apache/incubator-mxnet/blob/master/src/resource.cc#L174-L201

DickJC123 · 2019-10-26T02:46:58Z

@roywei I've been playing with a variant of your proposed test in which I set the seed to two different values for the two models and expect the results to be different. This fails, because the results are identical even with the differing seeds. The two models each get their own gpu random resource, but the two are seeded by cpu random number generators that are identical.

The problem here is that the cpu rng's are not responding to mx.random.seed(), and instead have their seed set to 0. The reason is that cpu rngs are requested from the ResourceManager, and ResourceManager is a thread-local variable. The main python thread (performing the mx.random.seed()) only affects the global_seed_ data member for its ResourceManager instance, which does not affect the seeds of the cpu rngs that are requested of the worker thread's ResourceManager.

DickJC123 · 2019-10-26T19:17:00Z

Yeah, this is going to be tricky to fix solidly, particularly considering models with multiple dropouts and rnns.

If all the gpu rngs of a worker indeed share the same rng state, then only the seed from the first rng in a model effects the initialization (the other seeds are ignored). Also, I believe it is non-deterministic which GPU worker handles an operator during execution, so this represents a problem (could be the cause of some low-probability test failures).

I believe moving the gpu rng state to be a resource was motivated by the high initialization overhead, particularly painful for imperative models. Would it be possible to add a seed argument at the python level to dropout and rnn operators, with the understanding that by setting the seed, the operator will get its own rng state (at some memory and initialization expense)? Not setting a seed would grab the global resource- not sure how the determinism would work, but it would be fast.

roywei · 2020-02-07T20:21:15Z

closing in favor of #17547

wkcn reviewed Oct 18, 2019

View reviewed changes

tests/python/gpu/test_operator_gpu.py Outdated Show resolved Hide resolved

wkcn approved these changes Oct 18, 2019

View reviewed changes

wkcn reviewed Oct 18, 2019

View reviewed changes

src/operator/nn/dropout-inl.h Outdated Show resolved Hide resolved

wkcn reviewed Oct 19, 2019

View reviewed changes

src/operator/nn/dropout-inl.h Outdated Show resolved Hide resolved

roywei force-pushed the fix_dropout_seed branch from 1cc69b4 to 2f48295 Compare October 21, 2019 03:29

wkcn reviewed Oct 21, 2019

View reviewed changes

roywei requested review from aaronmarkham and marcoabreu as code owners October 21, 2019 09:47

marcoabreu reviewed Oct 21, 2019

View reviewed changes

eric-haibin-lin previously requested changes Oct 21, 2019

View reviewed changes

wkcn reviewed Oct 22, 2019

View reviewed changes

marcoabreu approved these changes Oct 22, 2019

View reviewed changes

eric-haibin-lin reviewed Oct 22, 2019

View reviewed changes

roywei force-pushed the fix_dropout_seed branch from 8e030d2 to ec298ec Compare October 22, 2019 17:32

szha reviewed Oct 22, 2019

View reviewed changes

src/operator/nn/dropout-inl.h Show resolved Hide resolved

eric-haibin-lin mentioned this pull request Oct 22, 2019

Many cudaMemCopy in operators #16583

Closed

roywei added 6 commits October 23, 2019 03:30

fix seed

ef85b31

update

c17759e

update

b657dd5

lint

4de5eec

fix ci

9cb6034

update

59ff243

roywei added 6 commits October 23, 2019 03:30

move test

2756cf9

fix

c9eead9

move to nightly

f2182ab

fix seed on gpu1

ddb2659

address comments

afe1d16

trigger ci

4dec974

roywei force-pushed the fix_dropout_seed branch from d9d3b84 to 4dec974 Compare October 23, 2019 10:32

update

e96faf2

roywei requested a review from anirudh2290 as a code owner October 23, 2019 16:31

roywei added 2 commits October 23, 2019 10:41

remove multi-gpu test

7ea0b98

remove resource request

acd8cba

eric-haibin-lin reviewed Oct 23, 2019

View reviewed changes

tests/nightly/test_dropout.py Outdated Show resolved Hide resolved

roywei mentioned this pull request Oct 24, 2019

RNN op with dropout cannot use fixed seed on CPU #16604

Open

add rnn test

226cf6b

roywei mentioned this pull request Oct 24, 2019

Fix rnn dropout on CPU #16605

Closed

eric-haibin-lin approved these changes Oct 25, 2019

View reviewed changes

DickJC123 reviewed Oct 25, 2019

View reviewed changes

szha mentioned this pull request Oct 31, 2019

[Inconsistency] Non-deterministic behavior of pre-trained BERT within mx.autograd.record() after calling uuid.uuid1() dmlc/gluon-nlp#992

Closed

DickJC123 mentioned this pull request Nov 4, 2019

Dropout inconsistency bug #16705

Open

roywei mentioned this pull request Feb 7, 2020

Fix cudnn Dropout reproducibility #17547

Merged

roywei closed this Feb 7, 2020

		from mxnet.test_utils import assert_almost_equal


		def test_dropout_with_seed():

fix dropout gpu seed #16532

fix dropout gpu seed #16532

Conversation

roywei commented Oct 18, 2019

Description

wkcn left a comment

Choose a reason for hiding this comment

roywei commented Oct 21, 2019 • edited Loading

roywei commented Oct 21, 2019

roywei commented Oct 21, 2019

wkcn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roywei commented Oct 21, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcoabreu commented Oct 21, 2019

Choose a reason for hiding this comment

wkcn Oct 22, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roywei Oct 22, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

roywei commented Oct 22, 2019

roywei commented Oct 22, 2019

roywei commented Oct 23, 2019 • edited Loading

eric-haibin-lin commented Oct 25, 2019

Choose a reason for hiding this comment

roywei Oct 25, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DickJC123 commented Oct 26, 2019

DickJC123 commented Oct 26, 2019

roywei commented Feb 7, 2020

roywei commented Oct 21, 2019 •

edited

Loading

wkcn Oct 22, 2019 •

edited

Loading

roywei Oct 22, 2019 •

edited

Loading

roywei commented Oct 23, 2019 •

edited

Loading

roywei Oct 25, 2019 •

edited

Loading