Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

fix dropout gpu seed #16532

Closed
wants to merge 16 commits into from
Closed

fix dropout gpu seed #16532

wants to merge 16 commits into from

Conversation

roywei
Copy link
Member

@roywei roywei commented Oct 18, 2019

Description

fix #15662

Copy link
Member

@wkcn wkcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you! : )

src/operator/nn/dropout-inl.h Outdated Show resolved Hide resolved
src/operator/nn/dropout-inl.h Outdated Show resolved Hide resolved
@roywei
Copy link
Member Author

roywei commented Oct 21, 2019

@wkcn Thanks for the review!
Still investigate why local unit test passed, but CI constantly failed, seems seed is not fixed in CI.

On local GPU, running the following passed:

  1. nosetests single test passed 10000 times
  2. nosetests all test_operator_gpu passed
  3. runing cudnn Dropout reproducibility  #15662 (comment) directly from python

However, this test failed on CI with both mx.seed(fixed_seed) and @with_seed(fixed_seed) decorator.

I wil try to reproduce CI failure locally first. Or try to add this to nightly test where less nosetests are executed at the same time. Suspect other nosetest running parallel on CI workers will affect the result.

@roywei
Copy link
Member Author

roywei commented Oct 21, 2019

cc @eric-haibin-lin @sxjscience

@roywei
Copy link
Member Author

roywei commented Oct 21, 2019

I am able to reproduce CI failure locally now by running the following on P3.8x with DLAMI
ci/build.py --docker-registry mxnetci --nvidiadocker --platform ubuntu_gpu_cu101 --docker-build-retries 3 --shm-size 500m /work/runtime_functions.sh unittest_ubuntu_python2_gpu

result:

======================================================================
FAIL: test_operator_gpu.test_dropout_with_seed
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/usr/local/lib/python2.7/dist-packages/nose/util.py", line 620, in newfunc
    return func(*arg, **kw)
  File "/work/mxnet/tests/python/gpu/../unittest/common.py", line 177, in test_new
    orig_test(*args, **kwargs)
  File "/work/mxnet/tests/python/gpu/../unittest/test_operator.py", line 6946, in test_dropout_with_seed
    assert_almost_equal(b.asnumpy(), c.asnumpy())
  File "/work/mxnet/python/mxnet/test_utils.py", line 624, in assert_almost_equal
    raise AssertionError(msg)
AssertionError:
Items are not equal:
Error 100000002004087734272.000000 exceeds tolerance rtol=1.000000e-05, atol=1.000000e-20 (mismatch at least 0.110000%).
Location of maximum error: (0, 1), a=2.00000000, b=0.00000000
 ACTUAL: array([[0., 2., 2., ..., 2., 0., 0.],
       [0., 2., 2., ..., 0., 0., 2.],
       [2., 2., 2., ..., 0., 0., 2.],...
 DESIRED: array([[2., 0., 2., ..., 2., 0., 2.],
       [2., 2., 2., ..., 2., 2., 2.],
       [2., 0., 0., ..., 2., 2., 2.],...
-------------------- >> begin captured stdout << ---------------------

*** Maximum errors for vector of size 10000:  rtol=1e-05, atol=1e-20
--------------------- >> end captured stdout << ----------------------
-------------------- >> begin captured logging << --------------------
common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=179619306 to reproduce.
--------------------- >> end captured logging << ---------------------

However, running the test standalone with the same seed failed with CI enviroment passed:

MXNET_TEST_SEED=179619306 nosetests --logging-level=DEBUG --verbose -s  tests/python/gpu/test_operator_gpu.py:test_dropout_with_seed
[INFO] Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=980748466 to reproduce.
[WARNING] *** test-level seed set: all "@with_seed()" tests run deterministically ***
test_operator_gpu.test_dropout_with_seed ... [INFO] Setting test np/mx/python random seeds, use MXNET_TEST_SEED=179619306 to reproduce.
[07:36:44] ../src/base.cc:84: Upgrade advisory: this mxnet has been built against cuDNN lib version 7401, which is older than the oldest version tested by CI (7600).  Set MXNET_CUDNN_LIB_CHECKING=0 to quiet this warning.
ok

----------------------------------------------------------------------
Ran 1 test in 13.896s

OK

Copy link
Member

@wkcn wkcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @roywei , could you please add a unit-test for multi-GPU? The result of Dropout on multi-GPU should be different.
Thanks a lot : )

Tensor<xpu, 1, unsigned>(reinterpret_cast<unsigned *>(workspace_ptr),
Shape1(1), s);
prnd->GetRandInt(random_number);
uint64_t seed_ = 17 + reinterpret_cast<uint64_t>(&random_number[0]) % 4096;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it need the modulus operator % ? the modulus operator makes the seed_ between 0+17~4096+17

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be uint64_t seed_ = 17 + static_cast<uint64_t>(random_number[0]) % 4096;, because the type of random_number[0] is unsigned.

https://github.com/apache/incubator-mxnet/blob/master/3rdparty/mshadow/mshadow/tensor.h#L591

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will give a segfault during dropout. Also why would dropout on mult-gpu return different result? I thought the seed in fixed at global? so dropout on different GPU will use the same seed, thus return the same result?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry that I didn't express it clearly. If different GPUs use the same seed, the result of drouout of different GPU should be the same. When training a model, do GPUs use different random seed?

@roywei
Copy link
Member Author

roywei commented Oct 21, 2019

Hi @roywei , could you please add a unit-test for multi-GPU? The result of Dropout on multi-GPU should be different.
Thanks a lot : )

I believe GPU unit tests are running on instances with 1GPU. I will try to move the entire test to nightly tests where it's using P3 instances with 4 gpus. I can add multi-gpu test there. Hopefully the seed can be properly fixed with less parallel jobs on CI workers.

from mxnet.test_utils import assert_almost_equal


def test_dropout_with_seed():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the with seed annotation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm manually choosing a random seed and set it before each forward, so with_seed decorator will not take effect. See comment: #16532 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In line 29 you're generating a random number to feed the seed though, so that generator needs to be fed with a seed as well

@marcoabreu
Copy link
Contributor

Ci runs on g3.8xlarge with 2 GPUs

Tensor<xpu, 1, unsigned>(reinterpret_cast<unsigned *>(workspace_ptr),
Shape1(1), s);
prnd->GetRandInt(random_number);
uint64_t seed_ = 17 + static_cast<uint64_t>(random_number[0]) % 4096;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tensor is on GPU, we need to explicitly copy it back to CPU using cudaMemCopy. You might get garbage value if you just access data on GPU mem address directly

with mx.autograd.record():
result2 = dropout(data1)

mx.random.seed(seed, ctx=mx.gpu(0))
Copy link
Member

@wkcn wkcn Oct 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be mx.random.seed(seed, ctx=mx.gpu(1)) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to fix seed on gpu 0 only, so gpu1 still have random seed, so result3 and result2 will be different. Otherwise, I will need to create a different seed2 and fix it on gpu1, it would take the same effect. (different seed on gpu0 and gpu1 leading to result2 and result3 to be different)

Copy link
Member Author

@roywei roywei Oct 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to use a different seed on gpu1

@eric-haibin-lin eric-haibin-lin dismissed their stale review October 22, 2019 17:14

comments addressed.

// copy generated random int to cpu
unsigned data = 0;
CUDA_CALL(cudaMemcpy(&data, &random_number[0], sizeof(unsigned), cudaMemcpyDeviceToHost));
uint64_t seed_ = 17 + static_cast<uint64_t>(data) % 4096;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ptrendx @DickJC123 any concern for the fix?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there are multiple problems with this:

  • why are you trying to seed the dropout every time you do forward? It will not actually do anything I believe as get_cudnn_dropout_desc after the first call calls cudnnRestoreDropoutDescriptor which seems to ignore the seed value. If you actually want to seed the dropout every time (via cudnnSetDropoutDescriptor) that would be super costly as seeding random number generator is way more expensive than actually using it.
  • you should never use cudaMemcpy in the operator code. If you really need to copy values from GPU to CPU you should use cudaMemcpyAsync followed by cudaStreamSynchronize. The difference is cudaMemcpy synchronizes on all streams (so it waits on all GPU activity and prevents all subsequent work to be done on all worker threads) vs cudaStreamSynchronize which waits only on the stream that you pass as argument (so other GPU workers are not affected).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about having the following logic inside get_cudnn_dropout_desc:

  if (!state_space->handle.size) {
      request = ResourceManager::Get()->Request(cpu_ctx, ResourceRequest::kRandom)
      seed = request.GetRandInt()
      CUDNN_CALL(cudnnSetDropoutDescriptor(..., seed))
} else {
    // use a dummy seed (e.g. 0)  for cudnnRestoreDropoutDescriptor
}

and we remove the seed argument for get_cudnn_dropout_desc?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a separate note, I see 34 occurrence of cudaMemcpy in various places (mainly operators) in the codebase, we probably need to do some cleanup:

src/kvstore/kvstore_utils.cu:  CUDA_CALL(cudaMemcpy(sort_output_ptr, dptr, sort_output_bytes,
src/kvstore/kvstore_utils.cu:  CUDA_CALL(cudaMemcpy(&num_selected_out, num_selected_ptr, num_selected_bytes,
src/ndarray/ndarray_function.cu:      CUDA_CALL(cudaMemcpy(&nnr_out, &row_flg[num_rows-1], sizeof(dim_t),
src/operator/contrib/adamw.cu:    CUDA_CALL(cudaMemcpy(&scale, scale_blob.dptr<DType>(), sizeof(DType),
src/operator/contrib/boolean_mask.cu:  CUDA_CALL(cudaMemcpy(&valid_num, &prefix_sum[idx_size - 1], sizeof(int32_t),
src/operator/contrib/index_array.cu:    CUDA_CALL(cudaMemcpy(workspace.dptr_, cpu_workspace.data(), sizeof(int64_t) * (2 * naxes),
src/operator/contrib/index_array.cu:    CUDA_CALL(cudaMemcpy(workspace.dptr_, inshape.data(), sizeof(dim_t) * ndim,
src/operator/contrib/multi_proposal.cu:  FRCNN_CUDA_CHECK(cudaMemcpy(&mask_host[0],
src/operator/contrib/multi_proposal.cu:    FRCNN_CUDA_CHECK(cudaMemcpy(workspace_proposals.dptr_, &anchors[0],
src/operator/contrib/multi_proposal.cu:        FRCNN_CUDA_CHECK(cudaMemcpy(keep, &_keep[0], sizeof(int) * _keep.size(),
src/operator/contrib/proposal.cu:  FRCNN_CUDA_CHECK(cudaMemcpy(&mask_host[0],
src/operator/contrib/proposal.cu:    FRCNN_CUDA_CHECK(cudaMemcpy(workspace_proposals.dptr_,
src/operator/contrib/proposal.cu:    FRCNN_CUDA_CHECK(cudaMemcpy(&cpu_im_info[0], im_info.dptr_,
src/operator/contrib/proposal.cu:    FRCNN_CUDA_CHECK(cudaMemcpy(keep, &_keep[0], sizeof(int) * _keep.size(),
src/operator/numpy/np_boolean_mask_assign.cu:    CUDA_CALL(cudaMemcpy(&valid_num, &prefix_sum[mask_size], sizeof(size_t),
src/operator/numpy/np_nonzero_op.cu:  CUDA_CALL(cudaMemcpy(&valid_num, &prefix_sum[in_size - 1], sizeof(int32_t),
src/operator/numpy/np_nonzero_op.cu:      CUDA_CALL(cudaMemcpy(out.data().dptr<int64_t>(), &temp, sizeof(int64_t),
src/operator/numpy/np_unique_op.cu:    CUDA_CALL(cudaMemcpy(&valid_num, thrust::raw_pointer_cast(&prefix_sum[input_size - 1]),
src/operator/numpy/np_unique_op.cu:    CUDA_CALL(cudaMemcpy(&valid_num, thrust::raw_pointer_cast(&prefix_sum[temp_shape[0] - 1]),
src/operator/numpy/np_unique_op.cu:      CUDA_CALL(cudaMemcpy(outputs[0].data().dptr<DType>(), inputs[0].data().dptr<DType>(),
src/operator/numpy/random/dist_common.cu:CUDA_CALL(cudaMemcpy(dst, src, sizeof(float), cudaMemcpyDeviceToHost));
src/operator/numpy/random/dist_common.cu:CUDA_CALL(cudaMemcpy(dst, src, sizeof(double), cudaMemcpyDeviceToHost));
src/operator/numpy/random/np_multinomial_op.cu:  CUDA_CALL(cudaMemcpy(&pvals_[0], input, sizeof(DType) * prob_length,
src/operator/rnn-inl.h:      CUDA_CALL(cudaMemcpy(sequence_length_cpu_itype,  sequence_length_ptr_gpu,
src/operator/tensor/cast_storage-inl.cuh:  CUDA_CALL(cudaMemcpy(&nnr, &row_flg[num_rows - 1], sizeof(dim_t), cudaMemcpyDeviceToHost));
src/operator/tensor/cast_storage-inl.cuh:        CUDA_CALL(cudaMemcpy(&nnz, &(indptr[num_rows]), sizeof(IType), cudaMemcpyDeviceToHost));
src/operator/tensor/dot-inl.cuh:          CUDA_CALL(cudaMemcpy(&nnr, nnr_ptr, nnr_bytes, cudaMemcpyDeviceToHost));
src/operator/tensor/dot-inl.cuh:            CUDA_CALL(cudaMemcpy(&nnr_out, &row_flg_out[num_cols_l-1], sizeof(dim_t),
src/operator/tensor/elemwise_binary_op_basic.cu:        CUDA_CALL(cudaMemcpy(&nnr_out, &common_row_table[num_rows-1], sizeof(nnvm::dim_t),
src/operator/tensor/indexing_op.cu:  CUDA_CALL(cudaMemcpy(&is_valid, is_valid_ptr, sizeof(char),
src/operator/tensor/indexing_op.cu:  CUDA_CALL(cudaMemcpy(&nnr, grad_row_idx + data_size, sizeof(RType),
src/operator/tensor/indexing_op.cu:        CUDA_CALL(cudaMemcpy(&nnr, &prefix_sum[num_rows-1], sizeof(dim_t),
src/operator/tensor/matrix_op.cu:        CUDA_CALL(cudaMemcpy(&nnr, &out_indptr[indptr_len-1], sizeof(RType),
src/operator/tensor/square_sum.cu:    CUDA_CALL(cudaMemcpy(&is_diff, is_diff_ptr, sizeof(int32_t), cudaMemcpyDeviceToHost));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reminisce @haojin2 can we fix the cudaMemcpy calls in the numpy op? They impact GPU performance

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eric-haibin-lin let's create an issue for tracking

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already created one #16583

@roywei
Copy link
Member Author

roywei commented Oct 22, 2019

Passed nightly test locally:
command:

ci/build.py --docker-registry mxnetci --nvidiadocker --platform ubuntu_nightly_gpu --docker-build-retries 3 --shm-size 500m /work/runtime_functions.sh nightly_python

result:

----------------------------------------------------------------------
Ran 2 tests in 351.086s

OK
build.py: 2019-10-22 17:05:18,213Z INFO Waiting for status of container 5d41b56c614c for 600 s.
build.py: 2019-10-22 17:05:18,443Z INFO Container exit status: {'StatusCode': 0, 'Error': None}
build.py: 2019-10-22 17:05:18,443Z INFO Container exited with success 👍
build.py: 2019-10-22 17:05:18,443Z INFO Stopping container: 5d41b56c614c
build.py: 2019-10-22 17:05:18,445Z INFO Removing container: 5d41b56c614c

@roywei
Copy link
Member Author

roywei commented Oct 22, 2019

Just noticed our test_dropout unittest is disabled due to flakiness. (#14288)

I have verified locally this test passed with my PR, the test is still flaky though (failed with 10000 runs, use MXNET_TEST_SEED=107821594)

python3 -m "nose" tests/python/gpu/test_operator_gpu.py:test_dropout
[INFO] Setting module np/mx/python random seeds, use MXNET_MODULE_SEED=1842556802 to reproduce.
[22:22:55] ../src/base.cc:84: Upgrade advisory: this mxnet has been built against cuDNN lib version 7401, which is older than the oldest version tested by CI (7600).  Set MXNET_CUDNN_LIB_CHECKING=0 to quiet this warning.
.
----------------------------------------------------------------------
Ran 1 test in 16.091s

OK

@roywei
Copy link
Member Author

roywei commented Oct 23, 2019

With updated implementation, dropout seed will come from cpu, setting seed on different gpu won't work. (it also does not work before)
tested again nightly and unit test passed. ran performance test and get similar speed as in #13896

In [1]: import mxnet as mx
   ...: a = mx.nd.ones((10, 200, 300, 500), ctx=mx.gpu(0))
   ...: a.attach_grad()
   ...: mx.autograd.set_recording(True)
   ...: %timeit mx.nd.Dropout(a, 0.5, mode='always').wait_to_read()

[10:00:10] ../src/base.cc:84: Upgrade advisory: this mxnet has been built against cuDNN lib version 7401, which is older than the oldest version tested by CI (7600).  Set MXNET_CUDNN_LIB_CHECKING=0 to quiet this warning.
4.51 ms ± 6.66 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)```

@roywei roywei mentioned this pull request Oct 24, 2019
@eric-haibin-lin
Copy link
Member

@ptrendx any concerns on the new fix?

CUDNN_CALL(cudnnSetDropoutDescriptor(dropout_desc_, s->dnn_handle_,
param_.p, // discard probability
dropout_states, dropout_bytes,
seed_));
0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the minimum, the comment should be updated. The way these calls work is:
cudnnSetDropoutDescriptor(..., dropout_states==NULL,...) // Set dropout probability and seed, leave states alone.
cudnnSetDropoutDescriptor(..., dropout_states!=NULL,...) // Set dropout probability and seed, init states based on these values.

cudnnRestoreDropoutDescriptor() // Set dropout probability, seed and states ptr from provided args.

Copy link
Member Author

@roywei roywei Oct 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, and I found another issue, the seed will be fixed during resource initialization. So once a dropout layer is created, event if we don't fix seed, each dropout result will be the same.

If we want each forward to have a different result, and the random result must respect mxnet random seed. The only solution is to get mxnet seed during each forward. Is that right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new test case

@with_seed()
def test_dropout_with_seed():
    info = np.iinfo(np.int32)
    seed = np.random.randint(info.min, info.max)
    _test_dropout(mx.cpu(), seed)
    _test_dropout(mx.gpu(), seed)
    _test_dropout(mx.cpu())
    _test_dropout(mx.gpu())

def _test_dropout(ctx, seed=None):
    data = mx.nd.ones((100, 100), ctx=ctx)
    dropout = mx.gluon.nn.Dropout(0.5)

    if seed:
        mx.random.seed(seed)
    with mx.autograd.record():
        result1 = dropout(data)

    if seed:
        mx.random.seed(seed)
    with mx.autograd.record():
        result2 = dropout(data)
    if seed:
        # dropout on gpu should return same result with fixed seed
        assert_almost_equal(result1.asnumpy(), result2.asnumpy())
    else:
        # seed not fixed, result should be different
        with assert_raises(AssertionError):
            assert_almost_equal(result1.asnumpy(), result2.asnumpy())

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dick is looking into it, but I don't think sentence event if we don't fix seed, each dropout result will be the same is true. What is true though, is the fact that the dropout as is implemented now would not respond to the change of seed on the MXNet side after the initial creation of the op. Seeding each forward would destroy performance, so how about a solution like this - if you know that you will cache the value of seed (like in the dropout descriptor resource case), every time when you get the descriptor it should internally ask the random resource "did a user set a new seed?" (which could be implemented as a set in the random resource that would keep track of who asked about this already and resetting that set when user calls to mxnet.random.seed). If the answer is "no", then no reseeding is required, but if the answer is yes, the dropout descriptor should be reseeded with a new value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

echoing @ptrendx 's comment, we should implement SetSeed() for the cudnn resource in https://github.com/apache/incubator-mxnet/blob/master/src/resource.cc#L174-L201

@DickJC123
Copy link
Contributor

@roywei I've been playing with a variant of your proposed test in which I set the seed to two different values for the two models and expect the results to be different. This fails, because the results are identical even with the differing seeds. The two models each get their own gpu random resource, but the two are seeded by cpu random number generators that are identical.

The problem here is that the cpu rng's are not responding to mx.random.seed(), and instead have their seed set to 0. The reason is that cpu rngs are requested from the ResourceManager, and ResourceManager is a thread-local variable. The main python thread (performing the mx.random.seed()) only affects the global_seed_ data member for its ResourceManager instance, which does not affect the seeds of the cpu rngs that are requested of the worker thread's ResourceManager.

@DickJC123
Copy link
Contributor

Yeah, this is going to be tricky to fix solidly, particularly considering models with multiple dropouts and rnns.

If all the gpu rngs of a worker indeed share the same rng state, then only the seed from the first rng in a model effects the initialization (the other seeds are ignored). Also, I believe it is non-deterministic which GPU worker handles an operator during execution, so this represents a problem (could be the cause of some low-probability test failures).

I believe moving the gpu rng state to be a resource was motivated by the high initialization overhead, particularly painful for imperative models. Would it be possible to add a seed argument at the python level to dropout and rnn operators, with the understanding that by setting the seed, the operator will get its own rng state (at some memory and initialization expense)? Not setting a seed would grab the global resource- not sure how the determinism would work, but it would be fast.

@roywei
Copy link
Member Author

roywei commented Feb 7, 2020

closing in favor of #17547

@roywei roywei closed this Feb 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cudnn Dropout reproducibility
7 participants