You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
rocRAND's tests test_hiprand_kernel, test_hiprand_api, and test_rocrand_kernel_philox4x32_10 randomly fail on AMD MI25 on ROCm 1.6.4. Mentioned tests don't fail on ROCm 1.6.3 and on CUDA 8/9. As far as we know right now, they also don't fail on any other device on ROCm 1.6.4. Currently, we suspect the problem is in ROCm, not it rocRAND.
After investigation we think it's some kind of synchronisation bug which shows itself only in very specific situations. Until it's fixed you can use temporary workarounds from branch rocm_164_mi25_workarounds.
Most of the features (including the most popular ones) are not / should not be affected by this bug.
Environment
Hardware:
AMD Radeon Instinct MI25
Software
version
ROCm
1.6.4
HIP
1.3.17385
HCC
clang version 6.0.0 (based on HCC 1.0.17412-f590a25-821e6d8-64e7fc7)
adding additional synchronization after kernels and before copying the memory (as presented in branch rocm_164_mi25_workarounds; you can try using hipStreamWaitEvent() or hipStreamSynchronize() which should have less impact on performance),
setting environment variable HCC_OPT_FLUSH to 0, or
setting HIP_LAUNCH_BLOCKING to 1.
Please comment if you have problems applying the workarounds, or experience similar bug in a different place or on a different device.
The text was updated successfully, but these errors were encountered:
rocRAND's tests
test_hiprand_kernel
,test_hiprand_api
, andtest_rocrand_kernel_philox4x32_10
randomly fail on AMD MI25 on ROCm 1.6.4. Mentioned tests don't fail on ROCm 1.6.3 and on CUDA 8/9. As far as we know right now, they also don't fail on any other device on ROCm 1.6.4. Currently, we suspect the problem is in ROCm, not it rocRAND.After investigation we think it's some kind of synchronisation bug which shows itself only in very specific situations. Until it's fixed you can use temporary workarounds from branch rocm_164_mi25_workarounds.
Most of the features (including the most popular ones) are not / should not be affected by this bug.
Environment
Hardware:
master
(452ef66)Workarounds
The possible workarounds for this bug are:
hipStreamWaitEvent()
orhipStreamSynchronize()
which should have less impact on performance),HCC_OPT_FLUSH
to 0, orHIP_LAUNCH_BLOCKING
to 1.Please comment if you have problems applying the workarounds, or experience similar bug in a different place or on a different device.
The text was updated successfully, but these errors were encountered: