You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Regardless of the model, data, or any other condition, I’ve never been able of using the built-in Flux.dice_coeff_loss() function. A very long error dump shows up, apparently tied to CUDA and memory usage.
The issue has been confirmed and duplicated on Discourse forum. For details, please check this link.
The text was updated successfully, but these errors were encountered:
julia>using Flux, CUDA
julia>let x =randn(3,5) |> cu
y = Flux.onehotbatch("abcab", 'a':'c') |> cu
Flux.dice_coeff_loss(x, y) # works forwardend1.1841338f0
julia>let x =randn(3,5) |> cu
y = Flux.onehotbatch("abcab", 'a':'c') |> cu
gradient(Flux.mse, x, y) # some gradients workend
(Float32[-0.16939788-0.19461282…-0.30000073-0.017194644; 0.07464689-0.15628384…-0.17090265-0.007114268; -0.22359066-0.06903434…0.1566836-0.022250716], nothing)
julia>let x =randn(3,5) |> cu
y = Flux.onehotbatch("abcab", 'a':'c') |> cu
gradient(Flux.dice_coeff_loss, x, y)
end
ERROR: a exception was thrown during kernel execution.
Run Julia on debug level 2for device stack traces.
...
ERROR: KernelException: exception thrown during kernel execution on device Tesla V100-PCIE-16GB
Stacktrace:
[1] check_exceptions()
@ CUDA ~/.julia/packages/CUDA/htRwP/src/compiler/exceptions.jl:34
[2] device_synchronize(; blocking::Bool, spin::Bool)
@ CUDA ~/.julia/packages/CUDA/htRwP/lib/cudadrv/synchronization.jl:180
(@v1.10) pkg> st Flux CUDA
Status `~/.julia/environments/v1.10/Project.toml`
[052768ef] CUDA v5.2.0
[587475ba] Flux v0.14.11
I don't know if this is the same error as yours, but it's surprising, and is a bug.
What "Run Julia on debug level 2 for device stack traces" means is that starting the REPL with julia -g2 will capture more information, which may help narrow this down. Can you try this, and paste here as much information as possible?
Cheers,
Regardless of the model, data, or any other condition, I’ve never been able of using the built-in Flux.dice_coeff_loss() function. A very long error dump shows up, apparently tied to CUDA and memory usage.
The issue has been confirmed and duplicated on Discourse forum. For details, please check this link.
The text was updated successfully, but these errors were encountered: