You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Math ops like sqrt and log does not seem to be implemented for half precision (e.g. Float16 and BFloat16):
julia>sqrt.(cu(Float16[1,2]))
┌ Warning: Performing scalar operations on GPU arrays: This is very slow, consider disallowing these operations with `allowscalar(false)`
└ @ GPUArrays E:\Programs\julia\.julia\packages\GPUArrays\jhRU7\src\host\indexing.jl:43
ERROR: MethodError: no method matching sqrt(::Float16)
You may have intended to import Base.sqrt
Closest candidates are:sqrt(::Float32) at E:\Programs\julia\.julia\packages\CUDA\YeS8q\src\device\intrinsics\math.jl:193sqrt(::Float64) at E:\Programs\julia\.julia\packages\CUDA\YeS8q\src\device\intrinsics\math.jl:192
Stacktrace:
[1] _broadcast_getindex_evalf at .\broadcast.jl:648 [inlined]
[2] _broadcast_getindex at .\broadcast.jl:621 [inlined]
[3] getindex at .\broadcast.jl:575 [inlined]
[4] copy at .\broadcast.jl:876 [inlined]
[5] materialize(::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1},Nothing,typeof(CUDA.sqrt),Tuple{CuArray{Float16,1}}}) at .\broadcast.jl:837
[6] top-level scope at REPL[2]:1
julia>sqrt.(cu(CUDA.BFloat16s.BFloat16[1,2]))
┌ Warning: Performing scalar operations on GPU arrays: This is very slow, consider disallowing these operations with `allowscalar(false)`
└ @ GPUArrays E:\Programs\julia\.julia\packages\GPUArrays\jhRU7\src\host\indexing.jl:43
ERROR: MethodError: no method matching sqrt(::BFloat16s.BFloat16)
You may have intended to import Base.sqrt
Closest candidates are:sqrt(::Float32) at E:\Programs\julia\.julia\packages\CUDA\YeS8q\src\device\intrinsics\math.jl:193sqrt(::Float64) at E:\Programs\julia\.julia\packages\CUDA\YeS8q\src\device\intrinsics\math.jl:192
Stacktrace:
[1] _broadcast_getindex_evalf at .\broadcast.jl:648 [inlined]
[2] _broadcast_getindex at .\broadcast.jl:621 [inlined]
[3] getindex at .\broadcast.jl:575 [inlined]
[4] copy at .\broadcast.jl:876 [inlined]
[5] materialize(::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1},Nothing,typeof(CUDA.sqrt),Tuple{CuArray{BFloat16s.BFloat16,1}}}) at .\broadcast.jl:837
[6] top-level scope at REPL[10]:1
Describe the solution you'd like
That correct values are returned.
Describe alternatives you've considered
I tried my luck in just defining @inline CUDA.sqrt(x::Float16) = ccall("extern hsqrt", llvmcall, Float16, (Float16,), x) after finding this doc, but it seems something more is required (maybe the h-file is not included?):
julia>sqrt.(cu(Float16[1,2]))
ERROR: InvalidIRError: compiling kernel broadcast_kernel(CUDA.CuKernelContext, CuDeviceArray{Float16,1,1}, Base.Broadcast.Broadcasted{Nothing,Tuple{Base.OneTo{Int64}},typeof(CUDA.sqrt),Tuple{Base.Broadcast.Extruded{CuDeviceArray{Float16,1,1},Tuple{Bool},Tuple{Int64}}}}, Int64) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to hsqrt)
Additional context
I might be able to submit a PR if I get some pointers as to what is required. I'll poke around a bit to see if I can spot something in the meantime. Not sure about the broadcast though. Would it solve itself if one creates the correct math functions?
The text was updated successfully, but these errors were encountered:
Yeah these are tricky, and nobody is actively working on them right now. The CUDA implementations often just cast to Float32 though, so you could do the same for now.
Math ops like
sqrt
andlog
does not seem to be implemented for half precision (e.g.Float16
andBFloat16
):Describe the solution you'd like
That correct values are returned.
Describe alternatives you've considered
I tried my luck in just defining
@inline CUDA.sqrt(x::Float16) = ccall("extern hsqrt", llvmcall, Float16, (Float16,), x)
after finding this doc, but it seems something more is required (maybe the h-file is not included?):Additional context
I might be able to submit a PR if I get some pointers as to what is required. I'll poke around a bit to see if I can spot something in the meantime. Not sure about the broadcast though. Would it solve itself if one creates the correct math functions?
The text was updated successfully, but these errors were encountered: