You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following code fails to correctly iterate the elements of the array:
using CUDA
struct Box{T<:AbstractFloat}
x::T
y::T
z::Tendstruct Sphere{T<:AbstractFloat}
r::Tendstruct Tube{T<:AbstractFloat}
r::T
z::Tendstruct Cone{T<:AbstractFloat}
r::T
z::Tendvolume(b::Box{T}) where T = b.x * b.y * b.z
volume(s::Sphere{T}) where T =T(4)/3* π * s.r^3volume(t::Tube{T}) where T =T(π) * t.r^2* t.z
volume(c::Cone{T}) where T =T(1)/3* π * c.r^2* c.z
functionkernel(::Type{T}, shapes) where {T}
for s in shapes
if s isa Box{Float32}
@cuprintln"Box: $(volume(s))"elseif s isa Sphere{T}
@cuprintln"Sphere: $(volume(s))"elseif s isa Tube{T}
@cuprintln"Tube: $(volume(s))"elseif s isa Cone{T}
@cuprintln"Cone: $(volume(s))"else@cuprintln"Unknown shape"endendreturnnothingendfunctionmain(T=Float32)
shapes =Vector{Union{Box{T}, Sphere{T}, Tube{T}, Cone{T}}}()
#shapes = Vector{Union{Box{T}, Sphere{T}, Tube{T}}}()push!(shapes, Box{T}(1,2,3))
push!(shapes, Sphere{T}(1))
cu_shapes =CuVector(shapes)
@cudakernel(T, cu_shapes)
end
It is related to the union splitting limit of 3 -- uncommenting the alternative shapes allocation results in only 3 element types and the generated code being correct. Note that the manual splitting is required because of the limit, but even doing so the code still contains allocations, so this isn't a viable pattern for GPU programming.
The text was updated successfully, but these errors were encountered:
Don't pass such containers (which more than 3 types in a union) so that automatic union splitting kicks in. Code like this is always going to be fragile, because the Julia compiler is at liberty to generate dynamic code.
The following code fails to correctly iterate the elements of the array:
It is related to the union splitting limit of 3 -- uncommenting the alternative
shapes
allocation results in only 3 element types and the generated code being correct. Note that the manual splitting is required because of the limit, but even doing so the code still contains allocations, so this isn't a viable pattern for GPU programming.The text was updated successfully, but these errors were encountered: