Structs with tuple fields as broadcast arguments #38
Comments
Are you using this with an OpenCL CPU driver? I have been figthing with issues related to different alignment for intel/amd cpu opengl drivers...
|
I'm not totally sure, was hoping it was on the (somewhat meagre) GPU! Two driver versions show up on my setup: OpenCL.Device(Intel(R) HD Graphics on Intel(R) OpenCL @0x000055ad4d6cb990)
OpenCL.Device(Intel(R) HD Graphics 5500 BroadWell U-Processor GT2 on Intel Gen OCL Driver @0x00007f573ac89600) Using the first one just kills julia so I figure its an artefact of some kind, so I It should be the latest package clones and Julia, this is a fresh install yesterday. Let me know if you need any other details. |
I'm mostly trying to test that OpenCL will actually run the kernels that I was running on regular CPU. The main GPU work will be on CUDA, which I'm just getting set up. I'm aiming to build a simulation framework that will run user defined inner kernels on CPU/OpenCL/CUDA interchangeably, but I'm not sure how practically possible that is yet. |
You might be able to get around this problem if you can insert some padding ;) struct WithTuple
a::Int32
pad::Int32 # might be even more efficient
b::Tuple{Int32,Int32}
end |
Unfortunately not so easy! What would I be aiming for - padding it out to 64bit multiples? I'm also wondering if there is a long term solution for this, as in is it be possible to repack to a correct struct automatically? I was hoping user-supplied isbits structs would eventually work without this kind of step. |
I put a lot of effort into this and I thought I got it working with most gpu opencl drivers. I posted a question on stackoverflow a while ago: https://stackoverflow.com/questions/47076012/opencl-only-on-amd-cl-invalid-arg-size There was actually a suggestion in there:
I'm not a 100% sure, if that is a valid workaround for your specific issue - but definitely worth a try. |
Btw your example works on my gpus! julia> using CLArrays
julia> struct WithTuple
a::Int32
b::Tuple{Int32,Int32}
end
julia> Base.:(+)(x::Integer, y::WithTuple) = x + y.b[2]
julia> x = CLArray(Int32[1,2,3,4])
julia> x .+ WithTuple(1, (2,3))
GPU: 4-element Array{Int32,1}:
4
5
6
7
julia> CLArrays.device(x)
OpenCL.Device(Intel(R) HD Graphics 630 on Intel(R) OpenCL @0x00000000076a8ed0)
julia> CLArrays.init(CLArrays.devices()[2])
OpenCL context with:
CL version: OpenCL 1.2 CUDA
Device: CL GeForce GTX 1060
threads: 1024
blocks: (1024, 1024, 64)
global_memory: 6442.450944 mb
free_global_memory: NaN mb
local_memory: 0.049152 mb
julia> CLArray(Int32[1,2,3,4]) .+ WithTuple(1, (2,3))
GPU: 4-element Array{Int32,1}:
4
5
6
7 I'm kind of sick of dealing with buggy / inconsistent OpenCL drivers :P |
Oh god I didn't realise it was like that... these are both Intel HD cards... julia> using CLArrays
julia> CLArrays.init(CLArrays.devices()[2])
OpenCL context with:
CL version: OpenCL 1.2 beignet 1.3
Device: CL Intel(R) HD Graphics 5500 BroadWell U-Processor GT2
threads: 512
blocks: (512, 512, 512)
global_memory: 4119.855104 mb
free_global_memory: NaN mb
local_memory: 0.065536 mb
julia> struct WithTuple
a::Int32
b::Tuple{Int32,Int32}
end
julia> Base.:(+)(x::Integer, y::WithTuple) = x + y.b[2]
julia> x = CLArray(Int32[1,2,3,4])
GPU: 4-element Array{Int32,1}:
1
2
3
4
julia> x .+ WithTuple(1, (2,3))
ERROR: Julia and OpenCL type don't match at kernel argument 6: Found Tuple{CLArrays.DeviceArray{Int32,1,CLArrays.HostPtr{Int32}},WithTuple}.
Please make sure to define OpenCL structs correctly!
You should be generally fine by using `__attribute__((packed))`, but sometimes the alignment of fields is different from Julia.
Consider the following example:
```
//packed
// Tuple{NTuple{3, Float32}, Void, Float32}
struct __attribute__((packed)) Test{
float3 f1;
int f2; // empty type gets replaced with Int32 (no empty types allowed in OpenCL)
// you might need to define the alignement of fields to match julia's layout
float f3; // for the types used here the alignement matches though!
};
// this is a case where Julia and OpenCL packed alignment would differ, so we need to specify it explicitely
// Tuple{Int64, Int32}
struct __attribute__((packed)) Test2{
long f1;
int __attribute__((aligned (8))) f2; // opencl would align this to 4 in packed layout, while Julia uses 8!
};
```
You can use `c.datatype_align(T)` to figure out the alignment of a Julia type!
Stacktrace:
[1] set_arg!(::OpenCL.cl.Kernel, ::Int64, ::Tuple{CLArrays.DeviceArray{Int32,1,CLArrays.HostPtr{Int32}},WithTuple}) at /home/raf/.julia/v0.6/OpenCL/src/kernel.jl:186
[2] (::CLArrays.CLFunction{GPUArrays.#broadcast_kernel!,Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}},Tuple{(3, :ptr),(6, 1, :ptr)}})(::Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1
},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Int64}, ::Tuple{Int64}, ::OpenCL.cl.CmdQueue) at /home/raf/.julia/v0.6/CLArrays/src/compilation.jl:279
[3] (::CLArrays.CLFunction{GPUArrays.#broadcast_kernel!,Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}},Tuple{(3, :ptr),(6, 1, :ptr)}})(::Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1
},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Int64}, ::Tuple{Int64}) at /home/raf/.julia/v0.6/CLArrays/src/compilation.jl:272
[4] _gpu_call(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Base.#+,CLArrays.
CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any
,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Tuple{Int64},Tuple{Int64}}) at /home/raf/.julia/v0.6/CLArrays/src/compilation.jl:18
[5] gpu_call(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,
0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Int64) at /home/raf/.julia/v0.6/GPUArrays/src/abstract_gpu_interface.jl:151
[6] _broadcast!(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Tuple{Bool},Tuple{}}, ::Tuple{Tuple{Int64},Tuple{}}, ::CLArrays.CLArray{Int32,1}, ::Tuple{WithTuple}, ::Type{Val{1}}, ::CartesianRange{CartesianIndex{1}}) at /home/raf/.julia/v0.6/GPUArrays/src/broadcast.jl:89
[7] broadcast_t(::Function, ::Type{Int32}, ::Tuple{Base.OneTo{Int64}}, ::CartesianRange{CartesianIndex{1}}, ::CLArrays.CLArray{Int32,1}, ::WithTuple) at /home/raf/.julia/v0.6/GPUArrays/src/broadcast.jl:49
[8] broadcast_c at ./broadcast.jl:316 [inlined]
[9] broadcast(::Function, ::CLArrays.CLArray{Int32,1}, ::WithTuple) at ./broadcast.jl:455
[10] macro expansion at ./REPL.jl:97 [inlined]
[11] (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:73 |
Could it be beignet? devices [1] actually just segfaults on that last line. Or some other compiler version issue? I'm on arch linux so occasionally get bitten by bleeding edge releases of compilers breaking things. |
Yeah, definitely... the last time I tried beignet on linux, it failed with a self test saying:
:D So I lost a bit of trust in beignet, although it seems like they improved a lot recently! |
Do you have a snippet of code that fails using the OpenCL provided by beignet? |
For my Intel HD 5500 the simple demo above fails with the error shown, or a segfault, depending on the driver I select, as for some reason there is two. So far intels compute-runtime and the older intel-opencl drivers also just segfault when I run that code. |
Anyway, thanks @SimonDanisch for all your work on these things, especially now I know what a mess you have to deal with behind the scenes!!! |
I actually tested the code and it failed just like yours x .+ WithTuple(1, (2,3))
ERROR: Julia and OpenCL type don't match at kernel argument 6: Found Tuple{CLArrays.DeviceArray{Int32,1,CLArrays.HostPtr{Int32}},WithTuple}.
Please make sure to define OpenCL structs correctly!
You should be generally fine by using `__attribute__((packed))`, but sometimes the alignment of fields is different from Julia.
Consider the following example:
```
//packed
// Tuple{NTuple{3, Float32}, Void, Float32}
struct __attribute__((packed)) Test{
float3 f1;
int f2; // empty type gets replaced with Int32 (no empty types allowed in OpenCL)
// you might need to define the alignement of fields to match julia's layout
float f3; // for the types used here the alignement matches though!
};
// this is a case where Julia and OpenCL packed alignment would differ, so we need to specify it explicitely
// Tuple{Int64, Int32}
struct __attribute__((packed)) Test2{
long f1;
int __attribute__((aligned (8))) f2; // opencl would align this to 4 in packed layout, while Julia uses 8!
};
```
You can use `c.datatype_align(T)` to figure out the alignment of a Julia type!
Stacktrace:
[1] set_arg!(::OpenCL.cl.Kernel, ::Int64, ::Tuple{CLArrays.DeviceArray{Int32,1,CLArrays.HostPtr{Int32}},WithTuple}) at /home/david/.julia/v0.6/OpenCL/src/kernel.jl:186
[2] (::CLArrays.CLFunction{GPUArrays.#broadcast_kernel!,Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}},Tuple{(3, :ptr),(6, 1, :ptr)}})(::Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Int64}, ::Tuple{Int64}, ::OpenCL.cl.CmdQueue) at /home/david/.julia/v0.6/CLArrays/src/compilation.jl:279
[3] (::CLArrays.CLFunction{GPUArrays.#broadcast_kernel!,Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}},Tuple{(3, :ptr),(6, 1, :ptr)}})(::Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Int64}, ::Tuple{Int64}) at /home/david/.julia/v0.6/CLArrays/src/compilation.jl:272
[4] _gpu_call(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Tuple{Int64},Tuple{Int64}}) at /home/david/.julia/v0.6/CLArrays/src/compilation.jl:18
[5] gpu_call(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Int64) at /home/david/.julia/v0.6/GPUArrays/src/abstract_gpu_interface.jl:151
[6] _broadcast!(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Tuple{Bool},Tuple{}}, ::Tuple{Tuple{Int64},Tuple{}}, ::CLArrays.CLArray{Int32,1}, ::Tuple{WithTuple}, ::Type{Val{1}}, ::CartesianRange{CartesianIndex{1}}) at /home/david/.julia/v0.6/GPUArrays/src/broadcast.jl:89
[7] broadcast_t(::Function, ::Type{Int32}, ::Tuple{Base.OneTo{Int64}}, ::CartesianRange{CartesianIndex{1}}, ::CLArrays.CLArray{Int32,1}, ::WithTuple) at /home/david/.julia/v0.6/GPUArrays/src/broadcast.jl:49
[8] broadcast_c at ./broadcast.jl:316 [inlined]
[9] broadcast(::Function, ::CLArrays.CLArray{Int32,1}, ::WithTuple) at ./broadcast.jl:455 Yet CLArrays seem to work
|
On my GPU it didn't fail?
Yeah this should work, since there isn't any struct involved? |
Did you try with device equal to the Iris pro? julia> using CLArrays
julia> struct WithTuple
a::Int32
b::Tuple{Int32,Int32}
end
julia> CLArrays.init(CLArrays.devices()[2])
OpenCL context with:
CL version: OpenCL 1.2 (Build 43)
Device: CL Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz
threads: 8192
blocks: (8192, 8192, 8192)
global_memory: 16728.113152 mb
free_global_memory: NaN mb
local_memory: 0.032768 mb
julia> CLArray([WithTuple(1, (2,3))])
GPU: 1-element Array{WithTuple,1}:
WithTuple(1, (2, 3))
julia> Base.:(+)(x::Integer, y::WithTuple) = x + y.b[2]
julia> x = CLArray(Int32[1,2,3,4])
GPU: 4-element Array{Int32,1}:
1
2
3
4
julia> x .+ WithTuple(1, (2,3))
GPU: 4-element Array{Int32,1}:
4
5
6
7
julia> CLArrays.init(CLArrays.devices()[1])
OpenCL context with:
CL version: OpenCL 1.2 beignet 1.4 (git-591d387)
Device: CL Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile
threads: 512
blocks: (512, 512, 512)
global_memory: 2147.483648 mb
free_global_memory: NaN mb
local_memory: 0.065536 mb
julia> x = CLArray(Int32[1,2,3,4])
GPU: 4-element Array{Int32,1}:
1
2
3
4
julia> x .+ WithTuple(1, (2,3))
ERROR: Julia and OpenCL type don't match at kernel argument 6: Found Tuple{CLArrays.DeviceArray{Int32,1,CLArrays.HostPtr{Int32}},WithTuple}.
Please make sure to define OpenCL structs correctly!
You should be generally fine by using `__attribute__((packed))`, but sometimes the alignment of fields is different from Julia.
Consider the following example:
```
//packed
// Tuple{NTuple{3, Float32}, Void, Float32}
struct __attribute__((packed)) Test{
float3 f1;
int f2; // empty type gets replaced with Int32 (no empty types allowed in OpenCL)
// you might need to define the alignement of fields to match julia's layout
float f3; // for the types used here the alignement matches though!
};
// this is a case where Julia and OpenCL packed alignment would differ, so we need to specify it explicitely
// Tuple{Int64, Int32}
struct __attribute__((packed)) Test2{
long f1;
int __attribute__((aligned (8))) f2; // opencl would align this to 4 in packed layout, while Julia uses 8!
};
```
You can use `c.datatype_align(T)` to figure out the alignment of a Julia type!
Stacktrace:
[1] set_arg!(::OpenCL.cl.Kernel, ::Int64, ::Tuple{CLArrays.DeviceArray{Int32,1,CLArrays.HostPtr{Int32}},WithTuple}) at /home/david/.julia/v0.6/OpenCL/src/kernel.jl:186
[2] (::CLArrays.CLFunction{GPUArrays.#broadcast_kernel!,Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}},Tuple{(3, :ptr),(6, 1, :ptr)}})(::Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Int64}, ::Tuple{Int64}, ::OpenCL.cl.CmdQueue) at /home/david/.julia/v0.6/CLArrays/src/compilation.jl:279
[3] (::CLArrays.CLFunction{GPUArrays.#broadcast_kernel!,Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}},Tuple{(3, :ptr),(6, 1, :ptr)}})(::Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Int64}, ::Tuple{Int64}) at /home/david/.julia/v0.6/CLArrays/src/compilation.jl:272
[4] _gpu_call(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Tuple{Int64},Tuple{Int64}}) at /home/david/.julia/v0.6/CLArrays/src/compilation.jl:18
[5] gpu_call(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Int64) at /home/david/.julia/v0.6/GPUArrays/src/abstract_gpu_interface.jl:151
[6] _broadcast!(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Tuple{Bool},Tuple{}}, ::Tuple{Tuple{Int64},Tuple{}}, ::CLArrays.CLArray{Int32,1}, ::Tuple{WithTuple}, ::Type{Val{1}}, ::CartesianRange{CartesianIndex{1}}) at /home/david/.julia/v0.6/GPUArrays/src/broadcast.jl:89
[7] broadcast_t(::Function, ::Type{Int32}, ::Tuple{Base.OneTo{Int64}}, ::CartesianRange{CartesianIndex{1}}, ::CLArrays.CLArray{Int32,1}, ::WithTuple) at /home/david/.julia/v0.6/GPUArrays/src/broadcast.jl:49
[8] broadcast_c at ./broadcast.jl:316 [inlined]
[9] broadcast(::Function, ::CLArrays.CLArray{Int32,1}, ::WithTuple) at ./broadcast.jl:455 Odly enough if I broadcast |
Yeah that's the same behaviour I'm seeing, the simple case works fine but a struct or tuple gives that error. I haven't tried it on an opencl CPU, but it's interesting that one works and the other doesn't. |
Building the CLArray is totally fine: CLArray([WithTuple(1, (2,3))])
GPU: 1-element Array{WithTuple,1}:
WithTuple(1, (2, 3)) |
Is julia getting the refinition of "+" correctly?
|
The problem is All you examples just work on all my GPUs. It's your beigenet driver, that seems to choose a different alignment of WithTuple - possibly as part of a bug!
What I meant you to try is something like: x .+ CLArray(fill(WithTuple(1, (2,3)) , length(x))) |
This actually works: julia> a = CLArray([WithTuple(1, (2,3))])
GPU: 1-element Array{WithTuple,1}:
WithTuple(1, (2, 3))
julia> x .+ a
GPU: 4-element Array{Int32,1}:
4
5
6
7 |
Or using your example: julia> x .+ CLArray(fill(WithTuple(1, (2,3)) , length(x)))
GPU: 4-element Array{Int32,1}:
4
3
5
4 |
Cool! :) So the tip from stackoverflow is actually working :-O So this is a bug in uploading structs per value in gpu kernels - which seems to be ill defined in OpenCL. |
Great to get that narrowed down!! Do you have the SO link? would be good to understand what is happening. |
|
Thanks |
@davidbp sorry, I thought you were actually talking to me - I see now, you just reproduced the failure :) |
No problem. I hope OpenCL gets better future support from vendors. |
I've been trying to broadcast with an argument like this:
But it breaks with an error like this, somehow to do with differences in the packing of the tuple:
The text was updated successfully, but these errors were encountered: