Skip to content
This repository has been archived by the owner on Sep 27, 2021. It is now read-only.

Structs with tuple fields as broadcast arguments #38

Open
rafaqz opened this issue Jul 22, 2018 · 28 comments
Open

Structs with tuple fields as broadcast arguments #38

rafaqz opened this issue Jul 22, 2018 · 28 comments

Comments

@rafaqz
Copy link

rafaqz commented Jul 22, 2018

I've been trying to broadcast with an argument like this:

struct WithTuple
    a::Int32
    b::Tuple{Int32,Int32}
end

But it breaks with an error like this, somehow to do with differences in the packing of the tuple:

ERROR:     Julia and OpenCL type don't match at kernel argument 6: Found Tuple{CLArrays.DeviceArray{UInt32,2,CLArrays.HostPtr{UInt32}},Cellular.Life{Cellular.RadialNeighborhood{:test,Cellular.Skip},Int32,Tuple{Int32,Int32}}}. 
    Please make sure to define OpenCL structs correctly!
    You should be generally fine by using `__attribute__((packed))`, but sometimes the alignment of fields is different from Julia.
    Consider the following example:
        ```
        //packed
        // Tuple{NTuple{3, Float32}, Void, Float32}
        struct __attribute__((packed)) Test{
            float3 f1;
            int f2; // empty type gets replaced with Int32 (no empty types allowed in OpenCL)
            // you might need to define the alignement of fields to match julia's layout
            float f3; // for the types used here the alignement matches though!
        };
        // this is a case where Julia and OpenCL packed alignment would differ, so we need to specify it explicitely
        // Tuple{Int64, Int32}
        struct __attribute__((packed)) Test2{
            long f1;
            int __attribute__((aligned (8))) f2; // opencl would align this to 4 in packed layout, while Julia uses 8!
        };
        ```
    You can use `c.datatype_align(T)` to figure out the alignment of a Julia type!
@SimonDanisch
Copy link
Member

Are you using this with an OpenCL CPU driver? I have been figthing with issues related to different alignment for intel/amd cpu opengl drivers...
What's the output of this:

julia> using CLArrays
julia> x = CLArray([0]) |> CLArrays.device
OpenCL.Device(Intel(R) HD Graphics 630 on Intel(R) OpenCL @0x000000000772e440)

@rafaqz
Copy link
Author

rafaqz commented Jul 22, 2018

I'm not totally sure, was hoping it was on the (somewhat meagre) GPU! Two driver versions show up on my setup:

OpenCL.Device(Intel(R) HD Graphics on Intel(R) OpenCL @0x000055ad4d6cb990)                                    
OpenCL.Device(Intel(R) HD Graphics 5500 BroadWell U-Processor GT2 on Intel Gen OCL Driver @0x00007f573ac89600)

Using the first one just kills julia so I figure its an artefact of some kind, so I init() the second. It runs fine for simpler problems, but doesn't handle tuples anywhere, or structs if they are broadcast over in an array - giving a similar message for both.

It should be the latest package clones and Julia, this is a fresh install yesterday. Let me know if you need any other details.

@rafaqz
Copy link
Author

rafaqz commented Jul 22, 2018

I'm mostly trying to test that OpenCL will actually run the kernels that I was running on regular CPU. The main GPU work will be on CUDA, which I'm just getting set up. I'm aiming to build a simulation framework that will run user defined inner kernels on CPU/OpenCL/CUDA interchangeably, but I'm not sure how practically possible that is yet.

@SimonDanisch
Copy link
Member

You might be able to get around this problem if you can insert some padding ;)
E.g. try:

struct WithTuple
    a::Int32
    pad::Int32 # might be even more efficient
    b::Tuple{Int32,Int32}
end

@rafaqz
Copy link
Author

rafaqz commented Jul 23, 2018

Unfortunately not so easy! What would I be aiming for - padding it out to 64bit multiples?

I'm also wondering if there is a long term solution for this, as in is it be possible to repack to a correct struct automatically? I was hoping user-supplied isbits structs would eventually work without this kind of step.

@SimonDanisch
Copy link
Member

I was hoping user-supplied isbits structs would eventually work without this kind of step.

I put a lot of effort into this and I thought I got it working with most gpu opencl drivers.
The problem is, that the OpenCL specs don't seem to guarantee any alignment, so it can be pretty much vendor specific. As far as I know, one can't actually query the alignment of an opencl struct, so it would be a lot of work to support all different vendors - I also found some alignment bugs which they probably won't fix, so this whole thing is a mess.

I posted a question on stackoverflow a while ago:

https://stackoverflow.com/questions/47076012/opencl-only-on-amd-cl-invalid-arg-size

There was actually a suggestion in there:

As a workaround, you could copy structs into an OpenCL memory buffer and pass them by reference?

I'm not a 100% sure, if that is a valid workaround for your specific issue - but definitely worth a try.

@SimonDanisch
Copy link
Member

Btw your example works on my gpus!

julia> using CLArrays
julia> struct WithTuple
           a::Int32
           b::Tuple{Int32,Int32}
       end
julia> Base.:(+)(x::Integer, y::WithTuple) = x + y.b[2]
julia> x = CLArray(Int32[1,2,3,4])
julia> x .+  WithTuple(1, (2,3))
GPU: 4-element Array{Int32,1}:
 4
 5
 6
 7
julia> CLArrays.device(x)
OpenCL.Device(Intel(R) HD Graphics 630 on Intel(R) OpenCL @0x00000000076a8ed0)
julia> CLArrays.init(CLArrays.devices()[2])
OpenCL context with:
CL version: OpenCL 1.2 CUDA
Device: CL GeForce GTX 1060
            threads: 1024
             blocks: (1024, 1024, 64)
      global_memory: 6442.450944 mb
 free_global_memory: NaN mb
       local_memory: 0.049152 mb
julia> CLArray(Int32[1,2,3,4]) .+ WithTuple(1, (2,3))
GPU: 4-element Array{Int32,1}:
 4
 5
 6
 7

I'm kind of sick of dealing with buggy / inconsistent OpenCL drivers :P
Last time I complained to intel about driver bugs, they were telling me that they fix "obscure" bugs like this only for the newest generation.

@rafaqz
Copy link
Author

rafaqz commented Jul 23, 2018

Oh god I didn't realise it was like that... these are both Intel HD cards...

julia> using CLArrays                                                           
                                                                                
julia> CLArrays.init(CLArrays.devices()[2])                                     
OpenCL context with:                                                            
CL version: OpenCL 1.2 beignet 1.3                                              
Device: CL Intel(R) HD Graphics 5500 BroadWell U-Processor GT2                  
            threads: 512                                                        
             blocks: (512, 512, 512)                                            
      global_memory: 4119.855104 mb                                             
 free_global_memory: NaN mb                                                     
       local_memory: 0.065536 mb                                                


julia> struct WithTuple
           a::Int32
           b::Tuple{Int32,Int32}
       end                                                                      
                                                                                
julia> Base.:(+)(x::Integer, y::WithTuple) = x + y.b[2]                         
                                                                                
julia> x = CLArray(Int32[1,2,3,4])                                              
GPU: 4-element Array{Int32,1}:                                                  
 1                                                                              
 2                                                                              
 3                                                                              
 4                                                                              
                                                                                
julia> x .+  WithTuple(1, (2,3))                                                
ERROR:     Julia and OpenCL type don't match at kernel argument 6: Found Tuple{CLArrays.DeviceArray{Int32,1,CLArrays.HostPtr{Int32}},WithTuple}. 
    Please make sure to define OpenCL structs correctly!
    You should be generally fine by using `__attribute__((packed))`, but sometimes the alignment of fields is different from Julia.
    Consider the following example:
        ```
        //packed
        // Tuple{NTuple{3, Float32}, Void, Float32}
        struct __attribute__((packed)) Test{
            float3 f1;
            int f2; // empty type gets replaced with Int32 (no empty types allowed in OpenCL)
            // you might need to define the alignement of fields to match julia's layout
            float f3; // for the types used here the alignement matches though!
        };
        // this is a case where Julia and OpenCL packed alignment would differ, so we need to specify it explicitely
        // Tuple{Int64, Int32}
        struct __attribute__((packed)) Test2{
            long f1;
            int __attribute__((aligned (8))) f2; // opencl would align this to 4 in packed layout, while Julia uses 8!
        };
        ```
    You can use `c.datatype_align(T)` to figure out the alignment of a Julia type!
                                                                                
Stacktrace:                                                                     
 [1] set_arg!(::OpenCL.cl.Kernel, ::Int64, ::Tuple{CLArrays.DeviceArray{Int32,1,CLArrays.HostPtr{Int32}},WithTuple}) at /home/raf/.julia/v0.6/OpenCL/src/kernel.jl:186                                                                            
 [2] (::CLArrays.CLFunction{GPUArrays.#broadcast_kernel!,Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}},Tuple{(3, :ptr),(6, 1, :ptr)}})(::Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1
},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Int64}, ::Tuple{Int64}, ::OpenCL.cl.CmdQueue) at /home/raf/.julia/v0.6/CLArrays/src/compilation.jl:279                
 [3] (::CLArrays.CLFunction{GPUArrays.#broadcast_kernel!,Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}},Tuple{(3, :ptr),(6, 1, :ptr)}})(::Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1
},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Int64}, ::Tuple{Int64}) at /home/raf/.julia/v0.6/CLArrays/src/compilation.jl:272                                      
 [4] _gpu_call(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Base.#+,CLArrays.
CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any
,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Tuple{Int64},Tuple{Int64}}) at /home/raf/.julia/v0.6/CLArrays/src/compilation.jl:18
 [5] gpu_call(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,
0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Int64) at /home/raf/.julia/v0.6/GPUArrays/src/abstract_gpu_interface.jl:151                                   
 [6] _broadcast!(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Tuple{Bool},Tuple{}}, ::Tuple{Tuple{Int64},Tuple{}}, ::CLArrays.CLArray{Int32,1}, ::Tuple{WithTuple}, ::Type{Val{1}}, ::CartesianRange{CartesianIndex{1}}) at /home/raf/.julia/v0.6/GPUArrays/src/broadcast.jl:89                                                
 [7] broadcast_t(::Function, ::Type{Int32}, ::Tuple{Base.OneTo{Int64}}, ::CartesianRange{CartesianIndex{1}}, ::CLArrays.CLArray{Int32,1}, ::WithTuple) at /home/raf/.julia/v0.6/GPUArrays/src/broadcast.jl:49                                     
 [8] broadcast_c at ./broadcast.jl:316 [inlined]                                
 [9] broadcast(::Function, ::CLArrays.CLArray{Int32,1}, ::WithTuple) at ./broadcast.jl:455                                                                       
 [10] macro expansion at ./REPL.jl:97 [inlined]                                 
 [11] (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:73 

@rafaqz
Copy link
Author

rafaqz commented Jul 23, 2018

Could it be beignet? devices [1] actually just segfaults on that last line.

Or some other compiler version issue? I'm on arch linux so occasionally get bitten by bleeding edge releases of compilers breaking things.

@SimonDanisch
Copy link
Member

Yeah, definitely... the last time I tried beignet on linux, it failed with a self test saying:

test failed: (3 + 1) != 4

:D So I lost a bit of trust in beignet, although it seems like they improved a lot recently!

@davidbp
Copy link
Contributor

davidbp commented Jul 24, 2018

Do you have a snippet of code that fails using the OpenCL provided by beignet?
I would like to try it and provide feedback. In my computer all tests passed.

@rafaqz
Copy link
Author

rafaqz commented Jul 25, 2018

For my Intel HD 5500 the simple demo above fails with the error shown, or a segfault, depending on the driver I select, as for some reason there is two.

So far intels compute-runtime and the older intel-opencl drivers also just segfault when I run that code.

@rafaqz
Copy link
Author

rafaqz commented Jul 25, 2018

Anyway, thanks @SimonDanisch for all your work on these things, especially now I know what a mess you have to deal with behind the scenes!!!

@davidbp
Copy link
Contributor

davidbp commented Jul 26, 2018

I actually tested the code and it failed just like yours

x .+  WithTuple(1, (2,3))

ERROR:     Julia and OpenCL type don't match at kernel argument 6: Found Tuple{CLArrays.DeviceArray{Int32,1,CLArrays.HostPtr{Int32}},WithTuple}. 
    Please make sure to define OpenCL structs correctly!
    You should be generally fine by using `__attribute__((packed))`, but sometimes the alignment of fields is different from Julia.
    Consider the following example:
        ```
        //packed
        // Tuple{NTuple{3, Float32}, Void, Float32}
        struct __attribute__((packed)) Test{
            float3 f1;
            int f2; // empty type gets replaced with Int32 (no empty types allowed in OpenCL)
            // you might need to define the alignement of fields to match julia's layout
            float f3; // for the types used here the alignement matches though!
        };
        // this is a case where Julia and OpenCL packed alignment would differ, so we need to specify it explicitely
        // Tuple{Int64, Int32}
        struct __attribute__((packed)) Test2{
            long f1;
            int __attribute__((aligned (8))) f2; // opencl would align this to 4 in packed layout, while Julia uses 8!
        };
        ```
    You can use `c.datatype_align(T)` to figure out the alignment of a Julia type!

Stacktrace:
 [1] set_arg!(::OpenCL.cl.Kernel, ::Int64, ::Tuple{CLArrays.DeviceArray{Int32,1,CLArrays.HostPtr{Int32}},WithTuple}) at /home/david/.julia/v0.6/OpenCL/src/kernel.jl:186
 [2] (::CLArrays.CLFunction{GPUArrays.#broadcast_kernel!,Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}},Tuple{(3, :ptr),(6, 1, :ptr)}})(::Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Int64}, ::Tuple{Int64}, ::OpenCL.cl.CmdQueue) at /home/david/.julia/v0.6/CLArrays/src/compilation.jl:279
 [3] (::CLArrays.CLFunction{GPUArrays.#broadcast_kernel!,Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}},Tuple{(3, :ptr),(6, 1, :ptr)}})(::Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Int64}, ::Tuple{Int64}) at /home/david/.julia/v0.6/CLArrays/src/compilation.jl:272
 [4] _gpu_call(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Tuple{Int64},Tuple{Int64}}) at /home/david/.julia/v0.6/CLArrays/src/compilation.jl:18
 [5] gpu_call(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Int64) at /home/david/.julia/v0.6/GPUArrays/src/abstract_gpu_interface.jl:151
 [6] _broadcast!(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Tuple{Bool},Tuple{}}, ::Tuple{Tuple{Int64},Tuple{}}, ::CLArrays.CLArray{Int32,1}, ::Tuple{WithTuple}, ::Type{Val{1}}, ::CartesianRange{CartesianIndex{1}}) at /home/david/.julia/v0.6/GPUArrays/src/broadcast.jl:89
 [7] broadcast_t(::Function, ::Type{Int32}, ::Tuple{Base.OneTo{Int64}}, ::CartesianRange{CartesianIndex{1}}, ::CLArrays.CLArray{Int32,1}, ::WithTuple) at /home/david/.julia/v0.6/GPUArrays/src/broadcast.jl:49
 [8] broadcast_c at ./broadcast.jl:316 [inlined]
 [9] broadcast(::Function, ::CLArrays.CLArray{Int32,1}, ::WithTuple) at ./broadcast.jl:455

Yet CLArrays seem to work

 x .+  x
GPU: 4-element Array{Int32,1}:
 2
 4
 6
 8

@SimonDanisch
Copy link
Member

I actually tested the code and it failed just like yours

On my GPU it didn't fail?

Yet CLArrays seem to work

Yeah this should work, since there isn't any struct involved?
Interesting would be CLArray([WithTuple(1, (2,3))])!

@davidbp
Copy link
Contributor

davidbp commented Jul 26, 2018

Did you try with device equal to the Iris pro?
The whole test: It works with CLArray on "CPU_OpenCL" but not for "GPU_Opencl".

julia> using CLArrays

julia> struct WithTuple
                 a::Int32
                 b::Tuple{Int32,Int32}
             end 

julia> CLArrays.init(CLArrays.devices()[2]) 
OpenCL context with:
CL version: OpenCL 1.2 (Build 43)
Device: CL Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz
            threads: 8192
             blocks: (8192, 8192, 8192)
      global_memory: 16728.113152 mb
 free_global_memory: NaN mb
       local_memory: 0.032768 mb

julia> CLArray([WithTuple(1, (2,3))])
GPU: 1-element Array{WithTuple,1}:
 WithTuple(1, (2, 3))

julia> Base.:(+)(x::Integer, y::WithTuple) = x + y.b[2]   

julia> x = CLArray(Int32[1,2,3,4])                  
GPU: 4-element Array{Int32,1}:
 1
 2
 3
 4

julia> x .+  WithTuple(1, (2,3))      
GPU: 4-element Array{Int32,1}:
 4
 5
 6
 7

julia> CLArrays.init(CLArrays.devices()[1]) 
OpenCL context with:
CL version: OpenCL 1.2 beignet 1.4 (git-591d387)
Device: CL Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile
            threads: 512
             blocks: (512, 512, 512)
      global_memory: 2147.483648 mb
 free_global_memory: NaN mb
       local_memory: 0.065536 mb

julia> x = CLArray(Int32[1,2,3,4])
GPU: 4-element Array{Int32,1}:
 1
 2
 3
 4

julia> x .+  WithTuple(1, (2,3)) 
ERROR:     Julia and OpenCL type don't match at kernel argument 6: Found Tuple{CLArrays.DeviceArray{Int32,1,CLArrays.HostPtr{Int32}},WithTuple}. 
    Please make sure to define OpenCL structs correctly!
    You should be generally fine by using `__attribute__((packed))`, but sometimes the alignment of fields is different from Julia.
    Consider the following example:
        ```
        //packed
        // Tuple{NTuple{3, Float32}, Void, Float32}
        struct __attribute__((packed)) Test{
            float3 f1;
            int f2; // empty type gets replaced with Int32 (no empty types allowed in OpenCL)
            // you might need to define the alignement of fields to match julia's layout
            float f3; // for the types used here the alignement matches though!
        };
        // this is a case where Julia and OpenCL packed alignment would differ, so we need to specify it explicitely
        // Tuple{Int64, Int32}
        struct __attribute__((packed)) Test2{
            long f1;
            int __attribute__((aligned (8))) f2; // opencl would align this to 4 in packed layout, while Julia uses 8!
        };
        ```
    You can use `c.datatype_align(T)` to figure out the alignment of a Julia type!

Stacktrace:
 [1] set_arg!(::OpenCL.cl.Kernel, ::Int64, ::Tuple{CLArrays.DeviceArray{Int32,1,CLArrays.HostPtr{Int32}},WithTuple}) at /home/david/.julia/v0.6/OpenCL/src/kernel.jl:186
 [2] (::CLArrays.CLFunction{GPUArrays.#broadcast_kernel!,Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}},Tuple{(3, :ptr),(6, 1, :ptr)}})(::Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Int64}, ::Tuple{Int64}, ::OpenCL.cl.CmdQueue) at /home/david/.julia/v0.6/CLArrays/src/compilation.jl:279
 [3] (::CLArrays.CLFunction{GPUArrays.#broadcast_kernel!,Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}},Tuple{(3, :ptr),(6, 1, :ptr)}})(::Tuple{CLArrays.KernelState,Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Int64}, ::Tuple{Int64}) at /home/david/.julia/v0.6/CLArrays/src/compilation.jl:272
 [4] _gpu_call(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Tuple{Tuple{Int64},Tuple{Int64}}) at /home/david/.julia/v0.6/CLArrays/src/compilation.jl:18
 [5] gpu_call(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Base.#+,CLArrays.CLArray{Int32,1},Tuple{UInt32},Tuple{GPUArrays.BInfo{Array,1},GPUArrays.BInfo{Any,0}},Tuple{CLArrays.CLArray{Int32,1},WithTuple}}, ::Int64) at /home/david/.julia/v0.6/GPUArrays/src/abstract_gpu_interface.jl:151
 [6] _broadcast!(::Function, ::CLArrays.CLArray{Int32,1}, ::Tuple{Tuple{Bool},Tuple{}}, ::Tuple{Tuple{Int64},Tuple{}}, ::CLArrays.CLArray{Int32,1}, ::Tuple{WithTuple}, ::Type{Val{1}}, ::CartesianRange{CartesianIndex{1}}) at /home/david/.julia/v0.6/GPUArrays/src/broadcast.jl:89
 [7] broadcast_t(::Function, ::Type{Int32}, ::Tuple{Base.OneTo{Int64}}, ::CartesianRange{CartesianIndex{1}}, ::CLArrays.CLArray{Int32,1}, ::WithTuple) at /home/david/.julia/v0.6/GPUArrays/src/broadcast.jl:49
 [8] broadcast_c at ./broadcast.jl:316 [inlined]
 [9] broadcast(::Function, ::CLArrays.CLArray{Int32,1}, ::WithTuple) at ./broadcast.jl:455

Odly enough if I broadcast x .+3 it also works on the iris graphics. And it's doing the exact same operation. Is it something related to transpiler?

@rafaqz
Copy link
Author

rafaqz commented Jul 26, 2018

Yeah that's the same behaviour I'm seeing, the simple case works fine but a struct or tuple gives that error. I haven't tried it on an opencl CPU, but it's interesting that one works and the other doesn't.

@rafaqz
Copy link
Author

rafaqz commented Jul 26, 2018

Building the CLArray is totally fine:

CLArray([WithTuple(1, (2,3))])                                                      
GPU: 1-element Array{WithTuple,1}:                                                         
 WithTuple(1, (2, 3)) 

@davidbp
Copy link
Contributor

davidbp commented Jul 26, 2018

Is julia getting the refinition of "+" correctly?

@which  x .+  WithTuple(1, (2,3)) 
(::Base.##715#716)(a, b) in Base at deprecated.jl:354

@SimonDanisch
Copy link
Member

Odly enough if I broadcast x .+3 it also works on the iris graphics

The problem is WithTuple, so if you don't use it, there is no problem, right?!

All you examples just work on all my GPUs. It's your beigenet driver, that seems to choose a different alignment of WithTuple - possibly as part of a bug!

Building the CLArray is totally fine:
CLArray([WithTuple(1, (2,3))])

What I meant you to try is something like:

x .+ CLArray(fill(WithTuple(1, (2,3)) , length(x)))

@rafaqz
Copy link
Author

rafaqz commented Jul 26, 2018

This actually works:

julia> a = CLArray([WithTuple(1, (2,3))])                                                                                                                                              
GPU: 1-element Array{WithTuple,1}:                                                         
 WithTuple(1, (2, 3))                                                                      

julia> x .+ a                                                                              
GPU: 4-element Array{Int32,1}:                                                             
 4                                                                                         
 5                                                                                         
 6                                                                                         
 7  

@rafaqz
Copy link
Author

rafaqz commented Jul 26, 2018

Or using your example:

julia> x .+ CLArray(fill(WithTuple(1, (2,3)) , length(x)))                                 
GPU: 4-element Array{Int32,1}:                                                             
 4                                                                                         
 3                                                                                         
 5                                                                                         
 4    

@SimonDanisch
Copy link
Member

Cool! :) So the tip from stackoverflow is actually working :-O So this is a bug in uploading structs per value in gpu kernels - which seems to be ill defined in OpenCL.

@rafaqz
Copy link
Author

rafaqz commented Jul 26, 2018

Great to get that narrowed down!! Do you have the SO link? would be good to understand what is happening.

@davidbp
Copy link
Contributor

davidbp commented Jul 26, 2018

https://stackoverflow.com/questions/15639197/passing-struct-to-gpu-with-opencl-that-contains-an-array-of-floats

@rafaqz
Copy link
Author

rafaqz commented Jul 26, 2018

Thanks

@SimonDanisch
Copy link
Member

@davidbp sorry, I thought you were actually talking to me - I see now, you just reproduced the failure :)

@davidbp
Copy link
Contributor

davidbp commented Jul 28, 2018

No problem. I hope OpenCL gets better future support from vendors.
I feel it's not going to happen. Maybe with Vulkhan...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants