In [1]:
using CLArrays
using GPUArrays

In [3]:
X =  CLArray(rand(Float32, 1000));

In [6]:
Base.isapprox(X::GPUArray, Y::GPUArray) = isapprox(Array(X), Array(Y))
Base.isapprox(X::Array, Y::GPUArray)    = isapprox(X, Array(Y))
Base.isapprox(X::GPUArray, Y::Array)    = isapprox(Array(X), Y)

In [7]:
isapprox(X,X)

true

## CLArrays.Devices


Using CLArrays you can select a device to save and operate arrays.

The package is not ready to do this on a CPU, use a GPU or otherwise use standard Arrays.

In [8]:
devices = CLArrays.devices()

2-element Array{OpenCL.cl.Device,1}:
 OpenCL.Device(AMD Radeon Pro 580 Compute Engine on Apple @0x0000000001021c00)       
 OpenCL.Device(Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz on Apple @0x00000000ffffffff)

In [9]:
CLArrays.init(devices[1])

OpenCL context with:
CL version: OpenCL 1.2 
Device: CL AMD Radeon Pro 580 Compute Engine
            threads: 256
             blocks: (256, 256, 256)
      global_memory: 8589.934592 mb
 free_global_memory: NaN mb
       local_memory: 0.032768 mb


## CLArrays.CLArray

In [10]:
using CLArrays

In [11]:
X = rand(Float32, 1000);

In [12]:
sizeof(Array(X))

4000

In [13]:
Mem.current_allocated_mem[]

8000

In [14]:
X_gpu = CLArray(X);

Notice that `sizeof(X_gpu)` will give you only the size of the pointer to the array that is allocated in the device memory

In [15]:
sizeof(X_gpu)

16

If you want to know the memory of the array you need to cast it back to the CPU

In [16]:
sizeof(Array(X_gpu))

4000

In [17]:
Mem.current_allocated_mem[]

12000

In [18]:
Y_gpu = CLArray(rand(Float32, 1000));
Mem.current_allocated_mem[]

16000

In [19]:
# Notice this clears the symbol form the namespace but not necessarily the memory associated to it
# until GC takes place
clear!(:X_gpu)

In [20]:
Mem.current_allocated_mem[]

16000

## Operations on CLArrays

In [21]:
X  = rand(Float32, 2000,2000)
X_gpu = CLArray(X);

In [22]:
@time res1 = X * X;

  0.482350 seconds (245.57 k allocations: 26.727 MiB, 2.88% gc time)


In [23]:
@time res2 = X_gpu * X_gpu;

  1.206912 seconds (1.87 M allocations: 64.243 MiB, 6.05% gc time)


In [24]:
isapprox(res1, res2)

true

In [25]:
placeholder = similar(X_gpu)
A_mul_B!(placeholder, X_gpu, X_gpu)

GPU: 2000×2000 Array{Float32,2}:
 506.631  506.116  498.83   505.705  …  503.192  513.416  497.145  497.992
 501.592  509.815  501.163  505.556     506.836  515.723  486.877  496.849
 503.923  517.568  499.204  510.654     511.311  520.884  497.265  510.314
 509.489  507.65   502.406  505.374     504.578  518.595  486.062  500.619
 515.931  522.089  503.239  519.882     511.49   525.842  498.806  504.497
 493.705  493.535  494.516  498.994  …  493.996  505.536  483.468  488.265
 491.414  497.474  481.038  496.413     488.849  499.804  475.888  479.88 
 497.547  493.959  482.245  496.976     489.811  507.436  481.075  487.96 
 503.938  504.999  491.126  504.626     499.125  514.643  492.453  490.259
 503.577  497.366  482.871  502.151     491.104  509.409  488.052  486.232
 485.118  492.888  481.287  490.346  …  490.21   496.001  474.55   477.822
 490.282  494.825  480.65   485.441     487.034  499.849  477.809  480.038
 496.929  505.322  490.93   501.832     498.354  510.137  490.226  

In [26]:
isapprox(res2, placeholder)

true

### CLBLAS

We have acess to equivalent methods of `Base.LinAlg.BLAS` in  `CLBLAS`


In [47]:
?CLBLAS.gemm!

```
gemm!(tA, tB, alpha, A, B, beta, C)
```

Update `C` as `alpha*A*B + beta*C` or the other three variants according to [`tA`](@ref stdlib-blas-trans) and `tB`. Returns the updated `C`.


In [48]:
?BLAS.gemm!

```
gemm!(tA, tB, alpha, A, B, beta, C)
```

Update `C` as `alpha*A*B + beta*C` or the other three variants according to [`tA`](@ref stdlib-blas-trans) and `tB`. Returns the updated `C`.


### Other methods