-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Several benchmarks fail when running on MacOS due to ambiguous functions #66
Comments
It seems that the OpenCL OSx compiler cannot distinguish the function being used. We might need to explicitly typecast for the different varieties of these intrinsics. https://stackoverflow.com/questions/28851608/opencl-call-to-a-built-in-function-is-ambiguous We will work on this. |
Does it mean no trigonometric function can be used currently on MacOs? |
(Same behaviour on Monterey):
|
Hi @yazun, if you run on Linux it should be ok to use the trigonometric functions. We have faced some inconsistencies when we have deployed some benchmarks and unit-tests on MacOS. This is mainly because of the old version of the OpenCL drivers for this OS. For example, one of the latest issues that we encounter is the lack of support for native trigonometric functions. This could affect you only if you run tests on your development setup. Is this the case? |
Yes, we see sin/cos work fine on Linux and all tests/benchmarks pass ok (see #145) with JDK17. |
I see. Based on the information about your setup it seems that your driver supports OpenCL 1.2. This version offers support for the native trigonometric functions based on the standard (Page 252). So, it should not be a problem for you. In my case, I have an older version of the driver and this is what causes the problem. Otherwise, the generated code from the TornadoVM JIT compiler should work as it respects the OpenCL standard. Can you please confirm if |
Good to hear it's not the limitation of the driver. Not sure how to affect usage, tried both
Then with native enabled
|
Specifying device does not change much, i.e. for discrete GPU (but worked for OpenCL on CPU )
|
so can ambiguity come from the fact there are two GPUs? Is it possible to target just one to avoid this problem? |
Can you print the kernel please, with The error that you get ( In this case, my next step would be to try an OpenCL kernel that uses the same function calls from C++, and see if the program will be built by the OpenCL driver. I do not believe that the problem is related to the number of GPUs. If you want to force execution on a particular device, you can do that by using this flag: |
Could you clarify what should be kernel included: tornado --printKernel -Dtornado.enable.nativeFunctions=False -m tornado.benchmarks/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner dft 8192 4096 --iterations 1
WARNING: Using incubator modules: jdk.incubator.foreign, jdk.incubator.vector
bm=dft-15-8192 , id=java-reference , average=3.694428e+09, median=3.697348e+09, firstIteration=3.965296e+09, best=3.548463e+09
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable
__kernel void lookupBufferAddress(__global uchar *_heap_base, uint _frame_base, __constant uchar *_constant_region, __local uchar *_local_region, __global int *_atomics)
{
__global ulong *_frame = (__global ulong *) &_heap_base[_frame_base];
// BLOCK 0
_frame[0] = (ulong) _heap_base;
} // kernel
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable
__kernel void computeDFT(__global uchar *_heap_base, uint _frame_base, __constant uchar *_constant_region, __local uchar *_local_region, __global int *_atomics)
{
ulong ul_21, ul_40, ul_41, ul_0, ul_1, ul_2, ul_3, ul_19;
long l_17, l_18, l_16, l_37, l_38, l_39;
double d_24, d_23, d_26, d_25, d_28, d_27, d_30, d_29, d_20, d_22, d_12, d_14, d_13, d_32, d_31, d_34, d_33, d_35;
int i_9, i_8, i_7, i_6, i_5, i_4, i_36, i_15, i_11, i_10, i_42;
__global ulong *_frame = (__global ulong *) &_heap_base[_frame_base];
// BLOCK 0
ul_0 = (ulong) _frame[3];
ul_1 = (ulong) _frame[4];
ul_2 = (ulong) _frame[5];
ul_3 = (ulong) _frame[6];
i_4 = get_global_size(0);
i_5 = i_4 + 8191;
i_6 = i_5 / i_4;
i_7 = get_global_id(0);
i_8 = i_6 * i_7;
i_9 = i_8 + i_6;
i_10 = min(i_9, 8192);
// BLOCK 1 MERGES [0 5 ]
i_11 = i_8;
for(;i_11 < i_10;)
{
// BLOCK 2
d_12 = (double) i_11;
// BLOCK 3 MERGES [2 4 ]
d_13 = 0.0;
d_14 = 0.0;
i_15 = 0;
for(;i_15 < 8192;)
{
// BLOCK 4
l_16 = (long) i_15;
l_17 = l_16 << 3;
l_18 = l_17 + 24L;
ul_19 = ul_0 + l_18;
d_20 = *((__global double *) ul_19);
ul_21 = ul_1 + l_18;
d_22 = *((__global double *) ul_21);
d_23 = (double) i_15;
d_24 = d_23 * 6.283185307179586;
d_25 = d_24 * d_12;
d_26 = d_25 / 8192.0;
d_27 = sin(d_26);
d_28 = -d_20;
d_29 = cos(d_26);
d_30 = d_29 * d_22;
d_31 = fma(d_27, d_28, d_30);
d_32 = d_14 + d_31;
d_33 = d_27 * d_22;
d_34 = fma(d_29, d_20, d_33);
d_35 = d_13 + d_34;
i_36 = i_15 + 1;
d_13 = d_35;
d_14 = d_32;
i_15 = i_36;
} // B4
// BLOCK 5
l_37 = (long) i_11;
l_38 = l_37 << 3;
l_39 = l_38 + 24L;
ul_40 = ul_2 + l_39;
*((__global double *) ul_40) = d_13;
ul_41 = ul_3 + l_39;
*((__global double *) ul_41) = d_14;
i_42 = i_11 + 1;
i_11 = i_42;
} // B5
// BLOCK 6
return;
} // kernel
bm=dft-15-8192 , device=0:0 , average=2.073738e+08, median=2.048341e+08, firstIteration=2.683069e+08, best=1.915486e+08, speedupAvg=17.8153, speedupMedian=18.0505, speedupFirstIteration=14.7790, CV=-0.0000%, deviceName= [Apple] -- Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
__kernel void lookupBufferAddress(__global uchar *_heap_base, uint _frame_base, __constant uchar *_constant_region, __local uchar *_local_region, __global int *_atomics)
{
__global ulong *_frame = (__global ulong *) &_heap_base[_frame_base];
// BLOCK 0
_frame[0] = (ulong) _heap_base;
} // kernel
__kernel void computeDFT(__global uchar *_heap_base, uint _frame_base, __constant uchar *_constant_region, __local uchar *_local_region, __global int *_atomics)
{
ulong ul_36, ul_14, ul_0, ul_16, ul_1, ul_2, ul_3, ul_35;
long l_13, l_11, l_12, l_33, l_34, l_32;
double d_24, d_23, d_26, d_25, d_28, d_27, d_30, d_29, d_15, d_18, d_17, d_20, d_19, d_22, d_21, d_8, d_7, d_9;
int i_6, i_5, i_37, i_4, i_31, i_10;
__global ulong *_frame = (__global ulong *) &_heap_base[_frame_base];
// BLOCK 0
ul_0 = (ulong) _frame[3];
ul_1 = (ulong) _frame[4];
ul_2 = (ulong) _frame[5];
ul_3 = (ulong) _frame[6];
i_4 = get_global_size(0);
i_5 = get_global_id(0);
// BLOCK 1 MERGES [0 5 ]
i_6 = i_5;
for(;i_6 < 8192;)
{
// BLOCK 2
d_7 = (double) i_6;
// BLOCK 3 MERGES [2 4 ]
d_8 = 0.0;
d_9 = 0.0;
i_10 = 0;
for(;i_10 < 8192;)
{
// BLOCK 4
l_11 = (long) i_10;
l_12 = l_11 << 3;
l_13 = l_12 + 24L;
ul_14 = ul_0 + l_13;
d_15 = *((__global double *) ul_14);
ul_16 = ul_1 + l_13;
d_17 = *((__global double *) ul_16);
d_18 = (double) i_10;
d_19 = d_18 * 6.283185307179586;
d_20 = d_19 * d_7;
d_21 = d_20 / 8192.0;
d_22 = sin(d_21);
d_23 = -d_15;
d_24 = cos(d_21);
d_25 = d_24 * d_17;
d_26 = fma(d_22, d_23, d_25);
d_27 = d_9 + d_26;
d_28 = d_22 * d_17;
d_29 = fma(d_24, d_15, d_28);
d_30 = d_8 + d_29;
i_31 = i_10 + 1;
d_8 = d_30;
d_9 = d_27;
i_10 = i_31;
} // B4
// BLOCK 5
l_32 = (long) i_6;
l_33 = l_32 << 3;
l_34 = l_33 + 24L;
ul_35 = ul_2 + l_34;
*((__global double *) ul_35) = d_8;
ul_36 = ul_3 + l_34;
*((__global double *) ul_36) = d_9;
i_37 = i_4 + i_6;
i_6 = i_37;
} // B5
// BLOCK 6
return;
} // kernel
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: Build Program driver returned (10015)
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> OpenCL Warning : clBuildProgram failed: could not build program for 0x1024500 (Intel(R) UHD Graphics 630) (err:-2)
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> [CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log:
<program source>:42:16: error: call to '__fast_relax_sin' is ambiguous
d_22 = sin(d_21);
^~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4765:22: note: expanded from macro 'sin'
#define sin(__x) __fast_relax_sin(__x)
^~~~~~~~~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
__CLFN_FD_1FD_FAST_RELAX(__fast_relax_sin, native_sin, __cl_sin);
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:417:27: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float __OVERLOAD__ _name(float x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:418:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float2 __OVERLOAD__ _name(float2 x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:419:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float3 __OVERLOAD__ _name(float3 x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:420:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float4 __OVERLOAD__ _name(float4 x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:421:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float8 __OVERLOAD__ _name(float8 x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:422:29: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float16 __OVERLOAD__ _name(float16 x){ return _default_name(x); }
^
<program source>:44:16: error: call to '__fast_relax_cos' is ambiguous
d_24 = cos(d_21);
^~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4758:22: note: expanded from macro 'cos'
#define cos(__x) __fast_relax_cos(__x)
^~~~~~~~~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
__CLFN_FD_1FD_FAST_RELAX(__fast_relax_cos, native_cos, __cl_cos);
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:417:27: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float __OVERLOAD__ _name(float x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:418:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float2 __OVERLOAD__ _name(float2 x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:419:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float3 __OVERLOAD__ _name(float3 x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:420:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float4 __OVERLOAD__ _name(float4 x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:421:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float8 __OVERLOAD__ _name(float8 x) { return _default_name(x); } \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:422:29: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float16 __OVERLOAD__ _name(float16 x){ return _default_name(x); }
^
<program source>:46:16: error: call to '__cl_fma' is ambiguous
d_26 = fma(d_22, d_23, d_25);
^~~~~~~~~~~~~~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4590:32: note: expanded from macro 'fma'
#define fma(__x, __y, __z) __cl_fma(__x, __y, __z)
^~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
__CLFN_FD_3FD(__cl_fma);
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:987:48: note: expanded from macro '__CLFN_FD_3FD'
#define __CLFN_FD_3FD(name) float __OVERLOAD__ name(float x, float y, float z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:988:21: note: expanded from macro '__CLFN_FD_3FD'
float2 __OVERLOAD__ name(float2 x, float2 y, float2 z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:989:21: note: expanded from macro '__CLFN_FD_3FD'
float3 __OVERLOAD__ name(float3 x, float3 y, float3 z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:990:21: note: expanded from macro '__CLFN_FD_3FD'
float4 __OVERLOAD__ name(float4 x, float4 y, float4 z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:991:21: note: expanded from macro '__CLFN_FD_3FD'
float8 __OVERLOAD__ name(float8 x, float8 y, float8 z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:992:22: note: expanded from macro '__CLFN_FD_3FD'
float16 __OVERLOAD__ name(float16 x, float16 y, float16 z);
^
<program source>:49:16: error: call to '__cl_fma' is ambiguous
d_29 = fma(d_24, d_15, d_28);
^~~~~~~~~~~~~~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4590:32: note: expanded from macro 'fma'
#define fma(__x, __y, __z) __cl_fma(__x, __y, __z)
^~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
__CLFN_FD_3FD(__cl_fma);
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:987:48: note: expanded from macro '__CLFN_FD_3FD'
#define __CLFN_FD_3FD(name) float __OVERLOAD__ name(float x, float y, float z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:988:21: note: expanded from macro '__CLFN_FD_3FD'
float2 __OVERLOAD__ name(float2 x, float2 y, float2 z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:989:21: note: expanded from macro '__CLFN_FD_3FD'
float3 __OVERLOAD__ name(float3 x, float3 y, float3 z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:990:21: note: expanded from macro '__CLFN_FD_3FD'
float4 __OVERLOAD__ name(float4 x, float4 y, float4 z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:991:21: note: expanded from macro '__CLFN_FD_3FD'
float8 __OVERLOAD__ name(float8 x, float8 y, float8 z); \
^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:992:22: note: expanded from macro '__CLFN_FD_3FD'
float16 __OVERLOAD__ name(float16 x, float16 y, float16 z);
^
[TornadoVM-OCL-JNI] ERROR : clBuildProgram -> Returned: -11
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> [CL_INVALID_VALUE] : OpenCL Error : clGetProgramBuildInfo failed: return buffer size (8192 bytes) was too small to hold the result: 11019 bytes
[TornadoVM-OCL-JNI] ERROR : clGetProgramBuildInfo -> Returned: -30
Error during code compilation with the OpenCL driver
uk.ac.manchester.tornado.api.exceptions.TornadoBailoutRuntimeException: Unable to compile task task benchmark.t0 - computeDFT
[tornado.drivers.opencl@0.13-dev/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.compileTask(OCLTornadoDevice.java:289), tornado.drivers.opencl@0.13-dev/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.compileJavaToAccelerator(OCLTornadoDevice.java:321), tornado.drivers.opencl@0.13-dev/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.installCode(OCLTornadoDevice.java:449), tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.compileTaskFromBytecodeToBinary(TornadoVM.java:467), tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.execute(TornadoVM.java:743), tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.warmup(TornadoVM.java:218), tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskSchedule.warmup(TornadoTaskSchedule.java:748), tornado.api@0.13-dev/uk.ac.manchester.tornado.api.TaskSchedule.warmup(TaskSchedule.java:326), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.dft.DFTTornado.setUp(DFTTornado.java:58), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.benchmark(BenchmarkDriver.java:94), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:120), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)]
at tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.compileTaskFromBytecodeToBinary(TornadoVM.java:471)
at tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.execute(TornadoVM.java:743)
at tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.warmup(TornadoVM.java:218)
at tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskSchedule.warmup(TornadoTaskSchedule.java:748)
at tornado.api@0.13-dev/uk.ac.manchester.tornado.api.TaskSchedule.warmup(TaskSchedule.java:326)
at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.dft.DFTTornado.setUp(DFTTornado.java:58)
at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.benchmark(BenchmarkDriver.java:94)
at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:120)
at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86)
at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)
Exception in thread "main" java.lang.NullPointerException: Cannot read the array length because "arr" is null
at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getAverage(BenchmarkDriver.java:205)
at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getAverage(BenchmarkDriver.java:215)
at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getPreciseSummary(BenchmarkDriver.java:248)
at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:129)
at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86)
at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190) |
and the native kernel: tornado --printKernel -Dtornado.enable.nativeFunctions=True -m tornado.benchmarks/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner dft 8192 4096 --iterations 1
WARNING: Using incubator modules: jdk.incubator.foreign, jdk.incubator.vector
bm=dft-15-8192 , id=java-reference , average=3.819124e+09, median=3.791125e+09, firstIteration=4.083596e+09, best=3.519395e+09
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable
__kernel void lookupBufferAddress(__global uchar *_heap_base, uint _frame_base, __constant uchar *_constant_region, __local uchar *_local_region, __global int *_atomics)
{
__global ulong *_frame = (__global ulong *) &_heap_base[_frame_base];
// BLOCK 0
_frame[0] = (ulong) _heap_base;
} // kernel
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable
__kernel void computeDFT(__global uchar *_heap_base, uint _frame_base, __constant uchar *_constant_region, __local uchar *_local_region, __global int *_atomics)
{
ulong ul_21, ul_40, ul_41, ul_0, ul_1, ul_2, ul_3, ul_19;
long l_17, l_18, l_16, l_37, l_38, l_39;
double d_24, d_23, d_26, d_25, d_28, d_27, d_30, d_29, d_20, d_22, d_12, d_14, d_13, d_32, d_31, d_34, d_33, d_35;
int i_9, i_8, i_7, i_6, i_5, i_4, i_36, i_15, i_11, i_10, i_42;
__global ulong *_frame = (__global ulong *) &_heap_base[_frame_base];
// BLOCK 0
ul_0 = (ulong) _frame[3];
ul_1 = (ulong) _frame[4];
ul_2 = (ulong) _frame[5];
ul_3 = (ulong) _frame[6];
i_4 = get_global_size(0);
i_5 = i_4 + 8191;
i_6 = i_5 / i_4;
i_7 = get_global_id(0);
i_8 = i_6 * i_7;
i_9 = i_8 + i_6;
i_10 = min(i_9, 8192);
// BLOCK 1 MERGES [0 5 ]
i_11 = i_8;
for(;i_11 < i_10;)
{
// BLOCK 2
d_12 = (double) i_11;
// BLOCK 3 MERGES [2 4 ]
d_13 = 0.0;
d_14 = 0.0;
i_15 = 0;
for(;i_15 < 8192;)
{
// BLOCK 4
l_16 = (long) i_15;
l_17 = l_16 << 3;
l_18 = l_17 + 24L;
ul_19 = ul_0 + l_18;
d_20 = *((__global double *) ul_19);
ul_21 = ul_1 + l_18;
d_22 = *((__global double *) ul_21);
d_23 = (double) i_15;
d_24 = d_23 * 6.283185307179586;
d_25 = d_24 * d_12;
d_26 = d_25 / 8192.0;
d_27 = native_sin(d_26);
d_28 = -d_20;
d_29 = native_cos(d_26);
d_30 = d_29 * d_22;
d_31 = fma(d_27, d_28, d_30);
d_32 = d_14 + d_31;
d_33 = d_27 * d_22;
d_34 = fma(d_29, d_20, d_33);
d_35 = d_13 + d_34;
i_36 = i_15 + 1;
d_13 = d_35;
d_14 = d_32;
i_15 = i_36;
} // B4
// BLOCK 5
l_37 = (long) i_11;
l_38 = l_37 << 3;
l_39 = l_38 + 24L;
ul_40 = ul_2 + l_39;
*((__global double *) ul_40) = d_13;
ul_41 = ul_3 + l_39;
*((__global double *) ul_41) = d_14;
i_42 = i_11 + 1;
i_11 = i_42;
} // B5
// BLOCK 6
return;
} // kernel
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: build program driver returned (-1)
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> OpenCL Warning : clBuildProgram failed: could not build program for 0xffffffff (Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz) (err:-1)
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> [CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log:
kernel referenced external symbol '_Z10native_cosd' which could not be found.
kernel referenced external symbol '_Z10native_sind' which could not be found.
[TornadoVM-OCL-JNI] ERROR : clBuildProgram -> Returned: -11
Error during code compilation with the OpenCL driver
uk.ac.manchester.tornado.api.exceptions.TornadoBailoutRuntimeException: Unable to compile task task benchmark.t0 - computeDFT
[tornado.drivers.opencl@0.13-dev/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.compileTask(OCLTornadoDevice.java:289), tornado.drivers.opencl@0.13-dev/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.compileJavaToAccelerator(OCLTornadoDevice.java:321), tornado.drivers.opencl@0.13-dev/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.installCode(OCLTornadoDevice.java:449), tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.compileTaskFromBytecodeToBinary(TornadoVM.java:467), tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.execute(TornadoVM.java:743), tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.warmup(TornadoVM.java:218), tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskSchedule.warmup(TornadoTaskSchedule.java:748), tornado.api@0.13-dev/uk.ac.manchester.tornado.api.TaskSchedule.warmup(TaskSchedule.java:326), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.dft.DFTTornado.setUp(DFTTornado.java:58), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.benchmark(BenchmarkDriver.java:94), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:120), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)]
at tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.compileTaskFromBytecodeToBinary(TornadoVM.java:471)
at tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.execute(TornadoVM.java:743)
at tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.warmup(TornadoVM.java:218)
at tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskSchedule.warmup(TornadoTaskSchedule.java:748)
at tornado.api@0.13-dev/uk.ac.manchester.tornado.api.TaskSchedule.warmup(TaskSchedule.java:326)
at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.dft.DFTTornado.setUp(DFTTornado.java:58)
at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.benchmark(BenchmarkDriver.java:94)
at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:120)
at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86)
at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)
Exception in thread "main" java.lang.NullPointerException: Cannot read the array length because "arr" is null
at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getAverage(BenchmarkDriver.java:205)
at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getAverage(BenchmarkDriver.java:215)
at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getPreciseSummary(BenchmarkDriver.java:248)
at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:129)
at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86)
at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190) |
I remembered OpenCL has been a dead-end for Macs (i.e. https://www.reddit.com/r/OpenCL/comments/qevdbg/opencl_30_on_macos/) Something to keep in mind during the next round of purchases for any scientific team. Nevertheless,seems we can work on openCL with |
Yes, these are unfortunate news. |
Describe the bug
Running tornado-benchmarks.py gives failures with calls to ambiguous functions:
nbody : call to '__cl_sqrt' is ambiguous
dgemm: call to '__cl_fma' is ambiguous
dft: call to '__fast_relax_sin' is ambiguous
call to '__fast_relax_cos' is ambiguous
There are a couple other failures (see output) , but I thought I would limit this bug report to the ambiguous function failures
How To Reproduce
Run tornado-benchmarks.py
A clear and concise description of what you expected to happen.
All benchmarks should run
benchmark.txt
Computing system setup (please complete the following information):
Additional context
The text was updated successfully, but these errors were encountered: