Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several benchmarks fail when running on MacOS due to ambiguous functions #66

Open
brucejohnson opened this issue Feb 19, 2021 · 14 comments
Labels
bug Something isn't working

Comments

@brucejohnson
Copy link

Describe the bug
Running tornado-benchmarks.py gives failures with calls to ambiguous functions:

nbody : call to '__cl_sqrt' is ambiguous
dgemm: call to '__cl_fma' is ambiguous
dft: call to '__fast_relax_sin' is ambiguous
call to '__fast_relax_cos' is ambiguous

There are a couple other failures (see output) , but I thought I would limit this bug report to the ambiguous function failures

How To Reproduce
Run tornado-benchmarks.py

A clear and concise description of what you expected to happen.
All benchmarks should run
benchmark.txt

Computing system setup (please complete the following information):

  • OS: MacOS Catalina 10.15.7 MacBook Pro 6-Core Intel I7, 16Gb memory, Radeo Pro 560X 4 GB
  • OpenCL Version : 1.2
  • TornadoVM commit id : e8c3161

Additional context

@jjfumero
Copy link
Member

It seems that the OpenCL OSx compiler cannot distinguish the function being used. We might need to explicitly typecast for the different varieties of these intrinsics.

https://stackoverflow.com/questions/28851608/opencl-call-to-a-built-in-function-is-ambiguous

We will work on this.

@jjfumero jjfumero added the bug Something isn't working label Feb 22, 2021
@yazun
Copy link
Contributor

yazun commented Nov 26, 2021

Does it mean no trigonometric function can be used currently on MacOs?
We develop on MacOS and run on Linux so it would be nice to have a workaround if one exists.

@yazun
Copy link
Contributor

yazun commented Nov 26, 2021

(Same behaviour on Monterey):

 tornado --devices
WARNING: Using incubator modules: jdk.incubator.foreign, jdk.incubator.vector

Number of Tornado drivers: 1
Driver: OpenCL
  Total number of OpenCL devices  : 3
  Tornado device=0:0
	OpenCL --  [Apple] -- Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
		Global Memory Size: 32.0 GB
		Local Memory Size: 32.0 KB
		Workgroup Dimensions: 3
		Total Number of Block Threads: 1024
		Max WorkGroup Configuration: [1024, 1, 1]
		Device OpenCL C version: OpenCL C 1.2

  Tornado device=0:1
	OpenCL --  [Apple] -- Intel(R) UHD Graphics 630
		Global Memory Size: 1.5 GB
		Local Memory Size: 64.0 KB
		Workgroup Dimensions: 3
		Total Number of Block Threads: 256
		Max WorkGroup Configuration: [256, 256, 256]
		Device OpenCL C version: OpenCL C 1.2

  Tornado device=0:2
	OpenCL --  [Apple] -- AMD Radeon Pro 5500M Compute Engine
		Global Memory Size: 8.0 GB
		Local Memory Size: 64.0 KB
		Workgroup Dimensions: 3
		Total Number of Block Threads: 256
		Max WorkGroup Configuration: [256, 256, 256]
		Device OpenCL C version: OpenCL C 1.2

@stratika
Copy link
Collaborator

Does it mean no trigonometric function can be used currently on MacOs? We develop on MacOS and run on Linux so it would be nice to have a workaround if one exists.

Hi @yazun, if you run on Linux it should be ok to use the trigonometric functions.

We have faced some inconsistencies when we have deployed some benchmarks and unit-tests on MacOS. This is mainly because of the old version of the OpenCL drivers for this OS. For example, one of the latest issues that we encounter is the lack of support for native trigonometric functions. This could affect you only if you run tests on your development setup. Is this the case?

@yazun
Copy link
Contributor

yazun commented Nov 26, 2021

Yes, we see sin/cos work fine on Linux and all tests/benchmarks pass ok (see #145) with JDK17.
Since we rely heavily on the trigonometric support it is an issue for development and indeed affects us.

@stratika
Copy link
Collaborator

Yes, we see sin/cos work fine on Linux and all tests/benchmarks pass ok (see #145) with JDK17. Since we rely heavily on the trigonometric support it is an issue for development and indeed affects us.

I see. Based on the information about your setup it seems that your driver supports OpenCL 1.2. This version offers support for the native trigonometric functions based on the standard (Page 252). So, it should not be a problem for you.

In my case, I have an older version of the driver and this is what causes the problem. Otherwise, the generated code from the TornadoVM JIT compiler should work as it respects the OpenCL standard.

Can you please confirm if sin/cos or native_sin/native_cos, they fail in your MacOS platform?

@yazun
Copy link
Contributor

yazun commented Nov 26, 2021

Good to hear it's not the limitation of the driver.

Not sure how to affect usage, tried both -Dtornado.enable.nativeFunctions=[False|True]

tornado -Dtornado.enable.nativeFunctions=False -m tornado.benchmarks/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner dft 8192 4096 --iterations 1
WARNING: Using incubator modules: jdk.incubator.vector, jdk.incubator.foreign
bm=dft-15-8192    , id=java-reference      , average=3.876110e+09, median=3.855491e+09, firstIteration=4.149909e+09, best=3.756110e+09
bm=dft-15-8192    , device=0:0  , average=2.482360e+08, median=2.457139e+08, firstIteration=3.457622e+08, best=2.258864e+08, speedupAvg=15.6146, speedupMedian=15.6910, speedupFirstIteration=12.0022, CV=-0.0000%, deviceName= [Apple] -- Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: Build Program driver returned (10015)
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> OpenCL Warning : clBuildProgram failed: could not build program for 0x1024500 (Intel(R) UHD Graphics 630) (err:-2)
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> [CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log:
<program source>:42:16: error: call to '__fast_relax_sin' is ambiguous
      d_22  =  sin(d_21);
               ^~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4765:22: note: expanded from macro 'sin'
    #define sin(__x) __fast_relax_sin(__x)
                     ^~~~~~~~~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
    __CLFN_FD_1FD_FAST_RELAX(__fast_relax_sin, native_sin, __cl_sin);
                             ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:417:27: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float __OVERLOAD__ _name(float x) { return _default_name(x); } \
                          ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:418:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float2 __OVERLOAD__ _name(float2 x) { return _default_name(x); } \
                           ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:419:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float3 __OVERLOAD__ _name(float3 x) { return _default_name(x); } \
                           ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:420:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float4 __OVERLOAD__ _name(float4 x) { return _default_name(x); } \
                           ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:421:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float8 __OVERLOAD__ _name(float8 x) { return _default_name(x); } \
                           ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:422:29: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float16 __OVERLOAD__ _name(float16 x){ return _default_name(x); }
                            ^
<program source>:44:16: error: call to '__fast_relax_cos' is ambiguous
      d_24  =  cos(d_21);
               ^~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4758:22: note: expanded from macro 'cos'
    #define cos(__x) __fast_relax_cos(__x)
                     ^~~~~~~~~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
    __CLFN_FD_1FD_FAST_RELAX(__fast_relax_cos, native_cos, __cl_cos);
                             ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:417:27: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float __OVERLOAD__ _name(float x) { return _default_name(x); } \
                          ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:418:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float2 __OVERLOAD__ _name(float2 x) { return _default_name(x); } \
                           ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:419:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float3 __OVERLOAD__ _name(float3 x) { return _default_name(x); } \
                           ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:420:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float4 __OVERLOAD__ _name(float4 x) { return _default_name(x); } \
                           ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:421:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float8 __OVERLOAD__ _name(float8 x) { return _default_name(x); } \
                           ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:422:29: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float16 __OVERLOAD__ _name(float16 x){ return _default_name(x); }
                            ^
<program source>:46:16: error: call to '__cl_fma' is ambiguous
      d_26  =  fma(d_22, d_23, d_25);
               ^~~~~~~~~~~~~~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4590:32: note: expanded from macro 'fma'
    #define fma(__x, __y, __z) __cl_fma(__x, __y, __z)
                               ^~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
__CLFN_FD_3FD(__cl_fma);
              ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:987:48: note: expanded from macro '__CLFN_FD_3FD'
#define __CLFN_FD_3FD(name) float __OVERLOAD__ name(float x, float y, float z); \
                                               ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:988:21: note: expanded from macro '__CLFN_FD_3FD'
float2 __OVERLOAD__ name(float2 x, float2 y, float2 z); \
                    ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:989:21: note: expanded from macro '__CLFN_FD_3FD'
float3 __OVERLOAD__ name(float3 x, float3 y, float3 z); \
                    ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:990:21: note: expanded from macro '__CLFN_FD_3FD'
float4 __OVERLOAD__ name(float4 x, float4 y, float4 z); \
                    ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:991:21: note: expanded from macro '__CLFN_FD_3FD'
float8 __OVERLOAD__ name(float8 x, float8 y, float8 z); \
                    ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:992:22: note: expanded from macro '__CLFN_FD_3FD'
float16 __OVERLOAD__ name(float16 x, float16 y, float16 z);
                     ^
<program source>:49:16: error: call to '__cl_fma' is ambiguous
      d_29  =  fma(d_24, d_15, d_28);
               ^~~~~~~~~~~~~~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4590:32: note: expanded from macro 'fma'
    #define fma(__x, __y, __z) __cl_fma(__x, __y, __z)
                               ^~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
__CLFN_FD_3FD(__cl_fma);
              ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:987:48: note: expanded from macro '__CLFN_FD_3FD'
#define __CLFN_FD_3FD(name) float __OVERLOAD__ name(float x, float y, float z); \
                                               ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:988:21: note: expanded from macro '__CLFN_FD_3FD'
float2 __OVERLOAD__ name(float2 x, float2 y, float2 z); \
                    ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:989:21: note: expanded from macro '__CLFN_FD_3FD'
float3 __OVERLOAD__ name(float3 x, float3 y, float3 z); \
                    ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:990:21: note: expanded from macro '__CLFN_FD_3FD'
float4 __OVERLOAD__ name(float4 x, float4 y, float4 z); \
                    ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:991:21: note: expanded from macro '__CLFN_FD_3FD'
float8 __OVERLOAD__ name(float8 x, float8 y, float8 z); \
                    ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:992:22: note: expanded from macro '__CLFN_FD_3FD'
float16 __OVERLOAD__ name(float16 x, float16 y, float16 z);
                     ^


[TornadoVM-OCL-JNI] ERROR : clBuildProgram -> Returned: -11
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> [CL_INVALID_VALUE] : OpenCL Error : clGetProgramBuildInfo failed: return buffer size (8192 bytes) was too small to hold the result: 11019 bytes
[TornadoVM-OCL-JNI] ERROR : clGetProgramBuildInfo -> Returned: -30
Error during code compilation with the OpenCL driver
uk.ac.manchester.tornado.api.exceptions.TornadoBailoutRuntimeException: Unable to compile task task benchmark.t0 - computeDFT
[tornado.drivers.opencl@0.13-dev/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.compileTask(OCLTornadoDevice.java:289), tornado.drivers.opencl@0.13-dev/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.compileJavaToAccelerator(OCLTornadoDevice.java:321), tornado.drivers.opencl@0.13-dev/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.installCode(OCLTornadoDevice.java:449), tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.compileTaskFromBytecodeToBinary(TornadoVM.java:467), tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.execute(TornadoVM.java:743), tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.warmup(TornadoVM.java:218), tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskSchedule.warmup(TornadoTaskSchedule.java:748), tornado.api@0.13-dev/uk.ac.manchester.tornado.api.TaskSchedule.warmup(TaskSchedule.java:326), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.dft.DFTTornado.setUp(DFTTornado.java:58), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.benchmark(BenchmarkDriver.java:94), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:120), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)]
	at tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.compileTaskFromBytecodeToBinary(TornadoVM.java:471)
	at tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.execute(TornadoVM.java:743)
	at tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.warmup(TornadoVM.java:218)
	at tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskSchedule.warmup(TornadoTaskSchedule.java:748)
	at tornado.api@0.13-dev/uk.ac.manchester.tornado.api.TaskSchedule.warmup(TaskSchedule.java:326)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.dft.DFTTornado.setUp(DFTTornado.java:58)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.benchmark(BenchmarkDriver.java:94)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:120)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)
Exception in thread "main" java.lang.NullPointerException: Cannot read the array length because "arr" is null
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getAverage(BenchmarkDriver.java:205)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getAverage(BenchmarkDriver.java:215)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getPreciseSummary(BenchmarkDriver.java:248)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:129)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)

Then with native enabled

tornado -Dtornado.enable.nativeFunctions=True -m tornado.benchmarks/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner dft 8192 4096 --iterations 1
WARNING: Using incubator modules: jdk.incubator.vector, jdk.incubator.foreign
bm=dft-15-8192    , id=java-reference      , average=3.759722e+09, median=3.715212e+09, firstIteration=4.117996e+09, best=3.579663e+09
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: build program driver returned (-1)
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> OpenCL Warning : clBuildProgram failed: could not build program for 0xffffffff (Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz) (err:-1)
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> [CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log:
kernel referenced external symbol '_Z10native_cosd' which could not be found.
kernel referenced external symbol '_Z10native_sind' which could not be found.


[TornadoVM-OCL-JNI] ERROR : clBuildProgram -> Returned: -11
Error during code compilation with the OpenCL driver
uk.ac.manchester.tornado.api.exceptions.TornadoBailoutRuntimeException: Unable to compile task task benchmark.t0 - computeDFT
[tornado.drivers.opencl@0.13-dev/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.compileTask(OCLTornadoDevice.java:289), tornado.drivers.opencl@0.13-dev/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.compileJavaToAccelerator(OCLTornadoDevice.java:321), tornado.drivers.opencl@0.13-dev/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.installCode(OCLTornadoDevice.java:449), tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.compileTaskFromBytecodeToBinary(TornadoVM.java:467), tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.execute(TornadoVM.java:743), tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.warmup(TornadoVM.java:218), tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskSchedule.warmup(TornadoTaskSchedule.java:748), tornado.api@0.13-dev/uk.ac.manchester.tornado.api.TaskSchedule.warmup(TaskSchedule.java:326), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.dft.DFTTornado.setUp(DFTTornado.java:58), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.benchmark(BenchmarkDriver.java:94), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:120), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)]
	at tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.compileTaskFromBytecodeToBinary(TornadoVM.java:471)
	at tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.execute(TornadoVM.java:743)
	at tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.warmup(TornadoVM.java:218)
	at tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskSchedule.warmup(TornadoTaskSchedule.java:748)
	at tornado.api@0.13-dev/uk.ac.manchester.tornado.api.TaskSchedule.warmup(TaskSchedule.java:326)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.dft.DFTTornado.setUp(DFTTornado.java:58)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.benchmark(BenchmarkDriver.java:94)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:120)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)
Exception in thread "main" java.lang.NullPointerException: Cannot read the array length because "arr" is null
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getAverage(BenchmarkDriver.java:205)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getAverage(BenchmarkDriver.java:215)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getPreciseSummary(BenchmarkDriver.java:248)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:129)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)

@yazun
Copy link
Contributor

yazun commented Nov 26, 2021

Specifying device does not change much, i.e. for discrete GPU (but worked for OpenCL on CPU )

 tornado  -Ds0.t0.device=0:2  -Dtornado.enable.nativeFunctions=False  -m tornado.benchmarks/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner dft 8192 4096 --iterations 1
WARNING: Using incubator modules: jdk.incubator.foreign, jdk.incubator.vector
bm=dft-15-8192    , id=java-reference      , average=3.774364e+09, median=3.777630e+09, firstIteration=3.948355e+09, best=3.605918e+09
bm=dft-15-8192    , device=0:0  , average=2.213122e+08, median=2.150051e+08, firstIteration=2.796697e+08, best=2.063883e+08, speedupAvg=17.0545, speedupMedian=17.5700, speedupFirstIteration=14.1179, CV=-0.0000%, deviceName= [Apple] -- Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: Build Program driver returned (10015)
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> OpenCL Warning : clBuildProgram failed: could not build program for 0x1024500 (Intel(R) UHD Graphics 630) (err:-2)
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> [CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log:
<program source>:42:16: error: call to '__fast_relax_sin' is ambiguous
      d_22  =  sin(d_21);
               ^~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4765:22: note: expanded from macro 'sin'
    #define sin(__x) __fast_relax_sin(__x)
                     ^~~~~~~~~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
    __CLFN_FD_1FD_FAST_RELAX(__fast_relax_sin, native_sin, __cl_sin);
                             ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:417:27: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float __OVERLOAD__ _name(float x) { return _default_name(x); } \
                          ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:418:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float2 __OVERLOAD__ _name(float2 x) { return _default_name(x); } \
                           ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:419:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float3 __OVERLOAD__ _name(float3 x) { return _default_name(x); } \
                           ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:420:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float4 __OVERLOAD__ _name(float4 x) { return _default_name(x); } \
                           ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:421:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float8 __OVERLOAD__ _name(float8 x) { return _default_name(x); } \
                           ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:422:29: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float16 __OVERLOAD__ _name(float16 x){ return _default_name(x); }
                            ^
<program source>:44:16: error: call to '__fast_relax_cos' is ambiguous
      d_24  =  cos(d_21);
               ^~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4758:22: note: expanded from macro 'cos'
    #define cos(__x) __fast_relax_cos(__x)
                     ^~~~~~~~~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
    __CLFN_FD_1FD_FAST_RELAX(__fast_relax_cos, native_cos, __cl_cos);
                             ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:417:27: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float __OVERLOAD__ _name(float x) { return _default_name(x); } \
                          ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:418:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float2 __OVERLOAD__ _name(float2 x) { return _default_name(x); } \
                           ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:419:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float3 __OVERLOAD__ _name(float3 x) { return _default_name(x); } \
                           ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:420:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float4 __OVERLOAD__ _name(float4 x) { return _default_name(x); } \
                           ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:421:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float8 __OVERLOAD__ _name(float8 x) { return _default_name(x); } \
                           ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:422:29: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float16 __OVERLOAD__ _name(float16 x){ return _default_name(x); }
                            ^
<program source>:46:16: error: call to '__cl_fma' is ambiguous
      d_26  =  fma(d_22, d_23, d_25);
               ^~~~~~~~~~~~~~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4590:32: note: expanded from macro 'fma'
    #define fma(__x, __y, __z) __cl_fma(__x, __y, __z)
                               ^~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
__CLFN_FD_3FD(__cl_fma);
              ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:987:48: note: expanded from macro '__CLFN_FD_3FD'
#define __CLFN_FD_3FD(name) float __OVERLOAD__ name(float x, float y, float z); \
                                               ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:988:21: note: expanded from macro '__CLFN_FD_3FD'
float2 __OVERLOAD__ name(float2 x, float2 y, float2 z); \
                    ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:989:21: note: expanded from macro '__CLFN_FD_3FD'
float3 __OVERLOAD__ name(float3 x, float3 y, float3 z); \
                    ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:990:21: note: expanded from macro '__CLFN_FD_3FD'
float4 __OVERLOAD__ name(float4 x, float4 y, float4 z); \
                    ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:991:21: note: expanded from macro '__CLFN_FD_3FD'
float8 __OVERLOAD__ name(float8 x, float8 y, float8 z); \
                    ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:992:22: note: expanded from macro '__CLFN_FD_3FD'
float16 __OVERLOAD__ name(float16 x, float16 y, float16 z);
                     ^
<program source>:49:16: error: call to '__cl_fma' is ambiguous
      d_29  =  fma(d_24, d_15, d_28);
               ^~~~~~~~~~~~~~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4590:32: note: expanded from macro 'fma'
    #define fma(__x, __y, __z) __cl_fma(__x, __y, __z)
                               ^~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
__CLFN_FD_3FD(__cl_fma);
              ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:987:48: note: expanded from macro '__CLFN_FD_3FD'
#define __CLFN_FD_3FD(name) float __OVERLOAD__ name(float x, float y, float z); \
                                               ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:988:21: note: expanded from macro '__CLFN_FD_3FD'
float2 __OVERLOAD__ name(float2 x, float2 y, float2 z); \
                    ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:989:21: note: expanded from macro '__CLFN_FD_3FD'
float3 __OVERLOAD__ name(float3 x, float3 y, float3 z); \
                    ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:990:21: note: expanded from macro '__CLFN_FD_3FD'
float4 __OVERLOAD__ name(float4 x, float4 y, float4 z); \
                    ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:991:21: note: expanded from macro '__CLFN_FD_3FD'
float8 __OVERLOAD__ name(float8 x, float8 y, float8 z); \
                    ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:992:22: note: expanded from macro '__CLFN_FD_3FD'
float16 __OVERLOAD__ name(float16 x, float16 y, float16 z);
                     ^


[TornadoVM-OCL-JNI] ERROR : clBuildProgram -> Returned: -11
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> [CL_INVALID_VALUE] : OpenCL Error : clGetProgramBuildInfo failed: return buffer size (8192 bytes) was too small to hold the result: 11019 bytes
[TornadoVM-OCL-JNI] ERROR : clGetProgramBuildInfo -> Returned: -30
Error during code compilation with the OpenCL driver
uk.ac.manchester.tornado.api.exceptions.TornadoBailoutRuntimeException: Unable to compile task task benchmark.t0 - computeDFT
[tornado.drivers.opencl@0.13-dev/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.compileTask(OCLTornadoDevice.java:289), tornado.drivers.opencl@0.13-dev/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.compileJavaToAccelerator(OCLTornadoDevice.java:321), tornado.drivers.opencl@0.13-dev/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.installCode(OCLTornadoDevice.java:449), tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.compileTaskFromBytecodeToBinary(TornadoVM.java:467), tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.execute(TornadoVM.java:743), tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.warmup(TornadoVM.java:218), tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskSchedule.warmup(TornadoTaskSchedule.java:748), tornado.api@0.13-dev/uk.ac.manchester.tornado.api.TaskSchedule.warmup(TaskSchedule.java:326), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.dft.DFTTornado.setUp(DFTTornado.java:58), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.benchmark(BenchmarkDriver.java:94), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:120), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)]
	at tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.compileTaskFromBytecodeToBinary(TornadoVM.java:471)
	at tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.execute(TornadoVM.java:743)
	at tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.warmup(TornadoVM.java:218)
	at tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskSchedule.warmup(TornadoTaskSchedule.java:748)
	at tornado.api@0.13-dev/uk.ac.manchester.tornado.api.TaskSchedule.warmup(TaskSchedule.java:326)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.dft.DFTTornado.setUp(DFTTornado.java:58)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.benchmark(BenchmarkDriver.java:94)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:120)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)
Exception in thread "main" java.lang.NullPointerException: Cannot read the array length because "arr" is null
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getAverage(BenchmarkDriver.java:205)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getAverage(BenchmarkDriver.java:215)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getPreciseSummary(BenchmarkDriver.java:248)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:129)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)

@yazun
Copy link
Contributor

yazun commented Nov 26, 2021

so can ambiguity come from the fact there are two GPUs? Is it possible to target just one to avoid this problem?

@stratika
Copy link
Collaborator

Can you print the kernel please, with --printKernel?

The error that you get ([CL_BUILD_ERROR] : OpenCL Build Error) is because the OpenCL driver cannot compile the kernel. However, if the kernel works on Linux for a similar device type, this means that the problem most likely is an OS - driver issue.

In this case, my next step would be to try an OpenCL kernel that uses the same function calls from C++, and see if the program will be built by the OpenCL driver.

I do not believe that the problem is related to the number of GPUs. If you want to force execution on a particular device, you can do that by using this flag: -D<s>.<t>.device=<driverNumber>:<deviceNumber>. See more here.

@yazun
Copy link
Contributor

yazun commented Nov 26, 2021

Could you clarify what should be -D<s>.<t>.device for benchmarks for say device 0:1?

kernel included:

 tornado  --printKernel  -Dtornado.enable.nativeFunctions=False  -m tornado.benchmarks/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner dft 8192 4096 --iterations 1
WARNING: Using incubator modules: jdk.incubator.foreign, jdk.incubator.vector
bm=dft-15-8192    , id=java-reference      , average=3.694428e+09, median=3.697348e+09, firstIteration=3.965296e+09, best=3.548463e+09
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable
__kernel void lookupBufferAddress(__global uchar *_heap_base, uint _frame_base, __constant uchar *_constant_region, __local uchar *_local_region, __global int *_atomics)
{

  __global ulong *_frame = (__global ulong *) &_heap_base[_frame_base];


  // BLOCK 0
  _frame[0]  =  (ulong) _heap_base;
}  //  kernel

#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable
__kernel void computeDFT(__global uchar *_heap_base, uint _frame_base, __constant uchar *_constant_region, __local uchar *_local_region, __global int *_atomics)
{
  ulong ul_21, ul_40, ul_41, ul_0, ul_1, ul_2, ul_3, ul_19;
  long l_17, l_18, l_16, l_37, l_38, l_39;
  double d_24, d_23, d_26, d_25, d_28, d_27, d_30, d_29, d_20, d_22, d_12, d_14, d_13, d_32, d_31, d_34, d_33, d_35;
  int i_9, i_8, i_7, i_6, i_5, i_4, i_36, i_15, i_11, i_10, i_42;

  __global ulong *_frame = (__global ulong *) &_heap_base[_frame_base];


  // BLOCK 0
  ul_0  =  (ulong) _frame[3];
  ul_1  =  (ulong) _frame[4];
  ul_2  =  (ulong) _frame[5];
  ul_3  =  (ulong) _frame[6];
  i_4  =  get_global_size(0);
  i_5  =  i_4 + 8191;
  i_6  =  i_5 / i_4;
  i_7  =  get_global_id(0);
  i_8  =  i_6 * i_7;
  i_9  =  i_8 + i_6;
  i_10  =  min(i_9, 8192);
  // BLOCK 1 MERGES [0 5 ]
  i_11  =  i_8;
  for(;i_11 < i_10;)
  {
    // BLOCK 2
    d_12  =  (double) i_11;
    // BLOCK 3 MERGES [2 4 ]
    d_13  =  0.0;
    d_14  =  0.0;
    i_15  =  0;
    for(;i_15 < 8192;)
    {
      // BLOCK 4
      l_16  =  (long) i_15;
      l_17  =  l_16 << 3;
      l_18  =  l_17 + 24L;
      ul_19  =  ul_0 + l_18;
      d_20  =  *((__global double *) ul_19);
      ul_21  =  ul_1 + l_18;
      d_22  =  *((__global double *) ul_21);
      d_23  =  (double) i_15;
      d_24  =  d_23 * 6.283185307179586;
      d_25  =  d_24 * d_12;
      d_26  =  d_25 / 8192.0;
      d_27  =  sin(d_26);
      d_28  =  -d_20;
      d_29  =  cos(d_26);
      d_30  =  d_29 * d_22;
      d_31  =  fma(d_27, d_28, d_30);
      d_32  =  d_14 + d_31;
      d_33  =  d_27 * d_22;
      d_34  =  fma(d_29, d_20, d_33);
      d_35  =  d_13 + d_34;
      i_36  =  i_15 + 1;
      d_13  =  d_35;
      d_14  =  d_32;
      i_15  =  i_36;
    }  // B4
    // BLOCK 5
    l_37  =  (long) i_11;
    l_38  =  l_37 << 3;
    l_39  =  l_38 + 24L;
    ul_40  =  ul_2 + l_39;
    *((__global double *) ul_40)  =  d_13;
    ul_41  =  ul_3 + l_39;
    *((__global double *) ul_41)  =  d_14;
    i_42  =  i_11 + 1;
    i_11  =  i_42;
  }  // B5
  // BLOCK 6
  return;
}  //  kernel

bm=dft-15-8192    , device=0:0  , average=2.073738e+08, median=2.048341e+08, firstIteration=2.683069e+08, best=1.915486e+08, speedupAvg=17.8153, speedupMedian=18.0505, speedupFirstIteration=14.7790, CV=-0.0000%, deviceName= [Apple] -- Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
__kernel void lookupBufferAddress(__global uchar *_heap_base, uint _frame_base, __constant uchar *_constant_region, __local uchar *_local_region, __global int *_atomics)
{

  __global ulong *_frame = (__global ulong *) &_heap_base[_frame_base];


  // BLOCK 0
  _frame[0]  =  (ulong) _heap_base;
}  //  kernel

__kernel void computeDFT(__global uchar *_heap_base, uint _frame_base, __constant uchar *_constant_region, __local uchar *_local_region, __global int *_atomics)
{
  ulong ul_36, ul_14, ul_0, ul_16, ul_1, ul_2, ul_3, ul_35;
  long l_13, l_11, l_12, l_33, l_34, l_32;
  double d_24, d_23, d_26, d_25, d_28, d_27, d_30, d_29, d_15, d_18, d_17, d_20, d_19, d_22, d_21, d_8, d_7, d_9;
  int i_6, i_5, i_37, i_4, i_31, i_10;

  __global ulong *_frame = (__global ulong *) &_heap_base[_frame_base];


  // BLOCK 0
  ul_0  =  (ulong) _frame[3];
  ul_1  =  (ulong) _frame[4];
  ul_2  =  (ulong) _frame[5];
  ul_3  =  (ulong) _frame[6];
  i_4  =  get_global_size(0);
  i_5  =  get_global_id(0);
  // BLOCK 1 MERGES [0 5 ]
  i_6  =  i_5;
  for(;i_6 < 8192;)
  {
    // BLOCK 2
    d_7  =  (double) i_6;
    // BLOCK 3 MERGES [2 4 ]
    d_8  =  0.0;
    d_9  =  0.0;
    i_10  =  0;
    for(;i_10 < 8192;)
    {
      // BLOCK 4
      l_11  =  (long) i_10;
      l_12  =  l_11 << 3;
      l_13  =  l_12 + 24L;
      ul_14  =  ul_0 + l_13;
      d_15  =  *((__global double *) ul_14);
      ul_16  =  ul_1 + l_13;
      d_17  =  *((__global double *) ul_16);
      d_18  =  (double) i_10;
      d_19  =  d_18 * 6.283185307179586;
      d_20  =  d_19 * d_7;
      d_21  =  d_20 / 8192.0;
      d_22  =  sin(d_21);
      d_23  =  -d_15;
      d_24  =  cos(d_21);
      d_25  =  d_24 * d_17;
      d_26  =  fma(d_22, d_23, d_25);
      d_27  =  d_9 + d_26;
      d_28  =  d_22 * d_17;
      d_29  =  fma(d_24, d_15, d_28);
      d_30  =  d_8 + d_29;
      i_31  =  i_10 + 1;
      d_8  =  d_30;
      d_9  =  d_27;
      i_10  =  i_31;
    }  // B4
    // BLOCK 5
    l_32  =  (long) i_6;
    l_33  =  l_32 << 3;
    l_34  =  l_33 + 24L;
    ul_35  =  ul_2 + l_34;
    *((__global double *) ul_35)  =  d_8;
    ul_36  =  ul_3 + l_34;
    *((__global double *) ul_36)  =  d_9;
    i_37  =  i_4 + i_6;
    i_6  =  i_37;
  }  // B5
  // BLOCK 6
  return;
}  //  kernel

[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: Build Program driver returned (10015)
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> OpenCL Warning : clBuildProgram failed: could not build program for 0x1024500 (Intel(R) UHD Graphics 630) (err:-2)
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> [CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log:
<program source>:42:16: error: call to '__fast_relax_sin' is ambiguous
      d_22  =  sin(d_21);
               ^~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4765:22: note: expanded from macro 'sin'
    #define sin(__x) __fast_relax_sin(__x)
                     ^~~~~~~~~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
    __CLFN_FD_1FD_FAST_RELAX(__fast_relax_sin, native_sin, __cl_sin);
                             ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:417:27: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float __OVERLOAD__ _name(float x) { return _default_name(x); } \
                          ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:418:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float2 __OVERLOAD__ _name(float2 x) { return _default_name(x); } \
                           ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:419:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float3 __OVERLOAD__ _name(float3 x) { return _default_name(x); } \
                           ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:420:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float4 __OVERLOAD__ _name(float4 x) { return _default_name(x); } \
                           ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:421:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float8 __OVERLOAD__ _name(float8 x) { return _default_name(x); } \
                           ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4763:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:422:29: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float16 __OVERLOAD__ _name(float16 x){ return _default_name(x); }
                            ^
<program source>:44:16: error: call to '__fast_relax_cos' is ambiguous
      d_24  =  cos(d_21);
               ^~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4758:22: note: expanded from macro 'cos'
    #define cos(__x) __fast_relax_cos(__x)
                     ^~~~~~~~~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
    __CLFN_FD_1FD_FAST_RELAX(__fast_relax_cos, native_cos, __cl_cos);
                             ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:417:27: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float __OVERLOAD__ _name(float x) { return _default_name(x); } \
                          ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:418:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float2 __OVERLOAD__ _name(float2 x) { return _default_name(x); } \
                           ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:419:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float3 __OVERLOAD__ _name(float3 x) { return _default_name(x); } \
                           ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:420:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float4 __OVERLOAD__ _name(float4 x) { return _default_name(x); } \
                           ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:421:28: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float8 __OVERLOAD__ _name(float8 x) { return _default_name(x); } \
                           ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4756:30: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:422:29: note: expanded from macro '__CLFN_FD_1FD_FAST_RELAX'
inline float16 __OVERLOAD__ _name(float16 x){ return _default_name(x); }
                            ^
<program source>:46:16: error: call to '__cl_fma' is ambiguous
      d_26  =  fma(d_22, d_23, d_25);
               ^~~~~~~~~~~~~~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4590:32: note: expanded from macro 'fma'
    #define fma(__x, __y, __z) __cl_fma(__x, __y, __z)
                               ^~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
__CLFN_FD_3FD(__cl_fma);
              ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:987:48: note: expanded from macro '__CLFN_FD_3FD'
#define __CLFN_FD_3FD(name) float __OVERLOAD__ name(float x, float y, float z); \
                                               ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:988:21: note: expanded from macro '__CLFN_FD_3FD'
float2 __OVERLOAD__ name(float2 x, float2 y, float2 z); \
                    ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:989:21: note: expanded from macro '__CLFN_FD_3FD'
float3 __OVERLOAD__ name(float3 x, float3 y, float3 z); \
                    ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:990:21: note: expanded from macro '__CLFN_FD_3FD'
float4 __OVERLOAD__ name(float4 x, float4 y, float4 z); \
                    ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:991:21: note: expanded from macro '__CLFN_FD_3FD'
float8 __OVERLOAD__ name(float8 x, float8 y, float8 z); \
                    ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:992:22: note: expanded from macro '__CLFN_FD_3FD'
float16 __OVERLOAD__ name(float16 x, float16 y, float16 z);
                     ^
<program source>:49:16: error: call to '__cl_fma' is ambiguous
      d_29  =  fma(d_24, d_15, d_28);
               ^~~~~~~~~~~~~~~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4590:32: note: expanded from macro 'fma'
    #define fma(__x, __y, __z) __cl_fma(__x, __y, __z)
                               ^~~~~~~~
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
__CLFN_FD_3FD(__cl_fma);
              ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:987:48: note: expanded from macro '__CLFN_FD_3FD'
#define __CLFN_FD_3FD(name) float __OVERLOAD__ name(float x, float y, float z); \
                                               ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:988:21: note: expanded from macro '__CLFN_FD_3FD'
float2 __OVERLOAD__ name(float2 x, float2 y, float2 z); \
                    ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:989:21: note: expanded from macro '__CLFN_FD_3FD'
float3 __OVERLOAD__ name(float3 x, float3 y, float3 z); \
                    ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:990:21: note: expanded from macro '__CLFN_FD_3FD'
float4 __OVERLOAD__ name(float4 x, float4 y, float4 z); \
                    ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:991:21: note: expanded from macro '__CLFN_FD_3FD'
float8 __OVERLOAD__ name(float8 x, float8 y, float8 z); \
                    ^
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:4587:15: note: candidate function
/System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/A/lib/clang/3.5/include/cl_kernel.h:992:22: note: expanded from macro '__CLFN_FD_3FD'
float16 __OVERLOAD__ name(float16 x, float16 y, float16 z);
                     ^


[TornadoVM-OCL-JNI] ERROR : clBuildProgram -> Returned: -11
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> [CL_INVALID_VALUE] : OpenCL Error : clGetProgramBuildInfo failed: return buffer size (8192 bytes) was too small to hold the result: 11019 bytes
[TornadoVM-OCL-JNI] ERROR : clGetProgramBuildInfo -> Returned: -30
Error during code compilation with the OpenCL driver
uk.ac.manchester.tornado.api.exceptions.TornadoBailoutRuntimeException: Unable to compile task task benchmark.t0 - computeDFT
[tornado.drivers.opencl@0.13-dev/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.compileTask(OCLTornadoDevice.java:289), tornado.drivers.opencl@0.13-dev/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.compileJavaToAccelerator(OCLTornadoDevice.java:321), tornado.drivers.opencl@0.13-dev/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.installCode(OCLTornadoDevice.java:449), tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.compileTaskFromBytecodeToBinary(TornadoVM.java:467), tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.execute(TornadoVM.java:743), tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.warmup(TornadoVM.java:218), tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskSchedule.warmup(TornadoTaskSchedule.java:748), tornado.api@0.13-dev/uk.ac.manchester.tornado.api.TaskSchedule.warmup(TaskSchedule.java:326), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.dft.DFTTornado.setUp(DFTTornado.java:58), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.benchmark(BenchmarkDriver.java:94), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:120), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)]
	at tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.compileTaskFromBytecodeToBinary(TornadoVM.java:471)
	at tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.execute(TornadoVM.java:743)
	at tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.warmup(TornadoVM.java:218)
	at tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskSchedule.warmup(TornadoTaskSchedule.java:748)
	at tornado.api@0.13-dev/uk.ac.manchester.tornado.api.TaskSchedule.warmup(TaskSchedule.java:326)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.dft.DFTTornado.setUp(DFTTornado.java:58)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.benchmark(BenchmarkDriver.java:94)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:120)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)
Exception in thread "main" java.lang.NullPointerException: Cannot read the array length because "arr" is null
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getAverage(BenchmarkDriver.java:205)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getAverage(BenchmarkDriver.java:215)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getPreciseSummary(BenchmarkDriver.java:248)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:129)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)

@yazun
Copy link
Contributor

yazun commented Nov 26, 2021

and the native kernel:

tornado  --printKernel  -Dtornado.enable.nativeFunctions=True  -m tornado.benchmarks/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner dft 8192 4096 --iterations 1
WARNING: Using incubator modules: jdk.incubator.foreign, jdk.incubator.vector
  bm=dft-15-8192    , id=java-reference      , average=3.819124e+09, median=3.791125e+09, firstIteration=4.083596e+09, best=3.519395e+09
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable
__kernel void lookupBufferAddress(__global uchar *_heap_base, uint _frame_base, __constant uchar *_constant_region, __local uchar *_local_region, __global int *_atomics)
{

  __global ulong *_frame = (__global ulong *) &_heap_base[_frame_base];


  // BLOCK 0
  _frame[0]  =  (ulong) _heap_base;
}  //  kernel

#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable
__kernel void computeDFT(__global uchar *_heap_base, uint _frame_base, __constant uchar *_constant_region, __local uchar *_local_region, __global int *_atomics)
{
  ulong ul_21, ul_40, ul_41, ul_0, ul_1, ul_2, ul_3, ul_19;
  long l_17, l_18, l_16, l_37, l_38, l_39;
  double d_24, d_23, d_26, d_25, d_28, d_27, d_30, d_29, d_20, d_22, d_12, d_14, d_13, d_32, d_31, d_34, d_33, d_35;
  int i_9, i_8, i_7, i_6, i_5, i_4, i_36, i_15, i_11, i_10, i_42;

  __global ulong *_frame = (__global ulong *) &_heap_base[_frame_base];


  // BLOCK 0
  ul_0  =  (ulong) _frame[3];
  ul_1  =  (ulong) _frame[4];
  ul_2  =  (ulong) _frame[5];
  ul_3  =  (ulong) _frame[6];
  i_4  =  get_global_size(0);
  i_5  =  i_4 + 8191;
  i_6  =  i_5 / i_4;
  i_7  =  get_global_id(0);
  i_8  =  i_6 * i_7;
  i_9  =  i_8 + i_6;
  i_10  =  min(i_9, 8192);
  // BLOCK 1 MERGES [0 5 ]
  i_11  =  i_8;
  for(;i_11 < i_10;)
  {
    // BLOCK 2
    d_12  =  (double) i_11;
    // BLOCK 3 MERGES [2 4 ]
    d_13  =  0.0;
    d_14  =  0.0;
    i_15  =  0;
    for(;i_15 < 8192;)
    {
      // BLOCK 4
      l_16  =  (long) i_15;
      l_17  =  l_16 << 3;
      l_18  =  l_17 + 24L;
      ul_19  =  ul_0 + l_18;
      d_20  =  *((__global double *) ul_19);
      ul_21  =  ul_1 + l_18;
      d_22  =  *((__global double *) ul_21);
      d_23  =  (double) i_15;
      d_24  =  d_23 * 6.283185307179586;
      d_25  =  d_24 * d_12;
      d_26  =  d_25 / 8192.0;
      d_27  =  native_sin(d_26);
      d_28  =  -d_20;
      d_29  =  native_cos(d_26);
      d_30  =  d_29 * d_22;
      d_31  =  fma(d_27, d_28, d_30);
      d_32  =  d_14 + d_31;
      d_33  =  d_27 * d_22;
      d_34  =  fma(d_29, d_20, d_33);
      d_35  =  d_13 + d_34;
      i_36  =  i_15 + 1;
      d_13  =  d_35;
      d_14  =  d_32;
      i_15  =  i_36;
    }  // B4
    // BLOCK 5
    l_37  =  (long) i_11;
    l_38  =  l_37 << 3;
    l_39  =  l_38 + 24L;
    ul_40  =  ul_2 + l_39;
    *((__global double *) ul_40)  =  d_13;
    ul_41  =  ul_3 + l_39;
    *((__global double *) ul_41)  =  d_14;
    i_42  =  i_11 + 1;
    i_11  =  i_42;
  }  // B5
  // BLOCK 6
  return;
}  //  kernel

[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> [CL_DEVICE_NOT_AVAILABLE] : OpenCL Error : Error: build program driver returned (-1)
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> OpenCL Warning : clBuildProgram failed: could not build program for 0xffffffff (Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz) (err:-1)
[JNI] uk.ac.manchester.tornado.drivers.opencl> notify error:
[JNI] uk.ac.manchester.tornado.drivers.opencl> [CL_BUILD_ERROR] : OpenCL Build Error : Compiler build log:
kernel referenced external symbol '_Z10native_cosd' which could not be found.
kernel referenced external symbol '_Z10native_sind' which could not be found.


[TornadoVM-OCL-JNI] ERROR : clBuildProgram -> Returned: -11
Error during code compilation with the OpenCL driver
uk.ac.manchester.tornado.api.exceptions.TornadoBailoutRuntimeException: Unable to compile task task benchmark.t0 - computeDFT
[tornado.drivers.opencl@0.13-dev/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.compileTask(OCLTornadoDevice.java:289), tornado.drivers.opencl@0.13-dev/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.compileJavaToAccelerator(OCLTornadoDevice.java:321), tornado.drivers.opencl@0.13-dev/uk.ac.manchester.tornado.drivers.opencl.runtime.OCLTornadoDevice.installCode(OCLTornadoDevice.java:449), tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.compileTaskFromBytecodeToBinary(TornadoVM.java:467), tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.execute(TornadoVM.java:743), tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.warmup(TornadoVM.java:218), tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskSchedule.warmup(TornadoTaskSchedule.java:748), tornado.api@0.13-dev/uk.ac.manchester.tornado.api.TaskSchedule.warmup(TaskSchedule.java:326), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.dft.DFTTornado.setUp(DFTTornado.java:58), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.benchmark(BenchmarkDriver.java:94), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:120), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86), tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)]
	at tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.compileTaskFromBytecodeToBinary(TornadoVM.java:471)
	at tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.execute(TornadoVM.java:743)
	at tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.TornadoVM.warmup(TornadoVM.java:218)
	at tornado.runtime@0.13-dev/uk.ac.manchester.tornado.runtime.tasks.TornadoTaskSchedule.warmup(TornadoTaskSchedule.java:748)
	at tornado.api@0.13-dev/uk.ac.manchester.tornado.api.TaskSchedule.warmup(TaskSchedule.java:326)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.dft.DFTTornado.setUp(DFTTornado.java:58)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.benchmark(BenchmarkDriver.java:94)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:120)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)
Exception in thread "main" java.lang.NullPointerException: Cannot read the array length because "arr" is null
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getAverage(BenchmarkDriver.java:205)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getAverage(BenchmarkDriver.java:215)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkDriver.getPreciseSummary(BenchmarkDriver.java:248)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.benchmarkAll(BenchmarkRunner.java:129)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.run(BenchmarkRunner.java:86)
	at tornado.benchmarks@0.13-dev/uk.ac.manchester.tornado.benchmarks.BenchmarkRunner.main(BenchmarkRunner.java:190)

@yazun
Copy link
Contributor

yazun commented Nov 26, 2021

I remembered OpenCL has been a dead-end for Macs (i.e. https://www.reddit.com/r/OpenCL/comments/qevdbg/opencl_30_on_macos/)

Something to keep in mind during the next round of purchases for any scientific team.

Nevertheless,seems we can work on openCL with -Dtornado.enable.nativeFunctions=False with CPU so it should be good enough for development for now.

@stratika
Copy link
Collaborator

I remembered OpenCL has been a dead-end for Macs (i.e. https://www.reddit.com/r/OpenCL/comments/qevdbg/opencl_30_on_macos/)

Yes, these are unfortunate news.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants