Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation Faults in tests with POCL OpenCL driver #16

Open
AvtechScientific opened this issue Aug 27, 2015 · 7 comments
Open

Segmentation Faults in tests with POCL OpenCL driver #16

AvtechScientific opened this issue Aug 27, 2015 · 7 comments

Comments

@AvtechScientific
Copy link
Owner

No description provided.

@opoplawski
Copy link
Contributor

Fedora build errors with pocl:

/builddir/build/BUILD/ASL-248ad2de1fcddffd38405008779fb1ee292184e6/build-x86-64.fc26/test/testACL/testPrivateVar
Test of "Simple kernel" function.../builddir/build/BUILD/ASL-248ad2de1fcddffd38405008779fb1ee292184e6/build-x86-64.fc26/test/testACL/testPrivateVar/builddir/build/BUILD/ASL-248ad2de1fcddffd38405008779fb1ee292184e6/build-x86-64.fc26/test/testACL/testPrivateVar/builddir/build/BUILD/ASL-248ad2de1fcddffd38405008779fb1ee292184e6/build-x86-64.fc26/test/testACL/testPrivateVar/builddir/build/BUILD/ASL-248ad2de1fcddffd38405008779fb1ee292184e6/build-x86-64.fc26/test/testACL/testPrivateVar/builddir/build/BUILD/ASL-248ad2de1fcddffd38405008779fb1ee292184e6/build-x86-64.fc26/test/testACL/testPrivateVar/builddir/build/BUILD/ASL-248ad2de1fcddffd38405008779fb1ee292184e6/build-x86-64.fc26/test/testACL/testPrivateVar/builddir/build/BUILD/ASL-248ad2de1fcddffd38405008779fb1ee292184e6/build-x86-64.fc26/test/testACL/testPrivateVar: : : : : : : symbol lookup errorsymbol lookup errorsymbol lookup errorsymbol lookup errorsymbol lookup errorsymbol lookup errorsymbol lookup error: : : : : : : /builddir/.cache/pocl/kcache/PG/ODNNMOOCFAMCGHOHACMHAHGLIHDIALKHNEKIC/compute_0/1-1-1/compute_0.so/builddir/.cache/pocl/kcache/PG/ODNNMOOCFAMCGHOHACMHAHGLIHDIALKHNEKIC/compute_0/1-1-1/compute_0.so/builddir/.cache/pocl/kcache/PG/ODNNMOOCFAMCGHOHACMHAHGLIHDIALKHNEKIC/compute_0/1-1-1/compute_0.so/builddir/.cache/pocl/kcache/PG/ODNNMOOCFAMCGHOHACMHAHGLIHDIALKHNEKIC/compute_0/1-1-1/compute_0.so/builddir/.cache/pocl/kcache/PG/ODNNMOOCFAMCGHOHACMHAHGLIHDIALKHNEKIC/compute_0/1-1-1/compute_0.so/builddir/.cache/pocl/kcache/PG/ODNNMOOCFAMCGHOHACMHAHGLIHDIALKHNEKIC/compute_0/1-1-1/compute_0.so/builddir/.cache/pocl/kcache/PG/ODNNMOOCFAMCGHOHACMHAHGLIHDIALKHNEKIC/compute_0/1-1-1/compute_0.so: : : : : : : undefined symbol: _Z16_cl_native_rsqrtdundefined symbol: _Z16_cl_native_rsqrtdundefined symbol: _Z16_cl_native_rsqrtdundefined symbol: _Z16_cl_native_rsqrtdundefined symbol: _Z16_cl_native_rsqrtdundefined symbol: _Z16_cl_native_rsqrtdundefined symbol: _Z16_cl_native_rsqrtd

and

# /builddir/build/BUILD/ASL-248ad2de1fcddffd38405008779fb1ee292184e6/build-x86-64.fc26/test/testMath/testReductionFunction
testSum...Segmentation fault (core dumped)

This is with 248ad2d

@AvtechScientific
Copy link
Owner Author

Dear @opoplawski ,

thank you for the report! Could you, please, specify which version of POCL do you use?
As per ASL deployment info, the issue should have been fixed by the POCL team in version v0.11.

Do you face similar issue while working with other OpenCL drivers?

Thank you!

@AvtechScientific AvtechScientific changed the title Segmentation Fualts in tests with POCL OpenCL driver Segmentation Faults in tests with POCL OpenCL driver Nov 8, 2016
@opoplawski
Copy link
Contributor

This is with POCL 0.14-0.1.fc26. Tests run fine with nvidia binary drivers on my local machine.

@opoplawski
Copy link
Contributor

Additional failures on aarch64, see https://kojipkgs.fedoraproject.org//work/tasks/5670/16375670/build.log

The following tests FAILED:
      4 - testKernel (OTHER_FAULT)
      7 - testASLData (OTHER_FAULT)
      8 - testDistanceFunction (OTHER_FAULT)

@AvtechScientific
Copy link
Owner Author

AvtechScientific commented Nov 9, 2016

This is with POCL 0.14-0.1.fc26. Tests run fine with nvidia binary drivers on my local machine.

Then it wasn't fixed and looks like a POCL issue. Do you have a detailed error output to file an issue report to the POCL team, here: https://github.com/pocl/pocl/issues . If you do report it there, please, let us know, so we can provide our input.

Additional failures on aarch64

4 - testKernel (OTHER_FAULT)
8 - testDistanceFunction (OTHER_FAULT)

those two might fail due to the lack of double precision support.

As for '7 - testASLData (OTHER_FAULT)' we need more informative error message to try to find the cause of the problem.

Thank you!

@ghisvail
Copy link
Contributor

Has pocl 0.14 been released already? The milestone is not complete and it does not appear to be tagged.

It would be nice to try with 0.13 first, which is the latest tagged release.

@opoplawski
Copy link
Contributor

More verbose output with pocl 0.14 on 32-bit arm:

test 4
    Start 4: testKernel
4: Test command: /builddir/build/BUILD/ASL-0.1.7/build-armv7hl-32.fc26/test/testACL/testKernel
4: Test timeout computed to be: 9.99988e+06
4: Test of "copy" function... Ok
4: Test of Kernel with double...1 warning and 4 errors generated.
4:  
4: 			BUILD LOG
4:  ************************************************
4: error: /builddir/.cache/pocl/kcache/temp_v4yQlu.cl:8:7: subscript of pointer to incomplete type '__global error_undefined_type_double' (aka '__global struct error_undefined_type_double')
4: error: /builddir/.cache/pocl/kcache/temp_v4yQlu.cl:9:7: subscript of pointer to incomplete type '__global error_undefined_type_double' (aka '__global struct error_undefined_type_double')
4: error: /builddir/.cache/pocl/kcache/temp_v4yQlu.cl:9:28: subscript of pointer to incomplete type '__global error_undefined_type_double' (aka '__global struct error_undefined_type_double')
4: error: /builddir/.cache/pocl/kcache/temp_v4yQlu.cl:10:7: subscript of pointer to incomplete type '__global error_undefined_type_double' (aka '__global struct error_undefined_type_double')
4: warning: /builddir/.cache/pocl/kcache/temp_v4yQlu.cl:1:26: unsupported OpenCL extension 'cl_khr_fp64' - ignoring
4: 
4: 
4:  ************************************************
4:  
4: 			KERNEL SOURCE CODE
4:  ------------------------------------------------
4: #pragma OPENCL EXTENSION cl_khr_fp64 : enable
4: 
4: __kernel void compute_0(__global double *a_d1,
4:                         __global double *a_d2,
4:                         __global double *a_d3)
4: {
4: 	uint index = get_global_id(0);
4: 	(a_d3[index]=2.);
4: 	(a_d1[index]=(2.+pown(a_d3[index], 3)));
4: 	(a_d2[index]=index);
4: }
4: terminate called after throwing an instance of 'std::logic_error'
4:   what():  ASL ERROR: Program::build() (-11).
4:  ------------------------------------------------
4/8 Test #4: testKernel .......................***Exception: Other  1.47 sec

    Start 7: testASLData
7: Test command: /builddir/build/BUILD/ASL-0.1.7/build-armv7hl-32.fc26/test/testMath/testASLData
7: Test timeout computed to be: 9.99988e+06
7: 1 warning and 2 errors generated.
7:  
7: 			BUILD LOG
7:  ************************************************
7: error: /builddir/.cache/pocl/kcache/temp_UBKSin.cl:7:7: subscript of pointer to incomplete type '__global error_undefined_type_double' (aka '__global struct error_undefined_type_double')
7: error: /builddir/.cache/pocl/kcache/temp_UBKSin.cl:8:7: subscript of pointer to incomplete type '__global error_undefined_type_double' (aka '__global struct error_undefined_type_double')
7: warning: /builddir/.cache/pocl/kcache/temp_UBKSin.cl:1:26: unsupported OpenCL extension 'cl_khr_fp64' - ignoring
7: 
7: 
7:  ************************************************
7:  
7: 			KERNEL SOURCE CODE
7:  ------------------------------------------------
7: #pragma OPENCL EXTENSION cl_khr_fp64 : enable
7: 
7: __kernel void compute_0(__global double *a_d1,
7:                         __global double *a_d2)
7: {
7: 	uint index = get_global_id(0);
7: 	(a_d1[index]=0.);
7: 	(a_d2[index]=0.);
7: }
7: terminate called after throwing an instance of 'std::logic_error'
7:   what():  ASL ERROR: Program::build() (-11).
7:  ------------------------------------------------
7/8 Test #7: testASLData ......................***Exception: Other  1.20 sec

test 8
    Start 8: testDistanceFunction
8: Test command: /builddir/build/BUILD/ASL-0.1.7/build-armv7hl-32.fc26/test/testMath/testDistanceFunction
8: Test timeout computed to be: 9.99988e+06
8: 2 warnings and 5 errors generated.
8:  
8: 			BUILD LOG
8:  ************************************************
8: error: /builddir/.cache/pocl/kcache/temp_5n5UQU.cl:6:9: variable has incomplete type 'error_undefined_type_double' (aka 'struct error_undefined_type_double')
8: error: /builddir/.cache/pocl/kcache/temp_5n5UQU.cl:7:9: variable has incomplete type 'error_undefined_type_double' (aka 'struct error_undefined_type_double')
8: error: /builddir/.cache/pocl/kcache/temp_5n5UQU.cl:8:9: cast to incomplete type 'error_undefined_type_double' (aka 'struct error_undefined_type_double')
8: error: /builddir/.cache/pocl/kcache/temp_5n5UQU.cl:9:9: cast to incomplete type 'error_undefined_type_double' (aka 'struct error_undefined_type_double')
8: error: /builddir/.cache/pocl/kcache/temp_5n5UQU.cl:10:6: subscript of pointer to incomplete type '__global error_undefined_type_double' (aka '__global struct error_undefined_type_double')
8: warning: /builddir/.cache/pocl/kcache/temp_5n5UQU.cl:1:26: unsupported OpenCL extension 'cl_khr_fp64' - ignoring
8: warning: /builddir/.cache/pocl/kcache/temp_5n5UQU.cl:8:23: implicit declaration of function 'convert_double' is invalid in C99
8: 
8: 
8:  ************************************************
8:  
8: 			KERNEL SOURCE CODE
8:  ------------------------------------------------
8: #pragma OPENCL EXTENSION cl_khr_fp64 : enable
8: 
8: __kernel void compute_0(__global double *a_d1)
8: {
8: 	uint index = get_global_id(0);
8: 	double pv_d1;
8: 	double pv_d2;
8: 	(pv_d1=(double)((-1.+convert_double((index/102)))));
8: 	(pv_d2=(double)((-1.+convert_double((index%102)))));
8: 	a_d1[index] = max(min((sqrt(((pv_d1 - 50.)*(pv_d1 - 50.)+(pv_d2 - 50.)*(pv_d2 - 50.))) - 10.), (sqrt(((pv_d1 - 40.)*(pv_d1 - 40.)+(pv_d2 - 40.)*(pv_d2 - 40.))) - 10.)), (sqrt(((pv_d1 - 50.)*(pv_d1 - 50.)+(pv_d2 - 50.)*(pv_d2 - 50.))) - 20.));
8: }
8: terminate called after throwing an instance of 'std::logic_error'
8:   what():  ASL ERROR: Program::build() (-11).
8:  ------------------------------------------------
8/8 Test #8: testDistanceFunction .............***Exception: Other  2.53 sec

The testDistanceFunction error is new with pocl 0.14, but the other two test fail with 0.13 as well, though in different ways:

test 4
    Start 4: testKernel
4: Test command: /builddir/build/BUILD/ASL-0.1.7/build-armv7hl-32.fc25/test/testACL/testKernel
4: Test timeout computed to be: 9.99988e+06
4: Test of "copy" function... Ok
4: Test of Kernel with double... Ok
4: Test of KernelSIMD... Ok
4: Test of KernelSIMDUA... Ok
4: Test of kernel with PrivateVariable... Ok
4: Test of kernel with PrivateArray... Ok
4: Test of Variable functionality... Ok
4: Test of VariableReference functionality... Ok
4: Test of select function... Ok
4: Test of Swap functionality... Ok
4: Test of LocalArray and syncCopy with barrier()...6 errors generated.
4:  
4: 			BUILD LOG
4:  ************************************************
4: error: /builddir/.cache/pocl/kcache/temp_JTb38I.cl:10:20: used type 'event_t' where arithmetic or pointer type is required
4: error: /builddir/.cache/pocl/kcache/temp_JTb38I.cl:11:106: used type 'event_t' where arithmetic or pointer type is required
4: error: /builddir/.cache/pocl/kcache/temp_JTb38I.cl:13:20: used type 'event_t' where arithmetic or pointer type is required
4: error: /builddir/.cache/pocl/kcache/temp_JTb38I.cl:14:106: used type 'event_t' where arithmetic or pointer type is required
4: error: /builddir/.cache/pocl/kcache/temp_JTb38I.cl:19:20: used type 'event_t' where arithmetic or pointer type is required
4: error: /builddir/.cache/pocl/kcache/temp_JTb38I.cl:20:106: used type 'event_t' where arithmetic or pointer type is required
4: 
4: 
4:  ************************************************
4:  
4: 			KERNEL SOURCE CODE
4:  ------------------------------------------------
4: #pragma OPENCL EXTENSION cl_khr_fp64 : disable
4: 
4: __kernel void compute_8(__global float *a_f14,
4:                         __global float *a_f13)
4: {
4: 	uint index = get_local_id(0);
4: 	uint groupID = get_group_id(0);
4: 	__local float la_f2[2];
4: 	__local float la_f1[2];
4: 	event_t event_0 = (event_t)0;
4: 	event_0 = async_work_group_copy(&((__local float *)la_f1)[0], &((__global float *)a_f13)[2*groupID], 2, (event_t)0);
4: 	wait_group_events (1, &event_0);
4: 	event_t event_1 = (event_t)0;
4: 	event_1 = async_work_group_copy(&((__local float *)la_f2)[0], &((__global float *)a_f14)[2*groupID], 2, (event_t)0);
4: 	wait_group_events (1, &event_1);
4: 	barrier(CLK_LOCAL_MEM_FENCE);
4: 	(la_f2[index]=(la_f2[index] - la_f1[index]));
4: 	barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE);
4: 	event_t event_2 = (event_t)0;
4: 	event_2 = async_work_group_copy(&((__global float *)a_f14)[2*groupID], &((__local float *)la_f2)[0], 2, (event_t)0);
4: 	wait_group_events (1, &event_2);
4: }
4: terminate called after throwing an instance of 'std::logic_error'
4:   what():  ASL ERROR: Program::build() (-11).
4:  ------------------------------------------------
4/8 Test #4: testKernel .......................***Exception: Other 22.44 sec

7: #pragma OPENCL EXTENSION cl_khr_fp64 : enable
7: 
7: __kernel void compute_0(__global double *a_d1,
7:                         __global double *a_d2)
7: {
7: 	uint index = get_global_id(0);
7: 	(a_d1[index]=0.);
7: 	(a_d2[index]=0.);
7: }
7: Test of UploadToLocalMem()...4 errors generated.
7:  
7: 			BUILD LOG
7:  ************************************************
7: error: /builddir/.cache/pocl/kcache/temp_H7TiiK.cl:26:20: used type 'event_t' where arithmetic or pointer type is required
7: error: /builddir/.cache/pocl/kcache/temp_H7TiiK.cl:27:109: used type 'event_t' where arithmetic or pointer type is required
7: error: /builddir/.cache/pocl/kcache/temp_H7TiiK.cl:29:20: used type 'event_t' where arithmetic or pointer type is required
7: error: /builddir/.cache/pocl/kcache/temp_H7TiiK.cl:30:109: used type 'event_t' where arithmetic or pointer type is required
7: 
7: 
7:  ************************************************
7:  
7: 			KERNEL SOURCE CODE
7:  ------------------------------------------------
7: #pragma OPENCL EXTENSION cl_khr_fp64 : disable
7: 
7: __kernel void compute_2(__global float *a_f2,
7:                         __global float *a_f1,
7:                         __global float *a_f3,
7:                         __global float *a_f4)
7: {
7: 	uint index = get_local_id(0);
7: 	uint groupID = get_group_id(0);
7: 	__local float la_f1[125];
7: 	__local float la_f2[125];
7: 	(la_f1[index]=a_f1[((((index/25)+(groupID/4)*5)*100+(((index%25)/5)+((groupID%4)/2)*5)*10)+(((index%25)%5)+((groupID%4)%2)*5))]);
7: 	(la_f2[index]=a_f2[((((index/25)+(groupID/4)*5)*100+(((index%25)/5)+((groupID%4)/2)*5)*10)+(((index%25)%5)+((groupID%4)%2)*5))]);
7: 	(la_f1[(27+index)]=a_f1[(((((27+index)/25)+(groupID/4)*5)*100+((((27+index)%25)/5)+((groupID%4)/2)*5)*10)+((((27+index)%25)%5)+((groupID%4)%2)*5))]);
7: 	(la_f2[(27+index)]=a_f2[(((((27+index)/25)+(groupID/4)*5)*100+((((27+index)%25)/5)+((groupID%4)/2)*5)*10)+((((27+index)%25)%5)+((groupID%4)%2)*5))]);
7: 	(la_f1[(54+index)]=a_f1[(((((54+index)/25)+(groupID/4)*5)*100+((((54+index)%25)/5)+((groupID%4)/2)*5)*10)+((((54+index)%25)%5)+((groupID%4)%2)*5))]);
7: 	(la_f2[(54+index)]=a_f2[(((((54+index)/25)+(groupID/4)*5)*100+((((54+index)%25)/5)+((groupID%4)/2)*5)*10)+((((54+index)%25)%5)+((groupID%4)%2)*5))]);
7: 	(la_f1[(81+index)]=a_f1[(((((81+index)/25)+(groupID/4)*5)*100+((((81+index)%25)/5)+((groupID%4)/2)*5)*10)+((((81+index)%25)%5)+((groupID%4)%2)*5))]);
7: 	(la_f2[(81+index)]=a_f2[(((((81+index)/25)+(groupID/4)*5)*100+((((81+index)%25)/5)+((groupID%4)/2)*5)*10)+((((81+index)%25)%5)+((groupID%4)%2)*5))]);
7: 	if ((index<17))
7: 	{
7: 		(la_f1[(108+index)]=a_f1[(((((108+index)/25)+(groupID/4)*5)*100+((((108+index)%25)/5)+((groupID%4)/2)*5)*10)+((((108+index)%25)%5)+((groupID%4)%2)*5))]);
7: 		(la_f2[(108+index)]=a_f2[(((((108+index)/25)+(groupID/4)*5)*100+((((108+index)%25)/5)+((groupID%4)/2)*5)*10)+((((108+index)%25)%5)+((groupID%4)%2)*5))]);
7: 	};
7: 	barrier(CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE);
7: 	event_t event_0 = (event_t)0;
7: 	event_0 = async_work_group_copy(&((__global float *)a_f3)[125*groupID], &((__local float *)la_f1)[0], 125, (event_t)0);
7: 	wait_group_events (1, &event_0);
7: 	event_t event_1 = (event_t)0;
7: 	event_1 = async_work_group_copy(&((__global float *)a_f4)[125*groupID], &((__local float *)la_f2)[0], 125, (event_t)0);
7: 	wait_group_events (1, &event_1);
7: }
7: terminate called after throwing an instance of 'std::logic_error'
7:   what():  ASL ERROR: Program::build() (-11).
7:  ------------------------------------------------
7/8 Test #7: testASLData ......................***Exception: Other 12.22 sec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants