arfoll/occcl
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
OCCCL - OCCAM-Pi with openCL
ABSTRACT
AIM
Investiage the feasibility and the performance of opencl/C acceleration on existing pure occam programs.
METHODOLOGY
All startup is done by initial.c. Two methods are provided. initialise() returns a cl_context and cl_device_id which is used to compile the program from src. Each CL program creates its own cl_program_queue at every run of the CL program.
buildcl() builds from src a program which can then be executed by the cl_program_queue
All CL programs are in one .c file with the _init call and the call seperated. The _init needs to run once before the CL program can be executed. There is only a check if it is enabled at compile time.
HARDWARE
CL_DEVICE_TYPE_GPU:
ATI RV770 (HD4850) (cl_khr_byte_addressable_store not supported)
OpenCL 1.1 SDK = AMD APP SDK 2.5
Extensions = cl_amd_fp64 cl_khr_gl_sharing cl_amd_device_attribute_query
Num Compute units = 10
Nvidia GTX560 (GF114)
Extensions = OpenCL device : GeForce GTX 560
cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64
Nvidia 8600GT
Nvidia GT430
Nvidia ION
OpenCL 1.1 SDK = Nvidia CUDA 4.0.1
Extensions = cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
CL_DEVICE_TYPE_CPU:
2x Intel E5420
Intel Core i7-2600
RESULTS
mandelbrot (fp64, 32workers, 128frames) benchmark:
Occam (Intel Q6600 @ 3.0ghz):
real 0m17.877s
user 0m35.928s
sys 0m0.773s
OpenCL (Intel Q6600 @ 3.0ghz + nvidia GTX560):
real 0m3.467s
user 0m0.780s
sys 0m1.013s
C (Intel Q6600 @ 3.0ghz):
real 0m4.323s
user 0m3.230s
sys 0m0.670s
mandebrot no disp (gcc 4.6.2 with -O2
(Intel Q6600 @ 3.0ghz + nvidia GTX560)
OpenCL implementation (frame - with opts -D USE_DOUBLE -cl-fast-relaxed-math -cl-mad-enable)
438ms
C implementation (frame)
2663ms
Occam implementation (line)
13413ms
Occoids-single:
(Intel Q6600 @ 3.0Ghz + nvidia GTX560)
Occam implementation
Cycle 50; cycle time = 14719 us; 2180 processes
Cycle 100; cycle time = 14730 us; 2180 processes
Cycle 150; cycle time = 14885 us; 2180 processes
Cycle 200; cycle time = 15706 us; 2180 processes
Cycle 250; cycle time = 15967 us; 2180 processes
Cycle 300; cycle time = 17533 us; 2180 processes
Cycle 350; cycle time = 16094 us; 2180 processes
Cycle 400; cycle time = 16547 us; 2180 processes
Cycle 450; cycle time = 16460 us; 2180 processes
Cycle 500; cycle time = 16140 us; 2180 processes
Cycle 550; cycle time = 16998 us; 2180 processes
Cycle 600; cycle time = 17053 us; 2180 processes
Cycle 650; cycle time = 16883 us; 2180 processes
Cycle 700; cycle time = 16191 us; 2180 processes
Cycle 750; cycle time = 16688 us; 2180 processes
Cycle 800; cycle time = 16797 us; 2180 processes
Cycle 850; cycle time = 17053 us; 2180 processes
Cycle 900; cycle time = 17501 us; 2180 processes
Cycle 950; cycle time = 18061 us; 2180 processes
CONCLUSION
READING