What is it?

This is a project to experiment GPGPU using OpenCL.

Editing the main.cpp file will let you switch between the different examples.

The code of the first example is a slightly modified version of this excellent tutorial.

Why?

First, parallelizing algorithms is fun!

And in another project, I implemented a convolution reverb algorithm using ffts. Ffts are the bottleneck of the implementation, so I wanted to see if using the GPU to do the ffts would improve the performance, and by how much.

How?

As always, I try to tackle "one small problem at a time":

First, I started just with the tutorial, and verified I could make it run on my machine.

Then, I iteratively complexified the tutorial, adding one little new aspect at a time, before doing the fft (cooley-tukey algorithm without bitreversal of the input).

At every iteration, I was checking that the kernel code was behaving as intended by comparing the results with an equivalent cpu-based implementation.

And using the environment variable CL_LOG_ERRORS=stdout made debugging kernel compilation errors a lot easier!

Next Steps

experiment changing the number of items in the workgroup (compensate with the numner of local butterflies): is it best to have a lot of items or a lot of local butterflies? Should we auto-tune that?
experiment changing the radix, auto-tune that.
use images to have faster access to global memory:
- To have faster read only access to inputs, use an image + float4 read_imagef
- To have fater write to output, use an image + write_imagef
- read/write images are opencl 2.0 only, but in practice passing the image twice with different qualifiers can work, depending on the driver + hardware.
use images for twiddles, see if it is faster than computing them on the fly (especially for high precision, and double).
Alternate global memory reads with computations for the first level to hide the compute time in the memory latency.
use the idea in https://mc.stanford.edu/cgi-bin/images/7/75/SC08_FFT_on_GPUs.pdf where private memory is used
instead of doing all levels in a single kernel, try doing one kernel per level, and use images to store intermediate results. The code will be more optimal because more stuff will be precomputed, and possibly less registers will be used.
Try stockham for big ffts.
Compare with other (open source) fft implementations on the gpu (for example, https://github.com/clMathLibraries/clFFT)
Implement in-place fft.

Platforms

Using CMake you can build and run it on a recent OSX.

Other platforms are not supported, but I think it's just a matter of making the CMakeLists.txt more general regarding the way to link to the OpenCL library.

Contributions

PRs are welcome, for example to generalize the CMakeLists.txt file to make it build and run on Linux or Windows.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
CMakeLists.txt		CMakeLists.txt
README.md		README.md
bitReverse.cpp		bitReverse.cpp
cplx.c		cplx.c
cpu_fft.cpp		cpu_fft.cpp
cpu_fft_norecursion.cpp		cpu_fft_norecursion.cpp
error_check.cpp		error_check.cpp
main.cpp		main.cpp
main_add_1024_ints.cpp		main_add_1024_ints.cpp
main_fft_8_floats.cpp		main_fft_8_floats.cpp
main_fft_8_floats_stockham.cpp		main_fft_8_floats_stockham.cpp
main_fft_huge_floats_local_twiddles.cpp		main_fft_huge_floats_local_twiddles.cpp
main_fft_huge_floats_stockham_local_twiddles.cpp		main_fft_huge_floats_stockham_local_twiddles.cpp
main_fft_many_floats.cpp		main_fft_many_floats.cpp
main_fft_many_floats_local.cpp		main_fft_many_floats_local.cpp
main_fft_many_floats_local_twiddles.cpp		main_fft_many_floats_local_twiddles.cpp
main_fft_many_floats_local_twiddles_separate.cpp		main_fft_many_floats_local_twiddles_separate.cpp
main_fft_many_floats_stockham.cpp		main_fft_many_floats_stockham.cpp
main_fft_many_floats_stockham_twiddles.cpp		main_fft_many_floats_stockham_twiddles.cpp
main_fft_many_floats_stockham_twiddles_images.cpp		main_fft_many_floats_stockham_twiddles_images.cpp
main_fft_many_floats_stockham_twiddles_separate.cpp		main_fft_many_floats_stockham_twiddles_separate.cpp
main_mix_8_floats.cpp		main_mix_8_floats.cpp
math.cpp		math.cpp
rand.cpp		rand.cpp
read_kernel_source.cpp		read_kernel_source.cpp
vector_add_kernel.cl		vector_add_kernel.cl
vector_fft_floats.cl		vector_fft_floats.cl
vector_fft_floats_huge_local_coalesce_shifts_twiddles.cl		vector_fft_floats_huge_local_coalesce_shifts_twiddles.cl
vector_fft_floats_multi.cl		vector_fft_floats_multi.cl
vector_fft_floats_multi_local.cl		vector_fft_floats_multi_local.cl
vector_fft_floats_multi_local_coalesce.cl		vector_fft_floats_multi_local_coalesce.cl
vector_fft_floats_multi_local_coalesce_shifts_twiddles.cl		vector_fft_floats_multi_local_coalesce_shifts_twiddles.cl
vector_fft_floats_multi_local_coalesce_shifts_twiddles_separate.cl		vector_fft_floats_multi_local_coalesce_shifts_twiddles_separate.cl
vector_fft_floats_multi_local_coalesce_shifts_twiddles_separatebis.cl		vector_fft_floats_multi_local_coalesce_shifts_twiddles_separatebis.cl
vector_fft_floats_multi_local_shifts.cl		vector_fft_floats_multi_local_shifts.cl
vector_fft_floats_multi_local_shifts_peel.cl		vector_fft_floats_multi_local_shifts_peel.cl
vector_fft_floats_multi_local_shifts_twiddles.cl		vector_fft_floats_multi_local_shifts_twiddles.cl
vector_fft_floats_multi_local_shifts_twiddles_constantinput.cl		vector_fft_floats_multi_local_shifts_twiddles_constantinput.cl
vector_fft_floats_multi_local_shifts_twiddlesconstant.cl		vector_fft_floats_multi_local_shifts_twiddlesconstant.cl
vector_fft_floats_multi_local_writeback.cl		vector_fft_floats_multi_local_writeback.cl
vector_fft_floats_multi_separate_local_shifts_twiddles.cl		vector_fft_floats_multi_separate_local_shifts_twiddles.cl
vector_fft_floats_stockham.cl		vector_fft_floats_stockham.cl
vector_fft_floats_stockham_huge_local_coalesce_shifts_twiddles.cl		vector_fft_floats_stockham_huge_local_coalesce_shifts_twiddles.cl
vector_fft_floats_stockham_multi_local_coalesce.cl		vector_fft_floats_stockham_multi_local_coalesce.cl
vector_fft_floats_stockham_multi_local_coalesce_shift.cl		vector_fft_floats_stockham_multi_local_coalesce_shift.cl
vector_fft_floats_stockham_multi_local_coalesce_shift_twiddles.cl		vector_fft_floats_stockham_multi_local_coalesce_shift_twiddles.cl
vector_fft_floats_stockham_multi_local_coalesce_shift_twiddles_images.cl		vector_fft_floats_stockham_multi_local_coalesce_shift_twiddles_images.cl
vector_fft_floats_stockham_multi_local_coalesce_shift_twiddles_separate.cl		vector_fft_floats_stockham_multi_local_coalesce_shift_twiddles_separate.cl
vector_mix_floats.cl		vector_mix_floats.cl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is it?

Why?

How?

Next Steps

Platforms

Contributions

About

Releases

Packages

Languages

OlivierSohn/gpgpu-experiments

Folders and files

Latest commit

History

Repository files navigation

What is it?

Why?

How?

Next Steps

Platforms

Contributions

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages