Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New OpenCL FFT implementation #144

Closed
GoogleCodeExporter opened this issue Aug 12, 2015 · 2 comments
Closed

New OpenCL FFT implementation #144

GoogleCodeExporter opened this issue Aug 12, 2015 · 2 comments
Assignees
Labels
comp-Logic Related to internal code logic feature Allows new functionality OpenCL Running on GPUs and similar devices performance Simulation speed, memory consumption pri-Medium Worth assigning to a milestone
Milestone

Comments

@GoogleCodeExporter
Copy link

Сurrently performance of adda_ocl is limited by the used Apple clFFT, which 
was originally created mostly as proof-of-principle. The emerging alternative 
is AMD implementaion. It should be faster than Apple one, support radixes of 3 
and 5 (like Temperton FFT). There are remaining questions whether it can be 
used with Nvidia cards.

Overall, finding and linking to more advanced OpenCL FFT implementations is 
definitely the main direction for development of adda_ocl.

Original issue reported on code.google.com by yurkin on 16 Apr 2012 at 1:33

@GoogleCodeExporter
Copy link
Author

AMD FFT library usage is now controlled via ocl/Makefile since r1155.
It seems that the backward FFT of the AMD library still produces some errors 
which was a known problem for certain power of 2 sizes, where radix4 and radix8 
are involved.
With an AMD Radeon HD 5870 2GB and the proprietary AMD device drive driver 
Catalyst 12.4 and 12.6 and AMD APPML FFT version 1.6.244 and 1.8 Beta the 
problem still exists.
Timing is done using AMD FFT as forward FFT and Apple FFT as backward FFT.
If the Backward FFT of AMD can be used inside a-dda, with a Radeon HD 5870 it 
will speedup the FFT part to about a factor of 10 of the arithmetic part.

r1155 - bf6ae18

Original comment by Marcus.H...@gmail.com on 15 Aug 2012 at 10:09

@GoogleCodeExporter
Copy link
Author

This issue was closed by revision r1178.

r1178 - 746455d

Original comment by yurkin on 18 Jan 2013 at 7:46

  • Changed state: Fixed

@GoogleCodeExporter GoogleCodeExporter added OpSys-All comp-Logic Related to internal code logic performance Simulation speed, memory consumption pri-Medium Worth assigning to a milestone OpenCL Running on GPUs and similar devices labels Aug 12, 2015
@myurkin myurkin modified the milestone: 1.2 Aug 13, 2015
@myurkin myurkin added the feature Allows new functionality label Aug 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp-Logic Related to internal code logic feature Allows new functionality OpenCL Running on GPUs and similar devices performance Simulation speed, memory consumption pri-Medium Worth assigning to a milestone
Projects
None yet
Development

No branches or pull requests

3 participants