Skip to content

rocFFT 1.0.15 for ROCm 5.0.0

Compare
Choose a tag to compare
@lawruble13 lawruble13 released this 09 Feb 21:45
fb0d3f8

Changed

  • Re-aligned split device library into 4 roughly equal libraries.
  • Implemented the FuseShim framework to replace the original OptimizePlan
  • Implemented the generic buffer-assignment framework. The buffer assignment
    is no longer performed by each node. We designed a generic algorithm to
    test and pick the best assignment path.
    With the help of FuseShim, we can achieve more kernel-fusions as possible.
  • Do not read the imaginary part of the DC and Nyquist modes for even-length
    complex-to-real transforms.

Optimizations

  • Optimized twiddle-conjugation; complex-to-complex inverse transforms should have similar performance to foward transforms now.
  • Improved performance of single-kernel small 2D transforms.