Skip to content

Commit

Permalink
Add optimizations for AMD GPUs to VkFFT
Browse files Browse the repository at this point in the history
* Do not load from shared memory inside if branches

    hipcc generates slower jumps and duplicated loads
    instead of simple conditional moves (v_cndmask_b32).

* Use vector operations for better gfx90a's packed f32 math

* Add input and output wrappers for faster load and store address computation

* Replace blockDim with actual values
  • Loading branch information
ex-rzr committed Sep 8, 2022
1 parent 092286b commit d97c6bd
Showing 1 changed file with 125 additions and 80 deletions.
Loading

0 comments on commit d97c6bd

Please sign in to comment.