Conversation
908a1ff to
2b0dcb8
Compare
|
In comparison, lud.fut performs about similar to the old lud-clean.fut: whereas the Rodinia implementation is a lot faster than ours: |
|
What is the difference between lud.fut and lud-clean.fut? And if we really want a version that uses this kind of hack, should it then go in the "clean" version? |
|
I'm not sure there's much point in keeping both lud.fut and lud-clean.fut, as both seem to perform about the same (before this PR). I believe the original intent was to have a nice implementation that was easy (or easier) to understand and modify and one that ran fast, but that doesn't seem necessary any more. |
|
I'll merge this if you create one single |
lud-clean was originally created as a nicer but slower implementation of lud. However, it is not actually any slower any more, so we should replace lud with lud-clean. This commits does so.
This version of lud_diagonal uses intra-group parallelism and is much faster with the right tuning parameters.
81caf4c to
305f771
Compare
I've pushed new commits that merge lud-clean and lud, and apply the changes to lud_diagonal that I've been working on. |
Use faster lud_diagonal Former-commit-id: 26964cb
Use faster lud_diagonal Former-commit-id: 26964cb
This version of lud_diagonal uses intra-group parallelism and is faster
with the right tuning parameters and incremental flattening:
Before:
After
on gpu04-diku-apl.