-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use faster lud_diagonal #11
Conversation
908a1ff
to
2b0dcb8
Compare
In comparison, lud.fut performs about similar to the old lud-clean.fut:
whereas the Rodinia implementation is a lot faster than ours:
|
What is the difference between lud.fut and lud-clean.fut? And if we really want a version that uses this kind of hack, should it then go in the "clean" version? |
I'm not sure there's much point in keeping both lud.fut and lud-clean.fut, as both seem to perform about the same (before this PR). I believe the original intent was to have a nice implementation that was easy (or easier) to understand and modify and one that ran fast, but that doesn't seem necessary any more. |
I'll merge this if you create one single |
lud-clean was originally created as a nicer but slower implementation of lud. However, it is not actually any slower any more, so we should replace lud with lud-clean. This commits does so.
This version of lud_diagonal uses intra-group parallelism and is much faster with the right tuning parameters.
81caf4c
to
305f771
Compare
I've pushed new commits that merge lud-clean and lud, and apply the changes to lud_diagonal that I've been working on. |
Use faster lud_diagonal Former-commit-id: 26964cb
Use faster lud_diagonal Former-commit-id: 26964cb
This version of lud_diagonal uses intra-group parallelism and is faster
with the right tuning parameters and incremental flattening:
Before:
After
on gpu04-diku-apl.