-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Futhark implementation #146
Conversation
Signed-off-by: Jeff Hammond <jehammond@nvidia.com>
Since Futhark compiles to C code, it can use the same main.cpp as the various C++ implementations.
Perhaps the CI scripts should be updated as well? |
The more exotic implementations also aren't tested in CI, so I intentionally didn't add that to keep the invasiveness of this PR low. But I can do so if desired, no problem. (Incidentally, someone really ought to make a GA action for Futhark.) |
We do have CIs setup for the Rust, Julia, and Java variant so it would be good to have it for Futhark too. |
Alright, I'll add a CI step for Futhark. |
I have added a very simple Futhark action. It only tests a single version of cmake (the preinstalled one). If you want, I can also try to fit it into the C++ framework, but I don't think it's worth it (and might make it more difficult to test other Futhark backends). |
Thanks for the PR and sorry for the late reply, I'm taking a look now (running benchmarks, etc), one thing I've noticed is the lack of a device enumeration API in the Futhark runtime (but apparently you can set devices, or even be presented a dialog in the OpenCL case), this was a bit problematic as I'm testing machines with more than one OpenCL platform. As a workaround, we may have to implement the device enumeration by replicating the logic in the OpenCL and CUDA models. |
It wouldn't be difficult to add. I can take a swing at it. |
@athas Do have to say, the generated C code for multicore CPU is more readable than a good portion C libraries out there! |
Then the state of C libraries is more dire than I thought. The Futhark-generated OpenCL/CUDA APIs allow one to select a device by index, but not to enumerate all devices (except through the menu). Would it be OK to only implement the selection, but not the enumeration? |
I can just copy the device enumeration code from the |
I have added device selection now. |
I've got benchmark results for a few platforms. Which is excellent. >./build/omp-stream --arraysize 536870912
BabelStream
Version: 4.0
Implementation: OpenMP
Running kernels 100 times
Precision: double
Array size: 4295.0 MB (=4.3 GB)
Total size: 12884.9 MB (=12.9 GB)
Function MBytes/sec Min (sec) Max Average
Copy 25610.437 0.33541 0.36010 0.34216
Mul 25402.003 0.33816 0.35810 0.34445
Add 29010.697 0.44414 0.47372 0.45097
Triad 28878.115 0.44618 0.47433 0.45268
Dot 44377.658 0.19356 0.21538 0.20067
>./build/futhark-stream --arraysize 536870912
BabelStream
Version: 4.0
Implementation: Futhark (parallel CPU)
Running kernels 100 times
Precision: double
Array size: 4295.0 MB (=4.3 GB)
Total size: 12884.9 MB (=12.9 GB)
Function MBytes/sec Min (sec) Max Average
Copy 26107.059 0.32903 0.35351 0.33425
Mul 25928.510 0.33129 0.35788 0.33712
Add 29328.709 0.43933 0.46747 0.44469
Triad 28930.892 0.44537 0.47550 0.45077
Dot 44656.475 0.19236 0.20958 0.19728
Performance is still good on a dual socket Intel Xeon Gold 6338 (2 NUMA domains) system compared to OpenMP: But for a dual socket AMD EPYC 7713 (8 NUMA domains), the performance is quite poor: I didn't do the scaling here due to time constraints. |
Yes, the multicore backend is completely NUMA-ignorant. The GPU backends are much more mature. |
LGTM, I'm trying to validate the CI by merging this with |
Apparently it is impossible for me to do so for weird GitHub reasons. I can see about resolving the conflicts myself. |
Thanks for the merge, I've approved it now, let's wait for CI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor changes to get the CI going again
CI fails due to an unrelated job running out of disk space. I wonder why that is not an issue on the |
No worries, it's probably because we ran out of cache due to how big the compilers are (we download and untar NVHPC in the setup). Thanks again and sorry about the slow turnaround. |
* Add Futhark.
I'm at SC22 and noticed that BabelStream is pretty popular, and I really like the idea of polyglot benchmark suites. So, here's an implementation in Futhark. I don't know if this is too obscure a language, but it's easy to invoke from C and C++ and so can use the same tooling as all the C++ implementations.