-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More GPU Performance Optimization #1399
Comments
|
for the first: https://github.com/XFluids/XFluids/blob/master/src/solver_Reconstruction/FDM_Method/Reconstruction_kernels.hpp, line 8,a sycl kernel function named ReconstructFluxX is tested, and for CUDA/HIP code, we package the ReconstructFluxX kernel into a CUDA/HIP experssion kernel function named ReconstructFluxXVendorWrapper, and we add the __launch_bounds() attribute, too, it's a marco function named VENDOR_KERNEL_LB(__LBMt, 1), and __LBMt is set to 256 for CUDA and HIP. A cmake option named VENDOR_SUBMIT of https://github.com/XFluids/XFluids/blob/master/CMakeLists.txt to control either SYCL parallelism(set VENDOR_SUBMIT OFF) or CUDA/HIP(set VENDOR_SUBMIT ON) parallelism. Additionally, we tested the Kernel to find an optimization local block shape(work-group size in SYCL) for kernel's executing, it's processed by XFLUIDS automatically. for the seond:well, a little complex to run XFLUIDS until a binary file thrown, then rerun XFLUIDS, the binary file is read for best performance, for short, you compile XFLUIDS(paltform and arch is given in line 7,8 of https://github.com/XFluids/XFluids/blob/master/CMakeLists.txt) in build, assuming build as current work dir
and timer is implemented by std::choron, see https://github.com/XFluids/XFluids/blob/master/src/solver_Reconstruction/FDM_Method/ConVenction_block.hpp line 219, a timer spot is created, and then we submit the Kernel parallelism and wait to sync, once the wait is done, then get the duration time between the time previous spot and current. for the second:We had never tried before, we will try later.. |
A list of GPUs has been tested by the following Kernel, it's part of our opensource CFD framwork XFLUIDS:
llvm-16.0.6 is implemented on A100, amd Rocm-5.4.1 on RX6800XT, without competitive performance over CUDA/HIP model, we are now using multi-pass compile system
the Performance listed here:
Kernel Code listed here:
The text was updated successfully, but these errors were encountered: