-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RAJA view performance test to benchmark #1728
Conversation
@artv3 move to benchmark folder. |
I ran the code on Pascal and got the following results:
|
What gpus are on pascal? It seems strange that they are ~24x slower than the V100s on lassen? |
Tesla P100-PCIE-16GB I believe. |
Are we running the same code in the same way on both of these platforms? How did you build? |
We found that the issue was that my code built for debugging which was causing the slowdown. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just some small nits about making sure the benchmark makes sense for all backends
@johnbowen42 @rhornung67 can I get another review? I just pushed up the changes I thought I had pushed up. |
After resolving issue #1718, this PR now adds the performance test into the bench mark folder.
//------------
This PR adds the code provided in issue #1718 in an effort to reproduce the slow down.
I don't have access to pascal but on lassen I see comparable performance:
Elapsed time with RAJA view : 0.0951086
Elapsed time with NO RAJA view : 0.0952884
To avoid measuring stream initialization I added an basic forall at the start of the program.
Compiler setup:
nvcc: nvcc11.2.0, cuda_arch=70, gcc8.3.1
I also tried: nvcc11.8.0, cuda_arch=70, gcc8.3.1
Elapsed time with RAJA view : 0.0949394
Elapsed time with NO RAJA view : 0.0949237