-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternate implementation of Intel PRK DGEMM #15835
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some in-detail comments in-line. Couple of higher-level comments:
- I think this shouldn't create a new dir under
test/studies/prk
. You can create a dir undertest/studies/prk/DGEMM
nameddistributed
,summa
etc. Or you can also dump these files with some suffix intest/studies/prk
- We discussed with @ben-albrecht that we want to have this to be a part of nightly performance tracking suite. For example see XC suite here. Also see multilocale performance testing doc. But doing that can be a follow up. The sooner we begin measuring the performance, the better. You can track the impact of your future optimizations there.
@e-kayrakli I think it's ready for a review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good. I asked few things that need to be adjusted for better testing runs.
Moreover, it is important to talk about the deviation from SUMMA in a comment somewhere.
I tested this on our correctness testing machine and 16-locale XC performance testing machine. @ben-albrecht could you also do a quick sanity check especially w.r.t the multilocale testing files and BLAS testing? ML tests are running only on Crays, so we don't need a skipif I think. But I have been wrong in similar topics before...
If someone can do a sanity check on the testing, I think this PR can be merged now |
An alternate implementation of Intel PRK DGEMM
NOTE - Currently does not perform pipelined communication, which is inherent to SUMMA