-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
General mostly performance-neutral improvements #97
General mostly performance-neutral improvements #97
Conversation
5527530
to
0044721
Compare
I think we might as well get this merged before it grows too big. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like all great improvements both in code and documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks all good to me
Also remove unnecessary duplicate of JGL.
Apparently this is unnecessary, so we remove it, and all ranks will go from north-south.
The parent scope (BUTTERFLY_ALG_MOD) has this already.
Also remove that copy to ZVECIN -> it served no purpose.
In both cases we have a temporary variable copy the same type and shape as what's being copied into it. These can be deleted.
The previous combination of IF statements meant an unnecessary number of different calls to GEMM. GEMM now wraps the double and single precision versions so we can take advantage of this to reduce the amount of code repetition.
36a6734
to
2a2581a
Compare
This PR will collect suggested changes that came as a result of the deep-dive into ecTrans with Nvidia colleagues.