- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.6k
Fix instabilities in CGEMM/CTRMM/DNRM2 on Apple M1/M2 under OSX #4003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
        
          
                kernel/arm64/sgemm_kernel_sve_v2x8.S
              
                Outdated
          
        
      | #define alpha w18 | ||
| #define vec_len x19 | ||
| #define alpha w19 | ||
| #define vec_len x20 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This overlaps with the below register, and I don't believe this kernel is in scope of this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially we could run ARMV8SVE on one of the AWS nodes for testing this in CI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I missed moving veclen2 to x21. And not exactly in scope, but same problem (once Apple releases a Vortex cpu with SVE - I was not sure about the exact capabilities of M2 at the time) - but I can drop this file from the PR if you prefer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The future cores might not have the same requirements so may be better to hold off?
Also, let me know if I can help with CI configuration, I have a history of missing things after all...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted, thanks. And sadly none of our current CI providers offers SVE-capable hardware as far as I know - my blunder should eventually have been caught by running tests locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
for #3995 - M2 cpuid was still not matched correctly, the workaround for a corner case in DNRM2 could be optimized out by the compiler, and most importantly several ARM64 assembly kernels were still using (half of) register 18 that is reserved on OSX (as the previous round of fixes had only matched on x18, not w18)