Skip to content

Conversation

@martin-frbg
Copy link
Collaborator

@martin-frbg martin-frbg commented Apr 13, 2023

for #3995 - M2 cpuid was still not matched correctly, the workaround for a corner case in DNRM2 could be optimized out by the compiler, and most importantly several ARM64 assembly kernels were still using (half of) register 18 that is reserved on OSX (as the previous round of fixes had only matched on x18, not w18)

@martin-frbg martin-frbg changed the title Issue3995 Fix instabilities in CGEMM/CTRMM/DNRM2 on Apple M1/M2 under OSX Apr 13, 2023
@martin-frbg martin-frbg added this to the 0.3.24 milestone Apr 14, 2023
#define alpha w18
#define vec_len x19
#define alpha w19
#define vec_len x20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overlaps with the below register, and I don't believe this kernel is in scope of this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially we could run ARMV8SVE on one of the AWS nodes for testing this in CI?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I missed moving veclen2 to x21. And not exactly in scope, but same problem (once Apple releases a Vortex cpu with SVE - I was not sure about the exact capabilities of M2 at the time) - but I can drop this file from the PR if you prefer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The future cores might not have the same requirements so may be better to hold off?

Also, let me know if I can help with CI configuration, I have a history of missing things after all...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted, thanks. And sadly none of our current CI providers offers SVE-capable hardware as far as I know - my blunder should eventually have been caught by running tests locally.

Copy link
Contributor

@Mousius Mousius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants