-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Regression in DGESDD / DLASCL on macOS Vortex M1 #3654
Comments
Sadly I cannot test on macOS/M1 with the gccfarm machine unavailable. I see no obvious error in the parameters, and all usual BLAS and LAPACK tests pass on the Linux/M1 machine there. (Getting numpy installed there may be a challenge). |
Adapted Fortran test code from #348 also runs fine on Linux/M1 |
@martin-frbg could you paste the file/commands here so a Fortran newbie (me) could compile + run a test locally? |
(Also, this might only be a bug with the 32-bit interface, if that matters -- TBD) |
gesdd.f.txt |
@martin-frbg indeed I have no problem running that code snippet on |
Oh I'm easily upset, especially by someone with an actual math degree. Just in this case there is not much to go on and no way to reproduce this right now. (Guess I could try running the Fortran testcase on the neoverse in Travis CI though) |
That was a long time ago, I doubt it's helping here :) One other interesting point is that there is no problem on M1 when using https://en.wikipedia.org/wiki/Comparison_of_ARMv8-A_processors This idea was brought up by @hoechenberger in numpy/numpy#21756 (comment) |
VORTEX is the name we picked for the Apple-developed arm64 cpus, but there are no actual M1-specific BLAS kernels in OpenBLAS so far anyway, that TARGET primarily adds the compiler flags for using the 8.3 instruction set. ARMV8 on the other hand is the most basic arm64 target,its assembly kernels do not require more than the 8.0 instruction set (or what e.g. a Cortex A57 from ten years ago has to offer). And before #3399, the kernel selection for VORTEX was an exact |
@larsoner it would help if you troubleshoot these ways:
Not having hardware or even QEMU equivalent here - all hope on you ;-) |
@brada4
after the |
If you can point me to a tutorial or step-by-step for how to do this I'm happy to. But I'll start with trying to follow @martin-frbg 's suggestion since I think it covers what you suggest as well ...
Okay so digging into FAILS:
SUCCEEDS (n.b. the same as
SUCCEEDS:
FAILS (makes sense because this is a double computation)
SUCCEEDS:
So it seems to be the I'm happy to debug further if someone can give me some instructions -- I'm barely a C programmer nowadays so I'm hopeless in assembly :(
So another way to go here is to make it clear what code path NumPy is using that hits the bug. But before that, @martin-frbg I took another look at the pure-fortran example and noticed what I think is an error -- the second call was to SGESDD rather than DGESDD. When I change it to use DGESDD, I can replicate the error:
@martin-frbg maybe you can replicate locally now, too? Here is the modified file (but the diff is just the one line changed): |
Oh sh.... sorry for that. Reproducible now on M1/Linux. Slightly improved solution is to set the DNRM2 (and SNRM2) kernel to "nrm2.S" (as used by most arm64 targets - and corresponding znrm2.S for CNRM2/ZNRM2 ). ThunderX2 kernel must be taking a shortcut or missing an initilalization somewhere that does not show up in the regular BLAS/LAPACK tests. Maybe even something introduced by its most recent fix to handle Inf in input. |
So @martin-frbg for now do you suggest that we put in this little workaround? It looks like you're already working on something in #3657 so happy to let you move forward however you think is best... |
Using the nrm2.S kernels will definitely fix the bug and should be more performant than the plain-C "generic" kernels. (Though a fixed thunderx2t99-style kernel should give best performance - but I am still horribly out of my depth with ARM assembly) |
Okay so maybe until someone can fix the thunderx2t99 code to work on M1 / VORTEX we should just switch. +1 for putting this change into #3657, especially if your test case in that PR does turn things red on CIs so we can see it switch to green! |
I commented this over in #3657 (comment) , but just to have all the testing in one place, this KERNEL.VORTEX also appears to be okay, as suggested by @martin-frbg :
|
On 8c20ca3 using OpenBLAS to build and compile SciPy I get problems when I try to take use DGESDD (or DGESVD) to take the SVD of a matrix of ones of shape
(41, 50)
:On the previous commit 8e4c209, things seem to be okay (SVD decomp matches that on x86_64 Linux). Same thing with being on latest
develop
/ 7060ca5 and doinggit revert 8c20ca3
to revert just one of the two commits (8c20ca3) from #3399 things are okay.@martin-frbg looks like #3399 this was your PR, can you replicate?
cross-ref numpy/numpy#21756
The text was updated successfully, but these errors were encountered: