The BLAS/LAPACK surgery#229
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The BLAS/LAPACK surgery
Here's the situation: until now, MYSTRAN had, within its source tree, somewhat-modified versions of the reference (Netlib) BLAS and LAPACK implementations. This caused some issues:
So... here are the steps I've taken. I've called this whole effect the "BLAS/LAPACK surgery" due to the need to extricate something very intrisic.
Phase 1: map out the incompatibilities.
Merely deleting the BLAS and LAPACK folders from the source tree and linking against, say OpenBLAS, wasn't enough.
We had modded functions with different signatures for convenience/logging, and some inconsequential changes to the math.
After mapping out these mods, I was ready to begin the "surgery".
Phase 2: update imports and move mods out.
Whereas BLAS functions had no actual math modifications, some LAPACK ones did. These mods were moved to the
MYSTRAN_LAPACK_EXTmodule.Also, LAPACK imports were all changed to take into account the functions all coming from "outside" of the current source tree.
I also had to restore some
EXTERNALdeclarations in the embedded ARPACK module that were commented out. That was breaking buckling problems. Major props to @victorkemp for helping me figure out what was going on!Phase 3: move submodules and modify the build script.
We got too many submodules. This was true before I even added the Netlib
lapacksubmodule (also includes BLAS). I moved all submodules (and patches directories) to thesubmodulesdir.Once that was done, the build script now has the
MYSTRAN_BLAS_LAPACKinput variable, which can beEMBEDDED(builds the Netlib reference implementations from the submodules),SYSTEM(locates a system-provided BLAS/LAPACK implementation, like OpenBLAS, Intel MKL, Apple Accelerate, APL, etc.), orAUTO. The otherenable_internal_blaslibvariable is now deprecated.Since we'll always link against some BLAS, that means SuperLU will never ever need to link against its CBLAS. Therefore,
f2ccan be fully removed from the build script, it is no longer a dependency.Also, I added the licenses for the most common system BLAS implementations I could find, so they can be statically linked against without violating the licenses' "must include a copy of this copyright notice" clause. The missing ones, like APL and Apple Accelerate, are the ones I couldn't find honest-to-god license TXTs for. They say TXTs are included in the local installation, so I call upon Arm and Apple uses to add the licenses. If in doubt, ping me on Discord, and I'll handle it myself. All I'll need is the license text file.
Bonus: Handle OpenBLAS threads
I also added a
SET_BLAS_THREADSsubroutine that can call the OpenBLAS function to force the threads to be 1. If we find out that other BLAS providers also perform better with different numbers of threads, we can add more macros to that file so we have better control of the number of threads. For now, the only one I could add support for was OpenBLAS.Okay, so all that's done. The end-result is: MYSTRAN now only uses fully-standard BLAS and LAPACK calls, allowing us to link against any BLAS/LAPACK-providing library, such as Apple Accelerate, Intel MKL, Arm Performance Libraries, etc. This also prevents future issues in case we ever decide to link against other libraries.
And, of course, when we link against system-provided BLAS, we get better performance. That's very much expected.