New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restore the number of threads for OpenBLAS to what it used to be on 1.6.4 and 1.7. #3996
Comments
Maintainability is also a factor though: we have to keep something like 6 (or 9 if including the HighCoreCount variant) builds at the same time, having each of them with their own script is error-prone. LLVM is even worse, keeping each version with their own independent build script (as we used to do before) and remember to propagate all changes everywehre is really hard. That said, introducing new features without changing the current default behaviour would be a sensible approach. |
I guess the common file could be there but not have any default arguments (so that each build specifies that fully). Although, that doesn't protect against changes in the actual script, it avoids the case where default input arguments are changed at least. |
To be clear, the following is not intended to assign "blame" to anyone, just to reflect on what happened: I think the crucial mistake was to modify the default argument value (and to accept it past review); I am not sure why that was done instead of just actually passing alternate values? After all, it exists precisely to allow overriding it. Of course mistakes happen: I can absolutely see myself both of making that mistake, and also overlooking it in a review. So, what can we do to reduce the risk of mistakes? Perhaps we can somehow detect the situation were a |
Having a |
It is also likely the case that the openblas version we have on 1.8-dev is newer enough and better at dealing with more threads than the versions of openblas that we ship with 1.6 and 1.7. In any case, we should add a test to linear algebra to test |
As a "regression test" on 1.8 the number of threads on OpenLAS was experimentally bumped to a large number (JuliaLang/julia#42584):
However, this thread bump has managed to sneak into both the LTS and the latest 1.7 release(!). This has caused multiple regressions. Looking at #3732, the reason seems to be that there is a shared file
OpenBLAS/common.jl
that determines a bunch of settings for ALL builds and the thread bump then caused every successive OpenBLAS builds (even those with bug fixes for old OpenBLAS versions), to use this experimental high thread count.This needs to be put back to the original thread count, new OpenBLAS releases need to be built and we need to release 1.7.1 and 1.6.5 with the fix.
As a note, I think the presence of this
common.jl
file is quite dangerous for reproducibility reasons. Imo, each build should be more or less contained and there should not be stuff like default arguments that are updated for every build.cc @ViralBShah
The text was updated successfully, but these errors were encountered: