-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for LLVM Flang #4016
Conversation
Is it ok to change that as part of this PR (even if it's not related to LLVM Flang at all). Or do you prefer if I opened a different PR for that? |
Thanks - the new flang still looks like a moving target to me, can we be certain that it does not have the reentrancy issue of previous implementations (and gfortran) without a specific commandline option ? |
I agree. It still feels like a work in progress, see the limited support for OpenMP. I'm not sure if such a flag is needed. I tried to search the documentation of Flang and googled for information on that. But I wasn't successful in that search. Here is what ChatGPT has to say about that:
But I don't know how reliable that is. Fwiw, |
Well, I have seen too many examples of ChatGPT simply confabulating answers to trust it over a conventional search... |
I completely agree.
In the light of that, I wouldn't mind if you prefer to not merge this change just yet. Do you prefer to close this PR? Or leave it hanging around and maybe revisit for LLVM 17? |
Seems to be failing (all) the BLAS3 tests specifically. I have restarted one of the CI jobs just to see if it fails consistently. Perhaps "convert to draft" and leave it hanging around (unless having that PR branch around bothers you) ? |
It looks like it still failed at the same 4 tests. Not sure why it is passing for me locally.
For the CI, there are different runners. Most of them are Intel CPUs. E.g., in the last run:
CLANG64, int64:
Done. It doesn't bother me. I just need to remember not to force push that branch with something else. |
Zen3 is mostly equivalent to Haswell, the Xeons would appear to be SKYLAKEX target (AVX512) so indeed different GEMM kernels - but same code for the infrastructure bits like thread management. I am away from my SKX system for the weekend, but IIRC the Windows jobs in Azure CI alternate between Haswell and SkylakeX hosts so it may be possible to repeat the test there |
See here and here. Apparently, this is a Fortran 2008 feature; this means it'll probably still take a while... |
We could try to ifdef that based on advertised OpenMP support capability (if I did not do that already?) but given the other oddities it is probably not worth doing it now |
I added an additional step that will hopefully display more information if the tests failed. |
So some of them complain about a division by zero "somewhere", and all report at least one result that is "less than half accurate" (leading to the "Error" statement output by the cmake test helper script). At that point it might make sense to |
I added commands that display the SUMM files for those tests. See, e.g.: https://github.com/xianyi/OpenBLAS/actions/runs/4798792488/jobs/8537546152#step:13:141 (until that log is purged) |
For some reason, one of the LLVM Flang runners passed the last run. Used processor:
Compared to one of the ones that failed previously: Maybe, AVX2 or AVX-512? |
Quite likely - #4016 (comment) but then this would either mean the new flang is trying to use AVX512 instructions/registers or there is a latent bug in (all) the AVX512 GEMM kernels that somehow only shows up in this particular combination of compilers |
Looking for Like you already suspected, the debug build passed its tests even though it was running on a I don't know how to track this down any further without being able to reproduce this on a machine to which I've physical access... |
So probably some kind of over-optimization in AVX512-enabled builds (and cmake unhelpfully defaults to |
Enforcing I'm also seeing these warnings during compilation:
Could that be related? |
Not related the tests only encompass the standard BLAS functions, not extensions like GEMMT. (And I'm pretty sure the warning is harmless, just an extra set of parentheses and some variables that are initialized in an if-else where there are no other options for the condition. |
I added some changes that enforces |
Not sure if it is a good idea to move the enable_language to the toplevel CMakelists.txt (apart from it being outside the - announced - scope of the PR, there may have been a reason for putting it in a separate script) |
I was mostly interested to see whether the tests would pass in CI with that change. Most projects using Cmake include the steps to enable languages pretty early on. Would that be ok for you? |
The additional guard would not change that much - I'm questioning if it is wise to change known working code that every build uses, just to accomodate an unfinished, experimental and arguably broken compiler. |
I'd argue that it is similar to what was needed for the Having the option to use that compiler doesn't force people to use it. But imho it would be "kind" to the people that venture that way if they'd end up with something that kind of works. Moving that compiler check earlier should come with little to no risk. So, the only variable we need to be careful about is |
Staring at the .cmake files for longer, there might be an existing minor inconsistency if With the changes from here, the latter would happen in both cases. (Which feels more consistent tbh.) |
Interesting - but there is something wrong with ONLY_CBLAS in the CMAKE builds that I need to look into anyway. |
I had both HLFIRDialect.h.inc and HLFIRenums.h.inc missing in my build, but both should apparently be generated and my VM ran out of disk space once during the build, so it could still be a local problem. |
Issue created as llvm/llvm-project#64268 after reproducing the problem in a clean build. |
So with the build problem fixed, LLVM 17.0-rc1 builds, and also builds OpenBLAS - the earlier OpenMP problem is indeed gone. Interestingly, the Fortran compiler is still called |
closing&reopening to rerun CI |
I stripped the commits that I only did for testing (that didn't make a difference) and rebased the changes on a current head of the develop branch. The Flang compiler for MSYS2/CLANG64 is still at version 16.0.5. So, I don't expect any difference when it comes to the failing tests on hardware that uses AVX512 instructions. |
Added another commit that should be de-activating AVX512 only on the CI-runners that build with LLVM Flang 16 (currently). Maybe, that's less intrusive than integrating that logic into the build system itself. |
With that change to the build rules in the CI, the tests passed running on "Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz". This is ready to land imho. (Note to future me or anyone else coming back to this when the CI logs are gone: The tests on CLANG64 still failed without |
Thanks. I saw v16 of flang-new as more of a transient nuisance, given the other issue with openmp. Maybe I should reconsider this due to the likely inertia until everybody has upgraded to 17+ |
I don't have access to hardware with AVX-512 support (other than the CI here). So, I don't know how to test if Flang 17 would actually behave differently compared to Flang 16. But imho these changes would still be helpful for anyone trying to use LLVM Flang. |
I have built and tested with 17.0.0-rc1 on Xeon, though not specifically on Windows (idk if the OS is expected to make a difference in this regard) |
Thank you for merging this. |
Fwiw, I tried to rebuild with "CI: Build with NO_AVX512 for the runners that use Flang 16." reverted now that MSYS2 updated to Flang 17.
Log from these tests
|
Interesting, any idea what's in the ?BLAT3.SUMM files ? (Just to exclude problems with writing those files, or searching them for error messages) |
See, e.g.: https://github.com/mmuetzel/OpenBLAS/actions/runs/6301097094/job/17109175034 Content of test/SBLAT3.SUMM
Content of test/DBLAT3.SUMM
Content of test/CBLAT3.SUMM
Content of test/ZBLAT3.SUMM
|
Funny - it is claiming that the BLAS kernels are trashing input-only arguments, but more likely it is simply not reading the returned values correctly. Not sure what to do about that as it only happens in BLAS3, so unlikely to be a general C/Fortran ABI problem. And it is only the "clang64-int32" build that fails (or is that the only one using flang17 ?) |
Both CLANG64 runners should be affected. (There is no 32-bit Flang, and the MINGW* runners are using gfortran.) But it's a bit of a lottery whether a runner with AVX512 instructions is picking up the job. Edit: It fails very similarly: |
@mmuetzel, we're still running into the It seems to me from this PR that you researched this issue in more depths than I have, but I'm happy to raise the issue myself in case you're not in a position to do it. |
@h-vetinari: Feel free to raise that issue upstream. So, I'm not sure I could contribute anything more than you. |
LLVM Flang is now at a point where it can actually produce usable code.
This PR changes the cmake rules to use appropriate flags for that compiler.
Additionally, it installs LLVM Flang for the CLANG64 runners using MSYS2.
To continue testing CLAPACK in the CI, a new CLANG32 runner is added to the CI. It is highly likely that no Fortran compiler will be available for that build environment.
I've also noticed that none of the runners on GitHub is building with OpenMP. It's likely that that won't work with LLVM Flang in its current form.
But I haven't tested yet.Should some of the runners be switched to test OpenMP support?
Edit: Trying to compile with
-DUSE_OPENMP=ON
leads to the following error for me:Looks like that is hitting a current limitation of LLVM Flang.