Skip to content

Conversation

@junyuan-chen
Copy link
Contributor

@junyuan-chen junyuan-chen commented Aug 19, 2025

I am trying to build AOCL-BLAS (a customized version of BLIS from AMD).

Update: After some trial and errors on my local machine, it seems that the _64 suffix is finally attached to the function names in the generated output for Linux build. However, the Windows build fails with the patches for _64 suffix for reasons that's not obvious to me.

Update2: Ideally I would wish to have someone who understands libblastramponline to take a look at the way how the _64 suffix is hanlded as I cannot tell for sure whether my hack would work with libblastramponline.

Update: After some local tests, I believe that the patches for _64 suffix is unnecessary and a misunderstanding of what the libblastramponline README says. I followed what was done with blis_jll.jl earlier and suspect the _64_ suffix appended there was also unnecessary.

@junyuan-chen junyuan-chen changed the title [AOCL-BLAS] Add version 5.1.0 as an initial attempt [AOCL-BLAS] [HELP NEEDED] Add version 5.1.0 as an initial attempt Aug 19, 2025
@giordano giordano requested a review from imciner2 August 20, 2025 09:18
@gbaraldi
Copy link
Contributor

You can build this for every target, and maybe we should just so the package is downloadable everywhere (not sure about riscv but aarch64 for sure)

@imciner2
Copy link
Member

You can build this for every target, and maybe we should just so the package is downloadable everywhere (not sure about riscv but aarch64 for sure)

I believe the JLL package should still be downloadable/installable on all the platforms, but it just will return false from is_available if you try to actually use it on a platform that doesn't have any artifacts built for it. We could use that in a higher level package, like an AOCLBLAS.jl package that would setup the auto-forwarding in LBT, etc.

@imciner2
Copy link
Member

I think it would be better to build a folder structure of: A/AOCL/AOCL_BLAS, because we eventually will also get the LAPACK libraries, and others, so then they would all live under A/AOCL.

@junyuan-chen
Copy link
Contributor Author

I think it would be better to build a folder structure of: A/AOCL/AOCL_BLAS, because we eventually will also get the LAPACK libraries, and others, so then they would all live under A/AOCL.

That makes sense. I also wish to get AOCL_LAPACK work but that will require AOCL_BLAS as a dependency so started with the BLAS first. Let me rename the outer folder to AOCL.

@junyuan-chen
Copy link
Contributor Author

@giordano Somehow buildkite is not rebuilding the artifacts? I guess I am otherwise done with the recipe unless new issue is captured.

@junyuan-chen junyuan-chen changed the title [AOCL-BLAS] [HELP NEEDED] Add version 5.1.0 as an initial attempt [AOCL-BLAS] Add version 5.1.0 as an initial attempt Aug 20, 2025
@giordano
Copy link
Member

#11896 (comment). See https://github.com/JuliaPackaging/Yggdrasil/blob/78cffa5f0f85ce86a7cc72512162df210646e1b7/CONTRIBUTING.md#understanding-build-cache-on-yggdrasil for the explanation. Basically if you edit only the common.jl file, you also need to edit the build_tarballs.jl (e.g. by increasing the build trigger) to actually rebuild.

@junyuan-chen
Copy link
Contributor Author

You can build this for every target, and maybe we should just so the package is downloadable everywhere (not sure about riscv but aarch64 for sure)

I specified the "amdzen" option to build the AMD customized version of BLIS. From what I understood, the other configuration options are just directly coming from the original BLIS. Hence, I am leaving these non-AMD options to blis_jll, assuming that nobody would bother to get the AMD-Zen version to run on Macs.

@junyuan-chen
Copy link
Contributor Author

#11896 (comment). See https://github.com/JuliaPackaging/Yggdrasil/blob/78cffa5f0f85ce86a7cc72512162df210646e1b7/CONTRIBUTING.md#understanding-build-cache-on-yggdrasil for the explanation. Basically if you edit only the common.jl file, you also need to edit the build_tarballs.jl (e.g. by increasing the build trigger) to actually rebuild.

@giordano Thank you! I wasn't aware that it's already explained there.

Would you also think it's fine to set preferred_gcc_version=v"14" as how it is now? This v"14" requirement seems to be vary rarely used. But, if I didn't misunderstand it, I thought the Zen 5 configuration (part of the amdzen specification currently used) would require GCC v14? (The AMD user guide only asks for GCC 12.2. But I see in the configure folder they have ifelse branching that seems to require GCC 14 for Zen 5.) Does it have any implication on the value for julia_compat?

@imciner2
Copy link
Member

I specified the "amdzen" option to build the AMD customized version of BLIS. From what I understood, the other configuration options are just directly coming from the original BLIS. Hence, I am leaving these non-AMD options to blis_jll, assuming that nobody would bother to get the AMD-Zen version to run on Macs.

In my opinion, it is perfectly fine to not build the other platforms here, and as I said in my previous comment, that won't cause any problems for users.

Would you also think it's fine to set preferred_gcc_version=v"14" as how it is now? This v"14" requirement seems to be vary rarely used. But, if I didn't misunderstand it, I thought the Zen 5 configuration (part of the amdzen specification currently used) would require GCC v14? (The AMD user guide only asks for GCC 12.2. But I see in the configure folder they have ifelse branching that seems to require GCC 14 for Zen 5.) Does it have any implication on the value for julia_compat?

The preferred_gcc_version won't affect the julia_compat at all, but it will cause it to link against a newer libstdc++, which would limit compatibility with older operating systems. It's hard to say exactly how much that would be a problem, though. While we generally prefer to compile with the oldest possible compiler version to improve that compatibility, sometimes it can be fine to use newer ones.

Looking at the logic here: https://github.com/amd/blis/blob/16f852a065e76e824d77bc39e2baa82ac19ed419/config/zen5/make_defs.mk#L80, there is some fallback logic to still try and compile the Zen5 kernels on older compilers, but it is just that GCC 14 adds the native support for znver5. I'm not sure how much performance for the Zen4/5 kernels we would be missing if we dropped back to say GCC 11 (which is znver3), but since they are assembly kernels probably, I don't think we would lose very much?

@junyuan-chen junyuan-chen changed the title [AOCL-BLAS] Add version 5.1.0 as an initial attempt [AOCL-BLAS] [Help Needed] Add version 5.1.0 as an initial attempt Aug 20, 2025
@junyuan-chen
Copy link
Contributor Author

@imciner2 Thank you for the explanations! I just witched to preferred_gcc_version=v"12", which is still quite new but hopefully "old enough".

The remaining issue would mostly be about getting the Windows version works. After patching the _64 suffix, the linking process fails for Windows. I am not familiar with how that should be fixed and hence will leave it here and hopefully someone would have some insights on what to do with that.

@imciner2
Copy link
Member

Update: After some trial and errors on my local machine, it seems that the _64 suffix is finally attached to the function names in the generated output for Linux build. However, the Windows build fails with the patches for _64 suffix for reasons that's not obvious to me.

So, I think we should be putting the trailing underscore on all of these for the Linux builds. That forms part of the Fortran calling convention technically (Linux compilers generally append an _ at the end of Fortran symbols, and Windows compilers do not). That is slightly complicated then because we probably shouldn't need that on Windows.

I'll have to dive into their build system a bit more tomorrow to make sense of what they are doing, because they have some logic already in there for doing this (at least in the CMake build system), but it doesn't fully work.

@junyuan-chen junyuan-chen changed the title [AOCL-BLAS] [Help Needed] Add version 5.1.0 as an initial attempt [AOCL-BLAS] Add version 5.1.0 as an initial attempt Aug 22, 2025
@junyuan-chen
Copy link
Contributor Author

junyuan-chen commented Aug 22, 2025

@giordano @imciner2 I think this recipe is ready for final review (at least from my aspect things are done).

  • There is no need to add _64_ suffix in BLAS at all. That was a misunderstanding.
  • I have tried the Linux IPL64 version locally on my own machines. The Julia tests for BLAS succeeded for Float32/Float64 but there are a few errors for ComplexF32/ComplexF64. I think that's just how it is. The patched get_num_threads and set_num_threads both work. So, from my side, things seem to be working fine now.

@junyuan-chen
Copy link
Contributor Author

@giordano @imciner2 The recipe should be ready now. Could you please take a look for the merge? Thank you!

@junyuan-chen
Copy link
Contributor Author

@imciner2 Sorry to bother you again. But is there any chance that this PR can be merged soon?

@junyuan-chen
Copy link
Contributor Author

Any chance for the PR to get merged so that things can move on with the next AOCL components?

@giordano
Copy link
Member

giordano commented Sep 1, 2025

I believe @imciner2 wanted to have a chance to have another look, maybe didn't have the time for it yet (but a gentle bump is good).

@junyuan-chen
Copy link
Contributor Author

@giordano Thank you for the reply. Just wanted to make sure this PR wasn't forgotten, as people suddenly became quite (even though I thought things are already working).

@imciner2
Copy link
Member

imciner2 commented Sep 2, 2025

Yea, @junyuan-chen I need to think a bit more about the library naming and symbol naming. I am actually starting to think we shouldn't be renaming the actual 32-bit library (we don't do that for OpenBLAS actually, so I think it is just a vestigial thing that BLIS has kept), and I want to do some experiments to see if we can correctly load both the 64-bit and 32-bit symbols into LBT correctly.

@junyuan-chen
Copy link
Contributor Author

Yea, @junyuan-chen I need to think a bit more about the library naming and symbol naming. I am actually starting to think we shouldn't be renaming the actual 32-bit library (we don't do that for OpenBLAS actually, so I think it is just a vestigial thing that BLIS has kept), and I want to do some experiments to see if we can correctly load both the 64-bit and 32-bit symbols into LBT correctly.

Sounds good. In case it's relevant, AOCL-LAPACK may call BLAS with special internal API (introduced by AMD instead of BLIS) when it is built with AOCL-BLAS instead of standard BLAS symbols. So, this seems to suggest that libblastrampoline is relevant in use cases outside of AOCL-LAPACK. But when using AOCL-LAPACK, the internal symbals from AOCL-BLAS still need to work. I imagine this could make things more complicated if we want to keep both the 64-bit and 32-bit symbols together in one package.

@ViralBShah
Copy link
Member

What do we need to do to get this merged?

@giordano
Copy link
Member

I think we want to go with https://github.com/amd/AOCL_jll.jl instead.

@junyuan-chen
Copy link
Contributor Author

@ViralBShah Just a short while after this PR, both AOCL_jll.jl and AOCL.jl become available directly from AMD for Linux with ILP64 interface, covering the BLAS and LAPACK functions. They are not registered with Julia General registry though. Considering that someone from AMD is now aware of the need of such packages for Julia, I guess this PR would serve as a reference point without being merged.

@ViralBShah ViralBShah marked this pull request as draft October 15, 2025 20:19
@ViralBShah
Copy link
Member

ViralBShah commented Nov 5, 2025

I think it is perhaps better to see this one through and have a source build - since the other one is repackaging precompiled binaries. The big thing that would need to be confirmed is that these built from source binaries have the same performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants