Skip to content

Conversation

@mayeut
Copy link
Contributor

@mayeut mayeut commented Nov 22, 2025

  • I updated the package version in pyproject.toml and made sure the first 3 numbers match git describe --tags --abbrev=8 in OpenBLAS at the OPENBLAS_COMMIT. If I did not update OPENBLAS_COMMIT, I incremented the wheel build number (i.e. 0.3.29.0.0 to 0.3.29.0.1)

Builds on top of #230

The use of clang instead of gcc allows:

  • to get a very recent compiler that supports recent SIMD extensions without having to wait for a new gcc-toolset or update the manylinux base image.
  • to get faster builds when using QEMU

The clang install script might end-up included in manylinux images (see pypa/manylinux#1871) and has been copied directly from https://github.com/scikit-build/ninja-python-distributions/blob/master/scripts/install-static-clang.sh for now.

@mayeut
Copy link
Contributor Author

mayeut commented Nov 23, 2025

I though the fork tests hang would disappear after an OpenBlas update (looked like the issue mentioned in #229) but there are still random deadlocks in the fork test under QEMU (wether it's a QEMU one or just the fact that running QEMU increases the chance of an existing race condition to happen is yet to be determined).

It seems that aarch64 runners are much faster than x86_64 (for this workload) with QEMU builds going down from 1 hour to 40 minutes.

@mattip
Copy link
Collaborator

mattip commented Nov 23, 2025

I though the fork tests hang would disappear

One of the ppc64le runs succeeds, the other fails. The failed run prints

2025-11-23T08:09:19.4979912Z TEST 122/127 zgemv:2_0_nan_1_inf_1_incy_2 [OK]
2025-11-23T08:09:19.5031453Z TEST 123/127 potrf:bug_695 [OK]
2025-11-23T08:09:19.5115065Z TEST 124/127 potrf:smoketest_trivial [OK]
2025-11-23T08:09:19.7959236Z TEST 125/127 kernel_regress:skx_avx [OK]
2025-11-23T08:35:46.7868483Z ##[error]The action has timed out.

The successful run prints

2025-11-23T07:58:44.4121838Z TEST 122/127 zgemv:2_0_nan_1_inf_1_incy_2 [OK]
2025-11-23T07:58:44.4172979Z TEST 123/127 potrf:bug_695 [OK]
2025-11-23T07:58:44.4255760Z TEST 124/127 potrf:smoketest_trivial [OK]
2025-11-23T07:58:44.7111789Z TEST 125/127 kernel_regress:skx_avx [OK]
2025-11-23T07:59:10.0551932Z TEST 126/127 fork:safety [OK]
2025-11-23T07:59:10.0740217Z TEST 127/127 fork:safety_after_fork_in_parent [OK]

which suggests the problem is in fork:safety.

The test itself is the one from the scipy issue which is also the test in #229. I will try to debug it in a qemu docker container.

@mattip
Copy link
Collaborator

mattip commented Nov 23, 2025

Another problem: It seems this compiled shared object from the wheels-macos-latest-arm64-1-macosx- artifact suffers from the same segfault from issue #233 when testing the zladiv interface. Did something change in the way gfortran exports functions?

@mayeut
Copy link
Contributor Author

mayeut commented Nov 23, 2025

It seems this compiled shared object from the wheels-macos-latest-arm64-1-macosx- artifact suffers from the same segfault from issue #233 when testing the zladiv interface. Did something change in the way gfortran exports functions?

This PR does not touch the macOS build except for the OpenBLAS update which only has a limited diff compared to what's in main, the only thing related to fortran is OpenMathLib/OpenBLAS#5540 which seems right. Does main passes (or the current nightly build which uses the latest develop) ?

As a side note, this PR still uses gfortran on Linux.

@mattip
Copy link
Collaborator

mattip commented Nov 23, 2025

I will try to debug it in a qemu docker container.

It is a little convoluted to reproduce the cibuildwheel build since it uses build isolation, so maybe I am not 1:1 accurate but:

When I run the make command without QUIET_MAKE=1, I see it is using cc as the C compiler. Even when setting PATH=/opt/clang/bin:$PATH, cc is /opt/rh/gcc-toolset-14/root/usr/bin/cc.

@mayeut
Copy link
Contributor Author

mayeut commented Nov 23, 2025

I see it is using cc as the C compiler

CC, CXX & LDFLAGS are overriden by cibuildwheel at the end of pyproject.toml, maybe not the best way to do this for openblas-libs given how the install script is called (but it allows for easy overriding in pyproject.toml if needed).

@mattip
Copy link
Collaborator

mattip commented Nov 23, 2025

Maybe we could patch the Makefile to print out the compiler locations and versions just to be sure we are using the right ones

@mayeut
Copy link
Contributor Author

mayeut commented Nov 23, 2025

Maybe we could patch the Makefile to print out the compiler locations and versions just to be sure we are using the right ones

It's done at the end - once the build succeeds - that's how I found out gfortran was not found on macOS arm64, we might want to ask for this to also be printed early on.

@mattip
Copy link
Collaborator

mattip commented Nov 23, 2025

It's done at the end - once the build succeeds

+1, thanks

I reproduced the build locally on a x86_64 vm host and ran the test 100 times. It doesn't segfault.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants