Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

{lang}[iimkl/2023a] SciPy-bundle v2023.07 #18875

Merged

Conversation

schiotz
Copy link
Contributor

@schiotz schiotz commented Sep 27, 2023

(created using eb --new-pr)

@schiotz schiotz changed the title {lang}[iimkl/2023a] SciPy-bundle v2023.07 WIP: {lang}[iimkl/2023a] SciPy-bundle v2023.07 Sep 27, 2023
@schiotz
Copy link
Contributor Author

schiotz commented Sep 27, 2023

I am trying to make a SciPy-bundle for the newest iimkl toolchain. I have copied the differences between the foss and intel versions from previous bundles (mainly just a single patch). However, it segfaults in the numpy self-test. It is a new test f2py/tests/test_value_attrspec.py that was not present in earlier versions of numpy. It appears to be testing that functions can be written in FORTRAN and then used by numpy, so the problem could be related to how to link FORTRAN to Python, I know that is a thorny issue, but my FORTRAN expertise ends there...

I am tempted to patch out the test and hope for the best, but I suspect somebody with more expertise than me could perhaps debug and fix it.

The test looks like this:

import os
import pytest

from . import util

class TestValueAttr(util.F2PyTest):
    sources = [util.getpath("tests", "src", "value_attrspec", "gh21665.f90")]

    # gh-21665
    def test_long_long_map(self):
        inp = 2
        out = self.module.fortfuncs.square(inp)
        exp_out = 4
        assert out == exp_out

and the fortfunc sub-module is also very short (in lib/python3.11/site-packages/numpy/f2py/tests/src/value_attrspec/gh21665.f90):

module fortfuncs
  implicit none
contains
  subroutine square(x,y)
    integer, intent(in), value :: x
    integer, intent(out) :: y
    y = x*x
  end subroutine square
end module fortfuncs

@schiotz
Copy link
Contributor Author

schiotz commented Sep 27, 2023

OK, patching out the segfaulting test leads to a bunch of other FORTRAN errors. It looks like something is wrong in how to interface FORTRAN and Python with the Intel compilers. I have no idea how to proceed.

The errors are of this kind:

E                                  INFO: compiling Fortran sources
E                                  INFO: Fortran f77 compiler: ifort -FI -fPIC -O1 -xHost -ftz -fp-speculation=safe -fp-model precise -fPIC -fp-model strict -O1 -assume minus0 -qopenmp
E                                  Fortran f90 compiler: ifort -FR -O1 -xHost -ftz -fp-speculation=safe -fp-model precise -fPIC -fPIC -O1 -xHost -ftz -fp-speculation=safe -fp-model precise -fPIC -fp-model strict -O1 -assume minus0 -qopenmp
E                                  Fortran fix compiler: ifort -FI -O1 -xHost -ftz -fp-speculation=safe -fp-model precise -fPIC -fPIC -O1 -xHost -ftz -fp-speculation=safe -fp-model precise -fPIC -fp-model strict -O1 -assume minus0 -qopenmp
E                                  creating /tmp/eb-24h027op/tmpotvsz2qn/tmp/eb-24h027op/tmpyjwyqcsz
E                                  INFO: compile options: '-I/tmp/eb-24h027op/tmpotvsz2qn/src.linux-x86_64-3.11 -I/tmp/eb-24h027op/tmpkdae_ay6/lib/python3.11/site-packages/numpy/core/include -I/home/niflheim/schiotz/easybuild_2023a/icelake/software/Python/3.11.3-GCCcore-12.3.0/include/python3.11 -c'
E                                  INFO: ifort:f90: /tmp/eb-24h027op/tmpyjwyqcsz/gh17797.f90
E                                  ifort: command line warning #10121: overriding '-xHost' with '-xHost'
E                                  ifort: command line warning #10121: overriding '-fp-model precise' with '-fp-model strict'
E                                  /tmp/eb-24h027op/tmpyjwyqcsz/gh17797.f90: warning #5425: Qualifier 'fp-speculation' conflicts with floating-point mode and will be ignored
E                                  INFO: ifort:f90: /tmp/eb-24h027op/tmpotvsz2qn/src.linux-x86_64-3.11/_test_callback_TestF90Callback_ext_module-f2pywrappers2.f90
E                                  ifort: command line warning #10121: overriding '-xHost' with '-xHost'
E                                  ifort: command line warning #10121: overriding '-fp-model precise' with '-fp-model strict'
E                                  /tmp/eb-24h027op/tmpotvsz2qn/src.linux-x86_64-3.11/_test_callback_TestF90Callback_ext_module-f2pywrappers2.f90: warning #5425: Qualifier 'fp-speculation' conflicts with floating-point mode and will be ignored
E                                  /tmp/eb-24h027op/tmpotvsz2qn/src.linux-x86_64-3.11/_test_callback_TestF90Callback_ext_module-f2pywrappers2.f90(12): error #6428: This name has already been used as a dummy procedure name.   [F]
E                                            external f
E                                  -------------------^
E                                  /tmp/eb-24h027op/tmpotvsz2qn/src.linux-x86_64-3.11/_test_callback_TestF90Callback_ext_module-f2pywrappers2.f90(23): error #6633: The type of the actual argument differs from the type of the dummy argument.   [F]
E                                        gh17797f2pywrap = gh17797(f, y)
E                                  --------------------------------^
E                                  compilation aborted for /tmp/eb-24h027op/tmpotvsz2qn/src.linux-x86_64-3.11/_test_callback_TestF90Callback_ext_module-f2pywrappers2.f90 (code 1)

and the code it tries to compile is this autogenerated code:

!     -*- f90 -*-
!     This file is autogenerated with f2py (version:1.25.1)
!     It contains Fortran 90 wrappers to fortran functions.

      subroutine f2pywrapgh17797 (gh17797f2pywrap, f, y, f2py_y_d0)
      external f
      integer f2py_y_d0
      integer(kind=8) y(f2py_y_d0)
      integer(kind=8) gh17797f2pywrap
      interface
      function gh17797(f,y) result (r) 
          external f
          integer(kind=8), dimension(:) :: y
          integer(kind=8) :: r
          interface  
              function f(e_0_e) result (r) 
                  integer :: e_0_e
                  integer(kind=8) :: r
              end function f
          end interface 
      end function gh17797
      end interface
      gh17797f2pywrap = gh17797(f, y)
      end

@Micket
Copy link
Contributor

Micket commented Sep 27, 2023

numpy/numpy#20157 At least this error message seems like it isn't a new thing. @akesandgren how have we circumvented this issue with scipy bundle so far?

@schiotz
Copy link
Contributor Author

schiotz commented Sep 27, 2023

Now there is a C compile error. It looks like a function has changed signature from accepting an int* to accepting a long* and that has broken some other code. It is one of those things that it is scary to change without understanding it, but I'll dig into it a bit more...

The error is this:

E                                  creating /tmp/eb-fl3ghl7s/tmptcqqxl_m/tmp/eb-fl3ghl7s/tmptcqqxl_m/src.linux-x86_64-3.11
E                                  INFO: compile options: '-DNPY_DISABLE_OPTIMIZATION=1 -I/tmp/eb-fl3ghl7s/tmptcqqxl_m/src.linux-x86_64-3.11 -I/tmp/eb-fl3ghl7s/tmp2dxr_uw5/lib/python3.11/site-packages/numpy/core/include -I/home/niflheim/schiotz/easybuild_2023a/icelake/software/Python/3.11.3-GCCcore-12.3.0/include/python3.11 -c'
E                                  INFO: icx: /tmp/eb-fl3ghl7s/tmptcqqxl_m/src.linux-x86_64-3.11/_test_module_doc_TestModuleDocString_ext_modulemodule.c
E                                  INFO: icx: /tmp/eb-fl3ghl7s/tmptcqqxl_m/src.linux-x86_64-3.11/fortranobject.c
E                                  icxicx: : warning: warning: overriding '-march=native' option with '-x Host' [-Woverriding-t-option]overriding '-march=native' option with '-x Host' [-Woverriding-t-option]
E                                  
E                                  /tmp/eb-fl3ghl7s/tmptcqqxl_m/src.linux-x86_64-3.11/_test_module_doc_TestModuleDocString_ext_modulemodule.c:192:31: error: incompatible function pointer types assigning to 'f2py_init_func' (aka 'void (*)(int *, long *, void (*)(char *, long *), int *)') from 'void (*)(int *, int *, void (*)(char *, int *), int *)' [-Wincompatible-function-pointer-types]
E                                    f2py_mod_def[i_f2py++].func = b;
E                                                                ^ ~
E                                  1 error generated.
E                                  error: Command "icx -DNDEBUG -g -fwrapv -O3 -Wall -O2 -ftree-vectorize -march=native -fno-math-errno -fPIC -O2 -ftree-vectorize -march=native -fno-math-errno -fPIC -O1 -xHost -ftz -fp-speculation=safe -fp-model precise -fPIC -I/home/niflheim/schiotz/easybuild_2023a/icelake/software/imkl/2023.1.0/mkl/2023.1.0/include -fPIC -DNPY_DISABLE_OPTIMIZATION=1 -I/tmp/eb-fl3ghl7s/tmptcqqxl_m/src.linux-x86_64-3.11 -I/tmp/eb-fl3ghl7s/tmp2dxr_uw5/lib/python3.11/site-packages/numpy/core/include -I/home/niflheim/schiotz/easybuild_2023a/icelake/software/Python/3.11.3-GCCcore-12.3.0/include/python3.11 -c /tmp/eb-fl3ghl7s/tmptcqqxl_m/src.linux-x86_64-3.11/_test_module_doc_TestModuleDocString_ext_modulemodule.c -o /tmp/eb-fl3ghl7s/tmptcqqxl_m/tmp/eb-fl3ghl7s/tmptcqqxl_m/src.linux-x86_64-3.11/_test_module_doc_TestModuleDocString_ext_modulemodule.o -MMD -MF /tmp/eb-fl3ghl7s/tmptcqqxl_m/tmp/eb-fl3ghl7s/tmptcqqxl_m/src.linux-x86_64-3.11/_test_module_doc_TestModuleDocString_ext_modulemodule.o.d" failed with exit status 1

and the relevant function in /tmp/eb-fl3ghl7s/tmptcqqxl_m/src.linux-x86_64-3.11/_test_module_doc_TestModuleDocString_ext_modulemodule.c is this:

static void f2py_setup_mod(char *i,char *x,char *a,void (*b)(int*,int*,void(*)(char*,int*),int*),char *foo) {
  int i_f2py=0;
  f2py_mod_def[i_f2py++].data = i;
  f2py_mod_def[i_f2py++].data = x;
  f2py_mod_def[i_f2py++].data = a;
  f2py_mod_def[i_f2py++].func = b;
  f2py_mod_def[i_f2py++].data = foo;
}

The error is when b is assigned to f2py_mod_def[i_f2py++].func.

@schiotz
Copy link
Contributor Author

schiotz commented Sep 27, 2023

It looks like the second pointer should be npy_intp *. But it is autogenerated code, so hard to trace where it comes from. It seems to be related to this definition on line 54 in fortranobject.h:

typedef void (*f2py_init_func)(int *, npy_intp *, f2py_set_data_func, int *);

In any case, there are two other tests that fail, I suspect that has something to do with FORTRAN data types as well:

========================================================================== FAILURES ==========================================================================
_____________________________________________________________________ TestKind.test_int ______________________________________________________________________

self = <numpy.f2py.tests.test_kind.TestKind object at 0x7ff9f9dd9b90>

    def test_int(self):
        """Test `int` kind_func for integers up to 10**40."""
        selectedintkind = self.module.selectedintkind
    
        for i in range(40):
>           assert selectedintkind(i) == selected_int_kind(
                i
            ), f"selectedintkind({i}): expected {selected_int_kind(i)!r} but got {selectedintkind(i)!r}"
E           AssertionError: selectedintkind(19): expected 16 but got -1
E           assert -1 == 16
E            +  where -1 = <fortran function selectedintkind>(19)
E            +  and   16 = selected_int_kind(19)

i          = 19
selectedintkind = <fortran function selectedintkind>
self       = <numpy.f2py.tests.test_kind.TestKind object at 0x7ff9f9dd9b90>

/tmp/eb-fl3ghl7s/tmp2dxr_uw5/lib/python3.11/site-packages/numpy/f2py/tests/test_kind.py:20: AssertionError
_____________________________________________________________________ TestKind.test_real _____________________________________________________________________

self = <numpy.f2py.tests.test_kind.TestKind object at 0x7ff9f9dda510>

    def test_real(self):
        """
        Test (processor-dependent) `real` kind_func for real numbers
        of up to 31 digits precision (extended/quadruple).
        """
        selectedrealkind = self.module.selectedrealkind
    
        for i in range(32):
>           assert selectedrealkind(i) == selected_real_kind(
                i
            ), f"selectedrealkind({i}): expected {selected_real_kind(i)!r} but got {selectedrealkind(i)!r}"
E           AssertionError: selectedrealkind(16): expected 10 but got 16
E           assert 16 == 10
E            +  where 16 = <fortran function selectedrealkind>(16)
E            +  and   10 = selected_real_kind(16)

i          = 16
selectedrealkind = <fortran function selectedrealkind>
self       = <numpy.f2py.tests.test_kind.TestKind object at 0x7ff9f9dda510>

/tmp/eb-fl3ghl7s/tmp2dxr_uw5/lib/python3.11/site-packages/numpy/f2py/tests/test_kind.py:32: AssertionError

I must admit that I do not have the expertise to take this further.

@boegel boegel added the update label Sep 27, 2023
@boegel boegel added this to the 4.x milestone Sep 27, 2023
@branfosj branfosj marked this pull request as draft October 3, 2023 15:28
@schiotz
Copy link
Contributor Author

schiotz commented Oct 23, 2023

Progress: Now numpy builds. The main problem was that Intel fortran does not support 10-byte reals and 16-byte integers, and the python code in f2py needs to know the supported data type to build extensions that do not crash. Perhaps this was not tested properly before.

Now scipy fails to build as meson cannot find mkl. I'll look at that later, unless somebody has some cool insight that could help me...

@schiotz
Copy link
Contributor Author

schiotz commented Oct 23, 2023

Mayday! I will need help with this by somebody who knows easybuild a lot better than me, perhaps @Micket or @boegel ?

I finally got numpy to build with intel/2023a, but scipy fails to build. I suspect it may be as simple as passing the right option to the build step.

The Meson build system cannot find MKL, it says that it fails to find it using pkg-config or cmake:

== 2023-10-23 20:52:34,943 run.py:246 INFO running cmd:  meson setup --prefix /home/niflheim/schiotz/easybuild/icelake/software/SciPy-bundle/2023.07-iimkl-2023a  -Dblas=mkl  -Dlapack=mkl  -Dlibdir=lib  /scratch/schiotz/eb_build/SciPybundle/2023.07/iimkl-2023a/scipy/scipy-1.11.1 
== 2023-10-23 20:52:42,053 build_log.py:171 ERROR EasyBuild crashed with an error (at easybuild/base/exceptions.py:126 in __init__): cmd " meson setup --prefix /home/niflheim/schiotz/easybuild/icelake/software/SciPy-bundle/2023.07-iimkl-2023a  -Dblas=mkl  -Dlapack=mkl  -Dlibdir=lib  /scratch/schiotz/eb_build/SciPybundle/2023.07/iimkl-2023a/scipy/scipy-1.11.1" exited with exit code 1 and output:
The Meson build system
Version: 1.1.1
    [... snip ...]
Run-time dependency pybind11 found: YES 2.11.1
WARNING: CMake Toolchain: Failed to determine CMake compilers state
Run-time dependency mkl found: NO (tried pkgconfig and cmake)

../scipy/scipy-1.11.1/scipy/meson.build:161:9: ERROR: Dependency "mkl" not found, tried pkgconfig and cmake

And indeed pkg-config cannot find mkl, there is no mkl.pc file. There are "only" these:

$ find $EBROOTIMKL -name '*.pc'
/home/modules/software/imkl/2023.1.0/compiler/2023.1.0/lib/pkgconfig/openmp.pc
/home/modules/software/imkl/2023.1.0/compiler/2023.1.0/lib/pkgconfig/openmp32.pc
/home/modules/software/imkl/2023.1.0/mkl/2023.1.0/lib/pkgconfig/mkl-dynamic-ilp64-gomp.pc
/home/modules/software/imkl/2023.1.0/mkl/2023.1.0/lib/pkgconfig/mkl-dynamic-ilp64-iomp.pc
/home/modules/software/imkl/2023.1.0/mkl/2023.1.0/lib/pkgconfig/mkl-dynamic-ilp64-seq.pc
/home/modules/software/imkl/2023.1.0/mkl/2023.1.0/lib/pkgconfig/mkl-dynamic-ilp64-tbb.pc
/home/modules/software/imkl/2023.1.0/mkl/2023.1.0/lib/pkgconfig/mkl-dynamic-lp64-gomp.pc
/home/modules/software/imkl/2023.1.0/mkl/2023.1.0/lib/pkgconfig/mkl-dynamic-lp64-iomp.pc
/home/modules/software/imkl/2023.1.0/mkl/2023.1.0/lib/pkgconfig/mkl-dynamic-lp64-seq.pc
/home/modules/software/imkl/2023.1.0/mkl/2023.1.0/lib/pkgconfig/mkl-dynamic-lp64-tbb.pc
/home/modules/software/imkl/2023.1.0/mkl/2023.1.0/lib/pkgconfig/mkl-sdl.pc
/home/modules/software/imkl/2023.1.0/mkl/2023.1.0/lib/pkgconfig/mkl-static-ilp64-gomp.pc
/home/modules/software/imkl/2023.1.0/mkl/2023.1.0/lib/pkgconfig/mkl-static-ilp64-iomp.pc
/home/modules/software/imkl/2023.1.0/mkl/2023.1.0/lib/pkgconfig/mkl-static-ilp64-seq.pc
/home/modules/software/imkl/2023.1.0/mkl/2023.1.0/lib/pkgconfig/mkl-static-ilp64-tbb.pc
/home/modules/software/imkl/2023.1.0/mkl/2023.1.0/lib/pkgconfig/mkl-static-lp64-gomp.pc
/home/modules/software/imkl/2023.1.0/mkl/2023.1.0/lib/pkgconfig/mkl-static-lp64-iomp.pc
/home/modules/software/imkl/2023.1.0/mkl/2023.1.0/lib/pkgconfig/mkl-static-lp64-seq.pc
/home/modules/software/imkl/2023.1.0/mkl/2023.1.0/lib/pkgconfig/mkl-static-lp64-tbb.pc

I do not know meson at all, and mkl only very superficially, so I do not know how to make pip install scipy correctly. Probably just some option that needs to be passed.

@Micket
Copy link
Contributor

Micket commented Oct 23, 2023

so a quick look at mkl has never supplied anything named plainly mkl.pc, so if that meson script is actually trying to use pkgconfig mkl, it's just wrong.

A quick check with this toolchain + pkgconf

$ pkg-config mkl-dynamic-lp64-seq --libs
-L/apps/Common/software/imkl/2023.1.0/mkl/latest/lib/pkgconfig/../../lib/intel64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lpthread -lm -ldl

it does work.

Skimming the meson.build a bit it looks like they are aware of that, and it should be possible to tell it to use blas_name == mkl-dynamic-lp64-seq.

Looks to me that the easyblock just sets "mkl" and expects the meson build to figure it out (a fair assumption, most build tools wouldn't rely on pkg-config for this, but just look at MKL_ROOT and pick the build config themselves)

https://github.com/easybuilders/easybuild-easyblocks/blob/0d2bb58e931e927feb70945b27a03eb062a5475a/easybuild/easyblocks/s/scipy.py#L127

I guess this worked in the past, but no longer

@schiotz
Copy link
Contributor Author

schiotz commented Oct 24, 2023

Thank you, @Micket.

Can I assume that mkl-dynamic-lp64-seq is the right mkl variant to choose? Then I can probably hack the setup script to hardcode it.

Is the scipy.py easyblock still used, when SciPy-bundle is a PythonBundle - I mean, does the PythonBundle easyblock call sub-easyblocks so to speak? It is relevant for where to hack :-)

@schiotz
Copy link
Contributor Author

schiotz commented Oct 24, 2023

OK, this is an ugly hack, but it actually gets scipy compiled. Now it just needs to pass its test suite. The saga continues...

@boegelbot
Copy link
Collaborator

@schiotz: Tests failed in GitHub Actions, see https://github.com/easybuilders/easybuild-easyconfigs/actions/runs/6623690740
Output from first failing test suite run:

FAIL: test_pr_patch_descr (test.easyconfigs.easyconfigs.EasyConfigTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/easybuild-easyconfigs/easybuild-easyconfigs/test/easyconfigs/easyconfigs.py", line 1293, in test_pr_patch_descr
    self.assertFalse(no_descr_patches, "No description found in patches: %s" % ', '.join(no_descr_patches))
AssertionError: ['/home/runner/work/easybuild-easyconfigs/easybuild-easyconfigs/easybuild/easyconfigs/s/SciPy-bundle/scipy-1.11.1_meson-build-mkl-name.patch'] is not false : No description found in patches: /home/runner/work/easybuild-easyconfigs/easybuild-easyconfigs/easybuild/easyconfigs/s/SciPy-bundle/scipy-1.11.1_meson-build-mkl-name.patch

----------------------------------------------------------------------
Ran 18380 tests in 1085.909s

FAILED (failures=1)
ERROR: Not all tests were successful

bleep, bloop, I'm just a bot (boegelbot v20200716.01)
Please talk to my owner @boegel if you notice me acting stupid),
or submit a pull request to https://github.com/boegel/boegelbot fix the problem.

@Micket
Copy link
Contributor

Micket commented Oct 24, 2023

Is the scipy.py easyblock still used, when SciPy-bundle is a PythonBundle - I mean, does the PythonBundle easyblock call sub-easyblocks so to speak? It is relevant for where to hack :-)

Yes. The rule for PythonBundle is if a custom easyblock matching the extension name exists (and it does for scipy, numpy and some others), it is used, else default to PythonPackage

Can I assume that mkl-dynamic-lp64-seq is the right mkl variant to choose? Then I can probably hack the setup script to hardcode it.

That is what scipy mentions in their meson file, though it's not strict.

@schiotz
Copy link
Contributor Author

schiotz commented Oct 24, 2023

Now scipy compiles, and gets two thirds through its test suite, before Python crashes with a segmentation fault:

scipy/sparse/linalg/tests/test_propack.py::test_svdp[LM-True-complex64-array] Fatal Python error: Segmentation fault

Current thread 0x00002b964e088980 (most recent call first):
  File "/tmp/eb-lrn__2st/tmpbbkzqh13/install/lib/python3.11/site-packages/scipy/sparse/linalg/_svdp.py", line 302 in _svdp
  File "/tmp/eb-lrn__2st/tmpbbkzqh13/install/lib/python3.11/site-packages/scipy/sparse/linalg/tests/test_propack.py", line 76 in check_svdp
  File "/tmp/eb-lrn__2st/tmpbbkzqh13/install/lib/python3.11/site-packages/scipy/sparse/linalg/tests/test_propack.py", line 110 in test_svdp
  File "/home/modules/software/Python-bundle-PyPI/2023.06-GCCcore-12.3.0/lib/python3.11/site-packages/_pytest/python.py", line 194 in pytest_pyfunc_call
  File "/home/modules/software/Python-bundle-PyPI/2023.06-GCCcore-12.3.0/lib/python3.11/site-packages/pluggy/_callers.py", line 80 in _multicall

The crashing function seems to be clansvd_irl in scipy/sparse/linalg/_propack/PROPACK/complex8/clansvd_irl.F.

The crashing function seems to be zlansvd_irl in scipy/sparse/linalg/_propack/PROPACK/complex16/zlansvd_irl.F

I am not sure why the test suite refers to the type as complex64 whereas the path to the Fortran function refers to complex16. 16 bytes would be 128 bits, not 64.

Edit: I am hallucinating...

Comment on lines 11 to 17
+# Hack for MKL in easybuild
+if blas_name == 'mkl'
+ blas_name = 'mkl-dynamic-lp64-seq'
+endif
+if lapack_name == 'mkl'
+ lapack_name = 'mkl-dynamic-lp64-seq'
+endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just fix the scipy easyblock instead. The most annoying part would just be to check if this behavior is backwards compatible with older scipy's or if we need to make a check with LooseVersion and only use the full "mkl-dynamic-lp64-seq" name going forward.

I also wonder how this plays together with RPATH users .. sigh. Maybe more involved logic that uses the environment variables that the toolchain sets up should be attempted instead of these prepackages PC files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scipy only builds with Meson from SciPy-bundle-2023.02 and later, and there are no Intel versions of these, so I do not think there will be any problem with backwards compatibility.

I have no clue about RPATH, but usually pkg-config sets that correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made a PR for the easyblock, I'll test it later today.

easybuilders/easybuild-easyblocks#3024

@schiotz
Copy link
Contributor Author

schiotz commented Oct 24, 2023

It seems that the above-mentioned test is already skipped on 32-bit cpus; and on windows the corresponding complex128 test is skipped because it crashes. Apparently the test is just skipped in these cases, perhaps it is OK for us to skip them, too.

From test_propack.py:

_dtypes = []
for dtype_flavour in TOLS.keys():
    marks = []
    if is_complex_type(dtype_flavour):
        if is_32bit():
            # PROPACK has issues w/ complex on 32-bit; see gh-14433
            marks = [pytest.mark.skip]
        elif is_windows() and np.dtype(dtype_flavour).itemsize == 16:
            # windows crashes for complex128 (so don't xfail); see gh-15108
            marks = [pytest.mark.skip]
        else:
            marks = [pytest.mark.slow]  # type: ignore[list-item]
    _dtypes.append(pytest.param(dtype_flavour, marks=marks,
                                id=dtype_flavour.__name__))
_dtypes = tuple(_dtypes)  # type: ignore[assignment]

@schiotz
Copy link
Contributor Author

schiotz commented Oct 24, 2023

Now it works. Updates:

@schiotz schiotz marked this pull request as ready for review October 24, 2023 16:41
@schiotz schiotz changed the title WIP: {lang}[iimkl/2023a] SciPy-bundle v2023.07 {lang}[iimkl/2023a] SciPy-bundle v2023.07 Oct 24, 2023
@boegel
Copy link
Member

boegel commented Jan 26, 2024

Test report by @boegel
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
node4228.shinx.os - Linux RHEL 8.8, x86_64, AMD EPYC 9654 96-Core Processor (zen4), Python 3.6.8
See https://gist.github.com/boegel/e9d075c3e513c6c5c408fe1bdffca681 for a full test report.

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
cns2 - Linux Rocky Linux 8.5, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/8e2e56544e598e30de02ccc73db5b5a4 for a full test report.

@boegel
Copy link
Member

boegel commented Jan 26, 2024

Test report by @boegel
SUCCESS
Build succeeded for 2 out of 2 (1 easyconfigs in total)
node3128.skitty.os - Linux RHEL 8.8, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz, Python 3.6.8
See https://gist.github.com/boegel/8f61adc2f9a6130468346f47f2b3f60b for a full test report.

Co-authored-by: Simon Branford <4967+branfosj@users.noreply.github.com>
@boegel
Copy link
Member

boegel commented Jan 26, 2024

Test report by @branfosj FAILED Build succeeded for 1 out of 2 (1 easyconfigs in total) bear-pg0207u28a.bear.cluster - Linux RHEL 8.6, x86_64, AMD EPYC 9554 64-Core Processor (zen4), Python 3.6.8 See https://gist.github.com/branfosj/265ecb2f92bcca2b37d2a2d28d216b7b for a full test report.

@branfosj I used --optarch=Intel:march=rocketlake on AMD Genoa (Zen4) and didn't see any test failures, whereas you observed 1 failure when using --optarch=Intel:march=common-avx512:

FAILED scipy/special/tests/test_exponential_integrals.py::TestExp1::test_branch_cut - AssertionError: assert -3.141592653589793 == --3.141592653589793
================================================================================================== FAILURES ==================================================================================================
__________________________________________________________________________________________ TestExp1.test_branch_cut __________________________________________________________________________________________
scipy/special/tests/test_exponential_integrals.py:12: in test_branch_cut
    assert sc.exp1(complex(-1, 0)).imag == (
E   AssertionError: assert -3.141592653589793 == --3.141592653589793
E    +  where -3.141592653589793 = (-1.8951178163559368-3.141592653589793j).imag
E    +    where (-1.8951178163559368-3.141592653589793j) = <ufunc 'exp1'>((-1+0j))
E    +      where <ufunc 'exp1'> = sc.exp1
E    +      and   (-1+0j) = complex(-1, 0)
E    +  and   -3.141592653589793 = (-1.8951178163559368-3.141592653589793j).imag
E    +    where (-1.8951178163559368-3.141592653589793j) = <ufunc 'exp1'>((-1-0j))
E    +      where <ufunc 'exp1'> = sc.exp1
E    +      and   (-1-0j) = complex(-1, -0.0)
        self       = <scipy.special.tests.test_exponential_integrals.TestExp1 object at 0x7f750aa658d0>

Others have definitely seen this too, see scipy/scipy#17075 and scipy/scipy#11339
Perhaps this is a bit of a fluke error?

@boegel
Copy link
Member

boegel commented Jan 26, 2024

@boegelbot please test @ generoso

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on login1

PR test command 'EB_PR=18875 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_18875 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 12750

Test results coming soon (I hope)...

- notification for comment with ID 1912382353 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@branfosj
Copy link
Member

Test report by @branfosj FAILED Build succeeded for 1 out of 2 (1 easyconfigs in total) bear-pg0207u28a.bear.cluster - Linux RHEL 8.6, x86_64, AMD EPYC 9554 64-Core Processor (zen4), Python 3.6.8 See https://gist.github.com/branfosj/265ecb2f92bcca2b37d2a2d28d216b7b for a full test report.

@branfosj I used --optarch=Intel:march=rocketlake on AMD Genoa (Zen4) and didn't see any test failures, whereas you observed 1 failure when using --optarch=Intel:march=common-avx512:

I've run it twice with common-avx512 and it failed both times. So, I've switched to rocketlake to see if that is better.

@branfosj
Copy link
Member

Test report by @branfosj
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
bear-pg0105u03a - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Platinum 8360Y CPU @ 2.40GHz (icelake), Python 3.6.8
See https://gist.github.com/branfosj/67bc6dc0f4d99c47f6dcd79dda9bd479 for a full test report.

@branfosj
Copy link
Member

Test report by @branfosj
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
bear-pg0207u20a - Linux RHEL 8.6, x86_64, Intel(R) Xeon(R) Platinum 8480CL (sapphirerapids), Python 3.6.8
See https://gist.github.com/branfosj/f422fe4fa85d8f63d7ad4d8eb488b3ae for a full test report.

@boegel
Copy link
Member

boegel commented Jan 26, 2024

@boegelbot please test @ jsc-zen3

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=18875 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_18875 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 3492

Test results coming soon (I hope)...

- notification for comment with ID 1912427908 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegel
Copy link
Member

boegel commented Jan 26, 2024

Test report by @boegel
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
node4003.donphan.os - Linux RHEL 8.8, x86_64, Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz (cascadelake), 1 x NVIDIA NVIDIA A2, 535.129.03, Python 3.6.8
See https://gist.github.com/boegel/81d2653d98bbd2defc91bc9a4d6ada4f for a full test report.

@branfosj
Copy link
Member

Test report by @branfosj
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
bear-pg0207u28a.bear.cluster - Linux RHEL 8.6, x86_64, AMD EPYC 9554 64-Core Processor (zen4), Python 3.6.8
See https://gist.github.com/branfosj/80ac79b9c8cebc50fef625eeaf1b0bdd for a full test report.

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
cns2 - Linux Rocky Linux 8.5, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/31dda491b7403a22b8fba60221b1a03c for a full test report.

@boegel
Copy link
Member

boegel commented Jan 26, 2024

Test report by @boegel
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
node4228.shinx.os - Linux RHEL 8.8, x86_64, AMD EPYC 9654 96-Core Processor (zen4), Python 3.6.8
See https://gist.github.com/boegel/6e63cf3942998e8bbbd2210138c1e595 for a full test report.

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.3, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.18
See https://gist.github.com/boegelbot/a88ab97ea15ecaf2476b8078fb23dc07 for a full test report.

@boegel
Copy link
Member

boegel commented Jan 26, 2024

Test report by @boegel
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
node3613.doduo.os - Linux RHEL 8.8, x86_64, AMD EPYC 7552 48-Core Processor, Python 3.6.8
See https://gist.github.com/boegel/c855574920b3524cd964bd8bcdff5e08 for a full test report.

@boegel boegel dismissed stale reviews from Micket and branfosj January 26, 2024 18:26

requested changes made

Copy link
Member

@boegel boegel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegel
Copy link
Member

boegel commented Jan 26, 2024

Test report by @boegel
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
node4112.gallade.os - Linux RHEL 8.8, x86_64, AMD EPYC 7773X 64-Core Processor (zen3), Python 3.6.8
See https://gist.github.com/boegel/c38a66b9d32a1597cac47a6e9aff06a6 for a full test report.

@boegel
Copy link
Member

boegel commented Jan 26, 2024

It's time to call this one, we've burned enough dead dinosaurs over this.

Test suite with a handful of tests being tweaked/disabled (see patches) for:

  • Intel Haswell (Rocky 8) @ generoso
  • Intel Broadwell (cfr. test reports by @schiotz + @akesandgren)
  • Intel Skylake (RHEL 8) @ HPC-UGent skitty
  • Intel Cascade Lake (RHEL 8) @ HPC-UGent donphan
  • Intel Sapphire Rapids (RHEL 8) @ BEAR
  • AMD Zen2 (RHEL 8) @ HPC-UGent doduo
  • AMD Zen3 (Rocky 9) @ jsc-zen3
  • AMD Zen4 (RHEL 8) @ HPC-UGent shinx + BEAR

Should any more issues arise, they'll have to be dealt with in a subsequent PR...

@boegel
Copy link
Member

boegel commented Jan 26, 2024

Going in, thanks @schiotz!

@boegel boegel modified the milestones: 4.x, release after 4.9.0 Jan 26, 2024
@boegel boegel merged commit 210f38a into easybuilders:develop Jan 26, 2024
9 checks passed
@boegel
Copy link
Member

boegel commented Jan 26, 2024

Test report by @boegel
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
node3128.skitty.os - Linux RHEL 8.8, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz, Python 3.6.8
See https://gist.github.com/boegel/b92e23cd7c0d0849d03a340d0c42e883 for a full test report.

@akesandgren
Copy link
Contributor

akesandgren commented Jan 29, 2024

After having tried this without lowopt/strict with intel compilers and flexiblas/OpenBLAS I see the same type of problems.
Using lowopt+strict solves it. So this is a compiler related problem not MKL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants