Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.3.22 breaks functions downstream in GNU Octave #3976

Closed
arungiridhar opened this issue Mar 28, 2023 · 29 comments
Closed

0.3.22 breaks functions downstream in GNU Octave #3976

arungiridhar opened this issue Mar 28, 2023 · 29 comments

Comments

@arungiridhar
Copy link

arungiridhar commented Mar 28, 2023

OpenBLAS 0.3.21 works perfectly with GNU Octave, but 0.3.22 causes several test failures in matrix inverses, eigenvector calculations, and sparse matrix operations. These include returning Inf instead of NaN and giving an answer that is wrong by several orders of magnitude. Affected Octave functions include inv(), eigs(), residue(), and det(), and any user code using those functions.

Thread: https://octave.discourse.group/t/make-check-fails-with-openblas-0-3-22/4289

How to reproduce:

  1. Download & build a current version of GNU Octave from source: https://octave.org/download.
  2. Run make check.
  3. Observation: OpenBLAS 0.3.22 will give 10 test failures in the above functions. 0.3.21 will give zero test failures.

Workaround: I have reverted my system OpenBLAS to 0.3.21 for now, and Octave works properly again.

EDIT: Build settings for OpenBLAS, both versions:

FC=gfortran USE_OPENMP=1 USE_THREAD=1 USE_TLS=0 NO_STATIC=1 CPP_THREAD_SAFETY_TEST=1
@arungiridhar
Copy link
Author

If it helps: https://octave.discourse.group/t/make-check-fails-with-openblas-0-3-22/4289/4

The problem might be in dpotrf or dpstrf due to the specific mix of Octave test failures.

@martin-frbg
Copy link
Collaborator

Looks like i broke a division-by-zero test in zgetf2 with a quick fix lately (#3941 ,managed to change OR into AND and nobody/no test noticed)

@mmuetzel
Copy link
Contributor

mmuetzel commented Mar 29, 2023

That's probably off-topic to this issue. But @MehdiChinoune is probably referring to the following symbols that are no longer being exported (or exported with a different signature) by the OpenBLAS library compared to version 0.3.21:

  {b'libopenblas.dll': {b'dzamin_',
                        b'dzasum_',
                        b'dznrm2_',
                        b'dzsum1_',
                        b'dzsum_',
                        b'exec_blas',
                        b'gemm_thread_m',
                        b'gemm_thread_mn',
                        b'gemm_thread_n',
                        b'gemm_thread_variable',
                        b'get_num_procs',
                        b'goto_get_num_procs',
                        b'goto_set_num_threads',
                        b'gotoblas',
                        b'gotoblas_ATHLON',
                        b'gotoblas_BANIAS',
                        b'gotoblas_BARCELONA',
                        b'gotoblas_BULLDOZER',
                        b'gotoblas_COOPERLAKE',
                        b'gotoblas_COPPERMINE',
                        b'gotoblas_CORE2',
                        b'gotoblas_EXCAVATOR',
                        b'gotoblas_HASWELL',
                        b'gotoblas_KATMAI',
                        b'gotoblas_NEHALEM',
                        b'gotoblas_NORTHWOOD',
                        b'gotoblas_PILEDRIVER',
                        b'gotoblas_PRESCOTT',
                        b'gotoblas_SANDYBRIDGE',
                        b'gotoblas_SKYLAKEX'},

Is that intentional?

Edit: Inspecting the updated library, all of those symbols seem to be exported still. Not sure why the package grokker picked them out. Maybe, just ignore...

Edit 2: That was an error in the package grokker in MSYS2's CI. All symbols are still exported correctly. Please, ignore this.

@dimpase
Copy link

dimpase commented Mar 29, 2023

Looks like i broke a division-by-zero test in zgetf2 with a quick fix lately (#3941 ,managed to change OR into AND and nobody/no test noticed)

Would you like a test in Fortran which would have captured that? As scipy downstream points out, for them the regression comes from https://github.com/scipy/scipy/blob/main/scipy/linalg/src/det.f#L94
(cf. scipy/scipy#18208 (comment))

@martin-frbg
Copy link
Collaborator

test in C would be best (for integrating into "utest") but can wait til later,can run the scipy reproducer for now. (Anything better than the octave approach of "just build octave from source and you will see" but can't blame them really). Will be interesting if the other fallout will be similar blunders or just consequences of changed algorithms in LAPACK 3.11

@dasergatskov
Copy link

dasergatskov commented Mar 29, 2023

bug-63384.f.gz
Using bug-63384.f (see Reference-LAPACK/lapack#763):

$ gfortran bug-63384.f -llapack -lblas
$ ./a.out 
 x = 
                       NaN                       NaN
                       NaN                       NaN
 anorm =                        NaN
 dgetrf info:            0
 ipiv = 
           1           2
 dgecon info:            0
 rcond =    0.0000000000000000     
 dgetri info:            0
 xinv = 
                       NaN                       NaN
                       NaN                       NaN

That looks good.
Now:

$ LD_PRELOAD=./libopenblas_zenp-r0.3.22.so ./a.out 
 x = 
                       NaN                       NaN
                       NaN                       NaN
 anorm =                        NaN
 dgetrf info:            1
 ipiv = 
           1           2
 ** On entry to DGECON parameter number  5 had an illegal value
 dgecon info:           -5
 rcond =   -1.2345670461654663     
 dgetri info:            0
 xinv = 
                       NaN                       NaN
                       NaN                       NaN

@martin-frbg
Copy link
Collaborator

Hmm. I wonder if the GETF2 fixes are trading a NumPy issue ( numpy/numpy#22025 ) for an octave one (the DGECON fix for Reference-LAPACK issue 763 is carried in OpenBLAS 0.3.22)

@angsch
Copy link
Contributor

angsch commented Mar 29, 2023

Why does dgetrf return info=1 (and not 0)?

@martin-frbg
Copy link
Collaborator

martin-frbg commented Mar 29, 2023

Good question. Seems temp was below DBL_MIN so singularity gets reported (original code before the fix used "is unequal ZERO" there, as does the original dgetf2.f of the reference LAPACK. A trap, DLAMCH('S') something else than DBL_MIN ?

@dasergatskov
Copy link

This is directly in octave.
With 0.3.21:

octave:1> C = [   1.0000 +      0i        0 +      0i   1.0000 +      0i        0 +      0i
        0 + 1.0000i   0.3333 +      0i        0 - 1.0000i   0.3333 +      0i
   1.0000 +      0i        0 + 0.6667i   1.0000 +      0i        0 - 0.6667i
        0 + 1.0000i  -0.3333 +      0i        0 - 1.0000i  -0.3333 +      0i

];
octave:2> rcond(C)
ans = 0.1250

With 0.3.22:

octave:1> C = [  1.0000 +      0i        0 +      0i   1.0000 +      0i        0 +      0i
        0 + 1.0000i   0.3333 +      0i        0 - 1.0000i   0.3333 +      0i
   1.0000 +      0i        0 + 0.6667i   1.0000 +      0i        0 - 0.6667i
        0 + 1.0000i  -0.3333 +      0i        0 - 1.0000i  -0.3333 +      0i

];
octave:2> rcond(C)
ans = 0

I attached t_rcond.m with those two lines (to avoid cut-n-paste issues).
t_rcond.m.gz

@martin-frbg
Copy link
Collaborator

@angsch what if I stack the old and new conditionals - if temp1 not exactly zero then check if large enough to do the division, swap&scale as needed, else set info ? I know it`s an engineering solution to a mathematical problem, but...

@angsch
Copy link
Contributor

angsch commented Mar 29, 2023

@angsch what if I stack the old and new conditionals - if temp1 not exactly zero then check if large enough to do the division, swap&scale as needed, else set info ? I know it`s an engineering solution to a mathematical problem, but...

My understanding of DLAMCH('S') alias DBL_MIN is that this is the smallest normal number. Of course, there are subnormals, but for the LU factorization there is no point in continuing here. So the check against DBL_MIN is in my opinion correct. Every subnormal entry (fabs(a) < DBL_MIN) should be treated as zero, ie set info. I do not know how exactly the criterion for complex numbers should look.

Do we know if the issue with rcond is related to the LU factorization or to some other problem?

Edit: I take that back: Reference-LAPACK says

0: if INFO = i, U(i,i) is exactly zero.

So subnormals should be processed and possibly introduce Infinity. So to follow Reference-LAPACK I suppose two checks are needed. One for exactly zero (then set info and exit) and one fabs(a) >= DBL_MIN to do the computation.

@martin-frbg
Copy link
Collaborator

Ok, thanks, I'm following through with that already - will also check if this takes care of the Octave issues once I've got it built

@dasergatskov
Copy link

You do not need to build a new octave, you can use the binary provided by your distro and use LD_PRELOAD to
override blas libs.

LU also appears broken in octave (though the tests did not notice that):

octave:33> C
C =

   1.0000 +      0i        0 +      0i   1.0000 +      0i        0 +      0i
        0 + 1.0000i   0.3333 +      0i        0 - 1.0000i   0.3333 +      0i
   1.0000 +      0i        0 + 0.6667i   1.0000 +      0i        0 - 0.6667i
        0 + 1.0000i  -0.3333 +      0i        0 - 1.0000i  -0.3333 +      0i

octave:34> [L, U] = lu(C)
L =

   1.0000 +      0i        0 +      0i        0 +      0i        0 +      0i
   1.0000 +      0i        0 + 0.6667i   1.0000 +      0i        0 +      0i
        0 + 1.0000i   1.0000 +      0i        0 +      0i        0 +      0i
        0 + 1.0000i  -0.3333 +      0i   0.6000 + 0.8000i   1.0000 +      0i

U =

   1.0000 +      0i        0 +      0i   1.0000 +      0i        0 +      0i
        0 +      0i   0.3333 +      0i   1.0000 - 1.0000i        0 - 0.6667i
        0 +      0i        0 +      0i  -1.6667 - 1.6667i  -0.1112 +      0i
        0 +      0i        0 +      0i        0 +      0i  -0.2666 - 0.1333i

octave:35> L*U - C 
ans =

        0 +      0i        0 +      0i        0 +      0i        0 +      0i
   1.0000 - 1.0000i  -0.3333 + 0.2222i  -0.0000 +      0i        0 +      0i
  -1.0000 + 1.0000i   0.3333 - 0.6667i        0 +      0i        0 +      0i
        0 +      0i   0.2222 +      0i        0 - 0.0000i        0 +      0i

with 0.3.21 (works as expected):

octave:26> C
C =

   1.0000 +      0i        0 +      0i   1.0000 +      0i        0 +      0i
        0 + 1.0000i   0.3333 +      0i        0 - 1.0000i   0.3333 +      0i
   1.0000 +      0i        0 + 0.6667i   1.0000 +      0i        0 - 0.6667i
        0 + 1.0000i  -0.3333 +      0i        0 - 1.0000i  -0.3333 +      0i

octave:27> [L, U] = lu(C)
L =

   1.0000 +      0i        0 +      0i        0 +      0i        0 +      0i
        0 + 1.0000i        0 - 0.4999i   1.0000 +      0i        0 +      0i
   1.0000 +      0i   1.0000 +      0i        0 +      0i        0 +      0i
        0 + 1.0000i        0 + 0.4999i   1.0000 +      0i   1.0000 +      0i

U =

   1.0000 +      0i        0 +      0i   1.0000 +      0i        0 +      0i
        0 +      0i        0 + 0.6667i        0 +      0i        0 - 0.6667i
        0 +      0i        0 +      0i        0 - 2.0000i   0.6666 +      0i
        0 +      0i        0 +      0i        0 +      0i  -1.3332 +      0i

octave:28> L*U - C
ans =

   0   0   0   0
   0   0   0   0
   0   0   0   0
   0   0   0   0

@angsch
Copy link
Contributor

angsch commented Mar 29, 2023

@dasergatskov Looks like something is wrong with the tiebreaker when pivoting is applied... Can you print P as well?

@dasergatskov
Copy link

I assume P as permutation matrix. it is the same is both cases, but lu is still broken.
0.3.22:

octave:36> [L, U, P] = lu(C)
L =

   1.0000 +      0i        0 +      0i        0 +      0i        0 +      0i
        0 + 1.0000i   1.0000 +      0i        0 +      0i        0 +      0i
   1.0000 +      0i        0 + 0.6667i   1.0000 +      0i        0 +      0i
        0 + 1.0000i  -0.3333 +      0i   0.6000 + 0.8000i   1.0000 +      0i

U =

   1.0000 +      0i        0 +      0i   1.0000 +      0i        0 +      0i
        0 +      0i   0.3333 +      0i   1.0000 - 1.0000i        0 - 0.6667i
        0 +      0i        0 +      0i  -1.6667 - 1.6667i  -0.1112 +      0i
        0 +      0i        0 +      0i        0 +      0i  -0.2666 - 0.1333i

P =

Permutation Matrix

   1   0   0   0
   0   0   1   0
   0   1   0   0
   0   0   0   1

octave:37> P'*L*U - C 
ans =

        0 +      0i        0 +      0i        0 +      0i        0 +      0i
   1.0000 - 1.0000i  -0.3333 + 0.2222i  -0.0000 +      0i        0 +      0i
  -1.0000 + 1.0000i   0.3333 - 0.6667i        0 +      0i        0 +      0i
        0 +      0i   0.2222 +      0i        0 - 0.0000i        0 +      0i

0.3.21:

octave:33> [L, U, P] = lu(C)
L =

   1.0000 +      0i        0 +      0i        0 +      0i        0 +      0i
   1.0000 +      0i   1.0000 +      0i        0 +      0i        0 +      0i
        0 + 1.0000i        0 - 0.4999i   1.0000 +      0i        0 +      0i
        0 + 1.0000i        0 + 0.4999i   1.0000 +      0i   1.0000 +      0i

U =

   1.0000 +      0i        0 +      0i   1.0000 +      0i        0 +      0i
        0 +      0i        0 + 0.6667i        0 +      0i        0 - 0.6667i
        0 +      0i        0 +      0i        0 - 2.0000i   0.6666 +      0i
        0 +      0i        0 +      0i        0 +      0i  -1.3332 +      0i

P =

Permutation Matrix

   1   0   0   0
   0   0   1   0
   0   1   0   0
   0   0   0   1

octave:34> P'*L*U - C
ans =

   0   0   0   0
   0   0   0   0
   0   0   0   0
   0   0   0   0

@dasergatskov
Copy link

dasergatskov commented Mar 29, 2023

This is also a repost from octave https://octave.discourse.group/t/make-check-fails-with-openblas-0-3-22/4289/7
I noticed that few test failures in octave are due to what appears to be a small (~eps) mutual Real <-->Im spillage.

E.g. with residue():

octave:1> b = [1, 0, 1];
octave:2> a = [1, 0, 18, 0, 81];
octave:3> [r, p, k, e] = residue (b, a)
warning: matrix singular to machine precision
warning: called from
    residue at line 257 column 5

r =

  -5.6656e-18 - 9.2593e-02i
   2.2222e-01 + 5.9111e-17i
  -6.6672e-17 + 9.2593e-02i
   2.2222e-01 + 1.3878e-16i
# w/ openblas 9.3.21
# r =
#
#        0 - 0.0926i
#   0.2222 +      0i
#        0 + 0.0926i
#  0.2222 +      0i
#

p =

   0 + 3i
   0 + 3i
   0 - 3i
   0 - 3i

k = [](0x0)
e =

   1
   2
   1
   2
octave:4> [br, ar] = residue (r, p, k)
br =

 Columns 1 and 2:

  -7.2338e-17 + 1.1102e-16i   1.0000e+00 + 3.8091e-16i

 Columns 3 and 4:

  -1.7304e-16 + 6.6613e-16i   1.0000e+00 - 1.3382e-16i

# 
# br =
#
#   1   0   1

And test fail because br is now 1x4 instead of 1x3

@angsch
Copy link
Contributor

angsch commented Mar 29, 2023

@martin-frbg
https://github.com/xianyi/OpenBLAS/blob/23f2c4ca5b0ee7f9a5bd1ec25206f4db19453c36/lapack/getf2/getf2_k.c#L104
My understanding of the build system is that the code gets instantiated for float and double. Does the constant change, too? DBL_MIN has its FLT_MIN counterpart. The compiler may promote the data type from float to double.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Mar 29, 2023

@angsch yes, that may have been a bit sloppy but I think it is unrelated to the (main) problem. With the checks fixed to raise info on exact zero only, the Octave testsuite appears to pass (but I have not tried the separate testcases posted above yet)

@martin-frbg
Copy link
Collaborator

#3980 fixes the scipy error and also the rcond , lu und spurious residue errors from octave - I still see one scipy testsuite error:
TestFBLAS2Simple::test_spr_hpr AssertionError: Arrays are not almost equal to 6 decimals. Mismatched elements 9/9
but have not checked if it is/was already present with 0.3.21

@arungiridhar
Copy link
Author

With the latest patch, using d708951, it works again in Octave:

$ LD_PRELOAD=./libopenblas_zenp-r0.3.22.dev.so octave -q

octave:1> version -blas
ans = OpenBLAS (config: OpenBLAS 0.3.22.dev NO_AFFINITY USE_OPENMP ZEN MAX_THREADS=32)

octave:2> C = [1, 0, 1, 0;  i, 1/3, -i, 1/3; 1, 2*i/3, 1, -2*i/3; i, -1/3, -i, -1/3]
C =
   1.0000 +      0i        0 +      0i   1.0000 +      0i        0 +      0i
        0 + 1.0000i   0.3333 +      0i        0 - 1.0000i   0.3333 +      0i
   1.0000 +      0i        0 + 0.6667i   1.0000 +      0i        0 - 0.6667i
        0 + 1.0000i  -0.3333 +      0i        0 - 1.0000i  -0.3333 +      0i

octave:3> [L, U, P] = lu(C), residual = P'*L*U - C
L =
   1.0000 +      0i        0 +      0i        0 +      0i        0 +      0i
   1.0000 +      0i   1.0000 +      0i        0 +      0i        0 +      0i
        0 + 1.0000i        0 - 0.5000i   1.0000 +      0i        0 +      0i
        0 + 1.0000i        0 + 0.5000i   1.0000 -      0i   1.0000 +      0i

U =
   1.0000 +      0i        0 +      0i   1.0000 +      0i        0 +      0i
        0 +      0i        0 + 0.6667i        0 +      0i        0 - 0.6667i
        0 +      0i        0 +      0i        0 - 2.0000i   0.6667 +      0i
        0 +      0i        0 +      0i        0 +      0i  -1.3333 +      0i

P =
Permutation Matrix
   1   0   0   0
   0   0   1   0
   0   1   0   0
   0   0   0   1

residual =
   0   0   0   0
   0   0   0   0
   0   0   0   0
   0   0   0   0

@martin-frbg
Copy link
Collaborator

Thanks for confirming. I am currently bisecting to find the cause of the lone remaining scipy test error

ahesford added a commit to ahesford/void-packages that referenced this issue Apr 1, 2023
Version 0.3.22 apparently breaks Octave:

    OpenMathLib/OpenBLAS#3976

This reverts commit 0ef7a0e.
ahesford added a commit to ahesford/void-packages that referenced this issue Apr 1, 2023
Version 0.3.22 apparently breaks Octave:

    OpenMathLib/OpenBLAS#3976

This reverts commit 0ef7a0e.
ahesford added a commit to void-linux/void-packages that referenced this issue Apr 1, 2023
Version 0.3.22 apparently breaks Octave:

    OpenMathLib/OpenBLAS#3976

This reverts commit 0ef7a0e.
atweiden added a commit to atweiden/voidpkgs that referenced this issue Apr 1, 2023
@martin-frbg
Copy link
Collaborator

closing as fixed by #3980 and #3984 - also received confirmation that the sagemath testsuite passes again with these (as do both numpy and scipy testsuites)

@thesamesam
Copy link

thesamesam commented Apr 1, 2023

Thanks. Please consider a new release to avoid distros relying on word of mouth to backport patches or avoid the last release.

@martin-frbg
Copy link
Collaborator

Release preparations already in progress, only interrupted by dinner due to domestic misunderstanding...

@martin-frbg
Copy link
Collaborator

0.3.23 released now, Windows binaries may have to wait until Monday noon (CET) unless I convince myself that the MXE (mingw) dlltool is fully equivalent to a native lib.exe (for creating the export file and import library from the generated .def)

@dasergatskov
Copy link

dasergatskov commented Apr 2, 2023

I guess I am late to the party, but I am still getting higher errors in complex chol (in octave)
I compiled 0.3.21 and 0.3.23 with the same compiler and with default params (Just typed make):

$ LD_PRELOAD=~/src/OpenBLAS-0.3.21/libopenblas_zenp-r0.3.21.so NUM_THREADS=16 octave -qf
octave:1> c1
ans = 4.8179e-16
octave:2> version -blas
ans = FlexiBLAS Version 3.3.0
OpenBLAS (config: OpenBLAS 0.3.21 NO_AFFINITY ZEN MAX_THREADS=32)
octave:3> 

$ LD_PRELOAD=~/src/OpenBLAS-0.3.23/libopenblas_zenp-r0.3.23.so NUM_THREADS=16 octave -qf
octave:1> c1
ans = 2.2369e-15
error: assert (norm (R1 - R, Inf) < 1e1 * eps) failed
error: called from
    assert at line 107 column 11
    c1 at line 19 column 1
octave:2> version -blas
ans = FlexiBLAS Version 3.3.0
OpenBLAS (config: OpenBLAS 0.3.23 NO_AFFINITY ZEN MAX_THREADS=32)
octave:3> 

The original test was for 10*eps now it passes at 11*eps, but the error increase by factor ~5.
Test script is attached.
c1.m.gz

$ cat c1.m 
Ac = [  0.5585528 + 0.0000000i  -0.1662088 - 0.0315341i   0.0107873 + 0.0236411i  -0.0276775 - 0.0186073i ;
-0.1662088 + 0.0315341i   0.6760061 + 0.0000000i   0.0011452 - 0.0475528i   0.0145967 + 0.0247641i ;
0.0107873 - 0.0236411i   0.0011452 + 0.0475528i   0.6263149 - 0.0000000i  -0.1585837 - 0.0719763i ;
-0.0276775 + 0.0186073i   0.0145967 - 0.0247641i  -0.1585837 + 0.0719763i   0.6034234 - 0.0000000i ];

uc = [ 0.54267 + 0.91519i ;
0.99647 + 0.43141i ;
0.83760 + 0.68977i ;
0.39160 + 0.90378i ];
%!test
R = chol (Ac);
R1 = cholupdate (R, uc);
assert (norm (triu (R1)-R1, Inf), 0);
assert (norm (R1'*R1 - R'*R - uc*uc', Inf) < 1e1*eps);

[R1, R1info] = cholupdate (R1, uc, "-");
assert (norm (triu (R1)-R1, Inf), 0);
norm (R1 - R, Inf) 
assert (norm (R1 - R, Inf) < 1e1*eps);

@arungiridhar
Copy link
Author

@dasergatskov Mine is a little bit better, doesn't fail the assertion:

0.3.21:

$ octave -q
octave:1> version -blas
ans = OpenBLAS (config: OpenBLAS 0.3.21 NO_AFFINITY USE_OPENMP ZEN MAX_THREADS=32)
octave:2> c1
ans = 4.1489e-16
octave:3> exit

0.3.23:

$ LD_PRELOAD="./libopenblas_zenp-r0.3.23.so" octave -q
octave:1> version -blas
ans = OpenBLAS (config: OpenBLAS 0.3.23 NO_AFFINITY USE_OPENMP ZEN MAX_THREADS=32)
octave:2> c1
ans = 7.6690e-16
octave:3> exit

The options passed to building 0.3.23 didn't make a difference for me; I get the same results with these two sets of parameters. This:

CFLAGS="-pipe"  CXXFLAGS="-pipe"  FFLAGS="-pipe"  make -j all FC=gfortran USE_OPENMP=1 USE_THREAD=1 USE_TLS=0 NO_LAPACK=0 BUILD_LAPACK_DEPRECATED=1 NO_STATIC=1 BUILD_RELAPACK=1 CPP_THREAD_SAFETY_TEST=1 CONSISTENT_FPCSR=1

as well as this:

CFLAGS="-pipe"  CXXFLAGS="-pipe"  FFLAGS="-pipe"  make -j all

Compiler is gcc 12.2.1.

@martin-frbg
Copy link
Collaborator

martin-frbg commented Apr 2, 2023

Getting ans = 5.4624e-16 from Octave 8.1.0 for a ZEN binary built with gcc 10.2.1 (the default in Debian 11), same as with 0.3.21, on a six-core Ryzen5-4600H running Linux - cpu model (Zen3 vs Zen4), compiler version and operating system will probably play a role. EDIT: no change with gcc 12.2 or a mid-February snapshot of gcc 13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants