Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rcond is 0 for NaN matrix in lapack-3.11 #763

Closed
1 task done
dasergatskov opened this issue Nov 21, 2022 · 9 comments · Fixed by #765
Closed
1 task done

rcond is 0 for NaN matrix in lapack-3.11 #763

dasergatskov opened this issue Nov 21, 2022 · 9 comments · Fixed by #765

Comments

@dasergatskov
Copy link

dasergatskov commented Nov 21, 2022

Description
rcond for matrix of NaN values returns 0 while it should (and used to in previous version) return NaN
Checklist

  • I've included a minimal example to reproduce the issue

This came up as a bug in octave: https://savannah.gnu.org/bugs/?63384
With the fortran test file (attached):

$ gfortran bug-63384.f -llapack -lblas

$ ./a.out 
 x = 
                       NaN                       NaN
                       NaN                       NaN
 anorm =                        NaN
 dgetrf info:            0
 ipiv = 
           1           2
 dgecon info:            0
 rcond =    0.0000000000000000     
 dgetri info:            0
 xinv = 
                       NaN                       NaN
                       NaN                       NaN
$ LD_PRELOAD=/usr/lib64/libopenblaso.so.0 ./a.out 
 x = 
                       NaN                       NaN
                       NaN                       NaN
 anorm =                        NaN
 dgetrf info:            0
 ipiv = 
           1           2
 dgecon info:            0
 rcond =                        NaN
 dgetri info:            0
 xinv = 
                       NaN                       NaN
                       NaN                       NaN

Openblas is 0.3.21 and is linked with lapack 3.10.1

$ cat bug-63384.f 
      program main
      implicit none
      double precision zero, nan, x(2,2)
      integer i, j, ipiv(2), iwork(2), info
      double precision work(8), anorm, rcond
      rcond = -1.234567
      zero = 0.0
      nan = zero / zero
      anorm = nan
      do j = 1, 2
        ipiv(j) = -j
        do i = 1, 2
          x(i,j) = nan
        enddo
      enddo
      print *, 'x = '
      do i = 1, 2
        print *, x(i,1), x(i,2)
      enddo
      print *, 'anorm = ', anorm
      call dgetrf (2, 2, x, 2, ipiv, info);
      print *, 'dgetrf info: ', info
      print *, 'ipiv = '
      print *, ipiv(1), ipiv(2)
      call dgecon ('1', 2, x, 2, anorm, rcond, work, iwork, info)
      print *, 'dgecon info: ', info
      print *, 'rcond = ', rcond
      call dgetri (2, x, 2, ipiv, work, 8, info)
      print *, 'dgetri info: ', info
      print *, 'xinv = '
      do i = 1, 2
        print *, x(i,1), x(i,2)
      enddo
      end

Dmitri.
p.s. tagging @mmuetzel
bug-63384.f.gz

@martin-frbg
Copy link
Collaborator

Note that neither dgecon.f nor dgetrf.f were changed in recent years, also OpenBLAS uses its own reimplementation of GETRF.

@langou
Copy link
Contributor

langou commented Nov 21, 2022

Thanks @martin-frbg

Related past issues, comments:
numpy/numpy#18914, #471 and #567.

@dasergatskov
Copy link
Author

dasergatskov commented Nov 21, 2022

I downgraded back to lapack-3.10.1:

$ gfortran bug-63384.f -llapack -lblas
$ ./a.out 
 x = 
                       NaN                       NaN
                       NaN                       NaN
 anorm =                        NaN
 dgetrf info:            0
 ipiv = 
           1           2
 dgecon info:            0
 rcond =                        NaN
 dgetri info:            0
 xinv = 
                       NaN                       NaN
                       NaN                       NaN
$ ldd a.out 
	linux-vdso.so.1 (0x00007ffc99bcc000)
	liblapack.so.3 => /lib64/liblapack.so.3 (0x00007f4813e00000)
	libblas.so.3 => /lib64/libblas.so.3 (0x00007f481455b000)
	libgfortran.so.5 => /lib64/libgfortran.so.5 (0x00007f4813a00000)
	libm.so.6 => /lib64/libm.so.6 (0x00007f4813d24000)
	libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f4814540000)
	libquadmath.so.0 => /lib64/libquadmath.so.0 (0x00007f4813cda000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f4813600000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f481460d000)

$ rpm -qf /usr/lib64/liblapack.so
lapack-devel-3.10.1-1.fc35.x86_64

@weslleyspereira
Copy link
Collaborator

I suggest we fix the NaN issue with the patch:

diff --git a/SRC/dgecon.f b/SRC/dgecon.f
index aa10dee9a..103457b3e 100644
--- a/SRC/dgecon.f
+++ b/SRC/dgecon.f
@@ -106,6 +106,7 @@
 *>          INFO is INTEGER
 *>          = 0:  successful exit
 *>          < 0:  if INFO = -i, the i-th argument had an illegal value
+*>          =-5:  if ANORM is NAN.
 *> \endverbatim
 *
 *  Authors:
@@ -152,10 +153,10 @@
       INTEGER            ISAVE( 3 )
 *     ..
 *     .. External Functions ..
-      LOGICAL            LSAME
+      LOGICAL            LSAME, DISNAN
       INTEGER            IDAMAX
       DOUBLE PRECISION   DLAMCH
-      EXTERNAL           LSAME, IDAMAX, DLAMCH
+      EXTERNAL           LSAME, IDAMAX, DLAMCH, DISNAN
 *     ..
 *     .. External Subroutines ..
       EXTERNAL           DLACN2, DLATRS, DRSCL, XERBLA
@@ -175,7 +176,7 @@
          INFO = -2
       ELSE IF( LDA.LT.MAX( 1, N ) ) THEN
          INFO = -4
-      ELSE IF( ANORM.LT.ZERO ) THEN
+      ELSE IF( ANORM.LT.ZERO .OR. DISNAN( ANORM ) ) THEN
          INFO = -5
       END IF
       IF( INFO.NE.0 ) THEN

Same fix applied in #471 and #567.

@langou
Copy link
Contributor

langou commented Nov 21, 2022

Thanks for the patch, @weslleyspereira. The patch sounds great to me.

This patch will fix this bug and this is the way to go. We do want these quick return when we have a NaN in input and when we compute ANORM early in the routine. So this is perfect.

That being said, this patch masks the bug. We might want to know how the bug came in 3.11. It would be good to do some forensic work and try to understand why the bug is not in 3.10 and is in 3.11.

@langou
Copy link
Contributor

langou commented Nov 23, 2022

Hi @angsch, do you think this issue and #712 can be related? We did not check #712 for NaN propagation. Julien.

@dasergatskov
Copy link
Author

With the patch (allegedly) in Fedora's lapack-3.11.0-4 rpm I see:

$ ./a.out 
 x = 
                       NaN                       NaN
                       NaN                       NaN
 anorm =                        NaN
 dgetrf info:            0
 ipiv = 
           1           2
 ** On entry to DGECON parameter number  5 had an illegal value
Note: The following floating-point exceptions are signalling: IEEE_INVALID_FLAG

Is this a desired result? The way OpenBlas propagates NaN looks better to me.

@langou
Copy link
Contributor

langou commented Jun 5, 2023

Hi @dasergatskov,

I think you are saying that, for a 2x2 matrix with 4 NaNs, you would prefer the output of DGECON to be NaN as opposed to DGECON throwing the error message ** On entry to DGECON parameter number 5 had an illegal value. I think returning an error message is more consistent with where we are trying to go. That is we are trying to go where we return error messages when we have NaN in input parameter. (Whenever we can easily detect them.) We can speak more about it.

What bothers me the most about this is example is that DGETRF returns INFO = 0. This is also inline with only detect NaN when it is easy to do so. Maybe we can at least add a DISNAN in the code DGETF2 (and DGETRF2) at:

IF( A( JP, J ).NE.ZERO ) THEN

IF( A( I, 1 ).NE.ZERO ) THEN

Julien.

@weslleyspereira
Copy link
Collaborator

Hi @dasergatskov. Please, take a look at #926. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants