Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DGESDD with nan exits with: On entry to DLASCL parameter number 4 had an illegal value #469

Closed
jschueller opened this issue Jan 8, 2021 · 7 comments · Fixed by #471 or #567
Closed
Milestone

Comments

@jschueller
Copy link
Contributor

jschueller commented Jan 8, 2021

in version 3.9.0 and master, dgesdd with a nan value calls exit(0) instead of returning an error code on a 3x3 diagonal matrix plus a nan value and with jobz=N
(xref http://www.netlib.org/lapack/explore-html/d1/d7e/group__double_g_esing_gad8e0f1c83a78d3d4858eaaa88a1c5ab1.html)
it seems to happen with gcc>=9 (confirmed with 9.3 and 10.2, but not 7.5)

M=3
N=3
A = {1,NAN,0, 0,1,0, 0,0,1}

full C code to reproduce:

#include <stdlib.h>
#include <stdio.h>
#include <math.h>
/* DGESDD prototype */
extern void dgesdd( char* jobz, int* m, int* n, double* a,
                int* lda, double* s, double* u, int* ldu, double* vt, int* ldvt,
                double* work, int* lwork, int* iwork, int* info );
/* Auxiliary routines prototypes */
extern void print_matrix( char* desc, int m, int n, double* a, int lda );

/* Parameters */
#define M 3
#define N 3
#define LDA M
#define LDU M
#define LDVT N

/* Main program */
int main() {
        /* Locals */
        int m = M, n = N, lda = LDA, ldu = LDU, ldvt = LDVT, info, lwork;
        double wkopt;
        double* work;
        /* Local arrays */
   /* iwork dimension should be at least 8*min(m,n) */
   int iwork[8*N];
    double s[N], u[LDU*M], vt[LDVT*N];
    double a[LDA*N] = {1,NAN,0, 0,1,0, 0,0,1};
    /* Query and allocate the optimal workspace */
    lwork = -1;
    char jobz= 'N';// dont ask for vectors, only singular values
    dgesdd_( &jobz, &m, &n, a, &lda, s, u, &ldu, vt, &ldvt, &wkopt,
      &lwork, iwork, &info );
    lwork = (int)wkopt;
    printf( " lwork=%d\n",  lwork);
    work = (double*)malloc( lwork*sizeof(double) );
    /* Compute SVD */
    dgesdd_( &jobz, &m, &n, a, &lda, s, u, &ldu, vt, &ldvt, work,
      &lwork, iwork, &info );
    printf( " info=%d\n",  info);
    /* Check for convergence */
    if( info > 0 ) {
      printf( "The algorithm computing SVD failed to converge.\n" );
      exit( 1 );
    }
    /* Print singular values */
    print_matrix( "Singular values", 1, n, s, 1 );

    /* Free workspace */
    free( (void*)work );
    return 0;
} /* End of DGESDD Example */

/* Auxiliary routine: printing a matrix */
void print_matrix( char* desc, int m, int n, double* a, int lda ) {
        int i, j;
        printf( "\n %s\n", desc );
        for( i = 0; i < m; i++ ) {
                for( j = 0; j < n; j++ ) printf( " %6.2f", a[i+j*lda] );
                printf( "\n" );
        }
}

possible xrefs:
statsmodels/statsmodels#3596
statsmodels/statsmodels#5396
https://pystatsmodels.narkive.com/2IKxOFVo/valueerror-on-entry-to-dlascl-parameter-number-4-had-an-illegal-value
https://forge.scilab.org/index.php/p/arpack-ng/issues/641/
https://www.ljll.math.upmc.fr/pipermail/freefempp/2015-November/004288.html

@martin-frbg
Copy link
Collaborator

Which platform and compiler ? I doubt that the code in question changed between recent versions (or in almost a decade), perhaps you are at the mercy of undefined or implementation-defined behaviour regarding the comparisons of the (NaN) matrix norm that precede the DLASCL call at https://github.com/Reference-LAPACK/lapack/blob/master/SRC/dgesdd.f#L603.
(it might make sense to add an ISNAN() check there to ensure consistent behaviour though - I see the NaN get propagated to the "Singular values" on Linux with gcc)

@jschueller
Copy link
Contributor Author

jschueller commented Jan 8, 2021

it happens on ubuntu 20.04 (gcc 9.3.0)

here is a possible patch, as a nan value is always different than itself :

diff --git a/SRC/dgesdd.f b/SRC/dgesdd.f
index 0218900d..0bc33e8b 100644
--- a/SRC/dgesdd.f
+++ b/SRC/dgesdd.f
@@ -599,6 +599,10 @@
 *     Scale A if max element outside range [SMLNUM,BIGNUM]
 *
       ANRM = DLANGE( 'M', M, N, A, LDA, DUM )
+      IF( ANRM.NE.ANRM ) THEN
+          INFO = 1
+          RETURN
+      END IF
       ISCL = 0
       IF( ANRM.GT.ZERO .AND. ANRM.LT.SMLNUM ) THEN
          ISCL = 1

jschueller added a commit to jschueller/lapack that referenced this issue Jan 8, 2021
jschueller added a commit to jschueller/lapack that referenced this issue Jan 8, 2021
@martin-frbg
Copy link
Collaborator

FWIW there is an auxiliary DISNAN() function in LAPACK (which also uses the self-inequality test under the hood). (And offhand I am not sure if doing an early return, reporting a non-convergence error here is the right thing to do, or if we should just enclose the subsequent IF block that wants to do the scaling in an "IF NOT DISNAN(ANRM)"..."ENDIF")

@jschueller
Copy link
Contributor Author

jschueller commented Jan 8, 2021

yes, i should use the isnan fonctions
I think its best to stop here than silently dropping the scaling

jschueller added a commit to jschueller/lapack that referenced this issue Jan 8, 2021
jschueller added a commit to jschueller/lapack that referenced this issue Jan 8, 2021
jschueller added a commit to jschueller/lapack that referenced this issue Jan 8, 2021
jschueller added a commit to jschueller/lapack that referenced this issue Jan 8, 2021
@julielangou julielangou added this to the LAPACK 3.9.1 milestone Mar 25, 2021
@martin-frbg
Copy link
Collaborator

Reopening for discussion as this change is causing problems in NumPy numpy/numpy#18914 , suggesting that there may be other high-profile codes that rely on the "time honored" behaviour. The documentations for xGESDD do not specifically mention NaN as an "illegal value" that would trigger an early return with INFO<0

@h-vetinari
Copy link
Contributor

Reopening for discussion

@martin-frbg, this issue wasn't actually reopened. Maybe it's better to open a new issue for this regression?

@martin-frbg martin-frbg reopened this May 11, 2021
@h-vetinari
Copy link
Contributor

This is still milestoned for 3.9.1 - I think it'd be cleaner to open a new one, and would be happy to do it if that's welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants