-
Notifications
You must be signed in to change notification settings - Fork 464
Possible bug in dgesdd
routine
#672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It is worth mentioning that the data that crashes |
…LaPack error again ( ??? the same as Reference-LAPACK/lapack#672 ??? ) without it; unfortunately it was without a seed and I cannot reproduce it. This time, with seed(0) it completed
I found and extracted another case for this possible bug while working with Insurance data set. It is a full rank 5344x255 matrix. To reproduce it, please run:
|
I found and extracted another case for this possible bug while working with Insurance data set some more. It is a full rank 5344x258 matrix this time. This is the last one I report here: I don't want to create a mess out of this thread, but I hope that 3 different (but no doubt: similar) cases may add something meaningful to the analysis of the underlying problem. The next ones would undoubtedly add very little value, if any. The way they were all stumbled upon concerns a sample of a random 10% portion of the Insurance data set (hence equal number of rows in all 3 examples) and further analysing it (including attempting the SVD). So it seems that quite often the SVD would fail (out of 200 attempted random runs, I usually get this error in the first 20 runs, no matter the seed value at the beginning). You can reproduce this final example with
I have verified, that (in general) removing random rows or removing columns of that special matrix makes it pass the SVD, but one can also remove rows of that matrix in a certain order, obtaining a sequence of matrices failing SVD computation:
|
I couldn't reproduce the error on my Linux machine with LAPACK 3.9.0 through OpenBLAS. Does your R also use OpenBLAS? My system:> La_version()
[1] "3.9.0"
> print(sessionInfo())
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3 I tried:
The singular values:
> load_github_data("https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_svd.RData")
> dim(crashes_svd)
[1] 5344 263
> s <- svd(crashes_svd)
> s$d
[1] 80.237581 78.035925 76.401927 75.156859 74.909977 74.682839 74.405535
[8] 74.182772 73.978817 73.748076 73.715598 73.658436 73.596700 73.547603
[15] 73.484632 73.414763 73.399336 73.384719 73.375879 73.354423 73.330749
[22] 73.314610 73.292720 73.285536 73.278840 73.274262 73.266730 73.257655
[29] 73.250795 73.244416 73.239847 73.235621 73.229620 73.223629 73.219221
[36] 73.215167 73.209824 73.205479 73.201319 73.191746 73.191746 73.191746
[43] 73.191746 73.191746 73.187600 73.184882 73.184882 73.184882 73.179223
[50] 73.174548 73.171161 73.168689 73.164303 73.164303 73.164303 73.164303
[57] 73.160428 73.157447 73.157447 73.157447 73.154329 73.150592 73.150592
[64] 73.150592 73.150592 73.150592 73.150592 73.150592 73.145491 73.143740
[71] 73.143740 73.140924 73.136890 73.136890 73.136890 73.136890 73.136890
[78] 73.136890 73.136890 73.136890 73.136890 73.136890 73.132298 73.130042
[85] 73.130042 73.130042 73.130042 73.130042 73.130042 73.126859 73.123195
[92] 73.123195 73.123195 73.123195 73.123195 73.123195 73.123195 73.123195
[99] 73.123195 73.123195 73.123195 73.123195 73.123195 73.123195 73.123195
[106] 73.120294 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[113] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[120] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[127] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[134] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[141] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[148] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[155] 73.116351 73.116351 73.111979 73.109508 73.109508 73.109508 73.109508
[162] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[169] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[176] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[183] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[190] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[197] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[204] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[211] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[218] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[225] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[232] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[239] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[246] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[253] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[260] 73.109508 73.109508 73.109508 2.081634
> load_github_data("https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_svd_1.RData")
> dim(crashes_svd)
[1] 5344 255
> s <- svd(crashes_svd)
> s$d
[1] 80.338635 78.125716 76.424973 75.044972 74.936751 74.684774 74.394142
[8] 74.146391 73.951330 73.844023 73.755771 73.678289 73.598102 73.571031
[15] 73.532607 73.463722 73.374985 73.361678 73.338794 73.326327 73.313453
[22] 73.288042 73.285106 73.281151 73.273964 73.265097 73.260490 73.253863
[29] 73.246726 73.242377 73.234456 73.224807 73.219221 73.219221 73.219221
[36] 73.219221 73.208610 73.200970 73.191746 73.189781 73.184882 73.184882
[43] 73.184882 73.181404 73.178021 73.178021 73.175230 73.171161 73.171161
[50] 73.171161 73.171161 73.171161 73.165983 73.164303 73.160617 73.157447
[57] 73.157447 73.154335 73.150592 73.150592 73.150592 73.148074 73.143740
[64] 73.143740 73.143740 73.143740 73.143740 73.143740 73.143740 73.143740
[71] 73.139859 73.136890 73.136890 73.136890 73.136890 73.136890 73.136890
[78] 73.134036 73.130042 73.130042 73.130042 73.130042 73.130042 73.130042
[85] 73.130042 73.130042 73.130042 73.130042 73.130042 73.130042 73.130042
[92] 73.130042 73.126754 73.123195 73.123195 73.123195 73.123195 73.123195
[99] 73.123195 73.123195 73.123195 73.123195 73.123195 73.123195 73.123195
[106] 73.123195 73.123195 73.123195 73.123195 73.123195 73.123195 73.123195
[113] 73.123195 73.123195 73.123195 73.123195 73.123195 73.123195 73.123195
[120] 73.119217 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[127] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[134] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[141] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[148] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[155] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[162] 73.116351 73.116351 73.116351 73.111642 73.109508 73.109508 73.109508
[169] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[176] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[183] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[190] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[197] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[204] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[211] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[218] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[225] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[232] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[239] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[246] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[253] 73.109508 73.109508 1.041707
> load_github_data("https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_svd_2.RData")
> dim(crashes_svd)
[1] 5344 258
> s <- svd(crashes_svd)
> s$d
[1] 80.027683 78.131810 76.515927 75.051548 75.004976 74.826565 74.446285
[8] 74.117935 73.882036 73.822317 73.771688 73.720786 73.636142 73.559170
[15] 73.533816 73.479562 73.420415 73.345030 73.314114 73.308726 73.305097
[22] 73.298845 73.292099 73.285671 73.281151 73.276671 73.270169 73.261310
[29] 73.253607 73.249163 73.240953 73.232969 73.229093 73.223585 73.219221
[36] 73.215430 73.210622 73.205479 73.205479 73.205479 73.205479 73.195552
[43] 73.191746 73.187204 73.182422 73.178021 73.178021 73.174248 73.171161
[50] 73.171161 73.166166 73.161889 73.157447 73.155778 73.150592 73.150592
[57] 73.150592 73.150592 73.150592 73.147917 73.143740 73.143740 73.143740
[64] 73.143740 73.143740 73.143740 73.143740 73.143740 73.143740 73.139893
[71] 73.136890 73.136890 73.136890 73.136890 73.136890 73.136890 73.136890
[78] 73.136890 73.133319 73.130042 73.130042 73.130042 73.130042 73.130042
[85] 73.130042 73.130042 73.130042 73.130042 73.130042 73.130042 73.130042
[92] 73.126359 73.123195 73.123195 73.123195 73.123195 73.123195 73.123195
[99] 73.123195 73.123195 73.123195 73.123195 73.123195 73.123195 73.123195
[106] 73.123195 73.123195 73.123195 73.123195 73.123195 73.119713 73.116351
[113] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[120] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[127] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[134] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[141] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[148] 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351 73.116351
[155] 73.116351 73.111963 73.109508 73.109508 73.109508 73.109508 73.109508
[162] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[169] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[176] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[183] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[190] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[197] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[204] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[211] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[218] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[225] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[232] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[239] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[246] 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508 73.109508
[253] 73.109508 73.109508 73.109508 73.109508 73.109508 1.040376 |
No, on the Windows machine I don't have OpenBLAS. The Ubuntu machine is currently down, I will check & let you know when it gets up.
|
But I had the same error on another machine with an older R version (3.6.3) in Ubuntu with OpenBLAS:
|
Ok. I could reproduce the issue using https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_svd.RData in my machine: > La_version()
[1] "3.9.0"
> print(sessionInfo())
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 So, no issue using OpenBLAS; and |
Thank you. I installed OpenBlas on the Ubuntu machine and I confirm that this workaround works for all 3 isolated cases. For me it is sufficient as I can push my research ahead. Great thanks! |
Unfortunately, OpenBLAS creates problems of its own it seems. I have isolated another matrix which passes SVD on a machine without OpenBLAS, but fails SVD with OpenBLAS. Please check this code:
My system
|
Thanks for collecting this additional information! We are working to try to find the problem. |
No crash with current OpenBLAS (0.3.20) - Ubuntu 20.04 LTS apparently ships with outdated 0.3.8 (that probably hits some unrelated "historic" bug). DNRM2 may be a good candidate for different corner case behaviour in OpenBLAS vs netlib. |
All this is very interesting. Some thoughts: (1) we should keep these matrices in our torture test suite, it seems like they trigger some difficulties; (2) maybe getting the bidiagonal matrices is better than the full matrices; (3) one day, some days, in addition to always testing the current version of LAPACK, we should test older version, count the number of bugs we fixed, and the speedup we got, I think, we'll feel good about our work for the last x years; (4) let us keep on digging to try to identify the issue |
I managed to isolate the matrix from https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_svd.RData into a text file, and then run an example in using C. Let me go step-by-step so we can use it to gather more information about this issue.
> library(Rfssa)
> load_github_data("https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_svd.RData")
> write.table(format(t(crashes_svd), scientific=TRUE, digits=22), file="crashes_svd.txt") The following command in R shows the undesirable output ``: > print(sessionInfo())
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
(...)
> crashes_svd <- read.table(file="crashes_svd.txt")
> svd(t(crashes_svd))
Error in La.svd(x, nu, nv) : error code 1 from Lapack routine 'dgesdd'
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#ifndef lapack_int
#define lapack_int int
#endif
lapack_int min(lapack_int a, lapack_int b)
{
return (a > b) ? b : a;
}
void dgesdd_(
char const* jobz,
lapack_int const* m, lapack_int const* n,
double* A, lapack_int const* lda,
double* S,
double* U, lapack_int const* ldu,
double* VT, lapack_int const* ldvt,
double* work, lapack_int const* lwork,
lapack_int* iwork,
lapack_int* info );
int main(int argc, char *argv[])
{
// sizes
lapack_int n, m;
scanf( "%d", &m );
scanf( "%d", &n );
// allocate
double *A = (double*) malloc( (m*n) * sizeof(double) );
double *S = (double*) malloc( min(m,n) * sizeof(double) );
double *U = (double*) malloc( (m*m) * sizeof(double) );
double *V = (double*) malloc( (n*n) * sizeof(double) );
// read col-major data
for (int i = 0; i < m*n; i++)
{
long double aux;
scanf( "%Lf", &aux );
A[i] = aux;
}
// workspaces
lapack_int *iwork = (lapack_int*) malloc( (8*min(m,n)) * sizeof(lapack_int) );
lapack_int lwork = -1;
// work query
lapack_int info = 0;
double work_size = 0;
dgesdd_(
"A",
&m,&n,A,&m,
S,
U,&m,
V,&n,
&work_size,&lwork,iwork,
&info
);
// allocate workspace
double *work = (double*) malloc( work_size * sizeof(double) );
lwork = work_size;
// run
dgesdd_(
"A",
&m,&n,A,&m,
S,
U,&m,
V,&n,
work,&lwork,iwork,
&info
);
// print info
printf( "%d\n", info );
// print singular values
printf( "\nSingular values:\n" );
for (int i = 0; i < min(m,n); i++)
{
printf( "%lf ", S[i] );
}
return 0;
}
$ gcc test_dgesdd.c -o test_dgesdd /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
$ ./test_dgesdd < crashes_svd2.txt
0
Singular values:
138.921459 135.819739 135.520150 134.976011 127.236185 126.719818 124.754208 123.587351 120.010983 116.994755 116.684704 114.853547 113.646145 112.657777 111.713019 111.507807 111.179755 111.150557 110.816460 110.293399 109.425710 109.330495 109.061578 108.966178 108.651045 107.629418 106.823530 106.648234 106.364113 106.315408 106.131774 105.611349 105.252831 105.161095 105.128230 104.798008 104.504731 104.276730 101.742746 101.102745 98.589703 97.881927 97.539067 97.362174 97.326156 97.171976 96.958740 96.323631 95.481025 95.164000 95.062074 94.834365 94.535047 94.206671 93.801462 93.242641 92.775750 92.380965 91.346124 90.589142 88.894442 88.663396 87.867799 87.500949 87.314072 87.046753 86.995633 86.766978 86.407967 86.138031 86.041301 85.889684 85.769190 85.707008 85.574748 85.186597 84.857056 84.773774 84.644520 84.027815 83.792789 83.721602 83.210783 83.134147 82.830668 82.645403 82.531031 82.074604 81.602327 80.878397 80.327302 79.982995 79.624131 79.472658 79.282894 78.884349 78.735949 78.506006 78.253994 78.044852 77.624866 77.304068 77.067689 77.004966 76.340971 76.181077 75.343331 74.474760 73.278096 72.919688 72.328948 72.000425 71.693862 71.459548 70.696953 70.590415 69.861243 68.660174 68.221902 68.131634 67.432133 67.371519 66.960725 66.421092 66.079473 65.643671 65.518510 65.461814 65.096926 64.643686 64.416828 64.087087 63.977570 63.404865 62.540327 62.146893 61.926394 61.695142 61.137044 60.897390 60.621789 60.281927 60.144524 59.992570 59.833020 59.278489 59.071738 59.015457 58.538994 58.401949 58.217545 58.081582 57.900982 57.808147 57.756721 56.721582 56.373733 56.089498 55.675927 55.536675 55.171722 55.131030 54.755947 54.663055 54.480378 54.158751 54.029876 53.859765 53.723131 53.544550 53.217714 52.860974 52.193011 51.649154 51.272390 51.188479 50.745404 50.594944 50.268060 50.214799 50.122710 49.857880 49.547364 49.275911 48.780430 48.446466 47.564610 47.367113 46.953331 46.691890 46.293246 46.279272 45.995560 45.647042 45.460969 45.201356 45.102905 45.047106 44.929637 44.629221 44.298290 44.044366 44.020477 43.704117 43.401700 42.621339 42.579407 42.416868 42.018328 41.691898 41.425460 40.498291 40.395045 40.088446 39.280906 38.805429 38.572036 37.916318 37.633313 37.285791 36.482753 36.067144 35.765946 35.458111 35.046091 34.473421 34.380670 34.248538 34.057086 33.664821 33.199721 32.948524 32.346531 32.150851 31.206320 30.873987 30.561778 30.127724 29.786624 29.356329 28.791116 28.582062 28.182265 28.084910 27.701852 27.397549 27.187937 26.449320 26.041613 25.685849 25.395753 25.214940 24.436499 23.799655 22.067128 20.960366 19.984624 19.936679 19.558175 18.505146 17.745319 16.961943 11.628948
|
Hi @SzymonNowakowski. Could you try to use my input (https://github.com/Reference-LAPACK/lapack/files/9057573/crashes_svd2.zip) in your machines? Thanks |
Hi Weslley,
The two machines are the ones with very different R versions:
and
But obviously the If you want me to check something more, I gladly will, but in the morning (it is midnight in Poland). Thank you for having a look into this issue. |
The matrix I provided before, crashes_svd2.zip, had an error. I switched $ gcc test_dgesdd.c -o test_dgesdd /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0 /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
weslleyp@weslleyp-XPS-15-9510:~/storage/ballistic/lapack-issue-672$ ./test_dgesdd < crashes_svd3.txt
1
Singular values:
0.956577 0.956577 0.956577 0.956577 0.956577 0.956577 0.956577 0.956577 0.956577 0.956577 0.956577 0.956577 0.956577 0.956666 0.956666 0.956666 0.956666 0.956666 0.956666 0.956666 0.956666 0.956666 0.956666 0.956666 0.956666 0.956756 0.956756 0.956756 0.956756 0.956756 0.956756 0.956756 0.956756 0.956756 0.956756 0.956845 0.956845 0.956845 0.956845 0.956845 0.956935 0.956935 0.956935 0.956935 0.956935 0.956935 0.956935 0.956935 0.956935 0.957025 0.957025 0.957114 0.957114 0.957114 0.957114 0.957114 0.957114 0.957204 0.957204 0.957204 0.957294 0.957294 0.957294 0.957294 0.957351 0.957383 0.957563 0.957563 0.957563 0.957653 0.957653 0.957653 0.957653 0.957833 0.958012 0.958282 0.958732 1.049842 1.021035 0.999655 0.983365 0.980134 0.977163 0.973534 0.970620 0.967951 0.964932 0.964507 0.963759 0.962951 0.962309 0.961485 0.960571 0.960369 0.960178 0.960062 0.959781 0.959472 0.959260 0.958974 0.958880 0.958792 0.958634 0.958515 0.958425 0.958342 0.958227 0.958148 0.958070 0.957959 0.957889 0.957778 0.957653 0.957599 0.957489 0.957428 0.957243 0.957163 0.957114 0.957048 0.956988 0.956935 0.956875 0.956845 0.956804 0.956756 0.956756 0.956718 0.956609 0.956577 0.027236 0.956666 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 1.000000 1.000000 1.000000 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999906 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.999813 0.956577
|
More tests:
|
Thanks for those comparisons. You are working at the levels too deep for me to follow. Thus I just want to add one more piece of information from my side, since I am not sure if you have it already covered in those 4 txt files: I had the same error in LAPACK 3.10.0 but in Windows:
|
More information: I still couldn't reproduce this problem on Windows. On my Linux machine, using your data https://github.com/SzymonNowakowski/DMRnet/blob/testing_branch/data/crashes_svd.RData or, equivalently, https://github.com/Reference-LAPACK/lapack/files/9059133/crashes_svd3.zip:
|
Thank you so much for your work with this. So basically you are saying that (apart from my oldish Windows 7 machine) on LAPACK 3.10.0 it is a non-issue. After I am back from my vacations in Sep I will upgrade one of the Linux machines and rerun full-scale tests but with LAPACK 3.10.0, we'll see if no new problematic matrices pop out. It was the case with OpenBLAS workaround earlier. |
I just couldn't reproduce your problem in my machine, and had no time to try it on Windows.
That would be very helpful. Thanks! It would also be helpful if you could isolate in a file one matrix where your problem appear. Or if you could test this matrix out of R, using dgesdd directly from LAPACK 3.10.0 or more recent versions. |
Dear Weslley, Previously identified matrices are OK in LAPACK 3.10.3I upgraded a Linux Machine with Ubuntu 20.04 LTS to the newest LAPACK 3.10.3. I confirm that the problematic matrices pass successfully throughout an SVD() call. This I have confirmed for all previously considered matrices: New problematic matrices - problem persistsHowever, as agreed, I've also rerun the full scale tests. Very quickly (within roughly a promile of all runs) a new problematic matrix was identified. I uploaded it to my github data folder as
Summary of my findings so farA family of numerically unstable matrices was identified as far as SVD is concerned. Each individual version of the algorithm (Lapack+BLAS 3.9.0, OpenBLAS, Lapack+BLAS 3.10.3) handles those matrices a bit differently, it seems, so it is able to handle the matrices that are problematic to other versions. However, a fundamental reason seems to have remained so far unsolved, and I have provided matrices causing problems for each of those 3 versions. Unfortunately it is unclear for me how to isolate the matrix that gets passed directly into |
The core of my work lately is with large matrices being close to ill-defined. Thus I have been working with another dataset lately (airbnb dataset) and came upon a very similar behaviour in terms of SVD. This may or may not be related to the problem discussed in this thread. The similarity of behaviour convinced me to report it here. This dataset is completely unrelated, so the below matrices would be completely independent from the previous matrices, sheding possibly some new light upon the problem considered in this issue. I have isolated and uploaded two matrices:
Reproducing the issueLAPACK 3.9.0
LAPACK 3.10.3
|
Thanks for the follow up! I will repeat my tests on those matrices and see if I can get any sense of what is happening. |
I came across a similar behaviour with a different matrix. In R, I am doing the following: base::La.svd(my.mat, nu = 0) and get: Error in La.svd(x = my.mat, nu = 0) : This occurs on my machine, with R sessionInfo(): R version 4.3.0 (2023-04-21) Matrix products: default Random number generation: locale: time zone: Europe/Zurich attached base packages: loaded via a namespace (and not attached): However, the same code works on a different machine, with R sessionInfo(): R version 4.0.5 (2021-03-31) Matrix products: default locale: attached base packages: loaded via a namespace (and not attached): and Lapack Version 3.9.0 I attached my.mat as a .csv file for reference. |
For the sake of information, I have tried the test code using MKL, and it worked fine! |
Thank you - "the test code" here being which of the several matrices that got posted in this issue thread over the past (almost) two years ? A lot here seems to depend on tiny numerical differences between code changes, making some versions more likely to handle a given matrix, but so far nobody has confidently identified the (or a) vulnerable part of the algorithm |
I was mentioning one of the first comments:
|
I need to mention that my matrix is probably ill-specified. It originated due to a bug in my code. It could well be that this bug led to a numerical instability in the |
@SzymonNowakowski I am having the same problems, in order to fix the crashing, do you therefore recommend trying to use the OpenBLAS library. Thanks. |
@igorkf what is MKL, and how can I use this in R. Should I try MKL instead of BLAS? |
I don't know which one you should try, but here is a tutorial on how to install different BLAS libraries: https://csantill.github.io/RPerformanceWBLAS/ |
Dear All
I have discovered that my matrix was the result of a model misspecification, so it is probably ill-specified.
Numerical difficulties with this kind of matrix have to be expected.
Please consider the case with my matrix solved and remove me from this mailing list.
Best regards
… On 30 Apr 2024, at 15:18, Dylan Dijk ***@***.***> wrote:
@igorkf <https://github.com/igorkf> what is MKL, and how can I use this in R. Should I try MKL instead of BLAS?
—
Reply to this email directly, view it on GitHub <#672 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A5Z7GKBURRAP52TXXX3F73TY76K3FAVCNFSM5WLSEVRKU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBYGUZTCNRSGEYQ>.
You are receiving this because you commented.
|
Using Weslley's reproducer, I bisected the issue with the crashes_svd3 matrix to stop with commit 4e40cbe06058e34bea9ed37c4e0dc39bed5d94a0. That is the commit when Further, I could verify that this commit is also the one where |
Thanks for catching that @angsch. Do you know if this is an accuracy issue with NRM2? If so, would you be able to obtain a matrix whose norm changes by using the different implementations of NRM2? Also, would you know if this is a problem of DNRM2 or ZNRM2? |
Another MWE from Julia that reproduces the crash in LAPACK: using LinearAlgebra
using DelimitedFiles
f = download("https://gist.githubusercontent.com/juliohm/575a6eca8e56525c3ba2a03423d773e9/raw/6f6d51abeefc4d4cd0472520952ba7582a97d453/matrix.tsv")
A = readdlm(f)
svd(A) ERROR: LAPACKException(1)
Stacktrace:
[1] chklapackerror_positive(::Int64)
@ LinearAlgebra.LAPACK ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/lapack.jl:44
[2] chklapackerror
@ ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/lapack.jl:40 [inlined]
[3] gesdd!(job::Char, A::Matrix{Float64})
@ LinearAlgebra.LAPACK ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/lapack.jl:1714
[4] _svd!
@ ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/svd.jl:125 [inlined]
[5] svd!(A::Matrix{Float64}; full::Bool, alg::LinearAlgebra.DivideAndConquer)
@ LinearAlgebra ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/svd.jl:105
[6] svd!
@ ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/svd.jl:100 [inlined]
[7] svd(A::Matrix{Float64}; full::Bool, alg::LinearAlgebra.DivideAndConquer)
@ LinearAlgebra ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/svd.jl:179
[8] svd(A::Matrix{Float64})
@ LinearAlgebra ~/.julia/juliaup/julia-1.11.5+0.x64.linux.gnu/share/julia/stdlib/v1.11/LinearAlgebra/src/svd.jl:178
[9] top-level scope
@ REPL[5]:1
My platform information: julia> versioninfo()
Julia Version 1.11.5
Commit 760b2e5b739 (2025-04-14 06:53 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 8 × Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, skylake)
Threads: 8 default, 0 interactive, 4 GC (on 8 virtual cores)
Environment:
JULIA_NUM_THREADS = 8 |
Hi Júlio, For information, I tried to reproduce the issue on my laptop (MacOS) and I was not able to. I tried to use 8 threads. (I thought this could be the root of the issue.) My platform information is given below. But thanks for sending the info to reproduce the bug. Julien.
|
Uh oh!
There was an error while loading. Please reload this page.
I believe I found a bug in
LAPACK
insvd(.)
/dgesdd
function:Problem description
R routine
svd(.)
fromLAPACK
crashes indgesdd
on a full rank matrix with dimensions 5344x263.Problem replication
But
crashes_svd
is a full rank matrix, it should decompose insvd(.)
:> qr(crashes_svd)$rank
[1] 263
> dim(crashes_svd)
[1] 5344 263
EDIT 07/04/2022:
The above example (a matrix 5344x263) is minimal to reproduce the bug - If you remove a row or a column from that matrix,svd(.)
successfully decomposes a resulting matrix into SVD components.I have verified, that (in general) removing random rows or columns of that special matrix makes it pass the SVD, but one can also remove them in a certain order, obtaining a sequence of matrices failing SVD computation:
END OF EDIT 07/04/2022
Software versions
Windows 7,
R
4.2.0,LAPACK
3.10.0:> La_version()
[1] "3.10.0"
> print(sessionInfo())
Windows 7,
R
4.1.0,LAPACK
3.9.0:> La_version()
[1] "3.9.0"
> print(sessionInfo())
Ubuntu 20.04,
R
4.1.2,LAPACK
3.9.0:> La_version()
[1] "3.9.0"
> print(sessionInfo())
Checklist
The text was updated successfully, but these errors were encountered: