Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuSOLVER fails to compute eigenvalues for RMT #86

Open
spficklin opened this issue May 12, 2019 · 5 comments
Open

cuSOLVER fails to compute eigenvalues for RMT #86

spficklin opened this issue May 12, 2019 · 5 comments
Labels

Comments

@spficklin
Copy link
Member

When I run RMT I see what seems like odd behavior. I start thresholding around 0.95. It ticks down the values and always reports that unique egenvlues: 1, but the matrix size, in the example below is 1360. That seems odd that the matrix would be that big and only have one unique eigenvalue, especially considering that the next level (0.900) suddenly has 1398 eigenvalues (the size of the pruned matrix).

I think there may be a bug....

threshold:    0.901
prune matrix: 1360
warning: cuSOLVER ssyev returned 1358
eigenvalues: 1360
unique eigenvalues: 1
chi-squared: -1

threshold:    0.900
prune matrix: 1398
eigenvalues: 1398
unique eigenvalues: 1221
pace: 10, chi-squared: 264.024
pace: 11, chi-squared: 244.507
pace: 12, chi-squared: 221.733
@bentsherman
Copy link
Member

You got a warning from cuSOLVER: ssyev returned 1358, which means that the eigenvalue solver failed to compute all of the eigenvalues for the pruned matrix. The unique eigenvalues: 1 is just an artifact, really the warning should cause RMT to skip that threshold.

RMT can now offload the eigenvalue computation to the GPU using cuSOLVER, or it can use openblas which is multi-CPU. Try running RMT with the CPU implementation:

kinc settings set cuda none
kinc run rmt [...] --threads [num-cpus]

If the CPU implementation doesn't show this error then there's probably just a bug with the GPU code.

@spficklin
Copy link
Member Author

Yes, that fixed the problem. I suppose we should leave this open if this is a potential bug in the GPU code.

@bentsherman bentsherman changed the title RMT: unique eigenvalues: 1 cuSOLVER fails to compute eigenvalues for RMT May 17, 2019
@spficklin
Copy link
Member Author

@bentsherman I'm still getting this error on huge chunks of correlation values... say from 0.92 to 9.71. When I turn off the cuda setting I see a similar issue but just get -1 values. Here's the output file:

0.897   1317    985     78.3222
0.896   1371    1031    75.5462
0.895   1418    1078    72.0155
0.894   1460    1107    78.9224
0.893   1499    1153    80.4571
0.892   1544    1193    82.0969
0.891   1579    1       -1
0.890   1624    1       -1
0.889   1672    1       -1
0.888   1722    1       -1
0.887   1774    1       -1
0.886   1825    1       -1
0.885   1861    1       -1
0.884   1928    1       -1
0.883   1992    1       -1
0.882   2058    1       -1
0.881   2110    1       -1
0.880   2177    1       -1
0.879   2242    1       -1
0.878   2306    1       -1
0.877   2368    1       -1
0.876   2425    1       -1
0.875   2487    1       -1
0.874   2563    1       -1
0.873   2619    1       -1
0.872   2689    1       -1
0.871   2756    2347    485.371
0.892001

Thoughts?

@bentsherman
Copy link
Member

So at those thresholds the prune matrix has 1 unique eigenvalue... that could also just be an artifact of the LAPACKE solver failing, but LAPACKE and cuSOLVER are both configured to print a warning if they fail. I think we'll have to look at the prune matrix at these thresholds to see if something looks unusual. Something is going wrong with the eigenvalue computation.

@spficklin
Copy link
Member Author

This issue is almost a year old, but it is still a problem and needs fixing for folks who want to use RMT, so I'm going to leave it open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants