Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error new initialization #20

Open
adminnu opened this issue Aug 24, 2018 · 11 comments
Open

Error new initialization #20

adminnu opened this issue Aug 24, 2018 · 11 comments
Labels
bug Something isn't working

Comments

@adminnu
Copy link

adminnu commented Aug 24, 2018

Screen attache
192 168 0 100 2018-08-24 17-42-05

@adminnu
Copy link
Author

adminnu commented Aug 24, 2018

And on -d 0 such error, and on-d 1 there is no error, I study further

@adminnu
Copy link
Author

adminnu commented Aug 24, 2018

I tried with -d 2 also works, restarted the system, but still does not see 0 video card. After updating the last commit, I did not change anything.

@brichard19 brichard19 added the bug Something isn't working label Aug 25, 2018
@adminnu
Copy link
Author

adminnu commented Aug 29, 2018

If you return this change:
if(cuda::getDeviceCount() == 0) {
To:
if(cuda::getDeviceCount == 0) {

The code goes further, but a new error takes off:

[2018-08-29.19:15:52] [Debug:] Verifying points on device. This will take a while...
[2018-08-29.19:15:52] [Debug:] Validation failed: invalid point

if you call cuda::getDeviceCount(), which is true, I now have 3 cards connected, if the cuda::getDeviceCount() call returns, which is correct, I now have 3 cards connected

@brichard19
Copy link
Owner

brichard19 commented Aug 30, 2018

Hi,

Does the card show up in cudaInfo.exe ?

@adminnu
Copy link
Author

adminnu commented Aug 30, 2018

Yes, it opens. There are 3 cards 1080 ti, with numbers 0, 1, 2.
A little later I can send the output of this program.

@adminnu
Copy link
Author

adminnu commented Aug 30, 2018

By the way, the fact that he sees the map, this can serve as the first 4 lines in the screenshot in the first message. That is, he sees it, but does not work with it. The error appeared exactly after 0.15 release.

@adminnu
Copy link
Author

adminnu commented Aug 31, 2018

Found 3 devices

ID: 0
Name: GeForce GTX 1080 Ti
Capability: 6.1
MP: 28
Cores: 1792 (64 per MP)
Memory: 11264MB

ID: 1
Name: GeForce GTX 1080 Ti
Capability: 6.1
MP: 28
Cores: 1792 (64 per MP)
Memory: 11264MB

ID: 2
Name: GeForce GTX 1080 Ti
Capability: 6.1
MP: 28
Cores: 1792 (64 per MP)
Memory: 11264MB

@adminnu
Copy link
Author

adminnu commented Aug 31, 2018

I tried with the new release, 0.17, now it's like this:

[2018-08-31.21:45:51] [Info] Compression: compressed
[2018-08-31.21:45:51] [Info] Starting at: ....
[2018-08-31.21:45:51] [Info] Initializing GeForce GTX 1080 Ti
[2018-08-31.21:45:52] [Info] Generating 29,360,128 starting points (1120.0MB)
[2018-08-31.21:46:23] [Info] 10.0%
[2018-08-31.21:46:23] [Info] 20.0%
[2018-08-31.21:46:23] [Info] 30.0%
[2018-08-31.21:46:23] [Info] 40.0%
[2018-08-31.21:46:23] [Info] 50.0%
[2018-08-31.21:46:23] [Info] 60.0%
[2018-08-31.21:46:23] [Info] 70.0%
[2018-08-31.21:46:23] [Info] 80.0%
[2018-08-31.21:46:23] [Info] 90.0%
[2018-08-31.21:46:23] [Info] 100.0%
[2018-08-31.21:46:23] [Info] Done
[2018-08-31.21:46:25] [Info] Error: unspecified launch failure Exiting.

Since I had one power supply burned, and one had 2 cards and a system, on the other 3 cards. Now 3 cards (2 are working, one is not) and the system, I thought, it may all be that there is not enough power supply.
So I turned off one card and run it with 2, but the error is the same, does not see the 0-card.

@adminnu
Copy link
Author

adminnu commented Aug 31, 2018

There was an interesting thing, I accidentally launched the first card. I tried again, an error, then it was launched for the fifth time, then it does not start again. I noticed that the start occurs when the initialization percentage is slower moving, that is, there is some timeout.

I dug on this topic, found such topics: https://devtalk.nvidia.com/default/topic/1028094/unspecified-launch-failure-error/
Also talk about timeout, tdrdelay

Maybe this will help, thank you.

@adminnu
Copy link
Author

adminnu commented Sep 2, 2018

Since I know now, I try to run a lot of times, for some it starts, in the screenshot it's clear that this is 5-20 attempts, maybe the launch is due to the fact that the 3rd card fell off at that moment, by the way, it has since risen from first attempt.
I think that everything looks like this, initialization begins, then the percentages in the console are running, the application checks the initialized points and in some cases, not yet understandable for me, for me, for example, this is the 0th card, initialization does not have time to fully occur, and since At this point, a check occurs, an error occurs. Somehow, thank you.

192 168 0 100 2018-09-02 16-13-19

@adminnu
Copy link
Author

adminnu commented Sep 17, 2018

@brichard19 did my study of this error help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants