-
Notifications
You must be signed in to change notification settings - Fork 384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Radeon VII recurring error with gpuOwl PRP #873
Comments
|
@valeriob01 , can you provide the full dmesg after the issue been observed? You can use this cmd: |
I am sorry no, the story is that I have gone further and installed Debian 10.1, and ROCm 2.9, since then the error seems to be disappeared. |
Regarding the steps to trigger the issue, just run gpuowl and watch the output. I would go so far to admit I use gpuowl as an indicator of GPU health, it may very well someday become the "mprime stress test " equivalent for GPUs. |
Is this issue still reproducible? If not, can we please close it? Thanks! |
Original ticket is more than a year old and the person that opened ticket originally has not responded to the latest request. If this is still an issue, please file a new ticket and we will happy to investigate it. Thanks! |
At first I have ignored this error thinking it was a buggy gpu, but now with the second Radeon VII it is happening exactly the same error: all-zero residues, like if the program is reading from some wrong location, or a memory page has been evicted underneath.
(non-zero residues redacted).
System: Debian 10
2019-08-23 07:08:23 90166123 73860000 81.91%; 996 us/sq; ETA 0d 04:31; xxxxxxxxxxxxxxxx
2019-08-23 07:08:33 90166123 73870000 81.93%; 996 us/sq; ETA 0d 04:30; xxxxxxxxxxxxxxxx
2019-08-23 07:08:43 90166123 73880000 81.94%; 993 us/sq; ETA 0d 04:29; 0000000000000000
2019-08-23 07:08:53 90166123 73890000 81.95%; 992 us/sq; ETA 0d 04:29; 0000000000000000
2019-08-23 07:09:03 90166123 73900000 81.96%; 992 us/sq; ETA 0d 04:29; 0000000000000000
2019-08-23 07:09:13 90166123 73910000 81.97%; 991 us/sq; ETA 0d 04:29; 0000000000000000
2019-08-23 07:09:23 90166123 73920000 81.98%; 992 us/sq; ETA 0d 04:29; 0000000000000000
2019-08-23 07:09:33 90166123 73930000 81.99%; 992 us/sq; ETA 0d 04:28; 0000000000000000
2019-08-23 07:09:43 90166123 73940000 82.00%; 992 us/sq; ETA 0d 04:28; 0000000000000000
2019-08-23 07:09:53 90166123 73950000 82.01%; 992 us/sq; ETA 0d 04:28; 0000000000000000
2019-08-23 07:10:03 90166123 73960000 82.03%; 992 us/sq; ETA 0d 04:28; 0000000000000000
2019-08-23 07:10:12 90166123 73970000 82.04%; 992 us/sq; ETA 0d 04:28; 0000000000000000
2019-08-23 07:10:22 90166123 73980000 82.05%; 991 us/sq; ETA 0d 04:27; 0000000000000000
2019-08-23 07:10:32 90166123 73990000 82.06%; 991 us/sq; ETA 0d 04:27; 0000000000000000
2019-08-23 07:10:43 90166123 EE 74000000 82.07%; 992 us/sq; ETA 0d 04:27; 0000000000000000 (check 1.10s)
2019-08-23 07:10:43 90166123.owl loaded: k 73000000, block 1000, res64 xxxxxxxxxxxxxxxx
2019-08-23 07:10:55 90166123 73010000 80.97%; 1133 us/sq; ETA 0d 05:24; xxxxxxxxxxxxxxxx
The text was updated successfully, but these errors were encountered: