Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

amdgpu 0000:0a:00.0: GPU reset begin! after some time played #321

Closed
zaggynl opened this issue Sep 4, 2019 · 9 comments
Closed

amdgpu 0000:0a:00.0: GPU reset begin! after some time played #321

zaggynl opened this issue Sep 4, 2019 · 9 comments
Assignees

Comments

@zaggynl
Copy link

zaggynl commented Sep 4, 2019

Your system information

Please describe your issue in as much detail as possible:

Describe what you expected should happen and what did happen. Please link any large pastes as a Github Gist.
After some time playing screen freezes, sounds/network continue, using voicecom with friends.
syslog shows: https://gist.github.com/zaggynl/87c57f9ee0ffffbfadaa5afb7367cfb7
Using kernel 5.0.0-27-generic (Ubuntu 18.04 HWE kernel)
ACO PPA for Mesa per https://steamcommunity.com/app/221410/discussions/0/1640915206474070669/
Videocard: RX Vega 56 reference model with everything stock.
PSU: Seasonic Focus Plus 550W Platinum

Steps for reproducing this issue:

  1. Play a couple rounds Dota 2 against bots
  2. ???
  3. GPU driver reset, have to reset PC
@kisak-valve
Copy link
Member

Hello @zaggynl, can you reproduce the issue with RADV_PERFTEST=llvm %command% in the game's launch options?

If the game works fine with that launch option, then that hints you've encountered an issue with the experimental driver and it should reported over at https://github.com/daniel-schuermann/mesa/issues.

@zaggynl
Copy link
Author

zaggynl commented Sep 7, 2019

Can confirm this occurs with and without the RADV_PERFTEST=llvm %command% launch option.
Am adding temperature monitoring with below:

#!/bin/bash
while true
do
        file="/sys/class/drm/card0/device/hwmon/hwmon1/temp1_input"
        value=$(cat "$file")
        divide=1000
        temp=$((value / divide))
        timestamp=$(date +"%Y-%m-%d %T")
        CEL=$'\xe2\x84\x83 '
        echo $timestamp $temp$CEL >> gputemplog.txt
        echo "$timestamp $temp$CEL and logging to $PWD/gputemplog.txt"
        sleep 1
done

Edit:
Seems to hover around 75°C, I'll try forcing fanspeed 120/255 with:

echo 1  >   /sys/class/drm/card0/device/hwmon/hwmon1/pwm1_enable 
echo 120  > /sys/class/drm/card0/device/hwmon/hwmon1/pwm1

@zaggynl
Copy link
Author

zaggynl commented Sep 16, 2019

fanspeed did not make a change

will now try opengl instead of vulkan renderer

@zaggynl
Copy link
Author

zaggynl commented Sep 23, 2019

No differences between renderer.
Now trying borderless window instead of desktop-friendly fullscreen.

@zaggynl
Copy link
Author

zaggynl commented Sep 24, 2019

No differences with borderless window mode, enabled v-sync.

@zaggynl
Copy link
Author

zaggynl commented Sep 26, 2019

Still crashes with v-sync enabled, I'm not sure what to try else.
Should I provide any other logs to debug?

Trying this replay https://medium.com/layerth/benchmarking-dota-2-83c4322b12c0.
Started Dota 2 with launch options +exec_async benchmark +demo_quitafterplayback
But did not crash.
If it crashes again in one of my games I'll save the replay, maybe I can reproduce the crash that way.

Edit; some people are having luck with R600_DEBUG=nodcc %command%, will try that.

@zaggynl
Copy link
Author

zaggynl commented Oct 2, 2019

R600_DEBUG=nodcc %command% made no difference

Appear unable to download match replay, remains stuck at downloading: http://i.imgur.com/kG3zmUf.png

Tried through https://www.opendota.com/matches/5051516611, also no luck.

Can't find the match ID in: ~/.steam/steam/steamapps/common/dota 2 beta/game/dota/replays/
Manually placed replays work.

@zaggynl
Copy link
Author

zaggynl commented Oct 5, 2019

Tried using Proton 4.11-6, GPU also appears to crash a couple minutes into a botmatch, no entries left in syslog, just a line of @ symbols.

@zaggynl
Copy link
Author

zaggynl commented Oct 7, 2019

No differences between renderer.
Now trying borderless window instead of desktop-friendly fullscreen.

Reading back, seeing as my GPU also crashed during opengl it is no long relevant to this git repo.
This is likely a heat issue with the card itself, I've swapped it with a loaner Geforce GTX 1070 Ti and all issues disappeared.

Gave some thought to undervolting but apparently this proves troublesome under Linux:
Have to fool around with powertables?
https://forum.level1techs.com/t/how-to-overclock-vega-on-linux/132771/66
ROCm/ROCm#463

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants