Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrupted "snow"-looking artifacts and glitches, only on SteamVR system menus in VR grid mode #202

Closed
interfect opened this issue Jun 16, 2019 · 13 comments

Comments

@interfect
Copy link

interfect commented Jun 16, 2019

Your system information

  • Steam client version (build number or date): June 14, 2019
  • Distribution (e.g. Ubuntu): Ubuntu 18.04 with Gnome Shell under X11.
  • Graphics driver version (run nvidia-settings): amdgpu as shipped in kernel 4.15.0-51-generic #55-Ubuntu
  • Gist for SteamVR System Information: https://gist.github.com/interfect/efe60d5b43bada4d850d6e96aeba1229
  • Opted into Steam client beta?: No
  • Opted into SteamVR beta?: No
  • Have you checked for system updates?: No

Please describe your issue in as much detail as possible:

When SteamVR tries to display system menus or messages when you are in the "grid" view that you get when you are between applications, the menus are corrupted and illegible. It looks like the texture is not getting interpreted correctly by the GPU or something; it's covered in weird semi-random snow.

Here's a screenshot:

image

Notably, the artifacts aren't random or changing every frame. It looks like there's a real back there with some UI animation happening. I just can't see it properly.

Through random button-mashing, I've even managed to get a few corrupted dialogs open at once. I can't read any of them.

Possibly relevant is that I'm using an OpenHMD HMD driver for an Oculus DK1 HMD, but I get the artifacts even if the vr compositor window is on my normal screen.

Actual VR games like SteamVR Home work fine, and their in-game menus display properly.

This is not #183. Other applications are displaying just fine. It's only these VR system menus that are corrupted.

Steps for reproducing this issue:

  1. Open SteamVR Home
  2. From the middle screen in SteamVR Home, try to launch a game that is still downloading. Here I am trying to launch Google Earth VR.
  3. You end up going to the grid world, and it tries to display a dialog or something.
  4. On my system, the dialog will be corrupted and unreadable.
@kisak-valve
Copy link
Member

Hello @interfect, can you check if you see the same quirk with SteamVR Experimental Graphics PPA.

@interfect
Copy link
Author

@kisak-valve Unfortunately, my system is all running off of ZFS, and it looks like your kernels in that PPA specifically don't have ZFS support.

Would it be useful/advisable to not install the kernel packages from the PPA and try the other ones?

@kisak-valve
Copy link
Member

Thanks for mentioning the ZFS note, that note is outdated and has been removed.

@interfect
Copy link
Author

OK, I've tried the packages from the PPA. There's no apparent improvement in the system dialogs, and now something has gone wrong with syncing the rendering and the display or something, so the images in each eye flicker. I grabbed a screenshot and it looks like many blocks of the image are just not rendered by the time it is displayed.

image

@interfect
Copy link
Author

I just got a Valve Index, and this issue with the illegible menus persists with the Index as my display.

@lostgoat
Copy link
Collaborator

@interfect Can you provide a system report with the PPA video drivers installed?

@interfect
Copy link
Author

OK, so I installed the PPA's packages by following the instructions on this repo's README (not on the PPA itself).

sudo add-apt-repository ppa:kisak/steamvr
sudo apt update
sudo apt dist-upgrade
sudo apt install linux-generic-steamvr-18.04 xserver-xorg-hwe-18.04 mesa-vulkan-drivers mesa-vulkan-drivers:i386

I don't know if the drivers updated, or if I installed it with a different set of commands this time, or if the magic trick was forcing my extra GPU to be bound to the vfio-pci driver instead of amdgpu, but (after a few rounds of unplugging and replugging the cable to get all the USB devices to be detected), I can get SteamVR on my system to start up in direct mode about 1/2 of the time. If it comes up in extended mode (with the compositor just on the desktop), I can stop it and restart it and it eventually picks direct mode (and puts the compositor on the headset). It might be that it needs to wait a minute or so from the last time I ran it to start up again in direct mode.

Moreover, I can no longer reproduce the rendering-in-blocks effect, or the corrupted system menus.

Here's a system report from it working in direct mode:

https://gist.github.com/interfect/b2e18d33c52887b98ea3ebd2ab3b022b

And here's one from it stuck in extended mode:

https://gist.github.com/interfect/2e72caebd8ab89ccc6b3d3945d4a1cf7

I'm still having some trouble with latency, though not as severely as and more intermittently than in #211. Interestingly, some frames seem to take ~infinity time (see the numbers in the screenshot below). In the in-headset frame graph there are regions that are empty, but they don't show as empty on the one I can take a screenshot of.

image

Performance is still sadly sufficiently poor that I don't think I can really use it with this setup. I've also noticed a couple minor quirks:

  • The wood floor of the default SteamVR home is weirdly blurry and flickering, but I can't get a screenshot of that either.
  • The firmware updater insists on popping up to claim all my firmware is up to date as of today (since I used it to update the firmware around Friday). In Windows, the firmware updater yesterday thought my HMD had an available update (which I didn't install yet).

The original issue here is clearly resolved, but there are some new ones. Should I open new tickets for those?

@kisak-valve
Copy link
Member

Can you check if sudo mv /usr/share/vulkan/icd.d/intel_icd.x86_64.json /usr/share/vulkan/icd.d/intel_icd.x86_64.json.disabled has an effect on SteamVR's behavior?

@lostgoat
Copy link
Collaborator

Hi @interfect

On cases where direct mode is failing, it is because X.org isn't properly enumerating your HMD. This is why you see a fallback to extended mode with the extended display surface on your monitor instead of on your HMD display.

You should be able to verify this by running xrandr and looking for your HMD in the output.

For the good cases you should see the following for your HMD connector:

  • Status: disconnected (yes, disconnected is normal since X isn't using it as part of the desktop)
  • A list of modes

E.g.

DisplayPort-2 disconnected (normal left inverted right x axis y axis)
   2880x1600     90.00 + 144.00   120.02    80.00  
   1920x1200     90.00  
   1920x1080     90.00  
   1600x1200     90.00  
   1680x1050     90.00  
   1280x1024     90.00  
   1440x900      90.00  
   1280x800      90.00  
   1280x720      90.00  
   1024x768      90.00  
   800x600       90.00  
   640x480       90.00 

For the bad case you will see:

  • Status: disconnected
  • No modes listed

E.g.

DisplayPort-2 disconnected (normal left inverted right x axis y axis)

Checking with xrandr will probably be the best way to narrow down what is happening since it is a lot faster than launching SteamVR.

@lostgoat
Copy link
Collaborator

Some snippets from the system report of the bad case:

Mon Jul 15 2019 21:12:33.027328 - Looking for direct display through RandR
Mon Jul 15 2019 21:12:33.027350 -  - Root 0x6c4
Mon Jul 15 2019 21:12:33.028477 -    - Output 0x55 - 0 modes, 0 preferred
Mon Jul 15 2019 21:12:33.028530 -    - Output 0x56 - 0 modes, 0 preferred
Mon Jul 15 2019 21:12:33.028572 -    - Output 0x57 - 19 modes, 1 preferred
Mon Jul 15 2019 21:12:33.028599 -      - Mode 0 0x67 1920x1080 (looking for 2880x1600)
Mon Jul 15 2019 21:12:33.028646 -    - Output 0x58 - 0 modes, 0 preferred
Mon Jul 15 2019 21:12:33.028723 -    - Output 0x59 - 0 modes, 0 preferred
Mon Jul 15 2019 21:12:33.028751 - Tried to find direct display through RandR: (nil)
Mon Jul 15 2019 21:12:33.028768 - Looking for direct display through Vulkan WSI
Mon Jul 15 2019 21:12:33.028786 - Tried to find direct display through Vulkan WSI: (nil)
Mon Jul 15 2019 21:12:33.028801 - CHmdWindowSDL: Failed to create direct mode surface

And the good case:

Mon Jul 15 2019 21:13:42.147091 - Looking for direct display through RandR
Mon Jul 15 2019 21:13:42.147162 -  - Root 0x6c4
Mon Jul 15 2019 21:13:42.151723 -    - Output 0x55 - 0 modes, 0 preferred
Mon Jul 15 2019 21:13:42.151830 -    - Output 0x56 - 15 modes, 1 preferred
Mon Jul 15 2019 21:13:42.151862 -      - Mode 0 0x5b 2880x1600 (looking for 2880x1600)
Mon Jul 15 2019 21:13:42.151886 -        - Found matching output 86
Mon Jul 15 2019 21:13:42.151912 - Found candidate direct display as RandR output 0x56
Mon Jul 15 2019 21:13:42.155099 - Tried to find direct display through RandR: 0x104fdb0
Mon Jul 15 2019 21:13:42.155220 - Trying to match desired rate of 90.000000Hz.
Mon Jul 15 2019 21:13:42.155248 - 15 modes on display:
Mon Jul 15 2019 21:13:42.155272 -  - 0: 2880x1600@90.001007Hz
Mon Jul 15 2019 21:13:42.155295 -  - 1: 2880x1600@143.998001Hz
Mon Jul 15 2019 21:13:42.155318 -  - 2: 2880x1600@120.017006Hz
Mon Jul 15 2019 21:13:42.155340 -  - 3: 2880x1600@79.998001Hz
Mon Jul 15 2019 21:13:42.155361 -  - 4: 1920x1200@90.001007Hz
Mon Jul 15 2019 21:13:42.155383 -  - 5: 1920x1080@90.001007Hz
Mon Jul 15 2019 21:13:42.155424 -  - 6: 1600x1200@90.001007Hz
Mon Jul 15 2019 21:13:42.155448 -  - 7: 1680x1050@90.001007Hz
Mon Jul 15 2019 21:13:42.155470 -  - 8: 1280x1024@90.001007Hz
Mon Jul 15 2019 21:13:42.155491 -  - 9: 1440x900@90.001007Hz
Mon Jul 15 2019 21:13:42.155512 -  - 10: 1280x800@90.001007Hz
Mon Jul 15 2019 21:13:42.155534 -  - 11: 1280x720@90.001007Hz
Mon Jul 15 2019 21:13:42.155557 -  - 12: 1024x768@90.001007Hz
Mon Jul 15 2019 21:13:42.155578 -  - 13: 800x600@90.001007Hz
Mon Jul 15 2019 21:13:42.155618 -  - 14: 640x480@90.001007Hz
Mon Jul 15 2019 21:13:42.155642 - Selected mode 0.
Mon Jul 15 2019 21:13:42.159651 - Direct mode surface: 0x104fcf0

@T-X
Copy link

T-X commented Jun 13, 2023

Hi, I think I might be running into this issue, too. It looks like this for me:

https://gitlab.freedesktop.org/monado/monado/-/issues/267

I also have the laptop's internal GPU (AMD/ATI Radeon 680M) and a Thunderbolt eGPU (AMD/ATI Radeon RX 6650 XT). I'm running SteamVR over the latter.

However I wasn't able to completely disable the iGPU. I added vfio-pci.ids=1022:1681 to Grub and I see this in dmesg: [ 3.903528] vfio_pci: add [1022:1681[ffffffff:ffffffff]] class 0x000000/00000000. However I still get those artifacts. Trying to unbind via echo "0000:33:00.0" | sudo tee /sys/bus/pci/drivers/amdgpu/unbind is not working either, I get crashes in dmesg then.

@interfect any pointers regarding how you got rid of these rendering issues would be much appreciated. I guess in your case you are having an Intel iGPU and an extra PCIe connected AMD/ATI GPU?

@SpookySkeletons
Copy link

You're not all running your apps on the same GPU.

Run everything with DRI_PRIME=1 or such to eliminate the fruit salad due to one GPU accessing the resources of another.

@interfect
Copy link
Author

I had a weird config with two nearly equivalent AMD GPUs in the system. I think I managed to get one on vfio and one on amdgpu by:

  • Putting amdgpu on the list of modules not to load at boot ("blacklist")
  • Doing something to the kernel command line to not load amdgpu and to load vfio at boot for both GPUs instead, for the PCI ID of the GPUs
  • Loading amdgpu manually with a systemd unit, and binding the GPU I actually want to use to that. And also attaching the right sound driver to its integrated audio device.

Here's what that vaguely looks like:

[anovak@octagon ~]$ cat /etc/systemd/system/unbind-devices.service 
[Unit]
Description=unbind devices for guest VMs
Before=gpu-manager.service

[Service]
Type=oneshot
ExecStart=/etc/unbind-devices.sh
RemainAfterExit=yes

[Install]
WantedBy=gpu-manager.service
[anovak@octagon ~]$ cat /etc/unbind-devices.sh 
#!/usr/bin/env bash
# Set up after boot

# Make sure vfio-pci had a chance to grab the GPUs
modprobe vfio-pci
PASSTHROUGH="vfio-pci"

if [[ $? != "0" ]] ; then
    # Probably on Xen
    modprobe xen-pciback
    PASSTHROUGH="xen-pciback"
fi

# This is the GPU we want Linux to have
GPUS=("0000:08:00.0")

# And the sound devices
SOUNDS=("0000:0b:00.3" "0000:08:00.1")

# And the sound devices to not take
NONSOUNDS=("0000:09:00.1")

for GPU in ${GPUS[@]} ; do
    if [[ -e "/sys/bus/pci/devices/${GPU}/driver" ]] && [[ "$(basename $(readlink /sys/bus/pci/devices/${GPU}/driver))" == "amdgpu" ]] ; then
        echo 1>&2 "Error: amdgpu driver already loaded for ${GPU}"
        exit 1
    fi
done

# Change it over to the amdgpu driver
modprobe amdgpu
for GPU in ${GPUS[@]} ; do
    echo 'amdgpu' > "/sys/bus/pci/devices/${GPU}/driver_override"
    echo "${GPU}" > "/sys/bus/pci/devices/${GPU}/driver/unbind"
    echo "${GPU}" > "/sys/bus/pci/drivers/amdgpu/bind"
done

for SOUND in ${SOUNDS[@]} ; do
    if [[ -e "/sys/bus/pci/devices/${SOUND}/driver" ]] && [[ "$(basename $(readlink /sys/bus/pci/devices/${SOUND}/driver))" == "snd_hda_intel" ]] ; then
        echo 1>&2 "Error: snd_hda_intel driver already loaded for ${SOUND}"
        exit 1
    fi
done

for NONSOUND in ${NONSOUNDS[@]} ; do
    echo "${PASSTHROUGH}" > "/sys/bus/pci/devices/${NONSOUND}/driver_override"
    if [[ -e "/sys/bus/pci/devices/${NONSOUND}/driver" ]] ; then
        echo "${NONSOUND}" > "/sys/bus/pci/devices/${NONSOUND}/driver/unbind"
    fi
    echo "${NONSOUND}" > "/sys/bus/pci/drivers/${PASSTHROUGH}/bind"
done

modprobe snd_hda_intel
for SOUND in ${SOUNDS[@]} ; do
    echo 'snd_hda_intel' > "/sys/bus/pci/devices/${SOUND}/driver_override"
    echo "${SOUND}" > "/sys/bus/pci/devices/${SOUND}/driver/unbind"
    echo "${SOUND}" > "/sys/bus/pci/drivers/snd_hda_intel/bind"
done
[anovak@octagon ~]$ cat /etc/default/grub
# If you change this file, run 'update-grub' afterwards to update
# /boot/grub/grub.cfg.
# For full documentation of the options in this file, see:
#   info -f grub -n 'Simple configuration'

GRUB_DEFAULT="Ubuntu"
#GRUB_HIDDEN_TIMEOUT="0"
#GRUB_HIDDEN_TIMEOUT_QUIET="true"
GRUB_TIMEOUT="-1"
GRUB_DISTRIBUTOR="`lsb_release -i -s 2> /dev/null || echo Debian`"
# removed: quiet splash
# In addition to blacklisting amdgpu here, we have to prevent X
# from loading it.
GRUB_CMDLINE_LINUX_DEFAULT="modprobe.blacklist=amdgpu usb_storage.quirks=0bc2:ab38: amd_iommu=on vfio-pci.ids=1002:67df"
GRUB_CMDLINE_LINUX=""
...
[anovak@octagon ~]$ cat /etc/modprobe.d/local.conf 
options vfio vfio_iommu_type1 vfio_pci vfio_virqfd
options vfio-pci ids=1002:67df,1002:aaf0
[anovak@octagon modprobe.d]$ cat /etc/modprobe.d/blacklist.conf
...
# Don't load amdgpu at all
blacklist amdgpu
# Or the intel sound driver
blacklist snd_hda_intel

Instead of all this garbage you might be able to turn off the iGPU in the BIOS instead?

To check whether your setup is working, you can look at where the driver symlink points for the device in its directory under /sys. That tells you which driver is driving the device:

[anovak@octagon modprobe.d]$ readlink /sys/bus/pci/devices/0000\:08\:00.0/driver
../../../../bus/pci/drivers/amdgpu
[anovak@octagon modprobe.d]$ readlink /sys/bus/pci/devices/0000\:09\:00.0/driver
../../../../bus/pci/drivers/vfio-pci

You can also look in the directory for the driver and see if there's a symlink for the device or not:

[anovak@octagon modprobe.d]$ ls /sys/bus/pci/drivers/amdgpu/
0000:08:00.0  bind  module  new_id  remove_id  uevent  unbind

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants