Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Khronos\Vulkan\ExplicitLayers registry key is purged after driver updates on Windows #38

Closed
kondrak opened this Issue Jul 27, 2018 · 32 comments

Comments

Projects
None yet
8 participants
@kondrak
Copy link

kondrak commented Jul 27, 2018

This is an ongoing issue that I've been running into with each consecutive GPU driver update on Windows. Reproduced on 3 different machines with NVidia card but it might as well be vendor independent.

Each time I perform a driver update (either through Windows Update or by downloading the drivers directly from NVidia's website), the Computer\HKEY_LOCAL_MACHINE\SOFTWARE\Khronos\Vulkan\ExplicitLayers registry key is purged. This results in Validation Layers not working at all - other Vulkan functionality works perfectly fine. The only known solution to this problem is to reinstall the Vulkan SDK which repopulates all necessary registry keys.

This problem has been encountered by several other people but it seems there's no obvious pattern to reproduce this, so it's not even clear what causes this - either it's an OS issue or something not quite right with the driver installers.

I'll be happy to provide further information that might help identify the root of the problem.

@Tobski

This comment has been minimized.

Copy link

Tobski commented Jul 27, 2018

I've seen a bunch of devs talk about this issue too - it's a pain in the butt and constantly causes issues for developers, which is not a great dev experience. Would love to see this fixed!

@kayru

This comment has been minimized.

Copy link

kayru commented Jul 27, 2018

I've encountered this too. Validation layers are reported to be present at runtime, but no validation messages ever logged. This is quite confusing, as when stepping through the code things appear to work correctly. It can lead to errors sneaking in, as developer is not aware of anything wrong until it's too late.

@Jasper-Bekkers

This comment has been minimized.

Copy link

Jasper-Bekkers commented Jul 27, 2018

This is a really annoying workflow to have - ran into it so often it became second nature to reinstall the registrykeys or SDK after a driver update. If this turns out to be an IVH issue we should make sure it ends up there as well.

@krOoze

This comment has been minimized.

Copy link
Contributor

krOoze commented Jul 27, 2018

Same on AMD.

It is time this became mature. And not delete, double, or otherwisely corrupt the entries.

@lenny-lunarg

This comment has been minimized.

Copy link

lenny-lunarg commented Jul 27, 2018

What SDK version do you have installed?

There is a known bug that can cause those registry entries to get deleted when installing or removing a pre 1.1.73.0 runtime, while there is a 1.1.73.0 or later SDK already installed. Furthermore, since driver installers should be removing old runtime installers when they are replaced, the first time you upgrade to a newer runtime, this issue will come up. This is the same issue that was at the root of KhronosGroup/Vulkan-ValidationLayers#143.

The long-term solution is to upgrade to 1.1.73 or later in both the runtime that the drivers installs, and the SDK. I just installed Nvidia driver 398.36 and my layer registry entries (from SDK 1.1.77.0) were left alone (the driver installed 1.1.73.0). But its also possible that this driver would break if I had an older SDK installed. I didn't try AMD, and I don't know what version they're installing.

The short-term workaround is to remove all 1.1.70 and earlier SDKs and runtimes, and replace them with the latest SDK. That should solve the problem as long as drivers don't install old runtimes again.

Also, in the validation layer issue, we came to the conclusion that documenting the issue and communicating should be enough. Obviously, based on this issue, we haven't done that well enough. Does anyone have thoughts on the best way to do that? We can't put the documentation in old SDKs that have shipped, and it feels like it would be more useful to document it in old SDKs than new ones, since those are the ones that cause the problem. So where would the best place to document this be?

@kondrak

This comment has been minimized.

Copy link
Author

kondrak commented Jul 27, 2018

This happened even when updating to 398.36 driver and only 1.1.77.0 installed at one time, I had no older SDKs installed.

@lenny-lunarg

This comment has been minimized.

Copy link

lenny-lunarg commented Jul 27, 2018

@kondrak, do you know if you had any other runtime installed? The runtimes wouldn't be nearly as obvious as the SDKs. The way to be absolutely sure what's installed is to check your System32 directory (by default it's C:\Windows\System32) and look for files in the format vulkan-1-x-x-x-x.dll. The numbers in place of x's identify the version of the runtime. If you did have any other runtimes installed at the time, its possible that the driver removed them during the installation, which would cause this problem. Unfortunately, it's likely too late to know for sure if that was the case before the driver install.

Also, do you have any other graphics drivers installed? Windows update has been known to install other drivers and its possible that these other drivers are causing trouble.

But it you have only one Nvidia driver and no other runtimes installer, its possible that there's another bug here separate from the known one. I'll have to look into that a little more.

@krOoze

This comment has been minimized.

Copy link
Contributor

krOoze commented Jul 27, 2018

@lenny-lunarg IMO the documentation is not a problem. I think it is obvious reinstalling SDK will fix this (and OP figured so).
It is more how long it drags on (similar issues drag on from 1.0.0). It corrupted, doubled, erased, forgot to unistall, failed to install, or whatever with the RT for as long as I can remember. It is simply resurfacing issue for too long.

TBF I just do this workaround automatically after each driver update now. AMD is supposed to already be on 1.1.73 in beta, so hopefully it should work correctly from next update on...
Still if Windows Update driver version interferes, that is a problem.

Furthermore, since driver installers should be removing old runtime installers when they are replaced, the first time you upgrade to a newer runtime, this issue will come up.

Wait, I though they are supposed to coexist. Was that changed?
Driver should uninstall the RT and only the RT it installed, no?

@lenny-lunarg

This comment has been minimized.

Copy link

lenny-lunarg commented Jul 27, 2018

When the runtime installer was created over two years ago, the behavior that was settled on was over-complicated. This has been causing us trouble for some time. The runtime installer used to keep a copy of the loader and vulkaninfo for every single runtime that got installed. On uninstallation, the runtime would remove the file that it installed, and change the file that doesn't have the version embedded into it to be the latest version that is still installed to the machine. This caused all sorts of trouble because even if you fixed a bug, uninstallers that were triggered by driver installs would remove old runtimes, causing the bug to happen again. On top of that, the logic wasn't particularly useful, as there was no real need to keep around the old versioned copies of the runtime files.

On top of that, the logic to configure layers was put into the runtime installer/uninstaller and not in the SDK installer/uninstaller. This was done to ensure that the layers would only be configured if their version matched the loader version. But that's not useful behavior as those two components are supposed to work even when they're separate versions. And SDK logic should never have made it into the runtime in the first place.

As a result, when Windows changed the requirements for drivers so that they could not use the old runtime installer in future drivers, we tried to redesign this to a much simpler and better system, but the convoluted older behavior proved problematic because we didn't want to break backwards compatibility.

Wait, I though they are supposed to coexist. Was that changed?

Old runtimes are supposed to coexist, and we went out of our way to design a solution where old runtime uninstallers would not downgrade the loader because of the change. But I forgot to account for the fact that old runtime uninstallers would be configuring layers. As a result, our solution didn't take into account validation layer configuration and we broke it. We didn't catch this until after release and we haven't come up with a way to change that, without changing behavior (again).

It is more how long it drags on

The hope is that this overhaul will prevent these issue from coming up again. I am not aware of any problems that have been reported with the new scheme, that weren't compatibility issues. That's part of why I want to establish if this really is a compatibility issue or not.

@krOoze

This comment has been minimized.

Copy link
Contributor

krOoze commented Jul 27, 2018

The way to be absolutely sure what's installed is to check your System32 directory (by default it's C:\Windows\System32) and look for files in the format vulkan-1-x-x-x-x.dll.

OK, I have a vulkan-1-999-0-0-0.dll :p
PS: Am just gonna nuke it; what's the worst that can happen...

@Jasper-Bekkers

This comment has been minimized.

Copy link

Jasper-Bekkers commented Jul 27, 2018

@lenny-lunarg IMO the documentation is not a problem. I think it is obvious reinstalling SDK will fix this (and OP figured so).

This is still not a great workflow.

@lenny-lunarg

This comment has been minimized.

Copy link

lenny-lunarg commented Jul 27, 2018

OK, I have a vulkan-1-999-0-0-0.dll :p

999 is used to ensure that the runtime installed by the new machanism will always be considered newer than the old ones. I meant that this will check which old runtimes you have installed.

@krOoze

This comment has been minimized.

Copy link
Contributor

krOoze commented Jul 27, 2018

@lenny-lunarg Nice hax :p! Anyway, it does not seem to be cleaned up after uninstalling everything... I just deleted it; hope it does not linger somewhere in registry too.

@krOoze

This comment has been minimized.

Copy link
Contributor

krOoze commented Jul 27, 2018

OK, I tried from what I assume is a clean state. AMD reports 1.1.73, but apparently installs 1.1.70 RT, sigh... And yeah, the layers get deleted from registry.

@kondrak

This comment has been minimized.

Copy link
Author

kondrak commented Jul 28, 2018

@lenny-lunarg I just checked the contents of my System32 folder and here's what I have:

$ ls Windows/System32 | grep vulkan
vulkan-1.dll
vulkan-1-1-0-54-1.dll
vulkan-1-1-0-65-1.dll
vulkan-1-999-0-0-0.dll
vulkaninfo.exe
vulkaninfo-1-1-0-54-1.exe
vulkaninfo-1-1-0-65-1.exe
vulkaninfo-1-999-0-0-0.exe

So it seems there were still some leftover garbage. I'm confused, before upgrading the SDK I always uninstalled the existing one so I'd expect the dlls to be cleared too - or is that something provided by the driver updates?
And yes, I only have one set of NVidia drivers, nothing else.

@krOoze

This comment has been minimized.

Copy link
Contributor

krOoze commented Jul 28, 2018

@kondrak At some point they decided to hide the uninstallers from the users. Go to C:\Program Files (x86)\VulkanRT\version\ where you should find the uninstaller executable for the older RT versions.

@kondrak

This comment has been minimized.

Copy link
Author

kondrak commented Jul 29, 2018

I navigated to VulkanRT and have indeed found older runtimes for 1.0.54.1 and 1.0.65.1
What's still not quite clear to me is if this broken layers behavior has been fixed in latest SDKs according to what @lenny-lunarg because it seems it's still broken for @krOoze ?

@krOoze

This comment has been minimized.

Copy link
Contributor

krOoze commented Jul 30, 2018

@kondrak My driver installs 1.1.70 (despite reporting 73). Since @lenny-lunarg says everything has to be >= 1.0.73, I would not experience the fixed behavior.

@lenny-lunarg

This comment has been minimized.

Copy link

lenny-lunarg commented Jul 31, 2018

I'm confused, before upgrading the SDK I always uninstalled the existing one so I'd expect the dlls to be cleared too - or is that something provided by the driver updates?

I'm not sure where the old runtime installers come from. We've usually seen that happen when drivers install a runtime and then don't remove it. But to my knowledge, Nvidia drivers haven't had any trouble with that, so I don't know how that would be happening on your system. Unfortunately, we don't have any way to track where those runtimes came from, so I can't really say anything about them with confidence.

@kondrak

This comment has been minimized.

Copy link
Author

kondrak commented Aug 1, 2018

On that particular computer I had older Vulkan SDKs installed so chances are these are just leftovers from the older uninstallers not working correctly.

However, I just checked another machine which has the same problem but only had 1.1.70 SDK installed prior to updating to 1.1.77 and here's what I have:

$ ls /cygdrive/c/Windows/System32/ | grep vulkan
vulkan-1.dll
vulkan-1-999-0-0-0.dll
vulkaninfo.exe
vulkaninfo-1-999-0-0-0.exe

I manually uninstalled 1.1.70 before updating to 1.1.77. Then I updated NVidia drivers (using their official installer) and the problem still persisted. My VulkanRT folder now only contains this:

$ ls -l /cygdrive/c/Program\ Files\ \(x86\)/VulkanRT/
install.log
LICENSE.txt
VULKANRT_LICENSE.rtf
VulkanRT-License.txt
@pdaniell-nv

This comment has been minimized.

Copy link

pdaniell-nv commented Aug 3, 2018

#38 (comment) @kondrak Which NVIDIA driver version did you install?

@kondrak

This comment has been minimized.

Copy link
Author

kondrak commented Aug 4, 2018

I checked that with latest 398.82 drivers for GTX 970, Windows 10 64bit

@pdaniell-nv

This comment has been minimized.

Copy link

pdaniell-nv commented Aug 7, 2018

That driver has VulkanRT-1.1.73, which shouldn't have the issue. Hmm.

@krOoze

This comment has been minimized.

Copy link
Contributor

krOoze commented Aug 7, 2018

AMD beta is now on 77, and the layers seems to survive driver uninstall now.

@pdaniell-nv

This comment has been minimized.

Copy link

pdaniell-nv commented Aug 23, 2018

I've tried to reproduce what @kondrak is seeing locally, but I'm having no luck. For me with SDK-1.1.82.0 installed, when I install 398.82, which has RT-1.1.73.0, the SDK remains usable and the registry entries for the layers remains.

I'm curious, do you have any "VulkanRT" entries in:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall

If so, you can execute the "UninstallString" for each one until they all disappear. You can hit F5 in regedit after each uninstall and see the list shrink. With these all gone there is no chance a stale uninstaller gets called by accident.

Another thing I'm curious about. When you install for example 398.82 does it ask you to reboot at the end? Do you ever do a "clean install"? The reason I ask this is because I wonder if on your system the install of 398.82 is going through a currentDriver->someOldDriver->398.82 sequence and the install of someOldDriver is what's messing up the registry. If this is happening one thing you could try is to purge all drivers from your system with a tool like https://www.guru3d.com/files-details/display-driver-uninstaller-download.html so you know the only possible version on your system is 398.82.

@kondrak

This comment has been minimized.

Copy link
Author

kondrak commented Aug 24, 2018

My HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall entry is empty. Frankly, I can't remember if the driver installer asked me to reboot but I know for certain that I never did a clean install of the drivers with recent updates.
As of today, I'm running the latest SDK and latest NVidia drivers, as soon as new drivers show up I'll perform an update and will report back if the problem persists. Alternatively I can try and reinstall current drivers if it helps you - just let me know what steps I should follow (ie. a clean install/upgrade/other?).

@pdaniell-nv

This comment has been minimized.

Copy link

pdaniell-nv commented Aug 24, 2018

I think waiting for the next driver update makes sense. It should have RT-1.1.77.0 and should be available very soon. Thanks again for your help isolating this issue.

@kondrak

This comment has been minimized.

Copy link
Author

kondrak commented Aug 27, 2018

I have now updated my drivers to 399.07 (performing an update, not a clean install) and for the first time I can see that ExplicitLayers had not been removed from the registry. It seems the problem no longer occurs. Can anyone else confirm this? @kayru @Jasper-Bekkers I know you ran into this too.

@pdaniell-nv

This comment has been minimized.

Copy link

pdaniell-nv commented Aug 27, 2018

Awesome. Thanks for trying it out and reporting your findings.

@KarenGhavam-lunarG

This comment has been minimized.

Copy link
Contributor

KarenGhavam-lunarG commented Sep 14, 2018

@kayru @Jasper-Bekkers Have you had a chance to verify that updating to 399.07 does not have a problem? I am thinking that this issue can be closed but would like a verification from a few more people.

Thanks!

@kayru

This comment has been minimized.

Copy link

kayru commented Sep 28, 2018

Haven't experienced the issue so far.

@KarenGhavam-lunarG

This comment has been minimized.

Copy link
Contributor

KarenGhavam-lunarG commented Oct 18, 2018

Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.