Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vkBindImageMemory with NULL VkDeviceMemory handle does not trigger a validation warning #143

Closed
nsubtil opened this issue May 27, 2018 · 9 comments
Assignees
Milestone

Comments

@nsubtil
Copy link
Member

nsubtil commented May 27, 2018

Recently ran into an app bug where a NULL memory handle was being passed in to vkBindImageMemory. This did not trigger validation errors, even though the spec language seems to suggest it should.

Comments around ValidateSetMemBinding seem to suggest that this should cause a warning, but I couldn't find the actual warning anywhere --- it looks like if mem == VK_NULL_HANDLE the function ends up doing nothing. Maybe we're just missing the corresponding log_msg call for that case?

@karl-lunarg
Copy link
Contributor

That case would have been caught and reported by the parameter validation layer. Did you have this layer enabled, either explicitly or via enabling the standard_validation layer?

It's a fairly common pattern for the "later" layers like core_validation to assume that simple parameter checking and error reporting was handled by "earlier" layers like parameter validation. In this case, the NULL check in core_validation is to avoid crashing.

@nsubtil
Copy link
Member Author

nsubtil commented May 28, 2018

VK_LAYER_LUNARG_standard_validation is enabled on the instance, and I'm using debug_utils to get callbacks with error messages. The callback itself seems to work fine, I see info messages with a list of supported extensions on device creation.

However, I just realized that I don't seem to be getting any validation messages --- I dropped a few layout transitions which I know the validation layers will normally catch, and I don't get any callbacks for that either.

I tried via.exe and it looks like it's showing some installation problems. The initial run of cube.exe works fine, but the second one (with --validate) throws a message box that reads "Registry lookup failed to get layer manifest files". It then crashes with a call stack pointing at CmdBeginDebugUtilsLabelEXT in the loader.

Looking into the registry, HKLM\SOFTWARE\Khronos\Vulkan\ExplicitLayers is empty. Is this supposed to be filled in by the runtime or SDK installers? It sounds a bit like an installation problem, which I'd like to understand a bit better.

@lenny-lunarg
Copy link
Contributor

Prior to the 1.1.73.0 SDK, the layers were set up by the runtime installer. Since then, they've been set up by the SDK installer. The problem you're seeing stems from changes we made to the runtime installer in SDK 1.1.73.0. These changes were the result of IHVs wanting to be able to install the runtime contents from an INF file.

What happens is that the new runtime installer drops files with a 999 version so that the old runtime installer will always treat the new one as the newest. However, this had the unintended side effect that since those old runtimes also set up the layers, the runtimes would never believe that the layer version matches the loader version and they would never set up the layers right. You probably encountered the issue by either installing or uninstalling an old runtime.

If you want to get your layers set up again, the solution should be to reinstall the 1.1.73.0 SDK.

You did mention one other thing. It sounds like you enabled standard validation without an error, but the layers were not actually present. If you're sure that's what happened an you're able to repro the problem, it would be a good idea to open an issue in the loader repo. It would also probably mean that this issue can be closed.

@nsubtil
Copy link
Member Author

nsubtil commented May 30, 2018

Reinstalled the SDK and the explicit layers now load correctly. I now see a warning when the memory handle is null in vkBindImageMemory, so I think we're good here.

I did not change anything in the instance / device creation code, so I am fairly sure that layers are enabled correctly and there is no error reported in the case where the layers are not actually present. It sounds like we should trigger an error in this case, so I'll file a loader bug for this as suggested.

I am worried about the SDK install issue however, since I've seen reports from ISVs running into this same problem. We can't unship the old installers that are present in the wild, but this creates a problem for developers --- I wound up wasting a half day or more on this. Adding an error return in this case is useful but not a complete fix, since I expect developers to still be confused as to why their layers suddenly went away after switching drivers.

Might be worth considering what else we can do here. Either a technical solution that would prevent this problem if possible, or at least documenting this and communicating it more proactively?

@jjulianoatnv
Copy link
Contributor

Could a viable workaround be for new-style (incoming) IHV driver being installed on top of an old-style (outgoing) IHV driver to skip executing the uninstaller for the old-style vulkanrt-UNinstaller that was registered by the old (outgoing) driver?

@lenny-lunarg
Copy link
Contributor

Skipping the old runtime uninstaller should resolve the problem, at least as far as new driver installations are concerned. The problem will still occur when removing an old SDK, but I don't think there's a ton we can do about that.

We've been trying to come up technical solution, but haven't been able to do so without reverting the runtime behavior (which is not something we want to do). It might be possible to document this better, though we don't really have a standard way of doing that.

@nsubtil
Copy link
Member Author

nsubtil commented May 30, 2018

Documenting + communicating is probably enough here, we just need to make sure people know about this.

Perhaps as part of fixing KhronosGroup/Vulkan-Loader#23 we could add a message specifically about this problem while we're in this transition period?

@jjulianoatnv
Copy link
Contributor

jjulianoatnv commented May 31, 2018

Agreed with notion that documenting+communicating can probably be adequate.

Skipping the old runtime uninstaller would leave extra files in system32 "forever", which might not be a big deal because the files are small. At least there is no longer an entry in Programs&Features (Add/Remove Programs). Still, I'd prefer to clean up the old files from system32 by actually running the outgoing driver's runtime uninstaller. My driver is configured to run the outgoing runtime uninstaller in this situation.

@Andreyogld3d
Copy link
Contributor

I see error message from Validation layers:
ERROR: [Validation] Code 0 : [ UNASSIGNED-GeneralParameterError-RequiredParameter ] Object: VK_NULL_HANDLE (Type = 0) | vkBindImageMemory: required parameter memory specified as VK_NULL_HANDLE
I think issue can be closed.

@shannon-lunarg shannon-lunarg added this to the sdk-1.1.temp.0 milestone Jun 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants