Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some _device_ extensions are loading (and failing) _instance_ functions via get_device_proc_addr() #727

Closed
MarijnS95 opened this issue Mar 28, 2023 · 17 comments · Fixed by #734

Comments

@MarijnS95
Copy link
Collaborator

MarijnS95 commented Mar 28, 2023

Found a couple extensions that are erroneously loading all functions through get_device_proc_addr() despite containing some instance-level functions (i.e. not taking a device or device child as first argument, see also fn function_type() in the generator for fishing this out in a crude way), which cannot be loaded. Concretely found these extensions (by looking for explicit PhysicalDevice argument, and later validated using the generator):

Quick solution

Unfortunately the fix (a pattern which we apply more rigorously in other extensions) implies a slower path for device-level functions which now imply an extra dispatch table per device, "injected" by the ICD. Worse, it's a breaking change so it might be about time to start thinking about ash 0.38.

Long-term correctness

As for the fix itself, I've said it before and I'll use this issue to track it: I'd rather have the generator distinguish instance-level and device-level extension functions, either in the loader or via separate *Fn structs entirely (anything to force us to load both in a separate way). @oddhack what's the desired way to figure this out if a function operates on device (or childs thereof)? I'm not confident this hardcoded list of "VkDevice" | "VkCommandBuffer" | "VkQueue" for the first-parameter-type will hold, does it need an extra attribute in vk.xml?

ash/generator/src/lib.rs

Lines 561 to 566 in a9fbc71

let is_first_param_device = self.params.get(0).map_or(false, |field| {
matches!(
field.definition.type_name.as_deref(),
Some("VkDevice" | "VkCommandBuffer" | "VkQueue")
)
});

Random weirdness

VK_KHR_device_group_creation, added in #630, has a function named fn device() that returns vk::Instance 😅

@MarijnS95
Copy link
Collaborator Author

MarijnS95 commented Apr 3, 2023

For backporting this to 0.37, I intend to temporarily add new_with_instance functions to the function loaders and #[deprecate] the rest. Could still use some help determining whether all other loaders are fine though!

@Ralith
Copy link
Collaborator

Ralith commented Apr 3, 2023

either in the loader or via separate *Fn structs entirely

I have mixed feelings about proliferating tons of loader structs, but I think it's specifically needed here as you might want to call these before you have a device at all.

@Ralith
Copy link
Collaborator

Ralith commented Apr 3, 2023

I guess we could accept Option<Device> instead. Not sure if I like that.

@MarijnS95
Copy link
Collaborator Author

MarijnS95 commented Apr 3, 2023

I have mixed feelings about proliferating tons of loader structs, but I think it's specifically needed here as you might want to call these before you have a device at all.

Yup. I was initially thinking about cheating our way out of this by having a match in the affected high-level wrappers to delegate to one of the two function loaders (still need them as arguments), or even have the generator emit the exact same *Fn::load() function with two closures to clearly tell them apart. In those cases the user would have to load the exact same struct twice, either via different high-level wrapper functions or via an Option<Device> and figure out themselves which functions may panic.

I think I'll start by generating into separate DeviceFn and InstanceFn structs first, then see how it looks. On the high-level wrapper side we'll probably end up keeping the original "untagged" struct name for most extensions, except if there needs to be both a device and instance struct.

@MarijnS95
Copy link
Collaborator Author

MarijnS95 commented Apr 3, 2023

I just cobbled together the generator code in ed2e5e2 (again, assuming fn function_type() is still accurate 1), and found only 10 extensions (in 1.3.246) that have mixed function pointer types:

  • VK_KHR_swapchain 💥
  • VK_KHR_video_queue
  • VK_KHR_device_group 💥
  • VK_KHR_performance_query 🧹
  • VK_EXT_debug_utils 🧹
  • VK_EXT_sample_locations 🧹
  • VK_EXT_calibrated_timestamps 🧹
  • VK_KHR_fragment_shading_rate
  • VK_EXT_full_screen_exclusive 💥
  • VK_NV_optical_flow

We have high-level wrappers for seven of these (marked 🧹), three of which (marked 💥) are loading via get_device_proc_addr() and have instance function(s) that always panic: exactly the three found in the OP!

I'll go ahead and finalize the manual code changes and make a PR for this, then we can decide how to progress - and how to backport this to 0.37 🥶 (or decide to call it quits on the backports and release the many nice improvements in 0.38 already).

Footnotes

  1. Pinging @oddhack again: is "VkDevice" | "VkCommandBuffer" | "VkQueue" as first argument still accurate to tell instance and device functions apart?

@Ralith
Copy link
Collaborator

Ralith commented Apr 5, 2023

or decide to call it quits on the backports and release the many nice improvements in 0.38 already

I'd be pretty happy to get that stuff out, personally! Judging by the lack of clamor I don't think any existing major users urgently need the fix here.

@MarijnS95
Copy link
Collaborator Author

Definitely! Hope we won't get the same push-back on "ash releasing yet another breaking change" (which is just a year and a week or two ago now), though this release comes with trivial yet cumbersome changes.

However, I did keep the backports branch alive and up-to-date, even made some specific fixes. and to "solve" this issue I'd simply deprecate fn new() explaining the issue and pointing folks to an fn new_from_instance(). Might be annoying though as 99% of the vulkan users would be calling Swapchain::new(), and the affected function has only been added to that extension much later 😬

Will perhaps make a 37.3 release after all and start working towards 0.38 soon-ish, after having this issue fleshed out and existing PR's cleared too.

@codecnotsupported

This comment was marked as off-topic.

@MarijnS95

This comment was marked as off-topic.

@codecnotsupported

This comment was marked as off-topic.

@codecnotsupported

This comment was marked as off-topic.

@MarijnS95

This comment was marked as off-topic.

@codecnotsupported

This comment was marked as off-topic.

MarijnS95 added a commit that referenced this issue May 6, 2023
…ctions

This is a minimal, semver-compatible backport of #734 to the
`0.37-stable` branch, warning Ash users of the problem outlined below
while the issue is properly being solved in the next breaking Ash
release (by separating `Instance` and `Device` functions in the
generator to avert this problem entirely while also always providing
optimal `Device`-specific functions for extension wrappers that are
currently already loading _everything_ via `Instance` to forgo the
problem).

As discovered and detailed in #727 a few extension wrappers were loading
and calling `Instance` functions via `Device` and
`get_device_proc_addr()` which is [defined] to only return non-`NULL`
function pointers for `Device` functions.  Those wrapper functions will
always call into Ash's panicking NULL-stub functions as the desired
`Instance` function could not be loaded.

Deprecate the `new()` functions for extension wrappers that were doing
this, while pointing the reader to `new_from_instance()` and explaining
in the docs what function will always `panic!()` when the struct was
loaded using `new()` instead.

This function always takes a raw `vk::Device` directly to fill `handle`
(rather than `ash::Device` to retrieve `handle()` from), allowing users
to pass `vk::Device::null()` when they do intend to load this extension
wrapper just for calling the `Instance` function.

[defined]: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/vkGetDeviceProcAddr.html#_description
MarijnS95 added a commit that referenced this issue May 6, 2023
…ctions

This is a minimal, semver-compatible backport of #734 to the
`0.37-stable` branch, warning Ash users of the problem outlined below
while the issue is properly being solved in the next breaking Ash
release (by separating `Instance` and `Device` functions in the
generator to avert this problem entirely while also always providing
optimal `Device`-specific functions for extension wrappers that are
currently already loading _everything_ via `Instance` to forgo the
problem).

As discovered and detailed in #727 a few extension wrappers were loading
and calling `Instance` functions via `Device` and
`get_device_proc_addr()` which is [defined] to only return non-`NULL`
function pointers for `Device` functions.  Those wrapper functions will
always call into Ash's panicking NULL-stub functions as the desired
`Instance` function could not be loaded.

Deprecate the `new()` functions for extension wrappers that were doing
this, while pointing the reader to `new_from_instance()` and explaining
in the docs what function will always `panic!()` when the struct was
loaded using `new()` instead.

This function always takes a raw `vk::Device` directly to fill `handle`
(rather than `ash::Device` to retrieve `handle()` from), allowing users
to pass `vk::Device::null()` when they do intend to load this extension
wrapper just for calling the `Instance` function.

[defined]: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/vkGetDeviceProcAddr.html#_description
MarijnS95 added a commit that referenced this issue May 6, 2023
…ctions

This is a minimal, semver-compatible backport of #734 to the
`0.37-stable` branch, warning Ash users of the problem outlined below
while the issue is properly being solved in the next breaking Ash
release (by separating `Instance` and `Device` functions in the
generator to avert this problem entirely while also always providing
optimal `Device`-specific functions for extension wrappers that are
currently already loading _everything_ via `Instance` to forgo the
problem).

As discovered and detailed in #727 a few extension wrappers were loading
and calling `Instance` functions via `Device` and
`get_device_proc_addr()` which is [defined] to only return non-`NULL`
function pointers for `Device` functions.  Those wrapper functions will
always call into Ash's panicking NULL-stub functions as the desired
`Instance` function could not be loaded.

Deprecate the `new()` functions for extension wrappers that were doing
this, while pointing the reader to `new_from_instance()` and explaining
in the docs what function will always `panic!()` when the struct was
loaded using `new()` instead.

This function always takes a raw `vk::Device` directly to fill `handle`
(rather than `ash::Device` to retrieve `handle()` from), allowing users
to pass `vk::Device::null()` when they do intend to load this extension
wrapper just for calling the `Instance` function.

[defined]: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/vkGetDeviceProcAddr.html#_description
MarijnS95 added a commit that referenced this issue May 6, 2023
…`Instance` functions (#754)

extensions: Provide `new_from_instance()` fallback for `Instance` functions

This is a minimal, semver-compatible backport of #734 to the
`0.37-stable` branch, warning Ash users of the problem outlined below
while the issue is properly being solved in the next breaking Ash
release (by separating `Instance` and `Device` functions in the
generator to avert this problem entirely while also always providing
optimal `Device`-specific functions for extension wrappers that are
currently already loading _everything_ via `Instance` to forgo the
problem).

As discovered and detailed in #727 a few extension wrappers were loading
and calling `Instance` functions via `Device` and
`get_device_proc_addr()` which is [defined] to only return non-`NULL`
function pointers for `Device` functions.  Those wrapper functions will
always call into Ash's panicking NULL-stub functions as the desired
`Instance` function could not be loaded.

Deprecate the `new()` functions for extension wrappers that were doing
this, while pointing the reader to `new_from_instance()` and explaining
in the docs what function will always `panic!()` when the struct was
loaded using `new()` instead.

This function always takes a raw `vk::Device` directly to fill `handle`
(rather than `ash::Device` to retrieve `handle()` from), allowing users
to pass `vk::Device::null()` when they do intend to load this extension
wrapper just for calling the `Instance` function.

[defined]: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/vkGetDeviceProcAddr.html#_description
@06393993
Copy link

06393993 commented Jun 5, 2023

A related feature request: I am wondering if it's also possible to introduce ash::vk::<ExtensionName>Fn::load_from_instance_only and ash::vk::<ExtensionName>Fn::load_from_device to only load the instance-level dispatchable command and the device-level dispatchable command respectively?

e.g. ash::vk::KhrSwapchainFn::load_from_instance_only will only try to load get_physical_device_present_rectangles_khr, while ash::vk::KhrSwapchainFn::load_from_device will load the rest entries.

If we don't have enough bandwidth working on that, will the upstream push back a PR that implements these 2 interfaces? Thanks.

@MarijnS95
Copy link
Collaborator Author

@06393993 such a compatibility fix was already merged and published to 0.37.3 for the affected extensions.

@06393993
Copy link

06393993 commented Jun 5, 2023

@06393993 such a compatibility fix was already merged and published to 0.37.3 for the affected extensions.

Thanks for the reply. I know the fix which adds a new_from_instance for the extension wrappers under the ash::extensions module, but my use case is just using the raw function pointer structs in ash::vk, because I am writing Vulkan layers, and

  • ash::extensions wrappers don't cover all extensions, while ash::vk's raw structs do. e.g. VK_ANDROID_external_memory_android_hardware_buffer doesn't exist in ash::extensions while ash::vk::AndroidExternalMemoryAndroidHardwareBufferFn exists. And this extension is essential to my use case.
  • It's easy for me to forward the raw parameters with raw function pointers
  • The naming of the raw structs in ash::vk is easier for my codegen to generate the code to load function pointers.

Therefore, the feature request is about raw function pointer structs like ash::vk::KhrSwapchainFn, instead of extension wrappers in the ash::extensions module.

@MarijnS95
Copy link
Collaborator Author

MarijnS95 commented Jun 5, 2023

@06393993 Feel free to contribute a wrapper for missing extensions, request it, or wait until our new generator is capable of emitting sensible extension wrappers automatically. Likewise for layers, I'll be adding a helper library to ash for this and also utilize the new generator to help with dispatch tables and the like.

For now you can achieve a similar thing as #754 by calling the ::load() function twice but it'll still get called with every device and instance function. Properly separating them out to only load device or instance functions is exactly what this issue is about and what is already implemented in a working draft over at #734 (see ash::vk::{KhrSwapchainDeviceFn, KhrSwapchainInstanceFn}) so you're not requesting anything new.

However, that PR will only appear in the upcoming breaking release because this struct separation is inherently breaking, and I don't think it is very useful to create separate loader functions that still return the same struct with known-NULL members as an "interim" workaround (to facilitate #754 and your use-case).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants