Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add locking to protect against multithreaded conformance test usage. #252

Merged

Conversation

brycehutchings
Copy link
Contributor

The "multithreaded" conformance test is intended to test the runtime but it has exposed two race conditions in the loader:

  1. If debug utils extension is enabled, and instance-specific logger will be added and removed when an XrInstance is created. However, if another thread is trying to log while the list of loggers changes then it may crash. So I added a read/write lock in the logger class to protect this.
  2. If an XrInstance is being destroyed while xrEnumerateInstanceProperties is called, xrDestroyInstance may unload the runtime while xrEnumerateInstanceProperties is calling into the runtime to get instance properties. I consolidated the two existing locks into a "global loader lock" and expanded its use to a few more areas to solve this. I didn't want to introduce a third lock because then you can easily get into accidental dead lock issues if the locks are acquired in the wrong order, so a single lock is safer and the functions that use the lock don't require special concerns around concurrency anyway.

@rpavlik
Copy link
Contributor

rpavlik commented Apr 14, 2021

@ChristophHaag can you help out with a review on this?

Copy link
Contributor

@ChristophHaag ChristophHaag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't checked if those locks really lock everything that needs to be locked but I tried it with the gcc thread sanitizer and this PR fixes almost all of the issues it reports.

The code looks good safe for the two minor possible improvements I noted.

@@ -248,7 +245,6 @@ XRAPI_ATTR XrResult XRAPI_CALL LoaderXrCreateInstance(const XrInstanceCreateInfo

// Make sure only one thread is attempting to read the JSON files and use the instance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment now stale?
If the reason for the block below was the scope for locking, it doesn't need to be a block anymore now.

@@ -117,8 +113,8 @@ XRAPI_ATTR XrResult XRAPI_CALL LoaderXrEnumerateInstanceExtensionProperties(cons
XrResult result;

{
// Make sure only one thread is attempting to read the JSON files at a time.
std::unique_lock<std::mutex> json_lock(GetLoaderJsonMutex());
// Make sure the runtime isn't unloaded while this call is in progress.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nitpick: Maybe the original comment could be preserved since that lock does both of these things now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither Bryce nor I could figure out why it mattered that only one thread would read the json at a time, though. We suspect it was left over from when the loader was more ambitious and trying to do multiple instances and comparing runtimes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like the only real reason would be if the json parser was not threadsafe/had global state. I suppose that is not the case?`

@rpavlik rpavlik merged commit e5432f7 into KhronosGroup:master May 10, 2021
rpavlik added a commit that referenced this pull request May 11, 2021
This release contains an update to define a new error code,
XR_ERROR_RUNTIME_UNAVAILABLE, now returned by the loader at
xrCreateInstance and xrEnumerateInstanceProperties when it cannot find
or load a runtime for some reason. This should be more clear for
developers when encountering it, as well as helpful when troubleshooting
errors hit by users. (The previously-returned error was typically
XR_ERROR_INSTANCE_LOST, which is confusing when returned when trying to
create an instance.) This release also includes a new multi-vendor
extension, a new vendor extension, and improved concurrency handling in
the loader, among smaller fixes.

-   Registry
    -   Add new XR_ERROR_RUNTIME_UNAVAILABLE error code, add
        XR_ERROR_RUNTIME_UNAVAILABLE as a supported error code to
        xrCreateInstance and xrEnumerateInstanceProperties, and remove
        XR_ERROR_INSTANCE_LOST as a supported error code from
        xrCreateInstance. (internal MR 2024, internal issue 1552,
        OpenXR-SDK-Source/#177)
    -   Add XR_EXT_hand_joint_motion_range multi-vendor extension.
        (internal MR 1995)
    -   Add XR_FB_swapchain_update_state vendor extension. (internal MR
        1997)
    -   Fix missing XR_ERROR_INSTANCE_LOST return codes for extension
        functions in XR_EXT_performance_settings, XR_EXT_debug_utils,
        XR_EXT_conformance_automation, and XR_EXT_thermal_query.
        (internal MR 2023, OpenXR-Docs/#10, internal issue 1256)
    -   Reserve extension 166 for working group use. (internal MR 2025)
-   SDK
    -   Loader: Change runtime part to return
        XR_ERROR_RUNTIME_UNAVAILABLE when there is an error loading a
        runtime. (internal MR 2024, internal issue 1552,
        OpenXR-SDK-Source/#177)
    -   Loader: Simplify in areas where code paths were dead. (internal
        MR 2024)
    -   Loader: Improved locking around a few areas of the loader that
        aren’t robust against usual concurrent calls.
        (OpenXR-SDK-Source/#252)
    -   validation layer: Fix generated code when a protected extension
        contains a base header type. (internal MR 1997)
rhabacker pushed a commit to rhabacker/OpenXR-SDK-Source that referenced this pull request Nov 16, 2022
This release contains an update to define a new error code,
XR_ERROR_RUNTIME_UNAVAILABLE, now returned by the loader at
xrCreateInstance and xrEnumerateInstanceProperties when it cannot find
or load a runtime for some reason. This should be more clear for
developers when encountering it, as well as helpful when troubleshooting
errors hit by users. (The previously-returned error was typically
XR_ERROR_INSTANCE_LOST, which is confusing when returned when trying to
create an instance.) This release also includes a new multi-vendor
extension, a new vendor extension, and improved concurrency handling in
the loader, among smaller fixes.

-   Registry
    -   Add new XR_ERROR_RUNTIME_UNAVAILABLE error code, add
        XR_ERROR_RUNTIME_UNAVAILABLE as a supported error code to
        xrCreateInstance and xrEnumerateInstanceProperties, and remove
        XR_ERROR_INSTANCE_LOST as a supported error code from
        xrCreateInstance. (internal MR 2024, internal issue 1552,
        OpenXR-SDK-Source/KhronosGroup#177)
    -   Add XR_EXT_hand_joint_motion_range multi-vendor extension.
        (internal MR 1995)
    -   Add XR_FB_swapchain_update_state vendor extension. (internal MR
        1997)
    -   Fix missing XR_ERROR_INSTANCE_LOST return codes for extension
        functions in XR_EXT_performance_settings, XR_EXT_debug_utils,
        XR_EXT_conformance_automation, and XR_EXT_thermal_query.
        (internal MR 2023, OpenXR-Docs/KhronosGroup#10, internal issue 1256)
    -   Reserve extension 166 for working group use. (internal MR 2025)
-   SDK
    -   Loader: Change runtime part to return
        XR_ERROR_RUNTIME_UNAVAILABLE when there is an error loading a
        runtime. (internal MR 2024, internal issue 1552,
        OpenXR-SDK-Source/KhronosGroup#177)
    -   Loader: Simplify in areas where code paths were dead. (internal
        MR 2024)
    -   Loader: Improved locking around a few areas of the loader that
        aren’t robust against usual concurrent calls.
        (OpenXR-SDK-Source/KhronosGroup#252)
    -   validation layer: Fix generated code when a protected extension
        contains a base header type. (internal MR 1997)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants