Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(mvk): resumeLostDevice #4800

Merged
merged 1 commit into from May 10, 2023
Merged

Conversation

AaronO
Copy link
Contributor

@AaronO AaronO commented May 4, 2023

Command buffer errors currently trigger an exception "DeviceLost" crashing the process.

Looking at MKV's code we observe that:

  • It hard fails if error is:
    MTLCommandBufferErrorBlacklisted || MTLCommandBufferErrorNotPermitted || MTLCommandBufferErrorDeviceRemoved
    
  • Otherwise fails conditionally if config.resumeLostDevice == false (current default)

For Ryujinx's use-case it's more graceful to resume on those errors rather than crashing the app, the error isn't totally silenced since mvk still logs it

Fixes #4704, #4575

Notes

  • Users can currently workaround this by setting this env MVK_CONFIG_RESUME_LOST_DEVICE=1, but IMO this should be the default behaviour

Command buffer errors currently trigger an exception "DeviceLost" crashing the process.

Looking at [MKV's code](https://github.com/KhronosGroup/MoltenVK/blob/53a4eb26f2fbd7322eb087eb4af263fe466543b0/MoltenVK/MoltenVK/GPUObjects/MVKQueue.mm#L392-L408) we observe that:
- It hard fails if error is:
  ```
   MTLCommandBufferErrorBlacklisted || MTLCommandBufferErrorNotPermitted || MTLCommandBufferErrorDeviceRemoved
   ```
- Otherwise fails conditionally if `config.resumeLostDevice == false` (current default)

For Ryujinx's use-case it's more graceful to resume on those errors rather than crashing the app, the error isn't totally silenced since `mvk` still logs it

Fixes Ryujinx#4704, Ryujinx#4575
@gdkchan
Copy link
Member

gdkchan commented May 4, 2023

The device lost error is usually returned when there's something wrong, like a Metal error. For example, it is returned when Metal reports a InvalidResource error. Ignoring it does not fix the underlying issue, and the scene will not render correctly since it will likely just drop the problematic command buffer. It is better to fix the root issue than to do this. It could be useful for the few cases where the game is still playable despite it triggering errors here and there.

@IsaacMarovitz
Copy link
Member

IsaacMarovitz commented May 4, 2023

The device lost error is usually returned when there's something wrong, like a Metal error. For example, it is returned when Metal reports a InvalidResource error. Ignoring it does not fix the underlying issue, and the scene will not render correctly since it will likely just drop the problematic command buffer. It is better to fix the root issue than to do this. It could be useful for the few cases where the game is still playable despite it triggering errors here and there.

It's worth testing the specific game listed. If it looks fine IMO this is something we should consider. It's pretty common to use this env var.

@gdkchan
Copy link
Member

gdkchan commented May 4, 2023

Those games do have InvalidResource error, so I doubt they will look fine. But it shouldn't be hard to test.

@AaronO
Copy link
Contributor Author

AaronO commented May 4, 2023

I agree it would be best to fix the root issue, but in my experience apps/games with these issues are still useable/tolerable, so this could be a reasonable patch until the root issue is resolved, landing this patch wouldn't hinder investigating it (errors are still logged and devs can simply set this to false)

A visual glitch / misrender is preferable than a crash to the end-user, if an app is glitchy they can choose to stop using it, if the emulator crashes they have no choice, so it's a "lesser evil" IMO

@AaronO
Copy link
Contributor Author

AaronO commented May 4, 2023

I haven't tested many applications but in my experience this error was relatively infrequent and "random" in appearance. In practice this isn't failing every draw call but once every 10mn (or more)

@riperiperi
Copy link
Member

This option is enabled on the macos1 build, but there are lots of mitigations that avoid the moltenvk bug that causes the Invalid Resource errors. I've explained why I think the error appears over here:

KhronosGroup/MoltenVK#1870

@AcK77 AcK77 added gpu Related to Ryujinx.Graphics fix Fix something labels May 5, 2023
Copy link
Member

@riperiperi riperiperi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is reasonable to enable, but we should probably at least partially work around Invalid Resource errors.

@AcK77 AcK77 merged commit a7c6e6a into Ryujinx:master May 10, 2023
6 checks passed
@AaronO AaronO deleted the fix/mvk-resumeLostDevice branch May 10, 2023 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Fix something gpu Related to Ryujinx.Graphics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Mario Strikers: Battle League - Crash when starting any match or training
5 participants