Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an alternative backend parallel compilation queue for macOS #5291

Closed
wants to merge 2 commits into from

Conversation

gdkchan
Copy link
Member

@gdkchan gdkchan commented Jun 12, 2023

Due to the way how SPIRV-Cross works, some shaders with deeply nested if/else blocks can cause a stack overflow depending on the stack size. SPIRV-Cross is apparently using recursive function calls in some function where it visits blocks in the code (which is bad in general IMO). To work around this issue, this change basically implements a custom thread pool. Since we create the threads ourselves, we can specify a custom stack size that is large enough to make SPIRV-Cross happy.

The obvious downside with that change is that we need to create our own thread pool, which is most likely going to perform worse than .NET one (also the current implementation is pretty basic, it has 8 queues and adds compilation requests to them in a rotating way, maybe it should have a better method for balancing the load).

An alternative to this would passing some value to the backend that indicates the amount of nesting the shader has. The backend would use that value to decide if it should use Tasks to compile shaders in parallel, or if it should just compile it on the render thread (which already has an increased stack size).

Suggestions are welcome. I didn't really want to upstream that since I think not using Tasks here is a pretty significant downside. But as we come very close to upstreaming all the changes, and there's a general desire to tell everyone to switch to more up-to-date builds instead of using macos1 for specific games, I felt it was worth PRing it, even if just to start discussion about a better way to solve this problem.

We will need tests to ensure shader compilation is not slower on macOS with this change.

Affected games

This fixes a stack overflow on Mortal Kombat 11. I also opened #5290 to fix a different crash that I found while testing it. For this game, I had to increase the stack size to 2MB (on macos1, I used 1MB as that was enough for Splatoon 3).
Captura de Tela 2023-06-12 às 00 44 35
Also fixes the stack overflow crash when starting Splatoon 3.
Captura de Tela 2023-06-12 às 00 47 00
Captura de Tela 2023-06-12 às 00 48 15
Captura de Tela 2023-06-12 às 00 48 46
Contributes to #4062, closes #5282.

@gdkchan gdkchan added gpu Related to Ryujinx.Graphics fix Fix something graphics-backend:vulkan Graphical bugs when using the Vulkan API os:macOS An issue or feature request exclusively relating to macOS labels Jun 12, 2023
@@ -134,7 +134,7 @@ internal class AppHost
_inputManager = inputManager;
_accountManager = accountManager;
_userChannelPersistence = userChannelPersistence;
_renderingThread = new Thread(RenderLoop, 1 * 1024 * 1024) { Name = "GUI.RenderThread" };
_renderingThread = new Thread(RenderLoop, 2 * 1024 * 1024) { Name = "GUI.RenderThread" };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the same change in SDL2 headless and GTK for consistency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be in a common place, but I'm not sure where to put it. It will be annoying to change 4 files every time we want to change the stack size.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess maybe some helper in the common project to create a thread with a given name and stack size would be nice?

Then we would have other helper around for the common thread that are duplicated for all projects that have the right values?

Copy link
Member Author

@gdkchan gdkchan Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, the render thread should be created by the ThreadedRenderer and the GPU thread should be created on the GpuContext. But it's not that simple since the OpenGL context is tied to the thread it is created on, and there's also some coupling with the UI and a few places that needs to know if the current thread is the "render" thread. Not something that I will change here though, so I guess I will just increase the stack size of the other threads for now.

@gdkchan
Copy link
Member Author

gdkchan commented Jun 12, 2023

Does this new custom thread pool is only for MacOS, or it's for all platforms? I hope it won't affect other platform, or at least be sure the perf hit is insignificant on modern pc

It is only used on macOS, nothing should change for other platforms.

@AcK77
Copy link
Member

AcK77 commented Jun 14, 2023

It is only used on macOS, nothing should change for other platforms.

Maybe it could be nice to notice that somewhere in the code or at least add an OS check (even if it's useless) just in case?

@gdkchan
Copy link
Member Author

gdkchan commented Jun 14, 2023

It is only used on macOS, nothing should change for other platforms.

Maybe it could be nice to notice that somewhere in the code or at least add an OS check (even if it's useless) just in case?

It already has both the OS check and a comment explaining why it is done:

            if (OperatingSystem.IsMacOS())
            {
                MVKInitialization.Initialize();

                // Any device running on MacOS is using MoltenVK, even Intel and AMD vendors.
                IsMoltenVk = true;

                // The default thread stack size on MacOS is low, and can cause stack overflow
                // on SPIR-V Cross during shader compilation.
                // As a workaround, we use this custom queue which allows us to specify the stack
                // size of the threads used for compilation.
                ShaderCompilationQueue = new ShaderCompilationQueue();
            }

Copy link
Member

@AcK77 AcK77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah the check is already in the existing code, sorry! LGTM then (In cases there is no performance issue in the shader compilation)!

@@ -134,7 +134,7 @@ internal class AppHost
_inputManager = inputManager;
_accountManager = accountManager;
_userChannelPersistence = userChannelPersistence;
_renderingThread = new Thread(RenderLoop, 1 * 1024 * 1024) { Name = "GUI.RenderThread" };
_renderingThread = new Thread(RenderLoop, 2 * 1024 * 1024) { Name = "GUI.RenderThread" };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of readability, it might be nice to make it a named constant - either the whole stack size or the multiplier

@patrick-hovsepian
Copy link
Contributor

An alternative to this would passing some value to the backend that indicates the amount of nesting the shader has. The backend would use that value to decide if it should use Tasks to compile shaders in parallel, or if it should just compile it on the render thread (which already has an increased stack size).

This sounds cleaner to me at face value. Granted, I have very little experience with gpu code myself but if it's something that I'd like to take a crack at. Just not sure if priority/timing is a concern as I'd probably fumble through it.

@riperiperi
Copy link
Member

riperiperi commented Jun 18, 2023

I really don't think that's cleaner, the level of nesting it would split at is arbitrary, and the information is not useful for most platforms. An alternate task queue changes the least amount of shared code, and keeps background compilation active for all shaders.

@patrick-hovsepian
Copy link
Contributor

It seems to me that using framework managed Tasks would be cleaner than hand rolling custom thread management. You both know the space way better than I do though 😅

@Shihta
Copy link
Contributor

Shihta commented Jun 21, 2023

hope it's not a stupid question... if it's just because the stack size is too small, why not just increase it?

refer to dotnet/fsharp#9231 (comment)

export COMPlus_DefaultStackSize=800000 and then start Ryujinx, Splatoon 3 seems to start as normal

@gdkchan
Copy link
Member Author

gdkchan commented Jun 21, 2023

hope it's not a stupid question... if it's just because the stack size is too small, why not just increase it?

refer to dotnet/fsharp#9231 (comment)

export COMPlus_DefaultStackSize=800000 and then start Ryujinx, Splatoon 3 seems to start as normal

I tried that on macOS some time ago and got a crash, maybe I did something wrong. Anyway, it is undesirable to tell users to set an environment variable before running the emulator, so we need a solution that doesn't require any user intervention.

@Shihta
Copy link
Contributor

Shihta commented Jun 22, 2023

hope it's not a stupid question... if it's just because the stack size is too small, why not just increase it?
refer to dotnet/fsharp#9231 (comment)
export COMPlus_DefaultStackSize=800000 and then start Ryujinx, Splatoon 3 seems to start as normal

I tried that on macOS some time ago and got a crash, maybe I did something wrong. Anyway, it is undesirable to tell users to set an environment variable before running the emulator, so we need a solution that doesn't require any user intervention.

environment variable can be set in Ryujinx.app/Contents/Info.plist, refer to here
so we can add these:

    <key>LSEnvironment</key>
    <dict>
        <key>COMPlus_DefaultStackSize</key>
        <string>800000</string>
    </dict>

in this way, users can start Ryujinx in the same way (simply click on the icon)

@sunshineinabox
Copy link
Contributor

I am wondering if this is a reasonable alternative
master...sunshineinabox:Ryujinx:AltSplat

would be better than changing stacksize for all threads.

@gdkchan
Copy link
Member Author

gdkchan commented Jun 22, 2023

hope it's not a stupid question... if it's just because the stack size is too small, why not just increase it?
refer to dotnet/fsharp#9231 (comment)
export COMPlus_DefaultStackSize=800000 and then start Ryujinx, Splatoon 3 seems to start as normal

I tried that on macOS some time ago and got a crash, maybe I did something wrong. Anyway, it is undesirable to tell users to set an environment variable before running the emulator, so we need a solution that doesn't require any user intervention.

environment variable can be set in Ryujinx.app/Contents/Info.plist, refer to here so we can add these:

    <key>LSEnvironment</key>
    <dict>
        <key>COMPlus_DefaultStackSize</key>
        <string>800000</string>
    </dict>

in this way, users can start Ryujinx in the same way (simply click on the icon)

Nice, I didn't know you could do that. Still, maybe increasing the stack size of all threads is a bit excessive (but it would have the advantage that we can match the default stack size of other platforms, which is usually higher than macOS).

I am wondering if this is a reasonable alternative master...sunshineinabox:Ryujinx:AltSplat

would be better than changing stacksize for all threads.

Isn't this basically what this PR does. Although the implementation doesn't look correct, it creates new threads every time a new shader needs to be compiled which I guess is not going to be very efficient. Also it waits for the threads to exit right after which makes it kinda pointless. It should only need to wait when the shaders are actually used.

@sunshineinabox
Copy link
Contributor

That was my bad I was not paying attention and reviewing the changes and yes I would trust your changes more as I am way out of my depth here. My mistake sorry!

@Shihta
Copy link
Contributor

Shihta commented Jun 23, 2023

environment variable can be set in Ryujinx.app/Contents/Info.plist, refer to here so we can add these:

    <key>LSEnvironment</key>
    <dict>
        <key>COMPlus_DefaultStackSize</key>
        <string>800000</string>
    </dict>

in this way, users can start Ryujinx in the same way (simply click on the icon)

Nice, I didn't know you could do that. Still, maybe increasing the stack size of all threads is a bit excessive (but it would have the advantage that we can match the default stack size of other platforms, which is usually higher than macOS).

OK! I think I could look into how to add this to the CI process
I'll send PR when I'm done

@gdkchan
Copy link
Member Author

gdkchan commented Jun 25, 2023

Superseded by #5349.

@gdkchan gdkchan closed this Jun 25, 2023
@gdkchan gdkchan deleted the shader-queue branch June 25, 2023 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Fix something gpu Related to Ryujinx.Graphics graphics-backend:vulkan Graphical bugs when using the Vulkan API os:macOS An issue or feature request exclusively relating to macOS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] apple silicon running splatoon 3 crashing
7 participants