Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a specific core migration bug on the scheduler #2271

Merged
merged 1 commit into from
May 11, 2021

Conversation

gdkchan
Copy link
Member

@gdkchan gdkchan commented May 7, 2021

This fixes a bug that could occur during context switching if the scheduler decided to move a thread to another core, and then back.
Triggering the bug requires a very specific scenario:

  • Core 1 must pick a new thread "B" on the Schedule method, while already executing a guest thread "A".
  • Core 2 must pick the thread "A" currently executing on core 1.
  • The scheduler must move the thread "A" back to core 1, before thread "A" has a chance to switch to the next thread.

Due to the if (currentThread == nextThread) check, the SwitchTo method will exit early without updating the CurrentCore of the next thread to run. This means that on the above scenario, thread "A" will be running on core 1, but CurrentCore will be set to 2 which is incorrect. The CurrentCore value is used by the thread to pass control to the next thread scheduled on that core for example, so having the wrong value there could lead to other issues.

Triggering in pratice:

I tried a bunch of different things to try triggering this bug, mainly to prove that it can actually happen in pratice. Unfortunately I was not very successful. First I tried to check if it was happening on some games, by adding extra checks to assert that the thread CurrentCore is correct. It did not trigger on the few games I tried. Then I wrote targeted tests that basically waits/signals CVs non-stop while also changing the thread affinity on each iteration, to increase the likelyhood of the scenario described above happening. It also did not trigger the assert. Finally, I decided to "cheat" and just modified the scheduler to swap the threads and create the required scenario artificially. Doing this I was finally able to trigger the assert.

The fix:

The fix was basically moving the if (currentThread == nextThread) so that the CurrentCore value is still set regardless. It was validated using the setup described above. Again, I don't know for sure which games it may fix since I was not able to trigger it without changing the scheduler a little bit to create the scenario required to trigger the bug.

Other changes:

While I was at it, I made two other changes:

  • Added volatile to the scheduler state fields, since they are read/written from different threads, and all of them must be able to see those changes.
  • Made the Yield functions return early if the thread is not "schedulable", which is the case for HLE service threads. So instead of throwing, they just don't do anything on yield. This might be useful in the future if we decide to call Yield from HLE service threads to match the original code.

@gdkchan gdkchan added fix Fix something kernel Related to the kernel labels May 7, 2021
@gdkchan gdkchan marked this pull request as ready for review May 7, 2021 16:38
Copy link
Contributor

@marysaka marysaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch 👍

Copy link
Member

@riperiperi riperiperi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

@gdkchan gdkchan merged commit ebdbaa6 into Ryujinx:master May 11, 2021
@gdkchan gdkchan deleted the fix-sched-mig branch May 11, 2021 16:18
@gdkchan gdkchan added the horizon Related to Ryujinx.HLE label May 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Fix something horizon Related to Ryujinx.HLE kernel Related to the kernel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants