Fix a specific core migration bug on the scheduler #2271
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This fixes a bug that could occur during context switching if the scheduler decided to move a thread to another core, and then back.
Triggering the bug requires a very specific scenario:
Schedule
method, while already executing a guest thread "A".Due to the
if (currentThread == nextThread)
check, theSwitchTo
method will exit early without updating theCurrentCore
of the next thread to run. This means that on the above scenario, thread "A" will be running on core 1, butCurrentCore
will be set to 2 which is incorrect. TheCurrentCore
value is used by the thread to pass control to the next thread scheduled on that core for example, so having the wrong value there could lead to other issues.Triggering in pratice:
I tried a bunch of different things to try triggering this bug, mainly to prove that it can actually happen in pratice. Unfortunately I was not very successful. First I tried to check if it was happening on some games, by adding extra checks to assert that the thread
CurrentCore
is correct. It did not trigger on the few games I tried. Then I wrote targeted tests that basically waits/signals CVs non-stop while also changing the thread affinity on each iteration, to increase the likelyhood of the scenario described above happening. It also did not trigger the assert. Finally, I decided to "cheat" and just modified the scheduler to swap the threads and create the required scenario artificially. Doing this I was finally able to trigger the assert.The fix:
The fix was basically moving the
if (currentThread == nextThread)
so that theCurrentCore
value is still set regardless. It was validated using the setup described above. Again, I don't know for sure which games it may fix since I was not able to trigger it without changing the scheduler a little bit to create the scenario required to trigger the bug.Other changes:
While I was at it, I made two other changes:
volatile
to the scheduler state fields, since they are read/written from different threads, and all of them must be able to see those changes.Yield
functions return early if the thread is not "schedulable", which is the case for HLE service threads. So instead of throwing, they just don't do anything on yield. This might be useful in the future if we decide to callYield
from HLE service threads to match the original code.