New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NetPlay/Jit64: Avoid using software FMA #9804
Conversation
09883ec
to
23786a3
Compare
When I added the software FMA path in 2c38d64 and made us use it when determinism is enabled, I was assuming that either the performance impact of software FMA wouldn't be too large or CPUs that were too old to have FMA instructions were too slow to run Dolphin well anyway. This was wrong. To give an example, the netplay performance went from 60 FPS to 30 FPS in one case. This change makes netplay clients negotiate whether FMA should be used. If all clients use an x64 CPU that supports FMA, or AArch64, then FMA is enabled, and otherwise FMA is disabled. In other words, we sacrifice accuracy if needed to avoid massive slowdown, but not otherwise. When not using netplay, whether to enable FMA is simply based on whether the host CPU supports it. The only remaining case where the software FMA path gets used under normal circumstances is when an input recording is created on a CPU with FMA support and then played back on a CPU without. This is not an especially common scenario (though it can happen), and TASers are generally less picky about performance and more picky about accuracy than other users anyway. With this change, FMA desyncs are avoided between AArch64 and modern x64 CPUs (unlike before 2c38d64), but we do get FMA desyncs between AArch64 and old x64 CPUs (like before 2c38d64). This desync can be avoided by adding a non-FMA path to JitArm64 as an option, which I will wait with for another pull request so that we can get the performance regression fixed as quickly as possible. https://bugs.dolphin-emu.org/issues/12542
| @@ -272,7 +275,7 @@ void INIGameConfigLayerLoader::Save(Config::Layer* layer) | |||
| const Config::Location& location = config.first; | |||
| const std::optional<std::string>& value = config.second; | |||
|
|
|||
| if (!IsSettingSaveable(location)) | |||
| if (!IsSettingSaveable(location) || location.system == Config::System::Session) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we just modify IsSettingSaveable to return false for location.system == Config::System::Session instead? Maybe it already does that, which would make this check redundant I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsSettingSaveable marks settings that are still handled by the old config system, and based on code comments, it seems like we intend to drop IsSettingSaveable when the old config system is dead. The Session system is already handled by the new config system, and we want to keep not saving it in the future when the old config system is gone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yeah okay. I guess Session is where anything that we don't want to be persistent should go. I'm gonna make a follow-up PR that moves a few of such settings there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still feels a bit weird that Config::System::Session shows up scattered around everywhere. Isn't this what the layered configuration system was designed to solve in first place? Have a layer that is session-only, doesn't save, and has that single setting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The layered config system supports session-only values, but not session-only settings. I want to make it impossible for this setting to get loaded from Dolphin.ini so that nobody overrides the default by accident.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't tested this yet, but code looks good. I don't have pre-Haswell hardware in service to properly test anyways, just decommissioned my Sandy Bridge hardware last month.
|
I asked people in the netplay community that I got the performance regression report from to test it, and they said things work with good performance and without desyncs regardless of whether the players' CPUs support FMA. I've also verified on my own that when you run netplay with only CPUs that support FMA, you do take the hardware FMA code path. |
When I added the software FMA path in 2c38d64 and made us use it when determinism is enabled, I was assuming that either the performance impact of software FMA wouldn't be too large or CPUs that were too old to have FMA instructions were too slow to run Dolphin well anyway. This was wrong. To give an example, the netplay performance went from 60 FPS to 30 FPS in one case.
This change makes netplay clients negotiate whether FMA should be used. If all clients use an x64 CPU that supports FMA, or AArch64, then FMA is enabled, and otherwise FMA is disabled. In other words, we sacrifice accuracy if needed to avoid massive slowdown, but not otherwise. When not using netplay, whether to enable FMA is simply based on whether the host CPU supports it.
The only remaining case where the software FMA path gets used under normal circumstances is when an input recording is created on a CPU with FMA support and then played back on a CPU without. This is not an especially common scenario (though it can happen), and TASers are generally less picky about performance and more picky about accuracy than other users anyway.
With this change, FMA desyncs are avoided between AArch64 and modern x64 CPUs (unlike before 2c38d64), but we do get FMA desyncs between AArch64 and old x64 CPUs (like before 2c38d64). This desync can be avoided by adding a non-FMA path to JitArm64 as an option, which I will wait with for another pull request so that we can get the performance regression fixed as quickly as possible.
https://bugs.dolphin-emu.org/issues/12542