New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Block gpu thread #2172
Block gpu thread #2172
Conversation
3ef061e
to
4adadef
Compare
I'm not really liking the proliferation of mutexes and condition variables... Fifo.cpp has four mutexes (m_csHWVidOccupied, s_video_buffer_lock, s_fifo_mutex, s_gpu_flush_mutex), and no comments describing what state they actually protect. Is isGpuReadingData supposed to be atomic? |
@@ -58,6 +60,15 @@ static u8* s_video_buffer_pp_read_ptr; | |||
// polls, it's just atomic. | |||
// - The pp_read_ptr is the CPU preprocessing version of the read_ptr. | |||
|
|||
// events between cpu and gpu | |||
static std::atomic<int> s_gpu_is_running; |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
I don't think s_gpu_is_running is needed. Should be able to wait for s_fifo_cond_var directly. |
e8ba439
to
8b02629
Compare
@magumagu I've tried to separate the usage of those cond vars and mutex. If you want I'll just use the same for everything, but imo it will be harder to understand the communication if those variables are used on more than two places. And yes, isGpuReadingData should be atomic, but on x86, volatile is also working. As it isn't touched in this PR, I'll leave the cleanup for another one. Maybe a merge of Fifo.cpp and CommandProcessor.cpp? @phire This atomic is used to keep the condition variable out of the hot RunGpu code. As long as this atomic is set, the GPU thread will recheck everything. Only if it's unset, we might require to wakeup the GPU thread. |
https://dl.dropboxusercontent.com/u/484730/GPUBlock.jpg from @JMC47
So in my opinion, all latencies are fine, but we wake up the GPU too often. Do you think we should make a heuristic to not sleep or to not signal as early? The former may be faster, the later may safe more power. |
8b02629
to
5c9ce69
Compare
Has it been determined how the GPU is affected when you lock it to maximum performance(Max clock speeds)? |
@Sonicadvance1 This PR shouldn't affect the GPU itself just the GPU thread. And configuring other CPU priorities didn't affect the latency here :/ |
5c9ce69
to
1fe7067
Compare
@@ -58,6 +60,26 @@ static u8* s_video_buffer_pp_read_ptr; | |||
// polls, it's just atomic. | |||
// - The pp_read_ptr is the CPU preprocessing version of the read_ptr. | |||
|
|||
// The next three variables are used to wakeup the GPU thread | |||
// - There is a fast-path if the GPU is aready running: | |||
// If the atomic is set, the polling loop assure to execute the fifo. |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This is still up to 25% slower in Super Smash Bros. Melee. |
static std::mutex s_fifo_mutex; | ||
static std::condition_variable s_fifo_cond_var; | ||
|
||
// The next two variables and "isGpuReadingData" are used to block the CPU thread until the GPU is idlying |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
a24761d
to
44597bc
Compare
Rewritten with events. The logic should be much simpler now. |
Latest build is actually faster for me than master. On Rogue Squadron 2's opening, 76 - 78 fps on Master, 78 - 81 FPS on this build. Also faster in Melee, 400 fps in master to 420 fps on Dreamland in this Pull Request. CPU usage is down about 25% in almost all games. Tested on two Intel Quadcore + NVIDIA machines. |
I've choosen to limit the amond of wait() calls. Now the GPU thread is only allowed to sleep once per milli second of emulated time. So we may spend some time in the busy loop, but we'll fall to sleep quite soon on idle skipping. Ready for reviewing and merging in my option |
82e1999
to
ed1fe40
Compare
Updated with a fix for syncgpu and deterministic dual core |
f3dd175
to
ffcbaaa
Compare
My Xenoblade save that runs at 17-18 fps on master is 20-21 FPS on this build. Tested on an Athlon II X2 250 |
b41e791
to
bc38f7b
Compare
Same save and settings profile as above for Xenoblade runs at 29-31 on this latest build PR build. Tested on an Athlon II X2 250 EDIT: was this jump from the merge of PR-2192? |
@Ofunniku yes, so we also need master to compare the performance of this PR |
For Xenoblade: Although I do have some interesting results from SSBM (FoD, 4CPU, Link, Ness, Ice Climbers, Capt. Falcon): |
I did this to compare CPU usage between builds, this was run with [Framelimit: Auto] Xenoblade, Bionis Leg (CPU usage) SSBM (FoD, 4CPU, Link, Ness, Ice Climbers, Capt. Falcon) (CPU usage) |
I can't find any new slowdown in New Super Mario Bros. Wii. with EFB2RAM. 6x IR Master - 43 fps |
Benchmarks I've gathered so far: http://pastebin.com/vvbPX2tx |
Now it's done without a busy loop
This lock isn't required any more as our FlushGpu garanty to block until the GPU is idle
@@ -51,6 +52,7 @@ void AsyncRequests::PushEvent(const AsyncRequests::Event& event, bool blocking) | |||
|
|||
if (blocking) | |||
{ | |||
RunGpu(); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
bc38f7b
to
74795b4
Compare
LGTM |
FifoCI detected that this change impacts graphical rendering. Here are the behavior differences detected by the system:
automated-fifoci-reporter |
Breaks AArch64. |
Causes flickering in games ever since PR dolphin-emu#2172. No idea why
This PR tries to use Common::Event for CPU <-> GPU synchronization. It tries to get the same speed without using busy loops all over the code. So it should run much faster on thermal bottlenecked devices and maybe dual core CPUs.