Block gpu thread #2172

degasus · 2015-03-05T19:00:28Z

This PR tries to use Common::Event for CPU <-> GPU synchronization. It tries to get the same speed without using busy loops all over the code. So it should run much faster on thermal bottlenecked devices and maybe dual core CPUs.

magumagu · 2015-03-05T20:39:49Z

I'm not really liking the proliferation of mutexes and condition variables... Fifo.cpp has four mutexes (m_csHWVidOccupied, s_video_buffer_lock, s_fifo_mutex, s_gpu_flush_mutex), and no comments describing what state they actually protect.

Is isGpuReadingData supposed to be atomic?

Source/Core/VideoCommon/Fifo.cpp

@@ -58,6 +60,15 @@ static u8* s_video_buffer_pp_read_ptr;
 // polls, it's just atomic.
 // - The pp_read_ptr is the CPU preprocessing version of the read_ptr.

+// events between cpu and gpu
+static std::atomic<int> s_gpu_is_running;


phire · 2015-03-05T20:45:11Z

I don't think s_gpu_is_running is needed.

Should be able to wait for s_fifo_cond_var directly.

degasus · 2015-03-08T08:12:45Z

@magumagu I've tried to separate the usage of those cond vars and mutex. If you want I'll just use the same for everything, but imo it will be harder to understand the communication if those variables are used on more than two places.

And yes, isGpuReadingData should be atomic, but on x86, volatile is also working. As it isn't touched in this PR, I'll leave the cleanup for another one. Maybe a merge of Fifo.cpp and CommandProcessor.cpp?

@phire This atomic is used to keep the condition variable out of the hot RunGpu code. As long as this atomic is set, the GPU thread will recheck everything. Only if it's unset, we might require to wakeup the GPU thread.

degasus · 2015-03-08T16:04:57Z

https://dl.dropboxusercontent.com/u/484730/GPUBlock.jpg from @JMC47

The latency of the condition variables (signaling to wakeup) is on average about 4us which is fine.
The CPU thread did spend 3% of the real time waiting for the GPU, likely because of the idle skipping syncing hack. This includes all latency issues and so it's almost nothing. But this game only syncs once per frame, others may sync more often.
The GPU did get idle about 80 times per frame. The average signaling time is also about 4us, but this sums up to 8% of the real time on the CPU thread. This explains the slowdown from 270 fps in master to 246 fps here.

So in my opinion, all latencies are fine, but we wake up the GPU too often. Do you think we should make a heuristic to not sleep or to not signal as early? The former may be faster, the later may safe more power.

Sonicadvance1 · 2015-03-08T20:51:15Z

Has it been determined how the GPU is affected when you lock it to maximum performance(Max clock speeds)?

degasus · 2015-03-08T22:15:05Z

@Sonicadvance1 This PR shouldn't affect the GPU itself just the GPU thread. And configuring other CPU priorities didn't affect the latency here :/

Source/Core/VideoCommon/Fifo.cpp

@@ -58,6 +60,26 @@ static u8* s_video_buffer_pp_read_ptr;
 // polls, it's just atomic.
 // - The pp_read_ptr is the CPU preprocessing version of the read_ptr.

+// The next three variables are used to wakeup the GPU thread
+// - There is a fast-path if the GPU is aready running:
+//   If the atomic is set, the polling loop assure to execute the fifo.


JMC47 · 2015-03-11T23:22:26Z

This is still up to 25% slower in Super Smash Bros. Melee.

Source/Core/VideoCommon/Fifo.cpp

+static std::mutex s_fifo_mutex;
+static std::condition_variable s_fifo_cond_var;
+
+// The next two variables and "isGpuReadingData" are used to block the CPU thread until the GPU is idlying


degasus · 2015-03-13T21:54:25Z

Rewritten with events. The logic should be much simpler now.

JMC47 · 2015-03-13T22:53:15Z

Latest build is actually faster for me than master.

On Rogue Squadron 2's opening, 76 - 78 fps on Master, 78 - 81 FPS on this build. Also faster in Melee, 400 fps in master to 420 fps on Dreamland in this Pull Request.

CPU usage is down about 25% in almost all games.

Tested on two Intel Quadcore + NVIDIA machines.

degasus · 2015-03-13T23:06:46Z

I've choosen to limit the amond of wait() calls. Now the GPU thread is only allowed to sleep once per milli second of emulated time. So we may spend some time in the busy loop, but we'll fall to sleep quite soon on idle skipping.

Ready for reviewing and merging in my option

degasus · 2015-03-14T07:58:31Z

Updated with a fix for syncgpu and deterministic dual core

Ofunniku · 2015-03-16T06:02:47Z

My Xenoblade save that runs at 17-18 fps on master is 20-21 FPS on this build.

Tested on an Athlon II X2 250

Ofunniku · 2015-03-30T11:39:28Z

Same save and settings profile as above for Xenoblade runs at 29-31 on this latest build PR build.

Tested on an Athlon II X2 250

EDIT: was this jump from the merge of PR-2192?

degasus · 2015-03-30T14:20:35Z

@Ofunniku yes, so we also need master to compare the performance of this PR

Ofunniku · 2015-03-30T15:40:28Z

For Xenoblade:
4.0-5762: 17-18
This PR (Old): 20-21
4.0-5952: 29-31
This PR (2015-03-29): 29-31

Although I do have some interesting results from SSBM (FoD, 4CPU, Link, Ness, Ice Climbers, Capt. Falcon):
4.0-5762: 43-45
This PR (Old): 53-54
4.0-5952: 55-59
This PR (2015-03-29): 57-63

Ofunniku · 2015-03-30T16:14:50Z

I did this to compare CPU usage between builds, this was run with [Framelimit: Auto]

Xenoblade, Bionis Leg (CPU usage)
4.0-5952: CPU1 ~68% CPU2 98%
PR-2172 (2015-03-29): CPU1 59% CPU2 48%

SSBM (FoD, 4CPU, Link, Ness, Ice Climbers, Capt. Falcon) (CPU usage)
4.0-5952: CPU1 100% CPU2 100%
PR-2172 (2015-03-29): CPU1 ~83% CPU2 ~83%

JMC47 · 2015-04-03T05:37:03Z

I can't find any new slowdown in New Super Mario Bros. Wii. with EFB2RAM.

6x IR Master - 43 fps
6x IR PR2172 - 43 fps

Ofunniku · 2015-04-03T05:44:14Z

Benchmarks I've gathered so far: http://pastebin.com/vvbPX2tx
Win7, Athlon II x2 250, Radeon HD 6670

Now it's done without a busy loop

This lock isn't required any more as our FlushGpu garanty to block until the GPU is idle

Source/Core/VideoCommon/AsyncRequests.cpp

@@ -51,6 +52,7 @@ void AsyncRequests::PushEvent(const AsyncRequests::Event& event, bool blocking)

 	if (blocking)
 	{
+		RunGpu();


phire · 2015-04-06T11:08:15Z

LGTM

Block gpu thread

dolphin-emu-bot · 2015-04-06T13:14:43Z

FifoCI detected that this change impacts graphical rendering. Here are the behavior differences detected by the system:

rs2-glass on ogl-lin-nv: diff

_{^{automated-fifoci-reporter}}

Sonicadvance1 · 2015-04-07T23:05:09Z

Breaks AArch64.

Causes flickering in games ever since PR dolphin-emu#2172. No idea why

degasus force-pushed the block_gpu_thread branch 2 times, most recently from 3ef061e to 4adadef Compare March 5, 2015 20:18

lioncash reviewed Mar 5, 2015
View reviewed changes

degasus force-pushed the block_gpu_thread branch 5 times, most recently from e8ba439 to 8b02629 Compare March 7, 2015 16:00

degasus force-pushed the block_gpu_thread branch from 8b02629 to 5c9ce69 Compare March 8, 2015 16:34

degasus force-pushed the block_gpu_thread branch from 5c9ce69 to 1fe7067 Compare March 11, 2015 22:28

phire reviewed Mar 11, 2015
View reviewed changes

Source/Core/VideoCommon/Fifo.cpp

static std::mutex s_fifo_mutex;

static std::condition_variable s_fifo_cond_var;

// The next two variables and "isGpuReadingData" are used to block the CPU thread until the GPU is idlying

This comment was marked as off-topic.

Sign in to view

degasus force-pushed the block_gpu_thread branch 4 times, most recently from a24761d to 44597bc Compare March 13, 2015 21:53

degasus force-pushed the block_gpu_thread branch from 82e1999 to ed1fe40 Compare March 14, 2015 07:52

degasus force-pushed the block_gpu_thread branch 5 times, most recently from f3dd175 to ffcbaaa Compare March 15, 2015 17:26

degasus force-pushed the block_gpu_thread branch 2 times, most recently from b41e791 to bc38f7b Compare March 29, 2015 13:23

degasus and others added 6 commits April 6, 2015 12:35

Fifo: Replace busy loop with condition variable

279c657

Fifo: use the outer loop on sync GPU

9bdaa00

Fifo: rewrite sync on idle skipping hack

b020ae1

Now it's done without a busy loop

Fifo: only sleep once within every ms of emulated time

d2c62b1

Fifo: only touch the SIMD state once in the single core loop

b1ffd32

Fifo: rewrite Fifo_PauseAndLock

74795b4

This lock isn't required any more as our FlushGpu garanty to block until the GPU is idle

phire reviewed Apr 6, 2015
View reviewed changes

Source/Core/VideoCommon/AsyncRequests.cpp

@@ -51,6 +52,7 @@ void AsyncRequests::PushEvent(const AsyncRequests::Event& event, bool blocking)

if (blocking)

{

RunGpu();

This comment was marked as off-topic.

Sign in to view

degasus force-pushed the block_gpu_thread branch from bc38f7b to 74795b4 Compare April 6, 2015 10:51

degasus added a commit that referenced this pull request Apr 6, 2015

Merge pull request #2172 from degasus/block_gpu_thread

4669b50

Block gpu thread

degasus merged commit 4669b50 into dolphin-emu:master Apr 6, 2015

Sonicadvance1 added a commit to Sonicadvance1/dolphin that referenced this pull request May 11, 2015

[AArch64] Disable psq_l.

2d47a15

Causes flickering in games ever since PR dolphin-emu#2172. No idea why

Sonicadvance1 mentioned this pull request May 11, 2015

[AArch64] Disable psq_l. #2397

Merged

degasus deleted the block_gpu_thread branch August 9, 2015 08:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Block gpu thread #2172

Block gpu thread #2172

degasus commented Mar 5, 2015

magumagu commented Mar 5, 2015

This comment was marked as off-topic.

phire commented Mar 5, 2015

degasus commented Mar 8, 2015

degasus commented Mar 8, 2015

Sonicadvance1 commented Mar 8, 2015

degasus commented Mar 8, 2015

This comment was marked as off-topic.

JMC47 commented Mar 11, 2015

This comment was marked as off-topic.

degasus commented Mar 13, 2015

JMC47 commented Mar 13, 2015

degasus commented Mar 13, 2015

degasus commented Mar 14, 2015

Ofunniku commented Mar 16, 2015

Ofunniku commented Mar 30, 2015

degasus commented Mar 30, 2015

Ofunniku commented Mar 30, 2015

Ofunniku commented Mar 30, 2015

JMC47 commented Apr 3, 2015

Ofunniku commented Apr 3, 2015

This comment was marked as off-topic.

phire commented Apr 6, 2015

dolphin-emu-bot commented Apr 6, 2015

Sonicadvance1 commented Apr 7, 2015

Block gpu thread #2172

Block gpu thread #2172

Conversation

degasus commented Mar 5, 2015

magumagu commented Mar 5, 2015

This comment was marked as off-topic.

phire commented Mar 5, 2015

degasus commented Mar 8, 2015

degasus commented Mar 8, 2015

Sonicadvance1 commented Mar 8, 2015

degasus commented Mar 8, 2015

This comment was marked as off-topic.

JMC47 commented Mar 11, 2015

This comment was marked as off-topic.

degasus commented Mar 13, 2015

JMC47 commented Mar 13, 2015

degasus commented Mar 13, 2015

degasus commented Mar 14, 2015

Ofunniku commented Mar 16, 2015

Ofunniku commented Mar 30, 2015

degasus commented Mar 30, 2015

Ofunniku commented Mar 30, 2015

Ofunniku commented Mar 30, 2015

JMC47 commented Apr 3, 2015

Ofunniku commented Apr 3, 2015

This comment was marked as off-topic.

phire commented Apr 6, 2015

dolphin-emu-bot commented Apr 6, 2015

Sonicadvance1 commented Apr 7, 2015