New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Linux timers for sleeps up to 1ms #6697
Conversation
As with any good timer patch for rpcs3, please include comparison results with this reference testcase. |
What about the BSDs? This could break other Unices builds that doesn't support Linux timers? |
@hardBSDk no it can't break anything. |
@Nekotekina Thanks! I will run RPCS3 by the first time since I take a computer. Love your work @kd-11 @Nekotekina |
@kd-11 Thanks for the testcase. I will give it a try and report the results back. @Nekotekina You are right. timerfd does not support signalling. The thread will block during execution of the sleeps. For that reason I only implemented waits for small intervals. I'm totally fine if we reduce the maximum sleep time from 1000us to lets say 250us or even down to 100us. This is essentially the time interval the optimization aims for. |
Here are the test results:
Btw. got a mail but did not find anything here ... Someone mentioned that the patch must increase the ppu cache version. But where? |
Elad made that comment but deleted afterwards. |
Sorry, looks like I forgot to attach the base test (0-600) |
Seems that at idle the accuracy is approx 16us, but if switch to the performance governor the accuracy is 10us. Since the load of RPCS3 will cause any system to raise to maximum power state, it makes sense to test the accuracy of this at maximum power state as well. So I suggest simply setting the min quantum to 10us rather than 16, this will also have the benefit of "fixing" the behavior when the accuracy is set to usleep for values of 30us, since |
I initially started with 15us quantum but felt that 16us is a little more accurate. As requested here are all the numbers. Ran the test with 16us/50us quantums from the last commit. Host idle
Single thread load of dd if=/dev/zero of=/dev/null
|
Looks like the timer drifts a lot more on your computer than mine. host, this PR
|
I added an "alert" parameter to wait_for function. Timer should only be used if "alert" is false, otherwise it'll significantly increase synchronization latency. |
Hopefully rebased the pull request correctly to fit Nekotekinas additions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You rebased the wrong way around, you should rebase rpcs3/master onto master.
Urgh. |
Better now? |
Nope, the commit list still has 20 commits instead of 2 |
You may want a clean start here.
|
I think it's three commits though? |
Forgot to mention you have to checkout master between 1 and 2 to avoid losing your work. |
I properly rebased it on my plappermaul branch if you need help |
@kd-11 Thanks, I will try your instructions. |
v1: Initial version v2: implement review comments v3: adapt to new API The current sleep timer implementation basically offers two variants. Either wait the specified time exactly with a condition variable (as host) or use a combination of it with a thread yielding busy loop afterwards (usleep timer). While the second one is very precise it consumes CPU loops for each wait call below 50us. Games like Bomberman Ultra spam 30us waits and the emulator hogs low power CPUs. Switching to host mode reduces CPU consumption but gives a ~50us penalty for each wait call. Thus extending all sleeps by a factor of more than two. The following bugfix tries to improve the system timer for Linux by using Linux native timers for small wait calls below 1ms. This has two effects. - Host wait setting has much less wait overhead - usleep wait setting produces lower CPU overhead Some numbers for host timer setting from my tests on a Pentium G5600, UHD 630 waiting in the Bomberman welcome screen. I shortened/lengthened the game timer inside the emulator to get a better picture for different wait lenghts. As you can see current implementation always produces a 50us overhead while the new implementation mostly stays below 10us. us(er), sy(stem), id(le) have been taken from vmstat during the tests. sleeps of 70usec Calls >=120us <120us <95us <80us <73us us sy id Master run 1: 1000000 708599 144933 114607 27954 3906 44 12 15 Master run 2: 1000000 707853 145802 114613 27757 3975 45 12 43 Patch run 1: 1000000 24478 37779 122771 679292 135679 46 13 41 Patch run 2: 1000000 27544 38647 120150 676306 137353 45 13 42 sleeps of 60usec Calls >=110us <110us <85us <70us <63us us sy id Master run 1: 1000000 695187 167665 107111 26767 3269 42 11 47 Master run 2: 1000000 698397 166151 106322 25889 3241 42 11 46 Patch run 1: 1000000 23266 36454 131397 651232 157650 44 12 44 Patch run 2: 1000000 27780 41361 141313 636585 152961 45 12 42 sleeps of 50usec Calls >=100us <100us <75us <60us <53us us sy id Master run 1: 1000000 690729 183766 97207 25160 3137 43 12 46 Master run 2: 1000000 689518 184570 97716 25131 3065 42 11 47 Patch run 1: 1000000 21068 34504 124814 646399 173214 45 13 42 Patch run 2: 1000000 22531 36852 130585 638397 171635 44 12 44 sleeps of 40usec Calls >=90us <90us <65us <50us <43us us sy id Master run 1: 1000000 688084 176572 111680 20357 3306 45 12 44 Master run 2: 1000000 687553 177216 111599 20409 3223 46 12 42 Patch run 1: 1000000 18164 31248 113778 643851 192958 44 12 44 Patch run 2: 1000000 20985 34841 120508 633031 190635 45 12 43 sleeps of 30usec Calls >=80us <80us <55us <40us <33us us sy id Master run 1: 1000000 721705 205084 60793 12060 357 44 12 45 Master run 2: 1000000 720323 205960 61524 11884 309 43 11 46 Patch run 1: 1000000 15139 16863 101604 629094 227299 44 12 44 Patch run 2: 1000000 18560 30207 110159 617093 223981 45 12 43 sleeps of 20usec Calls >=70us <70us <45us <30us <23us us sy id Master run 1: 1000000 813648 144746 36458 5111 36 43 12 45 Master run 2: 1000000 813322 144917 36618 5097 46 45 12 43 Patch run 1: 1000000 14073 23076 83921 635412 243517 45 13 42 Patch run 2: 1000000 13769 23460 86245 632826 243700 44 13 43 sleeps of 10usec Calls >=60us <60us <35us <20us <13us us sy id Master run 1: 1000000 864216 101101 29002 5651 29 43 12 45 Master run 2: 1000000 864896 100595 28941 5550 18 42 11 47 Patch run 1: 1000000 7613 13301 52335 640861 285889 46 13 41 Patch run 2: 1000000 7223 13280 52123 644643 282731 47 13 40 Comparison between host and usleep setting for game defaults of 30us waits fps us sy id Mater run host : 53 43 11 46 Patch run host : 52 44 12 44 Mater run usleep: 49 51 18 31 Patch run usleep: 51 48 15 37
got it somehow ... |
Utilities/Thread.h
Outdated
@@ -118,6 +118,11 @@ class thread_base | |||
using native_entry = void*(*)(void* arg); | |||
#endif | |||
|
|||
#ifdef __linux__ | |||
// Linux thread timer | |||
int m_timer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initialize as -1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied.
Utilities/Thread.cpp
Outdated
m_timer = timerfd_create(CLOCK_MONOTONIC, 0); | ||
if (m_timer != -1) | ||
{ | ||
LOG_SUCCESS(GENERAL, "allocated high precision Linux timer"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove log on success
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied
Implement Nekotekinas requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
* Use Linux timers for sleeps up to 1ms (v3) The current sleep timer implementation basically offers two variants. Either wait the specified time exactly with a condition variable (as host) or use a combination of it with a thread yielding busy loop afterwards (usleep timer). While the second one is very precise it consumes CPU loops for each wait call below 50us. Games like Bomberman Ultra spam 30us waits and the emulator hogs low power CPUs. Switching to host mode reduces CPU consumption but gives a ~50us penalty for each wait call. Thus extending all sleeps by a factor of more than two. The following bugfix tries to improve the system timer for Linux by using Linux native timers for small wait calls below 1ms. This has two effects. - Host wait setting has much less wait overhead - usleep wait setting produces lower CPU overhead
This patch is for review/discussion. I do not know if I got everything in the
right place.
The current sleep timer implementation basically offers two variants. Either
wait the specified time exactly with a condition variable (as host) or use a
combination of it with a thread yielding busy loop afterwards (usleep timer).
While the second one is very precise it consumes CPU loops for each wait call
below 50us. Games like Bomberman Ultra spam 30us waits and the emulator hogs
low power CPUs. Switching to host mode reduces CPU consumption but gives a
~50us penalty for each wait call. Thus extending all sleeps by a factor of
more than two.
The following bugfix tries to improve the system timer for Linux by using
Linux native timers for small wait calls below 1ms. This has two effects.
Some numbers for host timer setting from my tests on a Pentium G5600, UHD
630 waiting in the Bomberman welcome screen. I shortened/lengthened the
game timer inside the emulator to get a better picture for different wait
lenghts. As you can see current implementation always produces a 50us
overhead while the new implementation mostly stays below 10us. us(er),
sy(stem), id(le) have been taken from vmstat during the tests.
Comparison between host and usleep setting for game defaults of 30us waits