New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use NET_Sleep() (or Sys_Sleep in SP) to avoid busy-waiting #695
Conversation
This works as expected on my Windows computer! However, testing on OS X, I'm not finding any difference between using
I prefer having the extra cvars around and not setting them for the dedicated server. But I don't see these changes in your PR. Was this intentional? |
Odd... I would have expected OS X to behave essentially the same as Linux here, it's using the same source code after all. How new is your Mac? If you turn com_maxfps down to some stupidly low value (I used 5fps) do you start to see a difference? OS X is the platform I can't currently test on - I have an old Mac which can just about manage Q3-engine games, but it's currently Linux-only. I'll see if I can find OS X installation media. Hopefully a self-compiled SDL and OpenJK will work on something as ancient as 10.2.
|
It seems to work well for me on Windows. It would be nice if The changed code regarding network packet handling makes me wary of regression, but that's me being paranoid I guess. The new |
Would really like to see this get tested and working if someone can check all OS' @dionrhys it doesn't have to retain the same logic because I don't see why Sleep should be necessary if its throttling while minimized anyway? @smcv are the stdin checks in NET_Sleep still necessary in this? They don't exist in ioq3 and is handled in CON_Input functions. |
There seems to also be some discrepencies in the stdin code even without this pull request. stdin_active is defined in sys_unix.cpp which is what is extern'd in net_ip but its never assigned to anything beyond the initial qtrue. There is a file local stdin_active in con_tty.cpp which is used in that file however no default value is set (presumably defaults to qfalse 0 because its a global) |
The patch has conflicts with current master, but they look easy to resolve; I'll try to re-test sometime soon. I can test Linux and (with some difficulty) Windows, but not OS X. If you're particularly concerned we could make
I think we should either leave them at 0 by default (so unfocused and minimized have the same 125fps framerate cap as normal gameplay), or set something actually noticeable. If retail JKA's logic was essentially
then that doesn't actually alter the framerate much. Suppose your computer could produce f fps without the sleep. With the sleep, it would produce f' = 1/(0.005 + 1/f) fps instead. This doesn't actually make a whole lot of difference, unless your framerate is already so high that it's rendering wasted frames when outputting to a typical 60Hz display:
It's as similar as I could get it to ioquake3 - that seemed more likely to be good than trying to stay as close as possible to late 90s Quake 3 engine technology :-)
You're right that 0 makes more logical sense, but this is how I found it (in ioquake3). I'm not sure valuable it is to treat simulated packet loss as cheating; if it was a viable way to cheat, people could equally easily do enough irrelevant traffic in parallel to cause non-simulated packet loss :-)
Indeed, well spotted. I think the intention may have been that all the uses of stdin_active were the same variable, which would be set true if standard input was not a terminal (a pipe or socket or something). If that had been done correctly, then the practical effect of monitoring stdin in NET_Sleep would be that any pending input on a non-interactive standard input would wake the engine up to process it during the idle time between frames, instead of remaining asleep until either there's new network data or it's time to start work on the next frame. I don't think that actually matters in practice: the input isn't lost, gets queued up in the kernel or in the sending process instead of in the Q3 engine, and is processed "soon". If we're running at, say, 10fps or better, then stdin input wouldn't have to wait more than 100ms to be processed, which seems fine. Entering text into the server console is rarely time-critical :-) In ioquake3, it seems to have worked the way I think it was intended to in the original id Software code-drop, then became the same Since e46fe244 "rewrite of the win32 dedicated console" in 2007, the |
Also in ioquake3, Sys_Sleep is never called except in cases of no mapname during SV_Frame. |
I can test on mac and Windows. |
I can test on Mac and Linux :p |
Right, but ioquake3 is always network-capable, whereas JASP/JK2SP is single-player-only. If you'd prefer me to link the MP netcode into SP, and replace the call to Sys_Sleep() in SP by a call to NET_Sleep() with no socket to monitor, that would work - it would effectively be emulating |
My mistake, Sys_Sleep is still used in main() in our codebase but not in ioquake3 if dedicated or minimized. |
Ah, I think I might understand what you meant by this now: the Windows |
These are not mentioned in any of the CMakefiles, so they can't be compiled by any supported configuration.
Backported and simplified from an assortment of ioquake3 commits, mostly by Thilo Schulz, with contributions from Zack Middleton and Özkan Sezer. The single-player engines don't have any netcode, and in particular no NET_Sleep(); if they had a straightforward port of NET_Sleep(), it would be roughly equivalent to Sys_Sleep() anyway, since they don't have anything to receive from the network. As a result, I've used Sys_Sleep() instead of NET_Sleep() there. This makes the hard-coded 5ms Sys_Sleep() calls in main() (and the 50ms equivalents in a couple of unused Windows equivalents) unnecessary, so I've made them conditional on this logic being disabled (com_busyWait 1). The practical effect of those 5ms Sys_Sleep() calls in terms of framerate-limiting varied according to the framerate your computer would be capable of without the limit, and because of the way this framerate limit works, it's best if the limit is an integer divisor of 1000. I've arbitrarily chosen com_maxfpsMinimized to be 50fps, which is what would have happened on a PC capable of slightly less than 60fps in the old implementation. Fixes JACoders#507. Signed-off-by: Simon McVittie <smcv@debian.org>
This was only used in the old NET_Sleep() implementation, which I've just removed. Signed-off-by: Simon McVittie <smcv@debian.org>
Branch updated and seems to work as intended on Windows (10, cross-compiling with mingw on Linux) and Linux. @mrwonko, @Razish: please test on Mac? (My branch at https://github.com/smcv/OpenJK/tree/sleep%2Btravis is what I actually tested, because I can't compile on mingw without also fixing #741, but the one proposed here just has the sleep-related commits.)
As described in my previous comment, the effect of a 5ms sleep per frame varies according to what your uncapped framerate would have been, so there is no value for com_maxfpsWhatever that can have exactly this effect for everyone? I've arbitrarily chosen to cap at 50fps, which is approximately what the result would be if your uncapped framerate was a little less than 60fps. If you don't like 50 as a default, it's probably best to choose a value that is a divisor of 1000: 10, 20 or 25 might make sense.
Well spotted. I've removed this one from both sys_unix and net_ip. (I think it would be a good goal to remove all
I've left this one as-is, it seemed more or less correct. Static variables (file-local) are implicitly initialized to zero by the compiler/linker.
I've left it in the shared The Windows-specific copies of |
Perfect timing, I will test this and review the Clang breakage too tomorrow. |
I have confirmed So that's Windows, Linux and Mac confirmed working? If so, I am very happy to merge this :) |
With busy wait 0? |
With all permutations of the above cvars, working as intended. |
The fps cap works as intended? |
Yep. |
And on Windows? |
Haven't tested, I don't really have time/effort at home to boot into Windows and test there lately. |
Windows is where the issue was afaik. When I was initially working on this that was my issue on Windows. |
Sorry, what is "that" here? I've lost track of which configuration you've had trouble with. |
Having the new behavior has issues long ago. |
Sorry, I can't test for "issues". Please could you be more specific? If the test steps from my first comment aren't sufficient to cover the situations that you're concerned about, please describe the configurations/situations that have had problems in the past, and what those problems were. |
The FPS cap was no longer being affected (com_maxFPS) with the new behavior (busyWait 0) |
Just to be sure, I've re-tested all combinations of:
with the multiplayer engine, and it still seems to work: the framerate is approximately at the cap, when I reduce the cap to 10fps the animation is visibly bad (as I'd expect), and with (With a 60fps cap I actually get 62-63fps, which is because the engine converts frames per second into milliseconds per frame, and 1000 isn't evenly divisible by 60.) I've also re-tested the single-player engine with |
Then there's no issues anymore. |
Hooray! |
Use NET_Sleep() (or Sys_Sleep in SP) to avoid busy-waiting
@dionrhys proposes we back out of this until the CPU usage regression in servers is fixed. |
This was removed while introducing com_busyWait (PR JACoders#695). Before merging PR JACoders#695, client behaviour was equivalent to com_busyWait 1, while dedicated server behaviour was broadly similar to com_busyWait 0 (run when there is network I/O, or as much as is necessary for sv_fps). However, dedicated servers also had a hard-coded 5ms sleep before each frame during which they did nothing, not even processing network I/O, resulting in the dedicated server never using as much as a full CPU core per process even if the operating system scheduler would have allowed it. That sleep was not reflected in the new code path for com_busyWait 0. This commit makes the sleep time configurable via a new cvar. com_dedicated_sleep 0 is the same as PR JACoders#695 and ioquake3 (best performance), while com_dedicated_sleep 5 is the same as historical OpenJK behaviour (reduced performance, increased latency, ensures that CPU time is made available to other processes). Larger values result in greater reductions to both CPU usage and performance.
Sorry, I didn't realise the original method had explicit Sleeps of 5 milliseconds. This would explain the "regression" 😉 |
@dionrhys, @ensiform: are you sufficiently happy with my explanation on #823 to not want to back this out any more? The thing I primarily wanted to fix is that before merging #695, on a client system capable of rendering at more than My preference order for our options for how to deal with this goes something like:
I would appreciate it if people who are concerned about this could try smcv/OpenJK@e1ee901 with |
I think this is suitable as-is right now. If someone comes along and finds any optimisation whatsoever in this implementation in either ioq3 or OpenJK, we can be sure to integrate that in :) |
I have a bright idea - how about we profile the whole code and optimise it to reduce CPU usage :P |
Backported and simplified from an assortment of ioquake3 commits,
mostly by Thilo Schulz, with contributions from Zack Middleton and
Özkan Sezer.
Fixes #507.
I've finally assembled enough of a build system (mingw-w64, #688) and test system (a spare laptop) to test this on Windows as well as Linux. It seems to work fine either way:
top
on Linuxcg_drawfps 1
com_busywait
com_maxfps
set com_maxfps 5
set com_maxfps 125
set com_busywait 1
set com_maxfps 5
set com_maxfps 125
com_maxfps
is 125, defaultcom_busywait
is 0com_maxfps 5
, framerate and CPU usage are noticeably lowercom_maxfps 125
, framerate and CPU usage both go back upcom_busywait 1
, CPU usage is close to 100% of 1 threadcom_busywait 0
,com_maxfps 5
=> 4-5 fps, 8-12% of 1 corecom_busywait 0
,com_maxfps 30
=> 30 fps, 40% of 1 corecom_busywait 0
,com_maxfps 125
=> 80-85 fps, 100% of 1 core (default settings)com_busywait 1
,com_maxfps 5
=> 4-5 fps, 100% of 1 corecom_busywait 1
,com_maxfps 30
=> 30 fps, 100% of 1 corecom_busywait 1
,com_maxfps 125
=> 80-85 fps , 100% of 1 core(The X220 has a better CPU and graphics, and only needs ~70% of a core with the default settings.)
I don't have OS X, but OS X uses basically the same code paths as Linux, so I expect it to work there.
Summarizing negative comments on this change from #507:
com_busywait
to 1 on Windows and 0 elsewhere.