Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use NET_Sleep() (or Sys_Sleep in SP) to avoid busy-waiting #695

Merged
merged 3 commits into from Mar 20, 2016

Conversation

smcv
Copy link
Contributor

@smcv smcv commented Aug 4, 2015

Backported and simplified from an assortment of ioquake3 commits,
mostly by Thilo Schulz, with contributions from Zack Middleton and
Özkan Sezer.

Fixes #507.


I've finally assembled enough of a build system (mingw-w64, #688) and test system (a spare laptop) to test this on Windows as well as Linux. It seems to work fine either way:

  • test systems: Debian Linux on Lenovo X220, Windows 7 on Lenovo X201 (both 2 cores × 2 threads, Intel integrated graphics)
  • steps to test:
    • watch CPU load with Task Manager on Windows or top on Linux
    • start JA SP (I used the autosave at the beginning of yavin1b) or MP (I used "Create Game" with no bots on Tatooine)
    • navigate to a location where something is animated (I used the river in yavin1b and a spinning Holocron in MP)
    • cg_drawfps 1
    • com_busywait
    • com_maxfps
    • set com_maxfps 5
    • set com_maxfps 125
    • set com_busywait 1
    • set com_maxfps 5
    • set com_maxfps 125
  • expected results:
    • default com_maxfps is 125, default com_busywait is 0
    • with com_maxfps 5, framerate and CPU usage are noticeably lower
    • with com_maxfps 125, framerate and CPU usage both go back up
    • with com_busywait 1, CPU usage is close to 100% of 1 thread
  • actual results: as expected. Representative numbers from Windows on my X201:
    • com_busywait 0, com_maxfps 5 => 4-5 fps, 8-12% of 1 core
    • com_busywait 0, com_maxfps 30 => 30 fps, 40% of 1 core
    • com_busywait 0, com_maxfps 125 => 80-85 fps, 100% of 1 core (default settings)
    • com_busywait 1, com_maxfps 5 => 4-5 fps, 100% of 1 core
    • com_busywait 1, com_maxfps 30 => 30 fps, 100% of 1 core
    • com_busywait 1, com_maxfps 125 => 80-85 fps , 100% of 1 core

(The X220 has a better CPU and graphics, and only needs ~70% of a core with the default settings.)

I don't have OS X, but OS X uses basically the same code paths as Linux, so I expect it to work there.


Summarizing negative comments on this change from #507:

@xycaleth
Copy link
Member

xycaleth commented Aug 9, 2015

This works as expected on my Windows computer! However, testing on OS X, I'm not finding any difference between using com_busyWait and not using it.

@ensiform pointed out that this patch results in com_minimized and com_unfocused existing in the dedicated server (with their values always 0). This matches what ioquake3 does, and I think it makes the code more maintainable; but if people have a strong enough objection to this to consider it a merge blocker, I can sprinkle #ifdefs around it.

I prefer having the extra cvars around and not setting them for the dedicated server. But I don't see these changes in your PR. Was this intentional?

@smcv
Copy link
Contributor Author

smcv commented Aug 9, 2015

However, testing on OS X, I'm not finding any difference between using com_busyWait and not using it.

Odd... I would have expected OS X to behave essentially the same as Linux here, it's using the same source code after all. How new is your Mac? If you turn com_maxfps down to some stupidly low value (I used 5fps) do you start to see a difference?

OS X is the platform I can't currently test on - I have an old Mac which can just about manage Q3-engine games, but it's currently Linux-only. I'll see if I can find OS X installation media. Hopefully a self-compiled SDL and OpenJK will work on something as ancient as 10.2.

I prefer having the extra cvars around and not setting them for the dedicated server. But I don't see these changes in your PR.

@dionrhys
Copy link
Member

dionrhys commented Sep 4, 2015

It seems to work well for me on Windows. It would be nice if com_maxfpsUnfocused and com_maxfpsMinimized matched the retail JKA equivalent of 5ms sleeps at the main loop level by default. Though fixed sleeps per frame isn't the same as limiting the max FPS, so it would need different logic. (CPU usage would drop dead when the game isn't in focus then)

The changed code regarding network packet handling makes me wary of regression, but that's me being paranoid I guess. The new net_dropsim cvar is initialized differently to how it was before though; default value "" and CVAR_TEMP instead of retail JKA default value "0" and CVAR_CHEAT (which made a bit more sense).

@ensiform
Copy link
Member

Would really like to see this get tested and working if someone can check all OS'

@dionrhys it doesn't have to retain the same logic because I don't see why Sleep should be necessary if its throttling while minimized anyway?

@smcv are the stdin checks in NET_Sleep still necessary in this? They don't exist in ioq3 and is handled in CON_Input functions.

@ensiform
Copy link
Member

There seems to also be some discrepencies in the stdin code even without this pull request. stdin_active is defined in sys_unix.cpp which is what is extern'd in net_ip but its never assigned to anything beyond the initial qtrue. There is a file local stdin_active in con_tty.cpp which is used in that file however no default value is set (presumably defaults to qfalse 0 because its a global)

@smcv
Copy link
Contributor Author

smcv commented Jan 23, 2016

Would really like to see this get tested and working if someone can check all OS'

The patch has conflicts with current master, but they look easy to resolve; I'll try to re-test sometime soon. I can test Linux and (with some difficulty) Windows, but not OS X. If you're particularly concerned we could make com_busywait default to 1 (equivalent to the old behaviour) on OS X.

It would be nice if com_maxfpsUnfocused and com_maxfpsMinimized matched the retail JKA equivalent of 5ms sleeps at the main loop level by default.

I think we should either leave them at 0 by default (so unfocused and minimized have the same 125fps framerate cap as normal gameplay), or set something actually noticeable.

If retail JKA's logic was essentially

    while (1) {
        render 1 frame, however long that takes
        sleep 5ms
    }

then that doesn't actually alter the framerate much. Suppose your computer could produce f fps without the sleep. With the sleep, it would produce f' = 1/(0.005 + 1/f) fps instead. This doesn't actually make a whole lot of difference, unless your framerate is already so high that it's rendering wasted frames when outputting to a typical 60Hz display:

f    | f'
20  | 18.2
30  | 26.1
50  | 40.0
60  | 46.2
100 | 66.7
200 | 100

The changed code regarding network packet handling makes me wary of regression

It's as similar as I could get it to ioquake3 - that seemed more likely to be good than trying to stay as close as possible to late 90s Quake 3 engine technology :-)

The new net_dropsim cvar is initialized differently to how it was before though; default value "" and CVAR_TEMP instead of retail JKA default value "0" and CVAR_CHEAT (which made a bit more sense).

You're right that 0 makes more logical sense, but this is how I found it (in ioquake3). "" is effectively interpreted as 0 anyway, when being parsed as a float (it's put through atof() which has no error-detection, and just returns 0 for non-numbers).

I'm not sure valuable it is to treat simulated packet loss as cheating; if it was a viable way to cheat, people could equally easily do enough irrelevant traffic in parallel to cause non-simulated packet loss :-)

There seems to also be some discrepencies in the stdin code even without this pull request

Indeed, well spotted. I think the intention may have been that all the uses of stdin_active were the same variable, which would be set true if standard input was not a terminal (a pipe or socket or something). If that had been done correctly, then the practical effect of monitoring stdin in NET_Sleep would be that any pending input on a non-interactive standard input would wake the engine up to process it during the idle time between frames, instead of remaining asleep until either there's new network data or it's time to start work on the next frame.

I don't think that actually matters in practice: the input isn't lost, gets queued up in the kernel or in the sending process instead of in the Q3 engine, and is processed "soon". If we're running at, say, 10fps or better, then stdin input wouldn't have to wait more than 100ms to be processed, which seems fine. Entering text into the server console is rarely time-critical :-)

In ioquake3, it seems to have worked the way I think it was intended to in the original id Software code-drop, then became the same #ifdef as in current OpenJK at some later point, then started to always monitor stdin in 672cfbf1 "Merge unified-sdl to trunk" (unfortunately the full history has been lost by the svn-to-git conversion). The same merge seems to have broken the intended functionality of stdin_active by making the version of it in the console code static.

Since e46fe244 "rewrite of the win32 dedicated console" in 2007, the #ifdef was replaced by just never monitoring stdin. I think we can probably conclude from the lack of a fix in the following 8 years that it was never actually very important, so you're probably right that I should drop that part.

@ensiform
Copy link
Member

Also in ioquake3, Sys_Sleep is never called except in cases of no mapname during SV_Frame.

@mrwonko
Copy link
Contributor

mrwonko commented Jan 23, 2016

I can test on mac and Windows.

@Razish
Copy link
Member

Razish commented Jan 23, 2016

I can test on Mac and Linux :p

@smcv
Copy link
Contributor Author

smcv commented Jan 23, 2016

Also in ioquake3, Sys_Sleep is never called except in cases of no mapname during SV_Frame.

Right, but ioquake3 is always network-capable, whereas JASP/JK2SP is single-player-only.

If you'd prefer me to link the MP netcode into SP, and replace the call to Sys_Sleep() in SP by a call to NET_Sleep() with no socket to monitor, that would work - it would effectively be emulating sleep() with a call to select() with empty fd_sets, which is a standard Unix idiom.

@ensiform
Copy link
Member

My mistake, Sys_Sleep is still used in main() in our codebase but not in ioquake3 if dedicated or minimized.

@smcv
Copy link
Contributor Author

smcv commented Mar 12, 2016

Sys_Sleep is still used in main() in our codebase

Ah, I think I might understand what you meant by this now: the Windows main() still has the hard-coded Sleep( 5 ) and/or Sleep( 50 ), even though my changes in this PR should make it unnecessary. I'll remove that.

These are not mentioned in any of the CMakefiles, so they can't be
compiled by any supported configuration.
Backported and simplified from an assortment of ioquake3 commits,
mostly by Thilo Schulz, with contributions from Zack Middleton and
Özkan Sezer.

The single-player engines don't have any netcode, and in particular
no NET_Sleep(); if they had a straightforward port of NET_Sleep(),
it would be roughly equivalent to Sys_Sleep() anyway, since they
don't have anything to receive from the network. As a result, I've
used Sys_Sleep() instead of NET_Sleep() there.

This makes the hard-coded 5ms Sys_Sleep() calls in main()
(and the 50ms equivalents in a couple of unused Windows equivalents)
unnecessary, so I've made them conditional on this logic being
disabled (com_busyWait 1).

The practical effect of those 5ms Sys_Sleep() calls in terms of
framerate-limiting varied according to the framerate your computer
would be capable of without the limit, and because of the way this
framerate limit works, it's best if the limit is an integer divisor of
1000. I've arbitrarily chosen com_maxfpsMinimized to be 50fps, which is
what would have happened on a PC capable of slightly less than 60fps
in the old implementation.

Fixes JACoders#507.

Signed-off-by: Simon McVittie <smcv@debian.org>
This was only used in the old NET_Sleep() implementation, which I've
just removed.

Signed-off-by: Simon McVittie <smcv@debian.org>
@smcv
Copy link
Contributor Author

smcv commented Mar 13, 2016

Branch updated and seems to work as intended on Windows (10, cross-compiling with mingw on Linux) and Linux. @mrwonko, @Razish: please test on Mac?

(My branch at https://github.com/smcv/OpenJK/tree/sleep%2Btravis is what I actually tested, because I can't compile on mingw without also fixing #741, but the one proposed here just has the sleep-related commits.)

It would be nice if com_maxfpsUnfocused and com_maxfpsMinimized matched the retail JKA equivalent of 5ms sleeps at the main loop level by default. Though fixed sleeps per frame isn't the same as limiting the max FPS, so it would need different logic. (CPU usage would drop dead when the game isn't in focus then)

As described in my previous comment, the effect of a 5ms sleep per frame varies according to what your uncapped framerate would have been, so there is no value for com_maxfpsWhatever that can have exactly this effect for everyone? I've arbitrarily chosen to cap at 50fps, which is approximately what the result would be if your uncapped framerate was a little less than 60fps. If you don't like 50 as a default, it's probably best to choose a value that is a divisor of 1000: 10, 20 or 25 might make sense.

stdin_active is defined in sys_unix.cpp which is what is extern'd in net_ip but its never assigned to anything beyond the initial qtrue.

Well spotted. I've removed this one from both sys_unix and net_ip.

(I think it would be a good goal to remove all extern declarations from *.c, *.cpp: if something in a different module is API then it should be in a header file, and if it isn't API it shouldn't be used via explicit extern.)

There is a file local stdin_active in con_tty.cpp which is used in that file however no default value is set (presumably defaults to qfalse 0 because its a global)

I've left this one as-is, it seemed more or less correct. Static variables (file-local) are implicitly initialized to zero by the compiler/linker.

the Windows main() still has the hard-coded Sleep( 5 ) and/or Sleep( 50 ), even though my changes in this PR should make it unnecessary. I'll remove that.

I've left it in the shared main(), but made it conditional on com_busyWait 1 (so you get the 5ms pauses if you are in the old-style code path, but not if you are in the framerate-capped code path). Does that make sense?

The Windows-specific copies of main() were in files that turn out not to have even been compiled, so I deleted those files to reduce confusion.

@Razish
Copy link
Member

Razish commented Mar 13, 2016

Perfect timing, I will test this and review the Clang breakage too tomorrow.
Thanks.

@Razish
Copy link
Member

Razish commented Mar 17, 2016

I have confirmed com_busyWait and com_maxFPS*** working perfectly (WRT CPU usage and reported framerate) on OS X Yosemite built with Clang (also incorporating pr/741 to fix the build)

So that's Windows, Linux and Mac confirmed working? If so, I am very happy to merge this :)

@ensiform
Copy link
Member

With busy wait 0?

@Razish
Copy link
Member

Razish commented Mar 18, 2016

With all permutations of the above cvars, working as intended.

@ensiform
Copy link
Member

The fps cap works as intended?

@Razish
Copy link
Member

Razish commented Mar 19, 2016

Yep.

@ensiform
Copy link
Member

And on Windows?

@Razish
Copy link
Member

Razish commented Mar 19, 2016

Haven't tested, I don't really have time/effort at home to boot into Windows and test there lately.

@ensiform
Copy link
Member

Windows is where the issue was afaik. When I was initially working on this that was my issue on Windows.

@smcv
Copy link
Contributor Author

smcv commented Mar 19, 2016

that was my issue on Windows

Sorry, what is "that" here? I've lost track of which configuration you've had trouble with.

@ensiform
Copy link
Member

Having the new behavior has issues long ago.

@smcv
Copy link
Contributor Author

smcv commented Mar 19, 2016

Having the new behavior has issues long ago.

Sorry, I can't test for "issues". Please could you be more specific?

If the test steps from my first comment aren't sufficient to cover the situations that you're concerned about, please describe the configurations/situations that have had problems in the past, and what those problems were.

@ensiform
Copy link
Member

The FPS cap was no longer being affected (com_maxFPS) with the new behavior (busyWait 0)

@smcv
Copy link
Contributor Author

smcv commented Mar 20, 2016

The FPS cap was no longer being affected (com_maxFPS) with the new behavior (busyWait 0)

Just to be sure, I've re-tested all combinations of:

  • com_maxfps 10, 60 or 125
  • com_busywait 0 or 1
  • Windows (32-bit OpenJK on 64-bit Windows 10, Lenovo X201) or Linux (64-bit Debian 8, Lenovo X220)

with the multiplayer engine, and it still seems to work: the framerate is approximately at the cap, when I reduce the cap to 10fps the animation is visibly bad (as I'd expect), and with com_busywait 0 the lower cap results in lower CPU use.

(With a 60fps cap I actually get 62-63fps, which is because the engine converts frames per second into milliseconds per frame, and 1000 isn't evenly divisible by 60.)

I've also re-tested the single-player engine with com_busywait 0 on Windows, and that behaves as expected at the three framerates I tried.

@ensiform
Copy link
Member

Then there's no issues anymore.

@Razish
Copy link
Member

Razish commented Mar 20, 2016

Hooray!

Razish added a commit that referenced this pull request Mar 20, 2016
Use NET_Sleep() (or Sys_Sleep in SP) to avoid busy-waiting
@Razish Razish merged commit e8453d7 into JACoders:master Mar 20, 2016
@smcv smcv deleted the sleepy branch March 21, 2016 00:11
@ensiform
Copy link
Member

ensiform commented Apr 2, 2016

@dionrhys proposes we back out of this until the CPU usage regression in servers is fixed.

smcv added a commit to smcv/OpenJK that referenced this pull request Apr 5, 2016
This was removed while introducing com_busyWait (PR JACoders#695). Before
merging PR JACoders#695, client behaviour was equivalent to com_busyWait 1,
while dedicated server behaviour was broadly similar to com_busyWait 0
(run when there is network I/O, or as much as is necessary for sv_fps).
However, dedicated servers also had a hard-coded 5ms sleep before each
frame during which they did nothing, not even processing network I/O,
resulting in the dedicated server never using as much as a full CPU
core per process even if the operating system scheduler would have
allowed it. That sleep was not reflected in the new code path for
com_busyWait 0.

This commit makes the sleep time configurable via a new cvar.
com_dedicated_sleep 0 is the same as PR JACoders#695 and ioquake3 (best
performance), while com_dedicated_sleep 5 is the same as historical
OpenJK behaviour (reduced performance, increased latency, ensures
that CPU time is made available to other processes). Larger values
result in greater reductions to both CPU usage and performance.
@dionrhys
Copy link
Member

dionrhys commented Apr 7, 2016

Sorry, I didn't realise the original method had explicit Sleeps of 5 milliseconds. This would explain the "regression" 😉

@smcv
Copy link
Contributor Author

smcv commented Apr 11, 2016

@dionrhys, @ensiform: are you sufficiently happy with my explanation on #823 to not want to back this out any more?

The thing I primarily wanted to fix is that before merging #695, on a client system capable of rendering at more than cl_maxfps, OpenJK would push the CPU usage right up to 100% of a core by busy-waiting. If you do that for too long on a laptop, you'll tend to run into CPU throttling where the firmware forces the clock speed right down (to avoid hardware damage or user injury), resulting in performance abruptly dropping right down - the opposite of what we wanted. A framerate cap is a sensible way to avoid that.

My preference order for our options for how to deal with this goes something like:

I would appreciate it if people who are concerned about this could try smcv/OpenJK@e1ee901 with com_dedicated_sleep set to 5, as described on #823.

@dionrhys
Copy link
Member

I think this is suitable as-is right now. If someone comes along and finds any optimisation whatsoever in this implementation in either ioq3 or OpenJK, we can be sure to integrate that in :)

@Razish
Copy link
Member

Razish commented Apr 11, 2016

I have a bright idea - how about we profile the whole code and optimise it to reduce CPU usage :P

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

100% CPU usage on Linux / need proper Sleep
6 participants