Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU usage on Mac #228

Closed
singalen opened this issue Sep 12, 2021 · 28 comments
Closed

High CPU usage on Mac #228

singalen opened this issue Sep 12, 2021 · 28 comments
Assignees
Labels
bug Something isn't working

Comments

@singalen
Copy link
Contributor

Looks like on Mac my game is running the main loop as fast as it can.
What can I do about it?
How can I help to diagnose it?
Ideally, it should only redraw in response to keyboard and, optionally, mouse events.

@singalen
Copy link
Contributor Author

I've used this code from BTerm examples:

let mut input = INPUT.lock();
if let Some(ev) = input.pop() ...

but thing is, I don't want to consume the event. I want to let BTerm keep track of the keys, like shift and other state like mouse_pos.
Currently, there is no way to check if an event is available. pop() and for_each_message() consume events.

Can we please have peek() -> Option<BEvent> that would return the clone of the current event?
Or for_each_message_ref(&BEvent)?

Thank you!

@singalen
Copy link
Contributor Author

Which is probably even more important – I don't see a way to put the process to sleep while waiting for the input event in input_hander.rs.
Looks like it's bound to consume as much CPU as it can?

@BiosElement
Copy link

BiosElement commented Oct 15, 2021

Hey @singalen, while we wait for @thebracket to take a look at this, I tried fix which, while not polished, isn't that awful. https://github.com/BiosElement/bracket-lib/commit/e8cd34112ade911bd6d1d460a209301b8e660e67
Currently the fps cap is hard-coded and the exit event doesn't fire correctly, but I think is just a matter of handling it properly in my game.

Basically just using winit's waituntil instead of polling, which means the sleep which adds input latency is removed. CPU usage and input latency both seem reasonable now.

@thebracket
Copy link
Collaborator

The approach in BiosElement's file is really good. I didn't know winit had a WaitUntil function! (I wonder if that was added after I wrote the initial loop?). I'm currently testing a fix based on this approach. I'll push a branch for this issue when I've finished testing locally; I don't have easy access to a Mac to check it with.

@thebracket
Copy link
Collaborator

thebracket commented Oct 16, 2021

On Windows, I'm getting some odd results - WaitUntil waits properly, unless an input event fires - and then the program zooms at full speed. So you can chug along at a limited FPS, and speed up by wiggling the mouse. Will keep debugging, but it may be a bit longer before the branch arrives. Seems to be related to rust-windowing/winit#1610

thebracket added a commit that referenced this issue Oct 16, 2021
…. Then change control flow to WaitUntil, pausing until the next frame - OR waking up when a new event arrives. #228
@thebracket
Copy link
Collaborator

There's a potential fix in the branch https://github.com/amethyst/bracket-lib/tree/issue-228-high-cpu
I ended up combining a few approaches:

  • The "redraw" event now checks to see if sufficient time has elapsed to be worth running the game logic and drawing the consoles. So resize events (which rapid-fire when you drag a window around) won't cause the redrawing to happen - but will still recalculate terminal sizes when needed. Hopefully, that part isn't too slow.
  • If the game isn't exiting, it sets control_flow to WaitUntil with a pause (unless it's going slow). On Windows, this only does something useful if you aren't doing anything that causes other events to fire - that seems to be a known issue with winit. On my test Linux setup, it made the loop pause until the time had elapsed.

I could use some testing on this, if you don't mind.

@BiosElement
Copy link

I'll be looking at this later today, but in response to the 'zooming' @thebracket , this is by design I believe. WaitUntil 'sleeps' until a certain time OR until an event triggers. So this could be considered 'good' in terms of responsiveness since I'd want the game to be responsive, but within that event trigger there should probably be tickrate limiting.

I did notice it capped at my Desktop's vsync, so this isn't the worst behavior, however things like animations would suffer from this rapidly changing fps.

@thebracket
Copy link
Collaborator

That's why I went with capping the render/update FPS and WaitUntil - it should still respond to messages rapidly, but without suddenly accelerating your game. It seems to work ok on a few tests I ran, but I imagine it'll need some tweaking.

@BiosElement
Copy link

There's a potential fix in the branch https://github.com/amethyst/bracket-lib/tree/issue-228-high-cpu I ended up combining a few approaches:

I could use some testing on this, if you don't mind.

I think your solution is a bit more elegant than what I came up with, however I am getting the following: thread 'main' panicked at 'attempt to subtract with overflow', C:\Users\William\Documents\Projects\bracket-lib-228\bracket-terminal\src\hal\native\mainloop.rs:136:129

Adding some debug prints, it's trying to subtract wait_time: 33 from next_tick: 83 and failing in spectacular fashion. Since there is already a check for next_tick >= wait_time, swapping the order resolves this panic.

From a few minutes of testing, the FPS is remarkably stable, however it misses almost every keyboard input using the keyboard example and my own project.

That's why I went with capping the render/update FPS and WaitUntil - it should still respond to messages rapidly, but without suddenly accelerating your game. It seems to work ok on a few tests I ran, but I imagine it'll need some tweaking.

Indeed that makes sense, apologies if I was stating the obvious! :)

@BiosElement
Copy link

Looking more closely, it looks like TICK_TYPE is still being set to Poll which only gets overridden with WaitUntil on the 'requested' Poll timeframe. This explains why the input is so delayed. I removed the control flow declaration on like line 181 entirely and that solved the input latency, however it looks like the next_tick variable is never getting reset meaning it gets slower and slower until a few frames in, nothing ticks without input.

@singalen
Copy link
Contributor Author

singalen commented Oct 18, 2021

Trying the latest issue-228-high-cpu. So far, CPU is still over 100%.
ControlFlow::WaitUntil does get assigned, according to my debugger.
If I add .with_fps_cap(30.0) to main console builder, then only a few first keystrokes gets processed after the game start, and then it freezes, with a high CPU.
I'm trying to debug the change, but failing so far. I'll write a few questions on the commit: 60a127b

@BiosElement
Copy link

After a bit of additional research, it also looks like Winit expects OnDrawRequested to be used for static GUI's while MainEventsCleared is intended for games that are redrawing regularly. There has been some discussion with Bevy and that's the path they are leaning towards anyway.

@thebracket
Copy link
Collaborator

I just pushed a couple of commits that work really well on my Windows system (I'll get to Linux later today - have publisher meetings). There are two major changes:

  • RedrawRequested doesn't do anything, now.
  • MainEventsCleared holds the game logic.
  • Moving and resizing set a variable to Some(details) - which is applied (and then cleared) when the game logic runs. So resizing isn't quite as fluid - because its locked the to the game's framerate - but it also doesn't get to eat much CPU time.

On Windows at least, I see CPU usage in the task manager remain constant while I zoom the window around or resize it like crazy.

@singalen
Copy link
Contributor Author

On Mac, hello_terminal from the branch still runs at 93% CPU 8(

@thebracket
Copy link
Collaborator

I just pushed a commit that significantly changes the winit event loop. Testing is very positive on my Windows setup, and input lag seems improved.

  • WindowEvent checks the window's ID, because you sometimes get events for other windows - which you can drop. This appears to be more prevalent on some platforms than others.
  • clear_input_state is now only called at the end of a frame. I'm really not sure why it was in NewEvents, but that seems to have been killing performance.
  • The various Moved, Resized and so-on store the new value and don't apply it until the next frame tick.
  • The frame tick is attached to MainEventsCleared. It checks to see if the wait_time for a frame has elapsed before doing anything. It then starts a new instant and checks the time after the render/logic is complete. If there is remaining time before the wait_time, it switches the control flow mode to WaitUntil with the desired time. On Windows, that seems to not do a lot - but double-checking the timings, I'm pretty sure it's waiting for the right time period now. If the frame took longer than intended, it switches back to Poll to ensure that it instantly starts the next frame.

If that doesn't do the trick, I'm going to have to find a Mac and a profiler.

@singalen
Copy link
Contributor Author

singalen commented Oct 21, 2021

Thank you so much for your help!
Here's a profile from hello_terminal, if it helps. I personally cannot see why the loop runs so frequently.
I added .with_fps_cap(30.0) to it.

Screen Shot 2021-10-21 at 11 14 27

  46  22399.0  hello_terminal (34275) :0
  45  22399.0  Main Thread  0x3fe788 :0
  44 libdyld.dylib 22399.0  start
  43 hello_terminal 22399.0  main
  42 hello_terminal 22399.0  std::rt::lang_start::h7d29aacc602424af /Users/vsergiienko/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:65
  41 hello_terminal 22399.0  std::rt::lang_start_internal::h0c37a46739a0311d /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/rt.rs:51
  40 hello_terminal 22399.0  std::panic::catch_unwind::h191bc002afc126a7 /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panic.rs:431
  39 hello_terminal 22399.0  std::panicking::try::ha45bc5eab1f103eb /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:343
  38 hello_terminal 22399.0  std::panicking::try::do_call::h51a4853c94b1bdea /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:379
  37 hello_terminal 22399.0  core::ops::function::impls::_$LT$impl$u20$core..ops..function..FnOnce$LT$A$GT$$u20$for$u20$$RF$F$GT$::call_once::h3b22ce68aa2879c2 /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/core/src/ops/function.rs:259
  36 hello_terminal 22399.0  std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::h5fc701173379e5e7 /Users/vsergiienko/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:66
  35 hello_terminal 22399.0  std::sys_common::backtrace::__rust_begin_short_backtrace::hc6b436e8e451fedf /Users/vsergiienko/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:125
  34 hello_terminal 22399.0  core::ops::function::FnOnce::call_once::ha3beb17d36cecbeb /Users/vsergiienko/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227
  33 hello_terminal 22399.0  hello_terminal::main::h52d6ae211d0f4e91 /Users/vsergiienko/src/amusement/bracket-lib/bracket-terminal/examples/hello_terminal.rs:106
  32 hello_terminal 22399.0  bracket_terminal::bterm::main_loop::hb257a2e5ed030ddc /Users/vsergiienko/src/amusement/bracket-lib/bracket-terminal/src/bterm.rs:1062
  31 hello_terminal 22399.0  bracket_terminal::hal::native::mainloop::main_loop::he0f41eabe10c855c /Users/vsergiienko/src/amusement/bracket-lib/bracket-terminal/src/hal/native/mainloop.rs:106
  30 hello_terminal 22399.0  winit::event_loop::EventLoop$LT$T$GT$::run::h29676a0a4cb1dd58 /Users/vsergiienko/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.24.0/src/event_loop.rs:154
  29 hello_terminal 22399.0  winit::platform_impl::platform::event_loop::EventLoop$LT$T$GT$::run::h280f4ebe2ab3700e /Users/vsergiienko/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.24.0/src/platform_impl/macos/event_loop.rs:93
  28 hello_terminal 22399.0  winit::platform_impl::platform::event_loop::EventLoop$LT$T$GT$::run_return::h65e0ecbf093392f1 /Users/vsergiienko/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.24.0/src/platform_impl/macos/event_loop.rs:106
  27 hello_terminal 22399.0  objc::message::send_message::h4d3d535e8f483d3d /Users/vsergiienko/.cargo/registry/src/github.com-1ecc6299db9ec823/objc-0.2.7/src/message/mod.rs:178
  26 hello_terminal 22399.0  objc::message::platform::send_unverified::h32b6afbb3ec1de3f /Users/vsergiienko/.cargo/registry/src/github.com-1ecc6299db9ec823/objc-0.2.7/src/message/apple/mod.rs:27
  25 hello_terminal 22399.0  _$LT$$LP$$RP$$u20$as$u20$objc..message..MessageArguments$GT$::invoke::he09246ba4dbeac19 /Users/vsergiienko/.cargo/registry/src/github.com-1ecc6299db9ec823/objc-0.2.7/src/message/mod.rs:128
  24 AppKit 22399.0  -[NSApplication run]
  23 AppKit 22399.0  -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:]
  22 AppKit 22399.0  _DPSNextEvent
  21 HIToolbox 22399.0  _BlockUntilNextEventMatchingListInModeWithFilter
  20 HIToolbox 22399.0  ReceiveNextEventCommon
  19 HIToolbox 22399.0  RunCurrentEventLoopInMode
  18 CoreFoundation 22399.0  CFRunLoopRunSpecific
  17 CoreFoundation 22367.0  __CFRunLoopRun
  16 CoreFoundation 13098.0  __CFRunLoopDoObservers
  15 CoreFoundation 12562.0  __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__
  14 hello_terminal 11622.0  winit::platform_impl::platform::observer::control_flow_end_handler::h33263b830f63805c /Users/vsergiienko/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.24.0/src/platform_impl/macos/observer.rs:135
  13 hello_terminal 11596.0  winit::platform_impl::platform::app_state::AppState::cleared::he5339b1983da3475 /Users/vsergiienko/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.24.0/src/platform_impl/macos/app_state.rs:331
  12 hello_terminal 8783.0  winit::platform_impl::platform::app_state::Handler::handle_nonuser_event::h34f85d3c91df5522 /Users/vsergiienko/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.24.0/src/platform_impl/macos/app_state.rs:173
  11 hello_terminal 8347.0  _$LT$winit..platform_impl..platform..app_state..EventLoopHandler$LT$T$GT$$u20$as$u20$winit..platform_impl..platform..app_state..EventHandler$GT$::handle_nonuser_event::hdc699a8d8b2f16f3 /Users/vsergiienko/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.24.0/src/platform_impl/macos/app_state.rs:71
  10 hello_terminal 8183.0  _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnMut$LT$Args$GT$$GT$::call_mut::h56b585e7e8707bb0 /Users/vsergiienko/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/alloc/src/boxed.rs:1553
   9 hello_terminal 8142.0  bracket_terminal::hal::native::mainloop::main_loop::_$u7b$$u7b$closure$u7d$$u7d$::h3968b3924eccdd39 /Users/vsergiienko/src/amusement/bracket-lib/bracket-terminal/src/hal/native/mainloop.rs:134
   8 hello_terminal 7539.0  bracket_terminal::hal::native::mainloop::tock::he668db854cec452c /Users/vsergiienko/src/amusement/bracket-lib/bracket-terminal/src/hal/native/mainloop.rs:295
   7 hello_terminal 7132.0  bracket_terminal::hal::gl_common::backing::shared_main_loop::rebuild_consoles::h3c083486b0360b5b /Users/vsergiienko/src/amusement/bracket-lib/bracket-terminal/src/hal/gl_common/backing/shared_main_loop.rs:70
   6 hello_terminal 7113.0  bracket_terminal::hal::gl_common::backing::simple_console_backing::SimpleConsoleBackend::rebuild_vertices::h085fd2d051e8643b /Users/vsergiienko/src/amusement/bracket-lib/bracket-terminal/src/hal/gl_common/backing/simple_console_backing.rs:136
   5 hello_terminal 5998.0  bracket_terminal::hal::gl_common::backing::simple_console_backing::SimpleConsoleBackend::push_point::h563443ab4fded306 /Users/vsergiienko/src/amusement/bracket-lib/bracket-terminal/src/hal/gl_common/backing/simple_console_backing.rs:70
   4 hello_terminal 5148.0  _$LT$alloc..vec..Vec$LT$T$C$A$GT$$u20$as$u20$core..ops..index..IndexMut$LT$I$GT$$GT$::index_mut::hef6550cf91188e38 /Users/vsergiienko/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:2393
   3 hello_terminal 3826.0  _$LT$alloc..vec..Vec$LT$T$C$A$GT$$u20$as$u20$core..ops..deref..DerefMut$GT$::deref_mut::h68ef25ce6e1a2c3c /Users/vsergiienko/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:2327
   2 hello_terminal 2198.0  alloc::vec::Vec$LT$T$C$A$GT$::as_mut_ptr::hb478dcefa0cc760e /Users/vsergiienko/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:1135
   1 hello_terminal 1223.0  core::ptr::mut_ptr::_$LT$impl$u20$$BP$mut$u20$T$GT$::is_null::h5fee71d7c4a78de8 /Users/vsergiienko/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ptr/mut_ptr.rs:39

@thebracket
Copy link
Collaborator

I'm beginning to think that winit timers are just plain broken. As a test, I set control flow to Now, and changed the event loop to only run on Event::NewEvents(StartCause::ResumeTimeReached{..}). I think printed "tick" to the console when it runs, and "tock" when sufficient time has elapsed to run the frame. It then never changes control_flow.

The result is kinda alarming:

Tick
Tick
Tick
Tick
Tick
Tick
Tick
Tick
Tick
Tick
Tick
Tock

In other words, even though I only set the timer once - it ran over and over again as fast as it could! Peeking at requested_resume on the event shows the current time, even though I never requested a resume. That seems to go against what the winit docs have to say, which guarantee that it will be later than the requested time.

thebracket added a commit that referenced this issue Oct 21, 2021
…eeded in cutting CPU usage from 12% to 4% on hello_terminal, and 18% to about 2% on dwarfmap.
@thebracket
Copy link
Collaborator

Even stranger, moving the control flow command to the top and using RedrawEventsCleared seems to honor the display. I just went with a short delay, but it dramatically cut CPU usage on my Windows box.

@singalen
Copy link
Contributor Author

singalen commented Oct 21, 2021

With the latest change, hello_terminal is down to 25% CPU, which is a great progress for Mac :)
It should be possible to write a model example that tests winit timers alone?
I'm available to run a test on Mac.

@thebracket
Copy link
Collaborator

It definitely sounds like a good idea to make a simpler testbed, but that'll have to wait a bit - I have limited time. What FPS does hello_terminal example show you? On Windows, with the delay at the full wait_time I'm seeing it fluctuate between 23 and 26 (target is 30), so it's sleeping longer than it should.

@singalen
Copy link
Contributor Author

singalen commented Oct 21, 2021

When I cap it to 30, it's similar, 21-25.
Unrestricted, it's 49-60FPS, and CPU is around 70%, which is still better than it was.

@thebracket
Copy link
Collaborator

Thank you. So it's behaving similarly - that means I can do the hard part of trying to make the numbers right. The uncapped CPU is going to be higher than it should, the library itself is quite inefficient at repeatedly redrawing the whole console. That part is on my "for the future" list.

@BiosElement
Copy link

When capped at 30, I'm seeing it around 20-21 FPS, however when rapid keystrokes or mouse movements occur, it locks at 30, sometimes 31. So there's certainly progress here... I'm also trying to tear about Winit to see if something is broken there as well.

thebracket added a commit that referenced this issue Oct 22, 2021
…s specified. Set the benchmark examples to set a higher FPS, which is what you should do if you want to go really fast.
@thebracket
Copy link
Collaborator

The latest commit takes a different approach.

My testing showed that WaitUntil with a 1ms delay was almost instant, and a 2ms delay could be as long as 30ms on Windows 10! I did some reading, and this article pointed me to the underlying issue: the Windows default timer really isn't very accurate at all. With a resolution of 16ms, it's not going to give me the kind of control required for a steady frame rate. Testing on Linux showed that it tended to be really accurate on timings, so I left the WaitUntil logic in place - I may have to conditionally compile one path on Windows and one on everything else.

So I grabbed the spin_sleep crate (it's tiny, so not too worried about dependency size) and it's dead on the money on Windows. CPU usage without sleeping is around 10% for hello_terminal (way too high), around 1.5% with a spinning sleep in the loop - and 29-30FPS for both. I tested most examples, and walking and walking_with_is_pressed are both better - the latter still being the better way to achieve low latency inputs.

I then went and set the default FPS cap to 30, and adjusted the benchmark examples to request 120 instead - so they still go really fast.

I'm hoping this will work as well on other platforms. The timer precision explains why my original code (which had a thread sleep) was so flakey.

@thebracket
Copy link
Collaborator

thebracket commented Oct 22, 2021

Happy to report that this version is also running very well in a virtual Ubuntu (what I have handy). CPU usage is way down, and frame-rate is about right (not perfect, but my VirtualBox setup on this PC is also mind-numbingly slow). My virtual hackintosh for testing is refusing to boot at all, so untested there. I really need to pickup a Mac Mini or something for testing.

@thebracket thebracket self-assigned this Oct 22, 2021
@thebracket thebracket added the bug Something isn't working label Oct 22, 2021
@BiosElement
Copy link

Apologies I've been so quiet, work has been hectic but I've been following along and doing research/testing fix ideas on my own as well with little success.

I can say the latest 228 looks to be locked at 30 for hello_terminal, but drops 1-2 fps during mouse movement/input, so nearly perfect. Also, the input_harness is locked at about 30, but also handling keyboard inputs without noticeable latency.

Finally, and almost as importantly as the input latency, CPU usage is around 2%, down from 10% previously, so great work and thank you so much for helping with this mind-bending issue @thebracket @singalen ! 👍

@thebracket
Copy link
Collaborator

Thank you! (And don't worry about being quiet; I have long periods of work keeping me busy). I'm not sure what I can about the small drop from input (downside of sleeping, the whole input thread is waiting - so it doesn't know to wake up for events and processes them all at once when tick-time arrives) - so I'll call this good enough for now and start getting it merged into the mainline. I have quite a few other issues I want to look at before rolling out the next crate version.

Thanks for all your help, glad we've got the CPU under control!

@thebracket
Copy link
Collaborator

Ok, this is merged and passing all the tests I threw at it. Most importantly, the Hands-on Rust and roguelike tutorials work with it. In particular, the final dungeon crawler example from Hands-on Rust drops CPU usage from 12% (I have 8 cores, so basically pegging one core) to 1.5%.

I'll go ahead and call this a win and close the issue. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants