-
-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add yield() to spinlock #1127
Add yield() to spinlock #1127
Conversation
Wait, C++ STL uses spinlock?! That's… weird to say the least. We need comparison tests on Linux and Windows as well. As for now: fix the formatting, please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will test this on the Pi and x86 Linux systems set to limited speeds. Thanks, @kklobe.
The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder about squashing commits and writing good commit message. :)
I thought that was a pretty comprehensive commit message, please advise on desired changes :) |
A fair amount of time is spent in the spinlock on systems that use DelayPrecise, so add the yield() hint for systems that can take advantage of it. Example: https://en.cppreference.com/w/cpp/thread/yield
9aa9d25
to
158e3ef
Compare
Ehh… I had no idea we added spinlocks to our code… @kklobe spinlocks in userspace code are evil, I don't care what somebody wrote on some random blog. We need to find a way to avoid it - preferably using platform-specific APIs. But in my opinion we should stick to what's provided to us by SDL. |
The change increased framerate and also reduced CPU usage by ~50% on at least my M1 Mac (#1084). I am happy to revert those changes if they are unwanted. EDIT: I should clarify, I'm probably using the term |
@kklobe we need to make sure your increased framerate on M1 Mac did not break e.g. GUS or SB emulation on Linux or Windows (or other things)… Generally: spinlocks never should be used in userspace, because they create an indirect, hidden dependency on the behaviour of OS kernel. Maybe today you get increased framerate, but then an OS update happens and you get a regression instead. Basically we need to use proper APIs for sleeping, like POSIX |
That is why I switched the base code to EDIT: if, on the other hand, we don't want to trust |
PR 1084 was a net-benefit based on actual measurements: holding performance at par or better on all supported platforms, and simultaneously improving the 1ms timing delay without the typical 10% overage due to using SDL coarse delay timer. Chrono is is vastly superior to SDL when it comes to timing: its type and unit safe, offers finer resolutions, and is directly support by the language. Compiler implementations continue to improve it (as has happened for Chrono from c++11 to c++14). If the code was an unmaintainable mess or had functional corner cases or performance regressions- then I fully agree we would want to hold to SDL. If we can refine it further with more OS-friendly mechanisms like nanosleep, then I'm all for that! |
just a reminder, |
Let me back up for a minute and refresh the context for these changes. The first change I looked at was replacing the SDL-specific After that was done, @kcgen and I were discussing the potential benefits of So the algorithm in I added the quick heuristic All of this was done while carefully measuring CPU usage and performance on Windows, Linux (Fedora 33/34 for arm64 and x64), and macOS (M1). |
Quake timedemo at 120k cycles running on an 800 Mhz Linux x86-64 PC Both sets of runs produced identical FPS (39.8, all 10 runs) Branch w/
Main branch w/ spinlock:
Branch delivers the same runtime performance with ~0.34% lower host-side cost. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kklobe , thanks for confirming that yield()
boils down to platform-specific cooperative calls (like nanosleep
) that @dreamer mentioned.
Benchmarks on my side all are good: at-par or slightly lower CPU overhead to the deliver the same performance.
Formatting and commit message are good.
I think this is OK to merge. @dreamer, what do you think?
Will plan to give it another day or so and then go-ahead.
Thanks for the this improvement @kklobe ! |
@kklobe Thanks! Next time please follow this guideline when it comes to commit messages, please: https://chris.beams.io/posts/git-commit/ @kcgen gentle reminder, that reviewing commit message is part of the review Aside of that - my previous worries about spinlock were a bit overblown - this code does not use spinlock as a thread synchronization mechanism after all - it's more akin to active waiting. We should still find the good solution using proper high resolution timer and appropriate API (maybe |
I would appreciate specific feedback on the commit message, because I thought I followed the guidelines. I believe the subject line meets #1-#5, and I described what I did and why in the body. I'm interested in good messages and would like to know how I could have made that better.
These are legitimate concerns, and it was me correcting an oversight from what I originally had in the first changes. In general (and especially in embedded / firmware, which is most of my dev work), you never want to "busy wait" without giving the CPU a chance to do something else. For example, on a Cortex M4, you use WFE or WFI to avoid eating 100% CPU while you wait, so this wasn't some cargo-cult change I slapped in for no reason, but was based on a reasonable amount of past experience. From the cppreference link I included in the original commit, this type of situation is one big reason why yield() was added to the standard, so I had no reason to distrust it, but I do also like that this is a single commit, so we can bisect if we do run into issues. Thank you - I appreciate the detailed feedback! |
It's rule 6 - wrapping your commit message body around 72 column. When you're writing your commit message in vim, the editor will wrap the lines automatically for you :) Avoid using GitHub interface for commit message editing. |
Do you think we should cherry-pick this change to release branch for 0.77.1 release? |
Sorry @kklobe , I thought the commit only had a title. GitHub didn't given me the usual Atleast it's currently indicating properly: I was happy with the commit title - and certainly reviewed that aspect. Next time I won't trust the presence of the dots (or I'll inspect the branch locally with |
On that topic - does anyone know of a work-around to make vscodium wrap commit messages at the 72 column boundary? (There is an open ticket, but no progress microsoft/vscode#2718). I currently do my git operations and builds on the console using first-letter aliases (gists: git & meson), and use |
The DOSBox authors (and the code) expect ongoing slippage where each 1ms tick through the emulator actually is much more than 1ms.
The existing DOSBox code expects this slippage by measuring and adjusting for it after all of the above have happened. It then adds a small number of ticks (between 2 and 5, if I recall), to force DOSBox to always stay ahead of the ragged-edge by just a couple ms. This PR only addresses the third bullet - trying to make the 1ms sleep really 1ms instead of 1.2ms. It's a bit like running DOSBox on an RT-patched Linux kernel and giving the process FIFO scheduling (which I've done.. timing becomes highly accurate). Given DOSBox already handles large variations in delay inaccuracies, this move toward more accurate 1-ms delaying just moves the error bars inward a bit more.
I did extensive testing of all audio devices, all cores, all machine-types prior to 0.77, including running tests for all of the video modes (vgatest.exe.zip).
Couldn't hurt! I'll add it. I think there are a couple more little fixes we've done that could go into 0.77.1 now too. |
In progress: #1138 Will merge when it passes CI. |
A fair amount of time is spent in the spinlock on systems that use DelayPrecise, so add the
yield
hint for systems that can take advantage of it. Example: https://en.cppreference.com/w/cpp/thread/yield.The
yield
was causing problems on Win32, so I had removed it initially, but after addingCanDelayPrecise()
, which effectively bypasses Win32, we can add it back in and test.This probably won't make much difference for manycore systems, but since spinlocking is relatively evil to begin with, the
yield
makes it more well-behaved citizen, especially on lower-core systems.I tested the number of loop iterations in the spinlock (M1 MacBook):