New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CoreTiming: Throttle Before Every Event Using std::chrono #11348
Conversation
|
Does this cause any kind of increased latency? |
|
@JMC47 I could be wrong but I do believe that it reduces latency. Basically the old throttling method would run every 1ms, and it would either wait 1ms or not in order to maintain speed. This effectively made the timing random. The new throttling looks at how long each event should take, and if it is faster than it should, it waits. If it is slower, it will eat into the next events time. This basically means that each event is happening at the EXACT time that it should, rather than spread out through the weird throttle function. |
759621f
to
0b5547a
Compare
3b0c9b9
to
9416a95
Compare
|
It looks like this fixes #11326 (comment), but it would be good to extract that fix into a standalone pull request so that it can be included in a beta release. |
024ecae
to
15455dd
Compare
|
This branch also contains additions that more optimally fit the performance information on the screen. It will now stack the information vertically when not using the performance graph. It also has a new counter which is equally as important as Speed, called Max Speed. This counts the amount of real cpu time compared to the emulated CPU time and tells you what your theoretical max speed would be if running at an uncapped framerate. This allows you to create a performance buffer by seeing how much lag your system could endure before you saw a stutter. This information was calculated before, but was less useful because of the sloppiness in the throttling. |
15455dd
to
8c7fd53
Compare
Source/Core/Core/CoreTiming.cpp
Outdated
| // timings on its own and we are safe to engage in VSync if we want. | ||
| if (m_throttle_disable_vsync) | ||
| { | ||
| WARN_LOG_FMT(COMMON, "System Lag Resolved! VSync Capability Reenabled!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't thisbe an INFO_LOG?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure but I an definitely update it.
bc0cbca
to
667806a
Compare
|
Can you please split out the refactoring from the actual change here? I have a hard time telling what is relevant and what is just noise from changing |
Source/Core/Core/CoreTiming.cpp
Outdated
| @@ -305,6 +310,8 @@ void CoreTimingManager::Advance() | |||
| Event evt = std::move(m_event_queue.front()); | |||
| std::pop_heap(m_event_queue.begin(), m_event_queue.end(), std::greater<Event>()); | |||
| m_event_queue.pop_back(); | |||
|
|
|||
| Throttle(evt.time, emulation_speed); | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the right place for the Throttle() call. The way CoreTiming works -- from what I remember, anyway -- is that CoreTiming::Advance() is called periodically by the JIT, and by the time it gets control multiple events could have logically already happened but haven't been executed yet (which this loop then executes in order to 'catch up' the event state). So by the time this code runs, we're already at m_globals.global_timer, which makes it odd to only throttle until evt.time, which may be earlier than current -- and also means you may throttle multiple times in quick succession when once would have sufficed.
It probably makes more sense to put a Throttle(m_globals.global_timer, emulation_speed) after this loop instead.
This is also most likely the most significant change here, so you should probably mention it in the PR title and description.
e: Actually... well, logically from the CPU we're at m_globals.global_timer, but in terms of real-world time we're likely ahead of where we want to be, so I can see the logic of throttling here if the event itself is user-observable in some way (like, say, presentation of the rendered image). So I guess this is probably fine as-is.
In the future it might also make sense to introduce a flag for an event whether it wants to execute as close to the correct real time as possible or not, since I suspect most events don't actually care.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understand the CPU doesnt actually need to throttle, but throttling here just makes sure that every event happens exactly when it's supposed to. The way the old throttler worked is that it created an event every GetTicksPerSecond() / 1000 that would sleep for about 1ms depending on the timer. This one is a little bit more precise. (having it here is also used in a future PR which uses this throttler to detect lag and skip VI's if needed)
163506e
to
73c5d25
Compare
Wait... as far as I can tell, And the nanosleep man page guarantees that it will sleep for AT LEAST that time. If anything, it provides a guarantee that it will oversleep whenever the OS's scheduler feels like it.
Throttling before every single event is extremely excessive, will probably result in worse latency, and will not improve actual timing accuracy. Might actually be very detrimental for performance on lower-end devices. Your current implementation tries to make every single event happen somewhat close to the correct time. But that's kind of pointless when most events don't have any side effects which the user can even observe. Ideally, you only want to throttle to make sure the Input/Output events. It doesn't matter if the emulator races though to complete rendering of half the frame as fast as possible, as long as it stops before reading input or presenting the result to screen (or audio, though we delay that) |
Sorry for the delay in response. I modeled this after the original throttling behavior. The previous throttle ran at 1000hz, and would sleep in 1ms increments. It didn't matter if there was an observable event or not, it would just sleep at different intervals at 1000hz. However because it was only able to sleep in increments of 1ms, it ended up sleeping a jittery, unpredictable way. What I meant by guaranteed to return before is that, if I request to sleep for 10µs, and windows is only able to sleep in 1ms increments, it will skip sleeping. I understand the concerns about performance, however from testing (the SkipVI branch uses this branch) I believe that there has not been a substantial drop in performance. This is likely because if you are lagging, then this whole process of sleeping is already ignored. It is probably smart to only throttle if the event requires it, however that is a little out of the scope of this PR. I could however add a "MIN_SLEEP" variable that will skip the sleep call if it is less than say 1ms, which would end up calling sleep at similar increments to the original throttling system, with the added benefits that this PR also brings. I was also considering completely disabling this throttling system when VSync is active to use that for timing instead of this, but I don't know if there is a good way to ensure that VSync is running at the right speed. |
0a459b1
to
8364f4c
Compare
968dd29
to
93d428b
Compare
I thought I had read it somewhere but looking into it further, that will actually not effect this PR at all. If the throttling oversleeps by say 5ms, this system will simply not call sleep again until it has caught up to the timings. The tolerance it has for this is +/- 40ms. So it will dynamically adapt to the best resolution that the system can provide. It is an operating system thing, and while it might not provide the same level of consistency as macos, having throttling in this part of the code makes life a lot easier, especially for the VISkip branch. It also is able to provide a lot more accurate estimated performance speed, which it was unable to do previously. It also opens the door to using multimedia timers if timing is that important, but I find this unlikely. On systems like MacOS and Linux that support nanosleep, calling sleep more often will probably be able to give the scheduler more appropriate times to hand control back, however the effect on performance is unknown. If calling sleep alot does cause issues, it is very simple to add a 1ms min sleep requirement to limit the sleep calls to 1000hz at the expense of accuracy. |
9a57866
to
bb629e9
Compare
|
Okay, how about this:
Does this still give you the 'good' frame times? |
|
Im going to test this right now, but I worry about how it might create issues that didnt exist before due to it being too different. But i will comment the results of doing what you said. |
|
If you check the discord youll see that the timings are worse if we only throttle in those places, and it seems like the performance difference is negligible. (it seems to show higher performance when throttling before every event) |
…actorings Performance tracker refactorings extracted from PR #11348.
|
shrug Weird, but if it's like that then it's like that. I'm not sure what else to do then. |
bb629e9
to
49a05d2
Compare
| if (stack_vertically) | ||
| window_y += window_height + window_padding; | ||
| else | ||
| window_x -= window_width + window_padding; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These commits are mainly just moving things around since the Improved Pacing allows us to get some new data points
49a05d2
to
bc1cc9e
Compare
|
Anyway, I remain unconvinced that this is a positive change in all instances, but at the same time I'm not gonna block @JMC47 or whoever from merging this if they really want it. It's not like we can't change it back if it turns out to cause problems... |
|
I agree that it shouldnt be regarded as final, but it is certainly more flexible that previously. (the changes you suggested were not possible with the old timing system) |
adfc882
to
f7de0e4
Compare
|
Overall, I'm happy with the code. It's a good cleanup and an improvement over what was there previously.
This is a dubious claim, and certainly not a guarantee. The only reason why windows will skip sleeping is that 10µs rounds down to 0ms, which skips sleeping. On other operating systems with Relying on rounding down to 0 is a weird edge case. If you actually want that behaviour, perhaps you should be implementing it in Throttle instead? Measure how long calls to nanosleep/sleep actually take (compared to requested time) to determine what timer resolution the OS is actually giving you, and then only call nanosleep only call Maybe on windows, we we should be switching to using The actual code is fine, but can you remove any mention of
Sigh. Discord isn't good for findability. In future, it would be helpful to duplicate any relevant infomation into the PR, rather than making me dig though discord history for it: That result doesn't feel right to me. When are you actually measuring frametime here? Is it in VI's present, or when the GPU finishes rendering the frame? The reason why I'm a bit concerned is that there is no guarantee that So because of concern about potential performance regressions, it's worth trying to reduce the number of calls to Clock::now(). And there is absolutely no reason to have every single event with accurate timing. Accurate timing only matters for input and video/sound out. Throttling before every single event seems like massive overkill, and I worry it might cause performance on certain OS configurations or cases where the hardware was already marginal. Actual requests:
|
|
@phire I updated the PR to reflect that the guarantee does not really exist. I must have misread that somewhere. However if it does happen to oversleep, it quickly compensates for that by not sleeping until the timings catch up. It can account for ~40ms of oversleep before there is a slow down. I believe that the reason that the timing are more accurate when sleeping before every event just comes down to the fact that its sleeping by small amounts more often, and is able to correct for the small errors in the sleep. I implemented the minimum sleep feature, but did it in a way that only calls Hopefully this resolves your concerns! |
6c29d1a
to
e95c1d5
Compare
|
@phire and @AdmiralCurtiss said I could merge this. We'll hopefully get some more testing on a wider variety of machines with this now. |






What it does
std::this_thread::sleep_untilCoreTimingManagernow happen the instant they are supposed toComparisons
This commit can turn these frame times (Wii Sports Resort):

Into these frame times (Wii Sports Resort - Immediate XMB ON): (just stunning)

Wii Sports Resort - Immediate XMB OFF: (notice that vblanks have an std of 0ms)

Here is with the metal frame time viewer:
Before:

After:
