Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Macos Optimization #576

Merged
merged 1 commit into from Dec 15, 2022
Merged

Macos Optimization #576

merged 1 commit into from Dec 15, 2022

Conversation

Tillsunset
Copy link
Contributor

@Tillsunset Tillsunset commented Dec 15, 2022

clock_gettime_nsec_np is an optimization of clock_gettime, roughly 90% time needed from some testing. It is specific to macOS though.

CLOCK_MONOTONIC_RAW_APPROX is another potential optimization for macOS, roughly 50% time needed compared to CLOCK_MONOTONIC_RAW. But typically only guarantees 1 millisecond accuracy. Its potential usage and overall speedup is extremely debatable. I could see it potentially being used in GetTickCount() and tick_cached().

clock_getres also always returns 1 nanosecond, I'm not sure if that is the intended purpose when used in HighResolutionTimer::getFrequency(). Feedback needed

@Tillsunset Tillsunset marked this pull request as draft December 15, 2022 05:29
@Exzap
Copy link
Contributor

Exzap commented Dec 15, 2022

Thanks for looking into this. I think some additional context and history on our timer functions would help.

First, we consider all time sources except for HighResolutionTimer to be deprecated. Using std::chrono::steady_clock for code where performance isn't a high priority is ok, but personally I would discourage it's use in critical code because it can be difficult to predict the latency and resolution of the various implementations.

Before HighResolutionTimer was introduced, we had tick_cached and now_cached as a workaround for Microsoft's slow implementation of std::chrono::steady_clock::now() (and high_resolution_clock). As far as I know this has been resolved by now but other implementations might still be slow.

Later, we added PPCTimer as a timer source specifically for CPU emulation, where we need maximum performance and resolution, with a small amount of drift acceptable. This lowered the importance of tick_cached and now_cached to be as fast as possible, since they were no longer frequently called on the hot path.

Pretty recently we then introduced HighResolutionTimer with the goal of providing a unified, simple, high-performance, high-accuracy, and high-resolution interface for our timing needs. In the long term, the goal is to replace all other timers with HighResolutionTimer.

Anyway, your PR looks good. As mentioned there isn't really a need to hyper optimize tick_cached/now_cached anymore so using something like CLOCK_MONOTONIC_RAW_APPROX would be overkill.

clock_getres also always returns 1 nanosecond, I'm not sure if that is the intended purpose when used in HighResolutionTimer::getFrequency(). Feedback needed

That matches the man page and expected behavior. The frequency is 1ns and clock_gettime_nsec_np returns the time in nanoseconds. On Windows the frequency is dynamic, but in theory for macOS we could leverage the fact that it's fixed and define m_freq as a compile-time constant. The performance gain would likely be pretty insignificant tho.

@Tillsunset Tillsunset marked this pull request as ready for review December 15, 2022 07:52
@Exzap Exzap merged commit 058d11b into cemu-project:main Dec 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants