Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ruby: Add Metal display backend (#1431)
This PR adds a Metal display backend to ares. <img width="1482" alt="Screenshot 2024-03-31 at 7 02 40 PM" src="https://github.com/ares-emulator/ares/assets/6864788/48718f19-9916-491b-8301-c16b95f95c19"> *ares N64 core on a 60Hz P3 display. Shader: "crt-maximus-royale".* ## Context With ares's recent introduction of librashader, ares users have enjoyed access to the sizable library of shaders in the slang format, such as those used by the RetroArch frontend. Simultaneously, up to now, the CGL OpenGL backend shipped by ares for Mac users has offered enough functionality to do the (relatively) simple job of an ares display backend; display software-rendered frame buffers and pace them appropriately. librashader's advanced postprocessing, however, has asked a bit more of the display backend, and in the case of OpenGL, has laid bare various deficiencies in macOS's OpenGL driver. These deficiencies can kneecap ares's librashader support on macOS with compiler errors and broken shader behavior. Rather than chase down these errors and try to work around what is fundamentally a broken driver shipped by Apple, we take advantage of librashader's native Metal support by adding a Metal backend to ares. This greatly increases baseline shader compatibility, with the added benefit of documentation and greater platform debugging information when issues arise. The Metal backend will also generally future-proof ares on macOS, with OpenGL's uncertain future on macOS. ## Basics The first iteration of this Metal driver mostly offers feature parity with the OpenGL driver. More advanced features, particularly in the realm of frame pacing on ProMotion displays, will arrive in future iterations. The priority with this PR is to start getting the driver in the hands of users with basic features and greater librashader compatibility. Host refresh rate sync is still a work in progress and in this iteration will only be enabled for users above 10.15.4 on non-VRR displays (discussed more below). Explicit sRGB color matching is offered as a new option for those on wide gamut displays (by default, the Metal driver will map to the native color space, conforming to OpenGL driver behavior). This option is only exposed on macOS, since other operating systems lack per-surface color matching. <img width="400" alt="Screenshot 2024-03-31 at 6 37 38 PM" src="https://github.com/ares-emulator/ares/assets/6864788/e177a256-bba2-4a02-ad5b-9ad84cb09740"> <img width="400" alt="Screenshot 2024-03-31 at 6 37 46 PM" src="https://github.com/ares-emulator/ares/assets/6864788/dd517394-e97e-40f6-98de-4562d62bda98"> *Left: ares presenting in Display P3. Right: ares presenting more accurately in sRGB with "Force sRGB" enabled. Shader: crt-hyllian.* A simple vertex and fragment shader are compiled at runtime, so we avoid the need to add another compiler toolchain to the ares build process. There is unfortunately some code duplicated at present; `metal.cpp` and `Shaders.metal` both include types defined in `ShaderTypes.h`, but we also need to place `Shaders.metal` inside the .app bundle for runtime compilation, making for awkward `#include`s. Presently, we just bundle a copy of `Shaders.metal` appended with `ShaderTypes.h`, inside `desktop-ui/resource`. This will be cleaned up in future work. The driver's draw implementation itself is fairly simple; one render pass renders to an offscreen texture, librashader performs its work on that offscreen texture, and a second ares Metal render pass composites the finished texture inside ares's viewport. Since we do not use `MTKViewDelegate`, our output function finishes with a `[view draw]` call and the system presents at the earliest opportunity. ## Details When it came to the details of implementing this driver, there were some nontrivial issues encountered. Some of these will need solving in separate PRs before this driver is feature-complete. ### ares vs. VRR Users on fixed refresh rate displays should enjoy good frame pacing with this driver. Unfortunately, users of more recent Mac machines with "ProMotion" refresh rates will not have an ideal experience in terms of pacing. To understand why, we need to take a brief detour into how ares works and then discuss some current limitations with ares's macOS integration. In ares "synchronize to audio" mode, ares creates and delivers video frames as audio frames are created. This means that the video frame timing is completely dependent on when exactly the audio driver processes audio frames. For display modes with a refresh rate at or close to the core refresh rate, this is mostly no problem; the system seems to naturally present frames in a FIFO-esque fashion, and every once in awhile the system will just drop or duplicate a frame if two or no draw calls fall within one refresh interval. For recent more advanced Mac displays with "up to 120Hz" refresh rate, the story is more complicated. We have to explicitly tell the system when we want it to draw the frame once available. It is tempting to answer "now"; after all, if our audio timings are correct, then video frames should be generated precisely when they need to be shown. Unfortunately, in higher latency modes of OpenAL or with SDL audio in general on macOS, audio frames are processed in large batches. That means that we end up emitting several video frames in quick succession at 8ms intervals on a 120Hz display, then waiting as long as 75ms for a new batch of audio (and thus video) frames: <img width="265" alt="Screenshot 2024-03-31 at 4 55 25 PM" src="https://github.com/ares-emulator/ares/assets/6864788/69f887a0-8ded-4f47-ac4c-4cf2b58283a7"> <img width="265" alt="Screenshot 2024-03-31 at 4 55 35 PM" src="https://github.com/ares-emulator/ares/assets/6864788/2f7a417a-0378-44b5-b0c0-f894542c0371"> <img width="265" alt="Screenshot 2024-03-31 at 4 55 56 PM" src="https://github.com/ares-emulator/ares/assets/6864788/13b471d9-3e77-45cd-baf4-0d786ae83e22"> *VRR macOS frame pacing across audio driver settings in v0.1 of the Metal driver. The graph in blue shows frame present intervals over time; the values in red show the minimum and maximum present intervals over the graph duration.* If we do not answer "now," we have to decide when to present. Unfortunately, currently, there is not a satisfying way to answer that question. Core refresh rates vary somewhat widely, sometimes during runtime, and there is no mechanism in ares by which to inform the graphics driver of a core's desired refresh rate. We could elect to just duplicate the behavior for a fixed display refresh rate and pick, e.g. 60 Hz, but unfortunately even that option is not available, because we currently have no way of receiving callbacks when a frame is actually presented. Why not `CAMetalDisplayLink` or even `MTKViewDelegate` you ask? Well... ### ares vs. macOS For most of its cores, ares performs much of its work on a single dedicated main thread that blocks for audio and video presentation to drive hardware-accurate timing. Unfortunately, all of this work occurs on the macOS main thread, with lots of blocking and CPU-intensive activity. This interferes with the macOS application run loop's ability to perform its callbacks and call out to observers. In practice, this means that if we try to employ delegates that interface with macOS, that could send a callback when a frame is presented, or tell ares the exact moment a frame needs to be presented, these system delegates cannot actually make these calls in time in between ares's main thread activity; upwards of 50% of `MTKViewDelegate` callbacks are lost, for example. This means that tools like `MTKViewDelegate` or `CAMetalDisplayLink` that would help us solve the frame pacing problem are, unfortunately, useless to us. We cannot leverage these tools as ares is currently architected. To get around these issues, we will need one of: less main thread blocking, so delegates can interface with ares on the main thread, or an audio driver with a processing tolerance that falls within the display's minimum refresh interval. Our best bet for now is to emit frames to the system within ares's main thread work as they come available, let the system draw them as it will, and hope that our audio driver is doing a good job pacing them. In practice, for Metal driver users on VRR displays you cannot set to a fixed rate, this means you should use the OpenAL driver, and set the latency to the lowest value possible. ## Future Work The future for the Metal driver in ares takes us down a few different paths. The main issue at present is making macOS system delegates work well with ares, which is the ideal path forward. Ideally, we could move all of the emulation-intensive work off of the main thread in macOS and into a high priority dedicated thread, reserving the main thread for actual UI and rendering, giving the system plenty of overhead with which to communicate. For the future of VRR in ares, it would be good to create a mechanism to tell the graphics driver what refresh rate the core wants to present at. This would be one way to pace draw calls appropriately in the absence of reliable feedback from the system about the state of the display. It has gone without mentioning so far due to the other issues, but long term, it would also be good for ares or librashader to have some way of utilizing the entire viewport for shaders; currently, shaders are limited to the output width and height area rather than the entire window view size. This is limiting for "bezel"-style shaders that want to use the entire screen in fullscreen, for example. Co-authored-by: jcm <butt@butts.com>
- Loading branch information