Skip to content

Commit

Permalink
ruby: Add Metal display backend (#1431)
Browse files Browse the repository at this point in the history
This PR adds a Metal display backend to ares.

<img width="1482" alt="Screenshot 2024-03-31 at 7 02 40 PM"
src="https://github.com/ares-emulator/ares/assets/6864788/48718f19-9916-491b-8301-c16b95f95c19">

*ares N64 core on a 60Hz P3 display. Shader: "crt-maximus-royale".*

## Context

With ares's recent introduction of librashader, ares users have enjoyed
access to the sizable library of shaders in the slang format, such as
those used by the RetroArch frontend. Simultaneously, up to now, the CGL
OpenGL backend shipped by ares for Mac users has offered enough
functionality to do the (relatively) simple job of an ares display
backend; display software-rendered frame buffers and pace them
appropriately.

librashader's advanced postprocessing, however, has asked a bit more of
the display backend, and in the case of OpenGL, has laid bare various
deficiencies in macOS's OpenGL driver. These deficiencies can kneecap
ares's librashader support on macOS with compiler errors and broken
shader behavior.

Rather than chase down these errors and try to work around what is
fundamentally a broken driver shipped by Apple, we take advantage of
librashader's native Metal support by adding a Metal backend to ares.
This greatly increases baseline shader compatibility, with the added
benefit of documentation and greater platform debugging information when
issues arise. The Metal backend will also generally future-proof ares on
macOS, with OpenGL's uncertain future on macOS.

## Basics

The first iteration of this Metal driver mostly offers feature parity
with the OpenGL driver. More advanced features, particularly in the
realm of frame pacing on ProMotion displays, will arrive in future
iterations. The priority with this PR is to start getting the driver in
the hands of users with basic features and greater librashader
compatibility.

Host refresh rate sync is still a work in progress and in this iteration
will only be enabled for users above 10.15.4 on non-VRR displays
(discussed more below).

Explicit sRGB color matching is offered as a new option for those on
wide gamut displays (by default, the Metal driver will map to the native
color space, conforming to OpenGL driver behavior). This option is only
exposed on macOS, since other operating systems lack per-surface color
matching.

<img width="400" alt="Screenshot 2024-03-31 at 6 37 38 PM"
src="https://github.com/ares-emulator/ares/assets/6864788/e177a256-bba2-4a02-ad5b-9ad84cb09740">
<img width="400" alt="Screenshot 2024-03-31 at 6 37 46 PM"
src="https://github.com/ares-emulator/ares/assets/6864788/dd517394-e97e-40f6-98de-4562d62bda98">

*Left: ares presenting in Display P3. Right: ares presenting more
accurately in sRGB with "Force sRGB" enabled. Shader: crt-hyllian.*

A simple vertex and fragment shader are compiled at runtime, so we avoid
the need to add another compiler toolchain to the ares build process.
There is unfortunately some code duplicated at present; `metal.cpp` and
`Shaders.metal` both include types defined in `ShaderTypes.h`, but we
also need to place `Shaders.metal` inside the .app bundle for runtime
compilation, making for awkward `#include`s. Presently, we just bundle a
copy of `Shaders.metal` appended with `ShaderTypes.h`, inside
`desktop-ui/resource`. This will be cleaned up in future work.

The driver's draw implementation itself is fairly simple; one render
pass renders to an offscreen texture, librashader performs its work on
that offscreen texture, and a second ares Metal render pass composites
the finished texture inside ares's viewport. Since we do not use
`MTKViewDelegate`, our output function finishes with a `[view draw]`
call and the system presents at the earliest opportunity.

## Details

When it came to the details of implementing this driver, there were some
nontrivial issues encountered. Some of these will need solving in
separate PRs before this driver is feature-complete.

### ares vs. VRR

Users on fixed refresh rate displays should enjoy good frame pacing with
this driver. Unfortunately, users of more recent Mac machines with
"ProMotion" refresh rates will not have an ideal experience in terms of
pacing. To understand why, we need to take a brief detour into how ares
works and then discuss some current limitations with ares's macOS
integration.

In ares "synchronize to audio" mode, ares creates and delivers video
frames as audio frames are created. This means that the video frame
timing is completely dependent on when exactly the audio driver
processes audio frames. For display modes with a refresh rate at or
close to the core refresh rate, this is mostly no problem; the system
seems to naturally present frames in a FIFO-esque fashion, and every
once in awhile the system will just drop or duplicate a frame if two or
no draw calls fall within one refresh interval.

For recent more advanced Mac displays with "up to 120Hz" refresh rate,
the story is more complicated. We have to explicitly tell the system
when we want it to draw the frame once available. It is tempting to
answer "now"; after all, if our audio timings are correct, then video
frames should be generated precisely when they need to be shown.
Unfortunately, in higher latency modes of OpenAL or with SDL audio in
general on macOS, audio frames are processed in large batches. That
means that we end up emitting several video frames in quick succession
at 8ms intervals on a 120Hz display, then waiting as long as 75ms for a
new batch of audio (and thus video) frames:

<img width="265" alt="Screenshot 2024-03-31 at 4 55 25 PM"
src="https://github.com/ares-emulator/ares/assets/6864788/69f887a0-8ded-4f47-ac4c-4cf2b58283a7">
<img width="265" alt="Screenshot 2024-03-31 at 4 55 35 PM"
src="https://github.com/ares-emulator/ares/assets/6864788/2f7a417a-0378-44b5-b0c0-f894542c0371">
<img width="265" alt="Screenshot 2024-03-31 at 4 55 56 PM"
src="https://github.com/ares-emulator/ares/assets/6864788/13b471d9-3e77-45cd-baf4-0d786ae83e22">

*VRR macOS frame pacing across audio driver settings in v0.1 of the
Metal driver. The graph in blue shows frame present intervals over time;
the values in red show the minimum and maximum present intervals over
the graph duration.*

If we do not answer "now," we have to decide when to present.
Unfortunately, currently, there is not a satisfying way to answer that
question. Core refresh rates vary somewhat widely, sometimes during
runtime, and there is no mechanism in ares by which to inform the
graphics driver of a core's desired refresh rate.

We could elect to just duplicate the behavior for a fixed display
refresh rate and pick, e.g. 60 Hz, but unfortunately even that option is
not available, because we currently have no way of receiving callbacks
when a frame is actually presented. Why not `CAMetalDisplayLink` or even
`MTKViewDelegate` you ask? Well...

### ares vs. macOS

For most of its cores, ares performs much of its work on a single
dedicated main thread that blocks for audio and video presentation to
drive hardware-accurate timing. Unfortunately, all of this work occurs
on the macOS main thread, with lots of blocking and CPU-intensive
activity. This interferes with the macOS application run loop's ability
to perform its callbacks and call out to observers.

In practice, this means that if we try to employ delegates that
interface with macOS, that could send a callback when a frame is
presented, or tell ares the exact moment a frame needs to be presented,
these system delegates cannot actually make these calls in time in
between ares's main thread activity; upwards of 50% of `MTKViewDelegate`
callbacks are lost, for example.

This means that tools like `MTKViewDelegate` or `CAMetalDisplayLink`
that would help us solve the frame pacing problem are, unfortunately,
useless to us. We cannot leverage these tools as ares is currently
architected.

To get around these issues, we will need one of: less main thread
blocking, so delegates can interface with ares on the main thread, or an
audio driver with a processing tolerance that falls within the display's
minimum refresh interval. Our best bet for now is to emit frames to the
system within ares's main thread work as they come available, let the
system draw them as it will, and hope that our audio driver is doing a
good job pacing them.

In practice, for Metal driver users on VRR displays you cannot set to a
fixed rate, this means you should use the OpenAL driver, and set the
latency to the lowest value possible.

## Future Work

The future for the Metal driver in ares takes us down a few different
paths.

The main issue at present is making macOS system delegates work well
with ares, which is the ideal path forward. Ideally, we could move all
of the emulation-intensive work off of the main thread in macOS and into
a high priority dedicated thread, reserving the main thread for actual
UI and rendering, giving the system plenty of overhead with which to
communicate.

For the future of VRR in ares, it would be good to create a mechanism to
tell the graphics driver what refresh rate the core wants to present at.
This would be one way to pace draw calls appropriately in the absence of
reliable feedback from the system about the state of the display.

It has gone without mentioning so far due to the other issues, but long
term, it would also be good for ares or librashader to have some way of
utilizing the entire viewport for shaders; currently, shaders are
limited to the output width and height area rather than the entire
window view size. This is limiting for "bezel"-style shaders that want
to use the entire screen in fullscreen, for example.

Co-authored-by: jcm <butt@butts.com>
  • Loading branch information
jcm93 and jcm committed Apr 1, 2024
1 parent ba31eaf commit 72fa7d3
Show file tree
Hide file tree
Showing 17 changed files with 897 additions and 5 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
*.moc
*.user
*.xcuserdata
.vs/
.vscode/
.idea/
Expand All @@ -12,3 +13,6 @@ out-amd64/
out-arm64/
thirdparty/SDL/SDL
thirdparty/SDL/libSDL2-2.0.0.dylib
macos-xcode/
.swiftpm
*.xcodeproj
1 change: 1 addition & 0 deletions desktop-ui/GNUmakefile
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,7 @@ endif
cp resource/$(name).plist $(output.path)/$(name).app/Contents/Info.plist
$(call mkdir,$(output.path)/$(name).app/Contents/Resources/Shaders/)
$(call mkdir,$(output.path)/$(name).app/Contents/Resources/Database/)
cp ../ruby/video/metal/Shaders.metal $(output.path)/$(name).app/Contents/Resources/Shaders/Shaders.metal
$(call rcopy,$(thirdparty.path)/slang-shaders/*,$(output.path)/$(name).app/Contents/Resources/Shaders/)
$(call rcopy,$(mia.path)/Database/*,$(output.path)/$(name).app/Contents/Resources/Database/)
sips -s format icns resource/$(name).png --out $(output.path)/$(name).app/Contents/Resources/$(name).icns
Expand Down
2 changes: 1 addition & 1 deletion desktop-ui/presentation/presentation.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -594,7 +594,7 @@ auto Presentation::loadShaders() -> void {

auto location = locate("Shaders/");

if(ruby::video.driver() == "OpenGL 3.2") {
if(ruby::video.hasShader()) {
auto files = directory::files(location, "*.slangp");
for(auto file : files) {
MenuCheckItem item{&videoShaderMenu};
Expand Down
82 changes: 82 additions & 0 deletions desktop-ui/resource/Shaders.metal
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
//
// Shaders.metal
// ares
//
// Created by jcm on 2024-03-07.
//

#include <metal_stdlib>

#include "ShaderTypes.h"

using namespace metal;

// Vertex shader outputs and fragment shader inputs
struct RasterizerData
{
// The [[position]] attribute of this member indicates that this value
// is the clip space position of the vertex when this structure is
// returned from the vertex function.
float4 position [[position]];

// Since this member does not have a special attribute, the rasterizer
// interpolates its value with the values of the other triangle vertices
// and then passes the interpolated value to the fragment shader for each
// fragment in the triangle.
float2 textureCoordinate;
};

vertex RasterizerData
vertexShader(uint vertexID [[vertex_id]],
constant AAPLVertex *vertices [[buffer(AAPLVertexInputIndexVertices)]],
constant vector_uint2 *viewportSizePointer [[buffer(AAPLVertexInputIndexViewportSize)]])
{
RasterizerData out;

// Index into the array of positions to get the current vertex.
// The positions are specified in pixel dimensions (i.e. a value of 100
// is 100 pixels from the origin).
float2 pixelSpacePosition = vertices[vertexID].position.xy;

// Get the viewport size and cast to float.
vector_float2 viewportSize = vector_float2(*viewportSizePointer);


// To convert from positions in pixel space to positions in clip-space,
// divide the pixel coordinates by half the size of the viewport.
out.position = vector_float4(0.0, 0.0, 0.0, 1.0);
out.position.xy = pixelSpacePosition / (viewportSize / 2.0);

// Pass the input color directly to the rasterizer.
out.textureCoordinate = vertices[vertexID].textureCoordinate;

return out;
}

fragment float4
samplingShader(RasterizerData in [[stage_in]],
texture2d<half> colorTexture [[ texture(AAPLTextureIndexBaseColor) ]])
{
constexpr sampler textureSampler (mag_filter::nearest,
min_filter::nearest);

// Sample the texture to obtain a color
const half4 colorSample = colorTexture.sample(textureSampler, in.textureCoordinate);

// return the color of the texture
return float4(colorSample);
}

fragment float4
drawableSamplingShader(RasterizerData in [[stage_in]],
texture2d<half> colorTexture [[ texture(AAPLTextureIndexBaseColor) ]])
{
constexpr sampler textureSampler (mag_filter::linear,
min_filter::linear);

// Sample the texture to obtain a color
const half4 colorSample = colorTexture.sample(textureSampler, in.textureCoordinate);

// return the color of the texture
return float4(colorSample);
}
9 changes: 9 additions & 0 deletions desktop-ui/settings/drivers.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,12 @@ auto DriverSettings::construct() -> void {
settings.video.flush = videoFlushToggle.checked();
ruby::video.setFlush(settings.video.flush);
});
#if defined(PLATFORM_MACOS)
videoColorSpaceToggle.setText("Force sRGB").onToggle([&] {
settings.video.forceSRGB = videoColorSpaceToggle.checked();
ruby::video.setForceSRGB(settings.video.forceSRGB);
});
#endif

audioLabel.setText("Audio").setFont(Font().setBold());
audioDriverList.onChange([&] {
Expand Down Expand Up @@ -147,6 +153,9 @@ auto DriverSettings::videoRefresh() -> void {
videoFormatList.setEnabled(0 && videoFormatList.itemCount() > 1);
videoExclusiveToggle.setChecked(ruby::video.exclusive()).setEnabled(ruby::video.hasExclusive());
videoBlockingToggle.setChecked(ruby::video.blocking()).setEnabled(ruby::video.hasBlocking());
#if defined(PLATFORM_MACOS)
videoColorSpaceToggle.setChecked(ruby::video.forceSRGB()).setEnabled(ruby::video.hasForceSRGB());
#endif
videoFlushToggle.setChecked(ruby::video.flush()).setEnabled(ruby::video.hasFlush());
VerticalLayout::resize();
}
Expand Down
1 change: 1 addition & 0 deletions desktop-ui/settings/settings.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ auto Settings::process(bool load) -> void {
bind(string, "Video/Format", video.format);
bind(boolean, "Video/Exclusive", video.exclusive);
bind(boolean, "Video/Blocking", video.blocking);
bind(boolean, "Video/PresentSRGB", video.forceSRGB);
bind(boolean, "Video/Flush", video.flush);
bind(string, "Video/Shader", video.shader);
bind(natural, "Video/Multiplier", video.multiplier);
Expand Down
4 changes: 4 additions & 0 deletions desktop-ui/settings/settings.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ struct Settings : Markup::Node {
string format;
bool exclusive = false;
bool blocking = false;
bool forceSRGB = false;
bool flush = false;
string shader = "None";
u32 multiplier = 2;
Expand Down Expand Up @@ -334,6 +335,9 @@ struct DriverSettings : VerticalLayout {
CheckLabel videoExclusiveToggle{&videoToggleLayout, Size{0, 0}};
CheckLabel videoBlockingToggle{&videoToggleLayout, Size{0, 0}};
CheckLabel videoFlushToggle{&videoToggleLayout, Size{0, 0}};
#if defined(PLATFORM_MACOS)
CheckLabel videoColorSpaceToggle{&videoToggleLayout, Size{0, 0}};
#endif
//
Label audioLabel{this, Size{~0, 0}, 5};
HorizontalLayout audioDriverLayout{this, Size{~0, 0}};
Expand Down
3 changes: 3 additions & 0 deletions ruby/GNUmakefile
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ ifeq ($(ruby),)
ruby += video.cgl
ruby += audio.openal
ruby += input.quartz #input.carbon
ruby += video.metal
ifeq ($(sdl2),true)
macsdl = ../thirdparty/SDL/libSDL2-2.0.0.dylib
ares.dylibs += $(macsdl)
Expand Down Expand Up @@ -118,6 +119,8 @@ ifeq ($(platform),windows)
endif

ifeq ($(platform),macos)
ruby.options += -framework Metal
ruby.options += -framework MetalKit
ruby.options += -framework IOKit
ruby.options += $(if $(findstring audio.openal,$(ruby)),-framework OpenAL)
ifeq ($(sdl2),true)
Expand Down
1 change: 1 addition & 0 deletions ruby/video/cgl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ struct VideoCGL : VideoDriver, OpenGL {
auto hasFullScreen() -> bool override { return true; }
auto hasContext() -> bool override { return true; }
auto hasBlocking() -> bool override { return true; }
auto hasForceSRGB() -> bool override { return false; }
auto hasFlush() -> bool override { return true; }
auto hasShader() -> bool override { return true; }

Expand Down
30 changes: 30 additions & 0 deletions ruby/video/metal/ShaderTypes.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
/*
See LICENSE folder for this sample’s licensing information.
Abstract:
Header containing types and enum constants shared between Metal shaders and C/ObjC source
*/
#include <simd/simd.h>

// Buffer index values shared between shader and C code to ensure Metal shader buffer inputs
// match Metal API buffer set calls.
typedef enum MetalVertexInputIndex
{
MetalVertexInputIndexVertices = 0,
MetalVertexInputIndexViewportSize = 1,
} MetalVertexInputIndex;

typedef enum MetalTextureIndex
{
MetalTextureIndexBaseColor = 0,
} MetalTextureIndex;

// This structure defines the layout of vertices sent to the vertex
// shader. This header is shared between the .metal shader and C code, to guarantee that
// the layout of the vertex array in the C code matches the layout that the .metal
// vertex shader expects.
typedef struct
{
vector_float2 position;
vector_float2 textureCoordinate;
} MetalVertex;
109 changes: 109 additions & 0 deletions ruby/video/metal/Shaders.metal
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
//
// Shaders.metal
// ares
//
// Created by jcm on 2024-03-07.
//

#include <metal_stdlib>
#include <simd/simd.h>

// Buffer index values shared between shader and C code to ensure Metal shader buffer inputs
// match Metal API buffer set calls.
typedef enum MetalVertexInputIndex
{
MetalVertexInputIndexVertices = 0,
MetalVertexInputIndexViewportSize = 1,
} MetalVertexInputIndex;

typedef enum MetalTextureIndex
{
MetalTextureIndexBaseColor = 0,
} MetalTextureIndex;

// This structure defines the layout of vertices sent to the vertex
// shader. This header is shared between the .metal shader and C code, to guarantee that
// the layout of the vertex array in the C code matches the layout that the .metal
// vertex shader expects.
typedef struct
{
vector_float2 position;
vector_float2 textureCoordinate;
} MetalVertex;

using namespace metal;

// Vertex shader outputs and fragment shader inputs
struct RasterizerData
{
// The [[position]] attribute of this member indicates that this value
// is the clip space position of the vertex when this structure is
// returned from the vertex function.
float4 position [[position]];

// Since this member does not have a special attribute, the rasterizer
// interpolates its value with the values of the other triangle vertices
// and then passes the interpolated value to the fragment shader for each
// fragment in the triangle.
float2 textureCoordinate;
};

vertex RasterizerData
vertexShader(uint vertexID [[vertex_id]],
constant MetalVertex *vertices [[buffer(MetalVertexInputIndexVertices)]],
constant vector_uint2 *viewportSizePointer [[buffer(MetalVertexInputIndexViewportSize)]])
{
RasterizerData out;

// Index into the array of positions to get the current vertex.
// The positions are specified in pixel dimensions (i.e. a value of 100
// is 100 pixels from the origin).
float2 pixelSpacePosition = vertices[vertexID].position.xy;

// Get the viewport size and cast to float.
vector_float2 viewportSize = vector_float2(*viewportSizePointer);


// To convert from positions in pixel space to positions in clip-space,
// divide the pixel coordinates by half the size of the viewport.
out.position = vector_float4(0.0, 0.0, 0.0, 1.0);
out.position.xy = pixelSpacePosition / (viewportSize / 2.0);

// Pass the input color directly to the rasterizer.
out.textureCoordinate = vertices[vertexID].textureCoordinate;

return out;
}

fragment float4
samplingShader(RasterizerData in [[stage_in]],
texture2d<half> colorTexture [[ texture(MetalTextureIndexBaseColor) ]])
{
constexpr sampler textureSampler (mag_filter::nearest,
min_filter::nearest);

// Sample the texture to obtain a color
const half4 colorSample = colorTexture.sample(textureSampler, in.textureCoordinate);

// return the color of the texture
return float4(colorSample);
}

fragment float4
drawableSamplingShader(RasterizerData in [[stage_in]],
texture2d<half> colorTexture [[ texture(MetalTextureIndexBaseColor) ]])
{
// We use this shader to sample the intermediate texture onto the screen texture;
// both textures are identical in size. Despite that, if we use nearest neighbor
// filtering, we end up with significant interference patterns at some scales,
// probably due to float rounding lower down in the system, if I were to guess.
// Linear filtering solves this problem. We could also blit, probably.
constexpr sampler textureSampler (mag_filter::linear,
min_filter::linear);

// Sample the texture to obtain a color
const half4 colorSample = colorTexture.sample(textureSampler, in.textureCoordinate);

// return the color of the texture
return float4(colorSample);
}
Loading

0 comments on commit 72fa7d3

Please sign in to comment.