Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSL: RWBuffer<uint> is translated into a texture2d and buffer of atomic_int even though SPIRV-Cross can tell the original type #1362

Closed
Dredhog opened this issue May 6, 2020 · 18 comments
Labels
question Further progress depends on answer from issue creator. rejected Feature does not belong in SPIRV-Cross workaround Workaround needed for bad driver implementations or C++ compilers

Comments

@Dredhog
Copy link
Contributor

Dredhog commented May 6, 2020

We're using DXC to turn the following HLSL:

RWBuffer<uint> g_DispatchIndirectBuffer : register( u0 );

[numthreads(1, 1, 1)]
void CSMain()
{
    uint prevGroupCnt;
    InterlockedAdd(g_DispatchIndirectBuffer[0], 4, prevGroupCnt);
}

Into the following SPIR-V

; SPIR-V
; Version: 1.0
; Generator: Google spiregg; 0
; Bound: 15
; Schema: 0
               OpCapability Shader
               OpCapability SampledBuffer
               OpCapability ImageBuffer
               OpMemoryModel Logical GLSL450
               OpEntryPoint GLCompute %CSMain "CSMain"
               OpExecutionMode %CSMain LocalSize 1 1 1
               OpSource HLSL 630
               OpName %type_buffer_image "type.buffer.image"
               OpName %g_DispatchIndirectBuffer "g_DispatchIndirectBuffer"
               OpName %CSMain "CSMain"
               OpDecorate %g_DispatchIndirectBuffer DescriptorSet 0
               OpDecorate %g_DispatchIndirectBuffer Binding 0
       %uint = OpTypeInt 32 0
     %uint_0 = OpConstant %uint 0
     %uint_4 = OpConstant %uint 4
%type_buffer_image = OpTypeImage %uint Buffer 2 0 0 2 R32ui
%_ptr_UniformConstant_type_buffer_image = OpTypePointer UniformConstant %type_buffer_image
       %void = OpTypeVoid
          %9 = OpTypeFunction %void
%_ptr_Image_uint = OpTypePointer Image %uint
     %uint_1 = OpConstant %uint 1
%g_DispatchIndirectBuffer = OpVariable %_ptr_UniformConstant_type_buffer_image UniformConstant
     %CSMain = OpFunction %void None %9
         %12 = OpLabel
         %13 = OpImageTexelPointer %_ptr_Image_uint %g_DispatchIndirectBuffer %uint_0 %uint_0
         %14 = OpAtomicIAdd %uint %13 %uint_1 %uint_0 %uint_4
               OpReturn
               OpFunctionEnd

and then we use SPIRV-Cross to generate MSL:

#pragma clang diagnostic ignored "-Wunused-variable"

#include <metal_stdlib>
#include <simd/simd.h>
#include <metal_atomic>

using namespace metal;

kernel void CSMain(texture2d<uint> g_DispatchIndirectBuffer [[texture(0)]], device atomic_uint* g_DispatchIndirectBuffer_atomic [[buffer(0)]])
{
    uint _14 = atomic_fetch_add_explicit((device atomic_uint*)&g_DispatchIndirectBuffer_atomic[0u], 4u, memory_order_relaxed);
}

However, it creates a redundant texture2d and also changes the name of the buffer resource by appending _atomic at the end. Both of these could be avoided by doing something similar as done for RWStructuredBuffer, which would produce a regular MSL buffer of uint (not atomic_int) and would only cast to atomic where necessary (seen here). If I use SPIRV-Cross to generate HLSL instead of MSL it will produce the RWBuffer as in the original source:

RWBuffer<uint> g_DispatchIndirectBuffer : register(u0, space0);

void comp_main()
{
    uint _14;
    InterlockedAdd(g_DispatchIndirectBuffer[0u], 4u, _14);
}

[numthreads(1, 1, 1)]
void main()
{
    comp_main();
}

so it seems to have enough context to know that the texture won't be needed. This issue does not seem to be specific to DXC as I get the same output with glslang as seen here. Also to the point about context - using SPIRV-Cross reflection on the compiled MSL reports this resource as a SPVC_RESOURCE_TYPE_STORAGE_IMAGE and a dimensionality of buffer

spvc_type imageType = spvc_compiler_get_type_handle(spvCompiler, textureResource.base_type_id);        
bool isBuffer = textureDimensionality == SpvDimBuffer;  //This is true for RWBuffers and not for RWTextures

The issue seems vaguely related to #950.

However I don't agree with the recommendation that instead of a RWBuffer we should use a RWStructuredBuffer or RWByteAddressBuffer since they produce the desired MSL buffer. This is not an option for us at Unity as our users can write their own shaders and thus the issue can't be fixed at the HLSL level.

@HansKristian-Work
Copy link
Contributor

HansKristian-Work commented May 6, 2020

This is a typed buffer, and not a raw buffer. MSL does not support typed buffers, and thus is emulated as a texture2D. In SPIR-V a typed buffer is expressed an image with buffer dimension, so this is all expected.

There is an option which enables native texture buffer support (minspec is Metal 2.1): https://github.com/KhronosGroup/SPIRV-Cross/blob/master/spirv_msl.hpp#L311

However, even with that, MSL does not support atomics on typed buffers/images, and thus we get the weirdness you see here. Another contributor added the atomic side buffer as a workaround for this problem, but it's not pretty.

I'm not sure what I can do, MSL is making it needlessly difficult to implement this as you'd expect.

The only viable workaround would be to detect this special case and rewrite a texture buffer to raw buffer and deduce that the type is u32/i32 from that.

@HansKristian-Work HansKristian-Work added the question Further progress depends on answer from issue creator. label May 6, 2020
@HansKristian-Work
Copy link
Contributor

Ok, I see another avenue here. The OpTypeImage format is declared as R32UI by DXC, that helps a lot. That might be enough to make this work. However, this can only possibly work for texel buffers, not RWTexture, just in case you were expecting that to work as well.

@HansKristian-Work
Copy link
Contributor

Actually, that won't work either ... Texture buffers in MSL are declared with [[texture()]] and not [[buffer()]]. We'd be silently rewriting the binding type, which breaks API side.

@HansKristian-Work
Copy link
Contributor

Here's an equivalent:

#version 450
layout(local_size_x = 1) in;

layout(set = 0, binding = 0, r32ui) uniform uimageBuffer uImg;

void main()
{
	imageAtomicAdd(uImg, 1, 2u);
}

With --msl-native-texture-buffer I get:

#pragma clang diagnostic ignored "-Wunused-variable"

#include <metal_stdlib>
#include <simd/simd.h>
#include <metal_atomic>

using namespace metal;

constant uint3 gl_WorkGroupSize [[maybe_unused]] = uint3(1u);

kernel void main0(texture_buffer<uint> uImg [[texture(0)]] /* :( */, device atomic_uint* uImg_atomic [[buffer(0)]])
{
    uint _17 = atomic_fetch_add_explicit((device atomic_uint*)&uImg_atomic[1], 2u, memory_order_relaxed);
}

@HansKristian-Work HansKristian-Work added the workaround Workaround needed for bad driver implementations or C++ compilers label May 6, 2020
@Dredhog
Copy link
Contributor Author

Dredhog commented May 6, 2020

From my perspective it's not clear why use a texture resource to emulate the buffer in the first place (and this might be lack of experience with Metal on my part), is it just that it's simpler to translate the SPIR-V that way? For more context, a snippet like this, which uses both atomic and regular writes:

RWBuffer<uint> g_DispatchIndirectBuffer : register( u0 );

#pragma never_use_dxc

[numthreads(1, 1, 1)]
void CSMain()
{
    uint prevGroupCnt;
    InterlockedAdd(g_DispatchIndirectBuffer[0], 4, prevGroupCnt);
    g_DispatchIndirectBuffer[5] += 6;
}

When passed through our existing MSL backend (which does HLSL -> [FXC]-> DXBC -> [HLSLcc] -> MSL, so there's no SPIR-V in the middle) produces the following MSL:

generates this MSL:
#include <metal_stdlib>
#include <metal_texture>
using namespace metal;

#if !(__HAVE_FMA__)
#define fma(a,b,c) ((a) * (b) + (c))
#endif

kernel void computeMain(
    device uint *g_DispatchIndirectBuffer [[ buffer(0) ]])
{
    int u_xlati0;
    atomic_fetch_add_explicit(reinterpret_cast<device atomic_uint *>(&g_DispatchIndirectBuffer[0x0]), 0x4u, memory_order::memory_order_relaxed);
    u_xlati0 = int(g_DispatchIndirectBuffer[(0x5)]);
    u_xlati0 = u_xlati0 + 0x6;
    g_DispatchIndirectBuffer[(0x5)] = uint(u_xlati0);
    g_DispatchIndirectBuffer[(0x5)] = uint(u_xlati0);
    g_DispatchIndirectBuffer[(0x5)] = uint(u_xlati0);
    g_DispatchIndirectBuffer[(0x5)] = uint(u_xlati0);
    return;
}

And I'm not clear why couldn't SPIRV-Cross produce the same/similar (at least in the fact that is would use [[buffer ()]] and not [[texture()]]), given that it can tell that the resource was originally a buffer. We'd highly prefer if we didn't have to change our resource types depending on what compiler backend we're using as shaders from both might need be used together.

@cdavis5e
Copy link
Contributor

cdavis5e commented May 6, 2020

Because again, it's a typed buffer. That means that pixel conversion is applied to the buffer's contents on every load and store. In Vulkan, you would create a VkBufferView and bind it to a descriptor of type STORAGE_TEXEL_BUFFER. In MoltenVK, this is implemented by creating an MTLTexture on the buffer and passing that to the shader; that's why we need a texture2d or texture_buffer parameter here.

@HansKristian-Work
Copy link
Contributor

Either HLSLCC or FXC must be applying some very special workaround here, because that MSL code rewrites the fact that the buffer is typed, and I don't see how that would work in a more interesting case. I'm open to adding support for a similar workaround in SPIRV-Cross, but it would have to be opt-in. E.g. something like, typed buffer with R32UI/I format declared will be treated as a plain buffer, but that's a very narrow workaround and not a general purpose solution.

@aras-p
Copy link

aras-p commented May 7, 2020

I'm open to adding support for a similar workaround in SPIRV-Cross, but it would have to be opt-in. E.g. something like, typed buffer with R32UI/I format declared will be treated as a plain buffer

Opt-in flag to get that behavior would work just fine for our (Unity) use case.

@Dredhog
Copy link
Contributor Author

Dredhog commented May 7, 2020

I'm open to adding support for a similar workaround in SPIRV-Cross, but it would have to be opt-in.

As @aras-p said, having an opt-in would be great for us.

E.g. something like, typed buffer with R32UI/I format declared will be treated as a plain buffer, but that's a very narrow workaround and not a general purpose solution.

You might have already had this in mind, but to be explicit about it - we would need regular Buffer<> types to produce [[buffer()]] bindings as well (not just RWBuffer<> types)

@HansKristian-Work
Copy link
Contributor

HansKristian-Work commented May 7, 2020

All Buffer<> types? How would that work? Now I'm very confused.

E.g. for a Buffer<float4> type, you could bind almost any pixel format for it in the API, and get implicit format conversion through the descriptor when you load a value. There is no way we can work around that with a plain buffer, what kind of code do you emit in hlslcc for that case? Note that we won't know the underlying format in SPIRV-Cross, Buffer<> is a texel buffer, basically a 1D texture. No IR will have any knowledge about any underlying format.

This workaround can only work iff:

  • For RWBuffer, shader seems to declare R32UI/R32I types in the SPIR-V. The SPIR-V you posted does this. This is helpful, because otherwise we would need ReadsStorageImageWithoutFormat, which is not widely supported. I think this makes use of the D3D11 limitation that read/write storage images can only be R32F/R32UI/R32I.
  • The type system could rewrite that type to be a buffer, but it can only meaningfully work for that particular type.

There is also the question of alignment. Texel buffers can support very tight alignment, but buffer objects often don't. Does Metal guarantee support for 4-byte aligned buffer descriptors?

@aras-p
Copy link

aras-p commented May 7, 2020

I think all of this is actually for a fairly narrow case of: original HLSL source has a Buffer or RWBuffer declaration of int or uint type, and only ever does reads/writes into it (maybe even "all of the reads/writes are atomic" even).

Yes that's a very "unorthodox" use case, and arguably the original HLSL source should be rewritten to use structured buffer or byte address buffer. But as it is right now, whoever wrote original HLSL ("anyone can write it!" is an issue we have at Unity :)), would be "oh this works everywhere, except on Metal, Metal must be broken!". Or even more likely, not even test on Metal, only to find out their end users or someone entirely else find that it's broken on Metal.

Yes that's a very "non generic" use case, and would be kind of a gross hack. But a flag to SPIRV-Cross that would essentially allow saying "if you find buffer or rwbuffer with 4-byte integer type that only ever does atomic operations on it, please output that as a buffer and not a texture" would probably cover this specific case.

@HansKristian-Work
Copy link
Contributor

Buffer<> cannot use atomics, it's a read-only type. It should translate perfectly to a texture_buffer. We won't know the underlying format, so I cannot support that anyways. Even if HLSL declares it as Buffer all sampling still happens on float4 types. It will just pick the x component for convenience.

RWBuffer<> with explicitly declared R32UI/R32I types in the SPIR-V is narrow enough that it can be done. However, I assume you do some funky detection logic in engine side to see "This is a typed UAV with R32UI type, let's make this a plain buffer instead of a typed buffer".

@Dredhog
Copy link
Contributor Author

Dredhog commented May 7, 2020

However, I assume you do some funky detection logic in engine side to see "This is a typed UAV with R32UI type, let's make this a plain buffer instead of a typed buffer".

Not exactly, our current backend has the limitation that it tries to produce output from DXBC which essentially has very little type information when compared to SPIR-V from what I've seen. This results in minimal type information being left in the output MSL, the buffer is assumed to be packed in memory as it would be in DirectX and casts are performed on every access

There is no way we can work around that with a plain buffer, what kind of code do you emit in hlslcc for that case?

So for a Buffer<float4> from source like this:

Buffer<float4> roBuffer : register( t0 );
RWBuffer<uint> g_DispatchIndirectBuffer : register( u0 );

[numthreads(1, 1, 1)]
void CSMain()
{
    uint prevGroupCnt;
    InterlockedAdd(g_DispatchIndirectBuffer[0], uint(roBuffer[1].y), prevGroupCnt);
    g_DispatchIndirectBuffer[5] += 6;
}

it would still generate a buffer of uint and output the following:

#include <metal_stdlib>
#include <metal_texture>
using namespace metal;

#if !(__HAVE_FMA__)
#define fma(a,b,c) ((a) * (b) + (c))
#endif

kernel void computeMain(
    const device uint *roBuffer [[ buffer(1) ]],
    device uint *g_DispatchIndirectBuffer [[ buffer(0) ]])
{
    float u_xlat0;
    int u_xlati0;
    uint u_xlatu0;
    u_xlat0 = as_type<float>(roBuffer[(0x1)]);
    u_xlatu0 = uint(u_xlat0);
    atomic_fetch_add_explicit(reinterpret_cast<device atomic_uint *>(&g_DispatchIndirectBuffer[0x0]), u_xlatu0, memory_order::memory_order_relaxed);
    u_xlati0 = int(g_DispatchIndirectBuffer[(0x5)]);
    u_xlati0 = u_xlati0 + 0x6;
    g_DispatchIndirectBuffer[(0x5)] = uint(u_xlati0);
    g_DispatchIndirectBuffer[(0x5)] = uint(u_xlati0);
    g_DispatchIndirectBuffer[(0x5)] = uint(u_xlati0);
    g_DispatchIndirectBuffer[(0x5)] = uint(u_xlati0);
    return;
}

@HansKristian-Work
Copy link
Contributor

So that code translates roBuffer[1].y to as_type<float>(roBuffer[(0x1)]). How can that be correct? It does not take into account texel strides either, I'm somewhat puzzled by this.

@Dredhog
Copy link
Contributor Author

Dredhog commented May 7, 2020

Yes, I just noticed that myself, so as @aras-p said we're most likely only meaningfully using Buffer<uint> in our code and Buffer<float4> seems like an untested case, just tried non zero offset with a Buffer<uint> and it's correct

@HansKristian-Work
Copy link
Contributor

So in that case, a workaround would look something like:

  • Any Buffer<> / UNIFORM_TEXEL_BUFFER or RWBuffer<> / STORAGE_TEXEL_BUFFER, no matter what is treated as being a scalar 32-bit value. Client side guarantees that the only texel formats will only be 32-bit scalar.
  • Any access to such a resource goes through a plain buffer pointer.
  • Certain features like GetResourceDimensions() will be unavailable (it's just a pointer).
  • Robustness is out the window (no resource size information, again, it's just a pointer).

It is still unclear to me why you would need to emit pointer types for Buffer though. There are several mechanisms you could use in Metal to implement this "correctly". The only real problem case I can see where this workaround would have made sense is RWBuffer with atomics, since Metal has no reasonable way to express this without ugly hackery.

@aras-p
Copy link

aras-p commented May 7, 2020

Yeah that sounds like it would work for our case.

But all the discussion above has good points, our own particular (very occasional) usage of Buffer<uint> / RWBuffer<uint> in HLSL code is very much "unusual" and I'm actually surprised that we even use it in that way. Apparently someone did, and it has happened to (by accident) work with the previous compiler toolchain, but now with a "more proper" DXC + SPIRV-Cross one it has uncovered that our usage was kinda wrong. Maybe not worth doing this workaround, and instead we'll try to fix all the offending HLSL source code instead.

@HansKristian-Work
Copy link
Contributor

HansKristian-Work commented May 7, 2020

That would be much appreciated. Working around this in SPIRV-Cross is a last resort and I really don't want to do it, especially if it starts to affect all typed buffers like this. At that point it becomes working around app bugs instead of workaround Metal limitations. A SPIR-V -> SPIR-V transformation is more appropriate if possible, but the best fix is to fix the invalid shader code of course.

I'll mark the issue as rejected for the time being. Can be re-evaluated later.

@HansKristian-Work HansKristian-Work added the rejected Feature does not belong in SPIRV-Cross label May 7, 2020
Dredhog added a commit to Unity-Technologies/Graphics that referenced this issue Jul 30, 2020
*Replace (RW)Buffer resources as these no longer work as before - HLSLcc
would generate buffer bindings, whereas DXC+SPIRV-Cross outputs textures
(see KhronosGroup/SPIRV-Cross#1362 for details)
*Removed explicit register locations in compute shaders as these do not have
any practival meaning AFAIK (only C# graphics API references direct bindings)
*Replace legacy 'COLOR' PS output semantic with 'SV_Target' in cloud shader
Dredhog added a commit to Unity-Technologies/Graphics that referenced this issue Jan 11, 2021
…DXC MSL backend.

*HLSLcc outputs MSL buffer bindings for HLSL's (RW)Buffer type, whereas DXC+SPIRV-Cross outputs
 texture bindings (see KhronosGroup/SPIRV-Cross#1362 for details)
sebastienlagarde added a commit to Unity-Technologies/Graphics that referenced this issue Mar 8, 2021
* Replace (RW)Buffer usages with (RW)StructuredBuffer to work with our DXC MSL backend.

*HLSLcc outputs MSL buffer bindings for HLSL's (RW)Buffer type, whereas DXC+SPIRV-Cross outputs
 texture bindings (see KhronosGroup/SPIRV-Cross#1362 for details)

* Avoid using DXC for tessellation shaders when targeting Metal

* Add sufficient DXC requirements for SubsurfaceScattering.compute

* Add tmeporary DXC bug workaround for using 'min' in array length declarations

* Fix DXC error about missing return values when _DETAIL is undefined.

* Avoid write to global constant to fix DXC compilation error.

* Replace Buffer<> with StructuredBuffer<> to work around DXC MSL limitation

* Add missed Buffer<> to StructuredBuffer<> conversion

* Make sure workaround for 'min()' in array declarations is only used for DXC

* Disable DXC for URP particle shaders which rely on UBO writes.

* Fix usage of raw VFACE semantic which won't link for DXC's MSL backend

* Add link for relevant DXC bug next to min() in array size declaration workaround

* Remove '#pragma require wavebasic' from SSR shader used for DXC testing

* Undo (RW)Buffer to (RW)StructuredBuffer changes in shaders.

* Formatting

Co-authored-by: lukas.taparauskas <lukas.taparauskas@unity3d.com>
Co-authored-by: Sebastien Lagarde <sebastien@unity3d.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further progress depends on answer from issue creator. rejected Feature does not belong in SPIRV-Cross workaround Workaround needed for bad driver implementations or C++ compilers
Projects
None yet
Development

No branches or pull requests

4 participants