Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Turnip] Dark Souls 3 and some other games do OOB access to array in uniforms causing geometry corruption #3861

Closed
werman opened this issue Feb 20, 2024 · 11 comments
Labels

Comments

@werman
Copy link
Contributor

werman commented Feb 20, 2024

In Dark Souls 3 and a few other games (probably at least "Sekiro: Shadows Die Twice" and "Final Fantasy Type-0 HD") geometry is corrupted on Turnip. The cause is the out-of-bounds access to constant buffer.

layout(set = 2, binding = 0, std140) uniform cb4_t
{
    vec4 m[34];
} cb4;

is dynamically access e.g. at cb4.m[39].x, which is beyond the array length.

However the bound descriptor size is much larger and game expects such access to result in a correct values from that descriptor. This is out of scope for robustBufferAccess2, since it deals with accesses beyond descriptor size, not OOB for arrays.

There was a discussion at https://gitlab.khronos.org/Tracker/vk-gl-cts/-/issues/2649 and the result is that the case above will yield undefined behavior since SPIR-V spec says:

If indexing into a vector, array, or matrix, with the result type being a logical pointer type, causes undefined behavior if not in bounds.

I looked at d3d11 draw calls the game makes and DXVK did translate what game does correctly.

As for why Turnip works the way it works: we lower UBOs into shader consts for faster access, if shader tells us there is 34*vec4 data in UBO we may upload all that data to consts and for indirect access - do the indirect access into consts file.

What I want from this issue? To confirm that I'm not missing anything. And whether there could be a reasonable workaround outside of Turnip. In Turnip we could disable UBO optimization for indirect access to UBOs (I verified that it resolves the issue for Dark Souls 3 and a user checked that it helps Final Fantasy Type-0 HD, but I haven't checked other presumably affected games). In DXVK, I'm not sure it could have nicer workaround in DXVK...

Software information

Dark Souls 3

System information

  • GPU: Adreno a7xx
  • Driver: Turnip
  • Wine version:
  • DXVK version: 2.3

Apitrace file(s)

RenderDoc D3D11 capture: https://drive.proton.me/urls/XNXXYWQZMG#CYZmf1LaXFee

You need draw call EID 2411 DrawIndexedInstanced(7794, 61), there in VS see how cb4 is accessed.

@turol
Copy link

turol commented Feb 20, 2024

#405

@werman
Copy link
Contributor Author

werman commented Feb 20, 2024

#405

Upon carefully re-reading the issue again, yeah that's it

doitsujin: The main issue here is that D3D11 has defined out-of-bounds access behaviour for constant buffer reads, whereas Vulkan just assumes that you stay within the bounds. My previous workaround implemented bound-checking using the constant buffer length specified in the shader, but you can actually read past that as long as you stay within the bound buffer range. Dishonored 2 relies on that and breaks as a result with that workaround enabled.

Though from d3d11 spec "7.5 Constant Buffers" I see:

If the constant buffer bound to a slot is larger than the size declared in the shader for that slot, implementations are allowed to return incorrect data (not necessarily 0) for indices that are larger than the declared size but smaller than the buffer size.

However, games rely on constants being valid even when access beyond declared in shader range...

@K0bin K0bin added the d3d11 label Feb 20, 2024
@werman
Copy link
Contributor Author

werman commented Feb 20, 2024

Would it be fair to say that the issue is widespread enough for per-game workarounds to be infeasible? At the moment I at least know of Dark Souls 3, Sekiro, Final Fantasy Type-0 HD, Ultrakill, Dishonored 2. Such list tells me there are much more of them.

@K0bin
Copy link
Collaborator

K0bin commented Feb 20, 2024

if shader tells us there is 34*vec4 data in UBO we may upload all that data to consts and for indirect access - do the indirect access into consts file.

Would it be possible to get the size to copy from the bound buffer descriptor?

@werman
Copy link
Contributor Author

werman commented Feb 20, 2024

if shader tells us there is 34*vec4 data in UBO we may upload all that data to consts and for indirect access - do the indirect access into consts file.

Would it be possible to get the size to copy from the bound buffer descriptor?

The bound descriptor size is only known at draw call time. But the lowering and calculations of const space is done during shader compilation. So no.

The best workaround we think of is to not to optimize indirect access into UBOs and go with proper load instruction. It's not catastrophic, just sad.

@doitsujin
Copy link
Owner

doitsujin commented Feb 22, 2024

Would it be fair to say that the issue is widespread enough for per-game workarounds to be infeasible? At the moment I at least know of Dark Souls 3, Sekiro, Final Fantasy Type-0 HD, Ultrakill, Dishonored 2. Such list tells me there are much more of them.

The shader-defined array size is indeed entirely meaningless in D3D, regardless of what the spec says. All desktop drivers do bound checking on a descriptor level, and games tend to rely on this because declaring UBO arrays with a true upper bound can dramatically increase FXC compile times compared to declaring a size of 1 (or at least something small).

Similarly, some games do provide an upper bound in the shader but will not actually bind the entire range.

The best we could do on our end is declare all dynamically UBOs with a size of 4096 vec4s (i.e. 65536 bytes).

@werman
Copy link
Contributor Author

werman commented Feb 22, 2024

Would it be fair to say that the issue is widespread enough for per-game workarounds to be infeasible? At the moment I at least know of Dark Souls 3, Sekiro, Final Fantasy Type-0 HD, Ultrakill, Dishonored 2. Such list tells me there are much more of them.

The shader-defined array size is indeed entirely meaningless in D3D, regardless of what the spec says. All desktop drivers do bound checking on a descriptor level, and games tend to rely on this because declaring UBO arrays with a true upper bound can dramatically increase FXC compile times compared to declaring a size of 1 (or at least something small).

Similarly, some games do provide an upper bound in the shader but will not actually bind the entire range.

Could games statically access uniform out of declared in the shader bounds? Or only dynamically? Because all instances of the issue I saw were dynamically indexed OOB accesses and my current workaround prevents optimization only for them.

Also, D3D9 is no different here, right, the workaround is needed for D3D9-11?

What about D3D12?

@doitsujin
Copy link
Owner

D3D12 is basically the same as D3D11 in this regard, but the binding model itself is very different in that it is bindless only. There is no way to know the bound descriptor at draw time.

As for D3D9, shader constants in that API are all sorts of special: UBOs don't exist in that API, instead we have a fixed-size pool for float constants, another fixed-size pool for integer constants, and a small number of bool constants which we map to a bit mask. At least the float pool can be dynamically indexed. I think we declare an upper bound for the size, but we heavily rely on descriptor-level robustness in order to keep the amount of data we need to upload for each draw manageable. So you might see something like vec4 float_constants[1024]; in the shader, but we'll only bind the first 256 bytes or something if we know that all the other data is zero anyway. This is essentially the only way to get any sort of real-time performance out of games that use dynamic indexing in D3D9.

Could games statically access uniform out of declared in the shader bounds?

Technically no, FXC errors out when trying to do that, but that will not stop people (esp. certain mods, cough ENB) from patching DXBC binaries in extremely broken ways.

That said, if we do ever run into OOB access with fully static indices, we can fix that up in DXVK as well.

In case you missed it since I just edited it into my comment roughly one second before your reply popped up:

The best we could do on our end is declare all dynamically indexed UBOs with a size of 4096 vec4s (i.e. 65536 bytes).

@K0bin
Copy link
Collaborator

K0bin commented Feb 22, 2024

As for D3D9, shader constants in that API are all sorts of special: UBOs don't exist in that API, instead we have a fixed-size pool for float constants, another fixed-size pool for integer constants, and a small number of bool constants which we map to a bit mask. At least the float pool can be dynamically indexed. I think we declare an upper bound for the size, but we heavily rely on descriptor-level robustness in order to keep the amount of data we need to upload for each draw manageable. So you might see something like vec4 float_constants[1024]; in the shader, but we'll only bind the first 256 bytes or something if we know that all the other data is zero anyway. This is essentially the only way to get any sort of real-time performance out of games that use dynamic indexing in D3D9.

Exactly. D3D9 devices can either be created with hardware vertex processing or software vertex processing.
DXVK implements both the same way and runs the vertex shader on the GPU. However, software vertex processing dramatically increases the amount of float constants you have available. In hardware mode, you're limited to 256 vec4s. In software mode, this is increased to 8192. So in the early days of D9VK we used to copy 8192 * 16 bytes for every draw (+ some more integer & boolean constants).

Nowadays we only copy the range of constants that the game actually changes and rely on robustness2 to return 0 for the others. This is basically necessary for playable performance in SWVP games but it's a nice optimization for all other games too.
The arrays inside the shader do get the upper bound of 8192 or 256 as their length.

@cwabbott0
Copy link
Contributor

I think that declaring all dynamically indexed UBOs with a size of 4096 vec4s in DXVK seems like the right thing to do. It'll do basically the same thing as the workaround we have in turnip, it'll only apply to DX11 and not DX9 (we can actually prefetch all 256 vec4s to constants in hardware mode in some cases and the workaround in turnip disables that), and other drivers mostly ignore the size AFAIK.

@werman
Copy link
Contributor Author

werman commented Feb 23, 2024

@cwabbott0 Agree, declaring upper bound for indirectly accessed uniforms would be a good solution and additionally a spec conformant one, so in the future other driver for non-desktop HW won't run into the same issue.

temeraire-cx pushed a commit to chaotic-cx/mesa-mirror that referenced this issue Feb 23, 2024
Some D3D11 games rely on out-of-bounds indirect UBO loads to return
real values from underlying bound descriptor. This workaround would
prevent us from lowering indirectly accessed UBOs to consts.

Later DXVK would declare dynamically indexed uniforms with upper
size bound, to make the accesses spec compliant. But for now
we need our own workaround.

Known affected games:
- Dark Souls 3
- Sekiro: Shadows Die Twice
- Final Fantasy Type-0 HD
- Ultrakill
- Dishonored 2

DXVK discussions:
- doitsujin/dxvk#405
- doitsujin/dxvk#3861

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27727>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants