Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

D3D12 Bindless Compute PSO replay crash #2981

Closed
yunmnmn opened this issue Jul 14, 2023 · 2 comments
Closed

D3D12 Bindless Compute PSO replay crash #2981

yunmnmn opened this issue Jul 14, 2023 · 2 comments
Labels
Bug A crash, misbehaviour, or other problem Unresolved Waiting for a fix or implementation

Comments

@yunmnmn
Copy link

yunmnmn commented Jul 14, 2023

Description

Since yesterday, I've been getting a crash when I click on a ComputePSO Dispatch in D3D12. The shader uses ResourceDescriptorHeap[] to access resources. If I don't access resources through ResourceDescriptorHeap[], no crash occurs.

renderdoc_crash

I happen to have an old capture (captured prior to when the error occurred), and tried to view the same Dispatch pass, but the crash also occurred there. It seems that the crash isn't dependent on the RenderDoc version or capture.

On my secondary machine everything works fine with the latest stable RenderDoc builds (v1.26 & v1.27)

I dipped my toe into RenderDoc's source to hopefully identify the issue(all code references are from the latest v1.x branch):

  • IDxcValidator outputs validation warnings and errors in ~ProgramEditor() (dxil_bytecode_editor.cpp, line 103)
dxil_bytecode_editor.cpp( 188) - Warning - DXIL validation failed: error: Validator version in metadata (1.7) is not supported; maximum: (1.6).
dxil_bytecode_editor.cpp( 188) - Warning - error: Named metadata '<null>' is unknown.
dxil_bytecode_editor.cpp( 188) - Warning - Function: dx.op.binary.i32: error: External function 'dx.op.binary.i32' is unused.
dxil_bytecode_editor.cpp( 188) - Warning - Validation failed.
  • The "error: Named metadata '<null>' is unknown" caught my attention. When looking into which functions add entries to m_NamedMeta, there is a helper function (ProgramEditor::CreateNamedMetadata()) that adds entries without setting the name
  • Setting the name resolved the issue for me locally, and the validation error is also gone:
(dxil_bytecode_editor.cpp, line 352)
NamedMetadata *ProgramEditor::CreateNamedMetadata(const rdcstr &name)
{
  for(NamedMetadata *m : m_NamedMeta)
    if(m->name == name)
      return m;

  m_NamedMeta.push_back(new(alloc) NamedMetadata);

  // <edit
  m_NamedMeta.back()->name = name;
  // edit>
  return m_NamedMeta.back();
}

I'm unsure if this is a proper fix, or just luck. The opcode injection is pretty involved.

Steps to reproduce

  1. Create a bindless Compute PSO with a shader that accesses resources through ResourceDescriptorHeap[]
  2. Take a capture and replay
  3. Click on the Compute PSO Dispatch

I can provide a capture if necessary

Environment

  • RenderDoc version: v1.24, v1.26, v1.27 and local build
  • Operating System: Windows 11 (Nvidia driver version: 536.40, latest as of writing)
  • Graphics API: D3D12 (with DX12 Agility (tried with 600 & 610), Nvidia RTX 3070 and AMD RX 580)
@Zorro666 Zorro666 added Bug A crash, misbehaviour, or other problem Unresolved Waiting for a fix or implementation labels Jul 14, 2023
@baldurk
Copy link
Owner

baldurk commented Jul 17, 2023

I do have a test that accesses resources through ResourceDescriptorHeap[] which doesn't hit this issue, but looking at it I suspect that's because there are some normal cbuffers used so the dx.resources metadata exists already.

Your investigation looks bang on to me and the fix is easily correct, the bug definitely present before. Since this works for you I'm happy to commit it without a direct repro. In future I might need to expand the test cases to include a "pure" direct heap access shader with no normal resources at all to stress this path.

@yunmnmn
Copy link
Author

yunmnmn commented Jul 18, 2023

That makes a lot of sense. Thanks for the quick response!

thisisjimmyfb pushed a commit to thisisjimmyfb/renderdoc_public that referenced this issue Jul 21, 2023
CamilliaX pushed a commit to CamilliaX/renderdoc that referenced this issue Aug 5, 2023
Allow a little latitude on expected line numbers in callstacks test

Protect against invalid socket data potentially causing crashes

Add test of Set*Root32BitConstants with 0 constants being set

* This crashes on nvidia, on compute only strangely, if we pass in a NULL
  pointer along with a 0 number of constants.

Expand NULL-protection for Set*Root32BitConstants

Don't allow tooltip to display if context menu is shown in shader viewer

Add "used" attribute to prevent exported symbol dead-stripping

Ideally would use "retain" but that would require upgrading to recent compilers in CI and the project.

__attribute__((retain)) function/variable to prevent linker garbage
collection.

Ensure image layout is up to date mid-command buffer in pixel history

* Previously we were obtaining the image layout from before the current command
  buffer was recorded, but this is out of date if the image layout has changed
  during the command buffer. Worst case it could be e.g. UNDEFINED if the image
  was created and not used until the current command buffer.

Add documentation for return type

Add missing documentation reference

Remove unimplemented function declarations

Add extra pings to reduce problems with unreasonably slow adb commands

Don't sanitise selected paths from remote hosts. Closes baldurk#3006

* If the remote host disconnects during the selection process we will no longer
  have a valid connection, we shouldn't sanitise the resulting path according to
  local filenames.

Tidy up and implement support for self capture on linux

Handle OpExecutionModeId in SPIR-V reflection

Bump version to v1.29

Handle non-square checkboxes : adjust to make them square

Extension of 7452e29

Change RenderDoc style checkboxes to use checks for low contrast themes

Include InidicatorCheckBox style to match style change in 7452e29

Vulkan PostVS ignore primmitive restart indexes when restart is active

Compute primitive restart using VulkanRenderState not pipeline, to account for dynamic state.
Ignore primitive restart indexes, handles primitive restart indexes being used when there are no active vertex buffer bindings

Fix branch specified in CI badge in README

Make gcc happier with escaped \\ in comments

Hide invalid shader resources

Fixed the problem that showing invalid shader resource in pipeline state viewer since v1.24

Revert "Hide invalid shader resources"

This reverts commit e62b6fa.

Hide invalid shader resources on D3D12

Change RenderDoc style checkboxes to use checks for low contrast themes

* Deliberately low contrast themes like the dark theme can have issues showing a
  box that is filled vs just a plain rectangle which is harder to read at a
  glance. Changing the fill to an X icon gives extra readability on the dark
  theme.

Fix capture comments styling of links

Remove explicit styling span in settings dialog, links style themselves

Ensure D3D12 barrier API is kept happy by matching access to layout

Fix disassembly for arrays of pointers having wrong type declaration

Account for arrays of pointers when calculating buffer offsets

Use correct size for basic types when calculating cbuffer sizes

avoid get color_read_* with depth format

Fix Android Vulkan capturing

Add __attribute__((visibility("default"))) to VK_LAYER_EXPORT on Android.
It used to be defined in the Vulkan headers and changed when the update to the latest Vulkan headers was integrated in 5118f08

GL Pixel History small code clean up in CalculateFragmentDepthTests

Move constant out of inner loop.
Remove if test for depth test being enabled and set the depth function to GL_ALWAYS if the depth test is disabled.

GL Pixel History set preMod.depth before call to QueryPostModPerFragment

Copy the postMod depth to next history's preMod depth.
The preMode depth is used to prime the depth buffer in QueryPostModPerFragment.

Use GL_ANY_SAMPLES_PASSED instead of GL_SAMPLES_PASSED Closes baldurk#2972

GL_SAMPLES_PASSED is not supported by GLES.
The pixel history test is already using a 1x1 scissor test, GL_ANY_SAMPLES_PASSED should be the same as GL_SAMPLES_PASSED.
The pixel history logic is only testing for non-zero passing samples, the precise value is not used.

Workaround nVidia crash in SetGraphicsRoot32BitConstants replay

When Num32BitValuesToSet = 0, set valid dummy pointer for pSrcData.
nVidia driver crashes if pSrcData is NULL.

Add Qt:WindowStaysOnTopHint to resource preview Closes baldurk#2971

Qt on Ubuntu 23.04 includes qt/qtbase@f9e4402ffe, which during show() under certain scenarios (for example Qt::ToolTip windows) would destroy and recreate the xcb window.

Adding Qt:WindowStaysOnTopHint to the window flags for ToolTip windows prevents the xcb window from being destroyed and recreated during show().

Vk tests ignore default pipeline in headless mode

Fixes crash trying to launch VK_Compute_Only

Use innertype id to look up image type. Closes baldurk#2970

Fixes incorrect shader debugging for ImageFetch operation on arrays on buffers (and certain scenarios of arrays of textures).

Set name on newly created named metadata. Closes baldurk#2981 (credit Yun)

Add missing ")" R11G11B10 GetBufferFormatString()

"[[packed(r11g11b10]] float3" -> "[[packed(r11g11b10)]] float3"

There was a missing ")"

Test null DSV in D3D12::GetRenderOutputSubresource

Prevents a crash when refreshing the overlay for am unbound Texture resource and the pipeline state does not have a bound DSV.

Related to 742e3de which changed the internal behaviour of FillResourceView().
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug A crash, misbehaviour, or other problem Unresolved Waiting for a fix or implementation
Projects
None yet
Development

No branches or pull requests

3 participants