Skip to content

Developer guidelines

Philip Rebohle edited this page Aug 26, 2022 · 12 revisions

This page describes the behaviour of DXVK in various D3D11 workloads, and may be useful for developers targeting the Steam Deck. In general, most IHV recommendations and good practices apply to our implementation as well, but performance characteristics may differ in practice.

Note that this page is written for DXVK 2.0 and later. Older versions may behave differently.

Feature support

DXVK supports D3D11 up to Feature Level 12_0.

Optional feature support

DXVK supports the following optional features, provided that the Vulkan driver and GPU support the corresponding features:

  • Tiled Resources: Up to Tier 3.
    • Multisampled tiled resources are not supported on many Vulkan drivers.
    • Sharing tiled resources or tile pools between multiple D3D devices is not supported.
    • Shader feedback on raw or structured buffer loads may be inaccurate if a single load straddles 64k pages.
  • Conservative Rasterization: Up to Tier 2.
    • The corresponding Vulkan functionality is provided by VK_EXT_conservative_rasterization.
  • Exporting Viewport Index and Render Target Array Index from vertex shaders or domain shaders.
    • This is expressed by the VPAndRTArrayIndexFromAnyShaderFeedingRasterizer feature in D3D11.
    • The corresponding Vulkan features are shaderOutputViewportIndex and shaderOutputLayer.
  • Exporting the Stencil Reference from pixel shaders.
    • This is expressed by the PSSpecifiedStencilRefSupported feature in D3D11.
    • The corresponding Vulkan functionality is provided by VK_EXT_shader_stencil_export.
    • Not all hardware can support this.

Unsupported features:

The following core API features are unsupported:

  • Class linkage. Compiling shaders that rely on this feature will fail.
  • Predication. Calls to SetPredication are ignored.
    • Support for predicated draws and dispatches may be implemented in the future, but clears and copy operations will remain unsupported.
    • Prefer using indirect draws or dispatches instead.
  • Target-independent rasterization: In case a render target is bound, the ForcedSampleCount parameter of the current rasterizer state gets ignored.
    • ForcedSampleCount is supported when performing UAV rendering, i.e. rendering without any bound render targets.
    • Most Vulkan drivers do not support 16x MSAA.
  • Video APIs.
    • There is rudimentary support for ID3D11VideoContext::VideoProcessorBlt, and related functionality required for setup.

The following optional features are unsupported:

  • Rasterizer Ordered Views.
    • There is no corresponding Vulkan feature with good driver support.

Shared resources

Shared Resources are partially supported, with the following restrictions:

  • Proton patches are required.
  • IDXGIKeyedMutex is unsupported due to extremely poor documentation.
  • This does not and will never work at all on Windows.

Shaders

Do:

  • Compile all shaders (as in, calling ID3D11Device::Create*Shader) that will be used during gameplay during loading screens or in the game menu. DXVK will start creating Vulkan pipeline libraries or complete Vulkan pipelines in the background.
  • Prefer StructuredBuffer<...> and RWStructuredBuffer<...> over typed buffer views when format conversion is not needed. Structured buffers are more efficient in our implementation, and easier for Vulkan drivers to optimize.
  • Calling ID3D11Device::Create*Shader from one single thread is usually sufficient. DXVK will perform the time-consuming parts of shader compilation on dedicated worker threads.

Avoid:

  • Do not compile shaders "on demand", i.e. just before they are first used in a draw. Doing so will cause stutter that is more severe than on native D3D11 drivers.
  • Do not use stream output. DXVK supports this feature, but it is usually less efficient than compute shaders on modern hardware.
  • Do not use class linkage. DXVK does not support this feature.
  • Avoid writing to UAVs from pixel shaders in a large number of draw calls. Our implementation does not handle this efficiently, and doing so will lead to reduced GPU and CPU performance. In particular, the following situations are bad:
    • Mixing draws which do perform UAV writes with draws that don't. Switching between these modes is inefficient in our implementation and we will insert a barrier.
    • Writing to the same UAV in consecutive draws. D3D11 requires us to insert a barrier between those draws.
  • In shader code, avoid unrolling large loops. Doing so makes it harder for Vulkan drivers to optimize.

Constant buffers

  • Use *SetConstantBuffers1 to bind sub-ranges of a larger constant buffer. This is by far the fastest path on our implementation. Ideally, only map the buffer with MAP_WRITE_DISCARD a few times per frame and write as much data as possible at once, but if that is not viable, using MAP_WRITE_NO_OVERWRITE between draws is still good.
  • When updating constant buffers, MAP_WRITE_DISCARD and UpdateSubresource have similar performance characteristics if the entire buffer is written in both cases.

Resources

Do:

  • Prefer strongly typed image formats for render targets and other high-bandwidth images. Using _TYPELESS formats may negatively affect GPU performance, especially if D3D11_BIND_UNORDERED_ACCESS is set.
  • Prefer initializing read-only textures during creation via D3D11_SUBRESOURCE_DATA rather than using UpdateSubresource after creation. This may avoid a redundant clear and allows us to execute the upload on a dedicated transfer or compute queue.
  • Create render targets and any other high-bandwidth resources early. This makes it more likely for those resources to end up in video memory even when video memory is exhausted.
  • Suballocate from large index and vertex buffers and use the StartIndexLocation and BaseVertexLocation draw parameters to avoid overhead from frequent re-binding.
  • For mapped resources created with D3D11_USAGE_DYNAMIC, always write full cache lines for optimal CPU performance. DXVK will generally allocate these resources in host-visible video memory, especially on systems with Resizeable BAR enabled.

Avoid:

  • Do not use large textures with D3D11_USAGE_DYNAMIC and non-zero bind flags. Partial writes to such resources are inefficient on our implementation.
  • Try not to create many tiny buffers. Doing so may cause memory fragmentation and overhead.
  • Avoid MAP_WRITE on staging resources that are still in use by the GPU. DXVK will try to avoid stalls, but this comes at the cost of both CPU performance and memory usage.
  • NEVER perform a CPU read from a resource that was created without the D3D11_CPU_ACCESS_READ flag. In the worst case, these resources are allocated in VRAM and have to be read back over PCI-E, which can very quickly become a major bottleneck.

Commands

Do:

  • Use GenerateMips rather than a custom render pass to generate mip maps if linear filtering is sufficient for your application.

  • Clear or discard (using DiscardView) render targets when binding them for rendering for the first time within a frame, if the previous contents are no longer needed. This saves CPU work and may enable some driver optimizations compared to clearing at a different time within the frame. Prefer the following pattern:

    context->OMSetRenderTargets(n, rtvs, dsv);
    context->ClearDepthStencilView(dsv, ...);
    context->ClearRenderTargetView(rtvs[0], ...);
    context->ClearRenderTargetView(rtvs[1], ...);
    ...
    context->Draw(...);
  • When using indirect draws with no state changes in between, keep the stride between multiple draw arguments consistent:

    • For DrawIndexedInstancedIndirect, consecutive draw arguments should be 20 bytes apart.
    • For DrawInstancedIndirect, consecutive draw arguments should be 16 bytes apart.

    This way, we can merge consecutive indirect draws into a single vkCmdDrawIndexedIndirect or vkCmdDrawIndirect call and hit fast paths on all hardware.

Avoid:

  • Do not bind a new set of vertex buffers on every draw call. Doing so can become the primary CPU bottleneck.
  • Do not redundantly set state or bind resources multiple times before a draw call. The following example is inefficient on our implementation:
    context->PSSetShaderResources(0, 1, &texture);
    context->Draw(...);
    
    context->PSSetShaderResources(0, 1, nullptr);
    context->PSSetShaderResources(0, 1, &texture);
    context->Draw(...);
  • Do not rely on vendor-specific extensions provided by NVAPI or AMDAGS. These are generally not supported.

Render State

Do:

  • Batch draw calls that use the same set of shaders and render state. Switching Vulkan pipelines is expensive and may affect CPU and GPU performance. The following methods can trigger a pipeline swap:
    • Set*Shader.
    • IASetInputLayout.
    • IASetPrimitiveTopology.
    • OMSetBlendState if the blend state object or the sample mask change. Changing the blend factor is cheaper.
    • OMSetDepthStencilState if the depth-stencil state object changes. Changing the stencil reference is cheaper.
    • OMSetRenderTargets and OMSetRenderTargetsAndUnorderedAccessViews.
    • RSSetState if the FillMode or DepthClipEnable members differ from the currently bound rasterizer state. Only changing CullMode and FrontCounterClockwise is cheaper.
      • Depth bias is more complicated: Given two rasterizer states, if all depth bias members are zero in one state but not in the other, then a pipeline swap is necessary. If depth bias members are non-zero in both, changing the depth bias values is cheaper.

Avoid:

  • For rasterizer states, avoid setting FillMode to D3D11_FILL_WIREFRAME, and avoid setting DepthClipEnable to FALSE. Doing so forces us to compile Vulkan pipelines at draw time, which may cause stutter.

Deferred contexts

Do:

  • Use Deferred Contexts for multithreaded rendering if your application is otherwise bound by its own rendering thread.
  • Keep the number of ID3D11CommandList objects used in a frame reasonably small (~a few dozen), and record at least 50-100 draws into each command list.
  • Set the RestoreContextState parameter to FALSE in both ExecuteCommandList and FinishCommandList.

Avoid:

  • Do not MAP_WRITE_DISCARD the same resource (e.g. constant buffer) on multiple deferred contexts at the same time as this will cause lock contention. Prefer using one dedicated constant buffer per thread.
  • Do not execute the same ID3D11CommandList multiple times per frame. DXVK does not support this by default.
Clone this wiki locally