refactor: resource management, dependency tracking & encoder reordering by 3Shain · Pull Request #51 · 3Shain/dxmt

3Shain · 2024-12-04T02:31:27Z

This PR contains complete refactoring on resource management & command translation for one motivation: let DXMT issue Metal commands that's more efficient for Apple GPU (which has TBDR architecture).

Apple provides some guidelines on optimization, and one major aspect is Encoder Coalescing. Typically, when one render encoder storing some attachments is followed by another render encoder that loads the same set of attachments, then both render can be merged as one encoder, saving memory bandwidth for 1 store + 1 load. This PR will make DXMT try to identify encoders that can be coalesced, even for non-trivial cases (e.g. two coalesce-able encoders with another encoder that has no data dependency with others in middle). That means, sometime DXMT will change the order of encoders, if it's beneficial and the ultimate effect is the same.

Before this PR, all D3D11 commands are firstly written to DXMT internal command ring buffer, and when Present(), Flush() or any synchronization happens, the ring buffer then will be executed by a dedicated thread that encodes actual Metal commands. However, in this PR a secondary internal command buffer is introduced. Any commands written in the original primary command buffer have to be executed in original order and populate secondary command buffer that can be executed out-of-order. In the mean time all dependency and residency information are collected and eventually feed into an algorithm that re-order and identify coalesce-able encoders (by the way, even the current optimization algorithm is very powerful, it is still not in its final form yet: there is still room for improvement. Thus we omit the details here).

But how do we know if two encoders have data dependency to determine a change of order is possible? Of course we need to know if there is a common element in lists of read & written resources. However enumerating and comparing resource lists sounds like a no-go since the time-complexity is $O(n^2)$. Thankfully we have a very powerful tool: Partitioned Bloom Filter. You may have heard of Bloom Filter that can tell you if one element is in a set immediately with a small probability of false positive. Partitioned Bloom Filter is an enhanced version that can be used to test if two sets have any intersection in constant time. Although false positive is still possible, the probability can be controlled. And remember what we are doing is optimization, even if we get false positive, we lose nothing. And the fact that take union of Partitioned Bloom Filter is also a constant-time operation makes it even better.

Maybe there is an even better solution that uses bitset to represent fences, although that is not in the scope of this PR

The capability of encoder reordering is not the only benefit that a secondary command buffer gives. It also simplifies implementation of Deferred Context and in fact eliminates a drawback of Deferred Context: a game uses Deferred Context tends to render different parts of the same scene in different threads, there a lot of coalesce-able encoders are created, but that's not a problem any more. Another optimization unlocked by secondary command buffer is resource renaming: when a resource is fully cleared/discarded, a fresh resource of the same descriptor is created or allocated form a pool instead. It simplifies the dependency between encoders which is not only good for GPU parallelism, but also helps our reordering algorithm to detect more potential optimizations.

Overall this PR provides performant boost for devices with limited memory bandwidth (typically non-Max/Ultra chips) and a solid ground for further optimization.

MetalFX spatial upscaling is broken in some cases, particually if the swapchain is created with sRGB format no ci

This commit temporarily breaks deferred context

…tachment

…apchain description

[no ci]

Replaced a bunch of com_cast with static_cast or reinterpret_cast

3Shain added 6 commits November 24, 2024 23:04

refactor: unify non-staging buffer implementations

727e1c7

refactor: dynamic texture implementation

afecc58

refactor: device texture implementation

d937c8c

refactor: unify device texture view implementation

23beabb

!refactor: swapchain & backbuffer implementation

c608e62

MetalFX spatial upscaling is broken in some cases, particually if the swapchain is created with sRGB format no ci

refactor: unify command list implementation

2ad56f1

3Shain changed the title ~~WIP~~ WIP: refactor: resource management, dependency tracking & encoder reordering Dec 4, 2024

!refactor: move argument encoding to command encoding thread

6a91256

This commit temporarily breaks deferred context

3Shain force-pushed the refactor/resource-management branch from 42f3ae1 to 6a91256 Compare December 5, 2024 04:42

3Shain added 19 commits December 5, 2024 15:33

fix: corner cases of visibility result readback

42e0c49

refactor: deferred context

bd78d23

fix(d3d11): dirty vertex buffer when vertex shader is set

963f2cf

chore(winemetal): expose objc_msgSend_stret

13c5755

fix(dxmt): update d24s8 format flags

a704124

refactor: prepare for dependency collection

91d63c9

chore(d3d11): remove unused swapchain resource

f7c3dcd

chore: collect render target count in render encoder data

d393e41

refactor(dxmt): encoder reordering & merging algorithm

b4b8ce9

feat(dxmt): dependency tracking

9cb859a

fix(d3d11): incorrect resolve destination

b1d6362

fix(dxmt): ensure 0-th element is visited in a reversed for-loop

be13e69

fix(dxmt): avoid coalescing store-clear operation on depth-stencil at…

9b5c401

…tachment

feat: render target renaming

9a57b7d

fix(d3d11): don't get window size when it's explicitly provided in sw…

dcabcc9

…apchain description

fix(dxmt): don't check empty attachment load/store op

86fabf6

fix(d3d11): rename a render target only when all planars are cleared

f591b19

chore: remove some unused code

56e18c7

[no ci]

refactor: deprecate IMTLBindable

b942867

Replaced a bunch of com_cast with static_cast or reinterpret_cast

3Shain marked this pull request as ready for review January 2, 2025 09:27

refactor: manage dynamic buffer explicitly on immediate context

f42ed1c

3Shain added 11 commits January 9, 2025 22:04

refactor: improve dynamic buffer performance on deferred context

c7a1194

refactor: improve query handling on deferred context

8cd5209

refactor: deprecate IMTLDynamicBuffer

27fe260

chore: allocate smaller staging buffer on deferred context

5ca2ddb

refactor(d3d11): improve Map() implementation consistency

b1a5f71

Merge branch 'main' into refactor/resource-management

7c90f28

fix(dxmt): fix a potential memory leak

f67edc3

refactor: deprecate IMTLD3D11Staging

2357ed5

refactor: support metalfx spatial upscaler

6c5461e

refactor: implement event query via MTLSharedEvent

bcab7f1

chore(d3d11): respect command list flush promotion

2e4db4d

3Shain changed the title ~~WIP: refactor: resource management, dependency tracking & encoder reordering~~ refactor: resource management, dependency tracking & encoder reordering Jan 13, 2025

refactor: support sampler lod bias

9ab39ef

3Shain merged commit 1e9b9e9 into main Jan 15, 2025

3Shain mentioned this pull request Apr 23, 2025

Unix Call #65

Merged

3Shain mentioned this pull request Jun 25, 2025

Resource Synchronization #73

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: resource management, dependency tracking & encoder reordering#51

refactor: resource management, dependency tracking & encoder reordering#51
3Shain merged 39 commits intomainfrom
refactor/resource-management

3Shain commented Dec 4, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

3Shain commented Dec 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

3Shain commented Dec 4, 2024 •

edited

Loading