Skip to content

refactor: resource management, dependency tracking & encoder reordering#51

Merged
3Shain merged 39 commits intomainfrom
refactor/resource-management
Jan 15, 2025
Merged

refactor: resource management, dependency tracking & encoder reordering#51
3Shain merged 39 commits intomainfrom
refactor/resource-management

Conversation

@3Shain
Copy link
Copy Markdown
Owner

@3Shain 3Shain commented Dec 4, 2024

This PR contains complete refactoring on resource management & command translation for one motivation: let DXMT issue Metal commands that's more efficient for Apple GPU (which has TBDR architecture).

Apple provides some guidelines on optimization, and one major aspect is Encoder Coalescing. Typically, when one render encoder storing some attachments is followed by another render encoder that loads the same set of attachments, then both render can be merged as one encoder, saving memory bandwidth for 1 store + 1 load. This PR will make DXMT try to identify encoders that can be coalesced, even for non-trivial cases (e.g. two coalesce-able encoders with another encoder that has no data dependency with others in middle). That means, sometime DXMT will change the order of encoders, if it's beneficial and the ultimate effect is the same.

Before this PR, all D3D11 commands are firstly written to DXMT internal command ring buffer, and when Present(), Flush() or any synchronization happens, the ring buffer then will be executed by a dedicated thread that encodes actual Metal commands. However, in this PR a secondary internal command buffer is introduced. Any commands written in the original primary command buffer have to be executed in original order and populate secondary command buffer that can be executed out-of-order. In the mean time all dependency and residency information are collected and eventually feed into an algorithm that re-order and identify coalesce-able encoders (by the way, even the current optimization algorithm is very powerful, it is still not in its final form yet: there is still room for improvement. Thus we omit the details here).

But how do we know if two encoders have data dependency to determine a change of order is possible? Of course we need to know if there is a common element in lists of read & written resources. However enumerating and comparing resource lists sounds like a no-go since the time-complexity is $O(n^2)$. Thankfully we have a very powerful tool: Partitioned Bloom Filter. You may have heard of Bloom Filter that can tell you if one element is in a set immediately with a small probability of false positive. Partitioned Bloom Filter is an enhanced version that can be used to test if two sets have any intersection in constant time. Although false positive is still possible, the probability can be controlled. And remember what we are doing is optimization, even if we get false positive, we lose nothing. And the fact that take union of Partitioned Bloom Filter is also a constant-time operation makes it even better.

Maybe there is an even better solution that uses bitset to represent fences, although that is not in the scope of this PR

The capability of encoder reordering is not the only benefit that a secondary command buffer gives. It also simplifies implementation of Deferred Context and in fact eliminates a drawback of Deferred Context: a game uses Deferred Context tends to render different parts of the same scene in different threads, there a lot of coalesce-able encoders are created, but that's not a problem any more. Another optimization unlocked by secondary command buffer is resource renaming: when a resource is fully cleared/discarded, a fresh resource of the same descriptor is created or allocated form a pool instead. It simplifies the dependency between encoders which is not only good for GPU parallelism, but also helps our reordering algorithm to detect more potential optimizations.

Overall this PR provides performant boost for devices with limited memory bandwidth (typically non-Max/Ultra chips) and a solid ground for further optimization.

@3Shain 3Shain changed the title WIP WIP: refactor: resource management, dependency tracking & encoder reordering Dec 4, 2024
This commit temporarily breaks deferred context
@3Shain 3Shain force-pushed the refactor/resource-management branch from 42f3ae1 to 6a91256 Compare December 5, 2024 04:42
@3Shain 3Shain marked this pull request as ready for review January 2, 2025 09:27
@3Shain 3Shain changed the title WIP: refactor: resource management, dependency tracking & encoder reordering refactor: resource management, dependency tracking & encoder reordering Jan 13, 2025
@3Shain 3Shain merged commit 1e9b9e9 into main Jan 15, 2025
@3Shain 3Shain mentioned this pull request Apr 23, 2025
@3Shain 3Shain mentioned this pull request Jun 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant