Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Memory Pools in D3D12 #11

Open
utterances-bot opened this issue Aug 19, 2022 · 6 comments
Open

GPU Memory Pools in D3D12 #11

utterances-bot opened this issue Aug 19, 2022 · 6 comments

Comments

@utterances-bot
Copy link

GPU Memory Pools in D3D12

https://therealmjp.github.io/posts/gpu-memory-pool/

Copy link

dwulive commented Aug 19, 2022

One thing that is worth mentioning, is that, when copying memory around, the cache is not always your friend.
The cache only helps if the memory is read or written more than once while the memory is in the cache.
When uploading textures, there is a good chance that they will be completely evicted from the cache before they are used.
So in a round about way, what I wanted to point out is that write combined memory is still your friend when doing one time transfers. It might even help to use the non temporal SSE/AVX instructions to avoid reading data into the full cache hierarchy and for good karma on the uncached writes.

Copy link

Vinluo commented Aug 26, 2022

Sorry, this might be a silly question, but what exactly does L0 mean? In my understanding, L0 cache usually refers to the register storage cache. The "demote L1 to L0" mentioned in the article may destroy performance, how should I understand?

Copy link
Owner

Hey @Vinluo! I'm not honestly not sure where that L0/L1 terminology comes from in the case of D3D12 memory pools. It seems completely unrelated to cache hierarchies, so it's a bit unfortunate that it re-uses that same terminology. It's possible that it the naming comes from something internal to the Windows OS, or something along those lines.

When you're dealing with D3D12, it's just important to know that in that particular context L0 is SysRAM And L1 is VRAM, where VRAM can potentially have much higher bandwidth for the GPU. That's why demotion can hurt so badly: you may go from having 500 GB/s down to only 10-12 GB/s after demotion.

Copy link

Hello, I wonder why CPU write performance for SysMem is lower than the the 56GB/s theoretical upper limit?

Copy link

Microsoft has added a feature called GPU Upload Heaps in the Agility SDK that allows you to allocate CPU visible VRAM in a manner similar to the vendor specific extension that you mention.

Copy link

(Hah, and of course I just notice the update at the bottom of the post)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants