Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DX12 video backend #3364

Merged
merged 4 commits into from Feb 15, 2016
Merged

DX12 video backend #3364

merged 4 commits into from Feb 15, 2016

Conversation

hdcmeta
Copy link
Contributor

@hdcmeta hdcmeta commented Dec 20, 2015

This pull request contains a new DX12 video backend, as well as two minor additions to the shared Render interfaces that were needed for performance/cleanliness reasons.

This backend provides decent performance gains on graphics-intensive titles, and is fairly conformant to the existing D3D11 backend (no known issues). Performance results from various people are posted here: https://forums.dolphin-emu.org/Thread-unofficial-dolphin-dx12-backend

This does not yet implement the GPU bounding box, XFB rendering, or performance query features. The only known correctness issue is with multisampling on AMD hardware, this doesn't repro on other hardware or the software Basic Display Driver.

This backend is based off the existing D3D11 backend, and porting changes between them is fairly trivial (this pull request contains all of the recent changes to the D3D11 backend). The main areas of divergence are resource binding and state management. All shaders are identical, maybe these could be factored into D3DCommon?

The backend also contains its own background thread to perform graphics processing. This is actually done automatically by the D3D11 and OpenGL graphics drivers, but is needed to be done manually on D3D12. This is done by writing a class that implements the ID3D12GraphicsCommandList interface (the equivalent of ID3D11DeviceContext in D3D12), so it doesn't 'pollute' any of the renderer code - its use can be disabled via a #define in d3d12base.h.

Interested in any feedback on the code or architecture, thanks. Also please let me know if the pull request should be factored any differently.

Review on Reviewable

HINSTANCE hD3DDll12 = nullptr;
int d3d12_dll_ref = 0;

namespace D3D

This comment was marked as off-topic.

@MayImilae
Copy link
Contributor

With the feature freeze planned for January 7th, whether this is approved for merging or not, I think merging should be off the table until after 5.0... A new experimental backend would be impossible to iron out and test enough to get to stable standards in time. :/ It would be a great post-5.0 feature though!

D3DBlob::D3DBlob(unsigned int blob_size, const u8* init_data) : ref(1), size(blob_size), blob(nullptr)
{
data = new u8[blob_size];
if (init_data) memcpy(data, init_data, size);

This comment was marked as off-topic.

@hdcmeta
Copy link
Contributor Author

hdcmeta commented Feb 4, 2016

Review status: 21 of 66 files reviewed at latest revision, 57 unresolved discussions.


Source/Core/VideoBackends/D3D12/D3D12.vcxproj, line 104 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/D3DBase.cpp, line 921 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/D3DBase.cpp, line 923 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/D3DBase.cpp, line 955 [r7] (raw file):
If the swap chain happens occupies the entire screen (e.g. windowed fullscreen), I think the swap chain is scanned out directly instead of a composition pass. If it is windowed, composition is needed of course. This is described here, in the second bullet: https://msdn.microsoft.com/en-us/library/windows/hardware/dn653329(v=vs.85).aspx


Source/Core/VideoBackends/D3D12/D3DCommandListManager.cpp, line 19 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/D3DCommandListManager.cpp, line 63 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/D3DCommandListManager.cpp, line 68 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/D3DCommandListManager.cpp, line 267 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/D3DCommandListManager.cpp, line 301 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/D3DCommandListManager.h, line 49 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/D3DCommandListManager.h, line 56 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/D3DCommandListManager.h, line 82 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/D3DCommandListManager.h, line 90 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/D3DDescriptorHeapManager.cpp, line 44 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/D3DDescriptorHeapManager.cpp, line 171 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/D3DQueuedCommandList.cpp, line 34 [r7] (raw file):
I see, that is much cleaner, thanks


Source/Core/VideoBackends/D3D12/D3DQueuedCommandList.cpp, line 497 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/D3DShader.cpp, line 28 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/D3DShader.h, line 9 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/D3DState.cpp, line 29 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/D3DState.cpp, line 30 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/D3DState.cpp, line 466 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/D3DState.cpp, line 472 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/D3DStreamBuffer.cpp, line 95 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/D3DTexture.cpp, line 116 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/D3DUtil.cpp, line 680 [r7] (raw file):
It is certainly possible to get rid of this lookup, but the calling functions would need to know which PSO they wanted to use, so we'd have to hardcode that in. The current design is a little more flexible, the calling functions only need to pass parameters they care about.

There is neglible perf/memory overhead of this approach, and I think it is cleaner code. If there is a cleaner way to do it then I'm in favor, but I think the number of permutations is too high to justify.


Source/Core/VideoBackends/D3D12/D3DUtil.h, line 24 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/FramebufferManager.cpp, line 64 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/FramebufferManager.cpp, line 80 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/Render.cpp, line 558 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/Render.cpp, line 652 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/ShaderCache.cpp, line 363 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/ShaderCache.cpp, line 372 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/ShaderCache.cpp, line 381 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/ShaderConstantsManager.cpp, line 38 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/Television.cpp, line 15 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/TextureCache.cpp, line 24 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/TextureEncoder.h, line 15 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/TextureEncoder.h, line 22 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/VertexManager.h, line 43 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/VideoBackend.h, line 28 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/XFBEncoder.cpp, line 16 [r7] (raw file):
Done.


Comments from the review on Reviewable.io

@stenzek
Copy link
Contributor

stenzek commented Feb 4, 2016

Reviewed 3 of 34 files at r8, 5 of 22 files at r9.
Review status: 29 of 66 files reviewed at latest revision, 49 unresolved discussions.


Source/Core/VideoBackends/D3D12/D3DTexture.cpp, line 134 [r9] (raw file):
There is still a potential race on the branch. You would want to do if (m_ref.fetch_sub(1) == 0), or, if ((--m_ref) == 0)


Comments from the review on Reviewable.io

@hdcmeta
Copy link
Contributor Author

hdcmeta commented Feb 4, 2016

Review status: 29 of 66 files reviewed at latest revision, 48 unresolved discussions.


Source/Core/VideoBackends/D3D12/D3DQueuedCommandList.cpp, line 34 [r7] (raw file):
Done.


Comments from the review on Reviewable.io

@hdcmeta
Copy link
Contributor Author

hdcmeta commented Feb 4, 2016

Review status: 29 of 66 files reviewed at latest revision, 48 unresolved discussions.


Source/Core/VideoBackends/D3D12/TextureCache.cpp, line 352 [r7] (raw file):
Didn't understand this at first, but makes perfect sense - switched it over to a Stream Buffer and it's much cleaner - thanks.


Source/Core/VideoBackends/D3D12/TextureCache.cpp, line 662 [r8] (raw file):
Ended up removing due to moving to StreamBuffer


Source/Core/VideoBackends/D3D12/TextureCache.cpp, line 672 [r8] (raw file):
Done.


Comments from the review on Reviewable.io

@hdcmeta
Copy link
Contributor Author

hdcmeta commented Feb 4, 2016

Review status: 29 of 66 files reviewed at latest revision, 48 unresolved discussions.


Source/Core/VideoCommon/TextureCacheBase.cpp, line 355 [r7] (raw file):
That would be a better/more simple approach - I'd prefer to avoid modifying the other backends too much at least in this PR. Maybe as a follow-up PR? Just trying to scope down the risk here..

Did notice the parameter is 'lastTexture' in some places instead of 'last_texture', so fixed that locally, will include in next commit.


Comments from the review on Reviewable.io

@hdcmeta
Copy link
Contributor Author

hdcmeta commented Feb 5, 2016

Review status: 25 of 66 files reviewed at latest revision, 49 unresolved discussions, some commit checks failed.


Source/Core/VideoBackends/D3D12/D3DCommandListManager.h, line 37 [r7] (raw file):
Done.


Source/Core/VideoBackends/D3D12/D3DStreamBuffer.cpp, line 136 [r7] (raw file):
Broke out the largest section into it's own function


Source/Core/VideoBackends/D3D12/D3DUtil.cpp, line 24 [r7] (raw file):
Makes sense, done


Comments from the review on Reviewable.io

@hdcmeta hdcmeta force-pushed the d3d12merge branch 5 times, most recently from a1f63e8 to 387d3f8 Compare February 9, 2016 07:15
@stenzek
Copy link
Contributor

stenzek commented Feb 9, 2016

Reviewed 1 of 34 files at r8, 1 of 22 files at r9.
Review status: 27 of 61 files reviewed at latest revision, 39 unresolved discussions.


Source/Core/VideoBackends/D3D12/D3DBase.cpp, line 928 [r7] (raw file):
Calling for other opinions on this. The way I see it:

  • The timing check here is not accurate due to the present being performed at a later time on the background thread
  • Doesn't work for multimonitor setups, where the refresh rate may differ between two monitors, which refresh rate is being used for this test?
  • DXGI_PRESENT_TEST is meant to be used for occlusion testing or testing if the device still exists, not to drop frames. May as well just skip the present call entirely.
  • This behavior is driver and vendor dependent. It could potentially change in the future. Present used to block in windowed mode a few months ago when I was writing D3D12 code, it doesn't anymore on more recent drivers.

NVIDIA's D3D12 guide (https://developer.nvidia.com/dx12-dos-and-donts) says to use GetFrameLatencyWaitableObject() for this type of thing, which due to the video thread requiring the backbuffer index won't work here (for the same reason the time check is not accurate, ideally the video thread would not have to worry about this state at all)

After testing, it seems the only times the framerate will be limited is when Direct Flip is being used, (i.e. the compositor being bypassed, so fullscreen modes where there isn't another window on top), limiting FPS to 90. Windowed does not have this issue.

I think it's better to remove the hack, given this is going into a stable release, and drivers may change in the future. If users want to have the game running at above 90fps in current environments, use windowed mode (unless there's another way to bypass direct flip), after all it is an experimental backend. Alternatively, at least bypass it when running in windowed mode.

But I'm happy to be convinced otherwise.


Source/Core/VideoBackends/D3D12/D3DUtil.cpp, line 24 [r7] (raw file):
Do you still need the globals here? (replace the references with a call?)


Comments from the review on Reviewable.io

@hdcmeta
Copy link
Contributor Author

hdcmeta commented Feb 9, 2016

Review status: 27 of 61 files reviewed at latest revision, 39 unresolved discussions.


Source/Core/VideoBackends/D3D12/D3DBase.cpp, line 928 [r7] (raw file):
No strong opinions here, just to add some points on the functionality:

  • This code works around the problem that if the app calls 'Present' too fast, Windows can sometimes throttle your presentation rate, and start blocking inside Present.
  • As pointed out, this only seems to be a problem in the full-screen case, windowed mode doesn't need this work-around.
  • If the full-screen case is 'fixed' in the OS in the future and gets an unlimited presentation rate, this code has no adverse side-effects. In fact, it works great in windowed mode today (which already has an unlimited presentation rate).

I'd vote for keeping it for these reasons:

  • I'd like to optimize for user functionality here, instead of adding a corner case ("Please don't use full-screen if you want to use 'tab'").
  • I have confidence in the stability/lack of side effects in the code, it has been around since the first release on the forums.
  • The 'hack' is extremely localized to the 'Present' method (around six lines), and is well commented. I don't think this makes the code less clear, or introduces maintainability concerns.

Once this is fixed in the OS/drivers, I'm happy to remove it asap when it's no longer needed..


Source/Core/VideoBackends/D3D12/D3DUtil.cpp, line 24 [r7] (raw file):
Done.


Comments from the review on Reviewable.io

@JMC47
Copy link
Contributor

JMC47 commented Feb 15, 2016

Touching the texture cache slider causes a lot of textures to flicker/disappear for one frame. Changing IR as well.

Turning on XFB (I know it's not supported) causes the screen to black out. When you consider we have INIs that do that, maybe it'd be best to just ignore XFB setting for now?

@JMC47
Copy link
Contributor

JMC47 commented Feb 15, 2016

At lower IRs, DX12's performance is superior, but it really flies at 4x - 8x IR.

At 4K resolutions, Super Mario Galaxy 2 was 48% faster on D3D12 vs OpenGL/D3D11 (GTX 760)

Stability wise, I (driver) crashed once early on while tabbing. Haven't had a crash since in two days of use.

@ppmeis
Copy link

ppmeis commented Feb 15, 2016

I noticed under DirectX12 there is more shader stuttering , it tates more time to create shaders...is this a known issue?

I'm really excited to test this on my GTX970 SLI. Great job ;). I will try 4K resolutions as @JMC47 said. Keep working.

@JMC47
Copy link
Contributor

JMC47 commented Feb 15, 2016

Shader Generation is about the same for me. It's just that we're usually more aware of it when we're looking for it.

@delroth delroth changed the title RFC: DX12 video backend DX12 video backend Feb 15, 2016
@delroth
Copy link
Member

delroth commented Feb 15, 2016

Reviewed 1 of 34 files at r8, 2 of 22 files at r9, 4 of 21 files at r11, 3 of 4 files at r12, 2 of 2 files at r13.
Review status: 39 of 61 files reviewed at latest revision, 1 unresolved discussion.


Comments from the review on Reviewable.io

@delroth
Copy link
Member

delroth commented Feb 15, 2016

Review status: 39 of 61 files reviewed at latest revision, 1 unresolved discussion.


Source/Core/VideoBackends/D3D12/D3DBase.cpp, line 928 [r7] (raw file):
I'm fine with that hack given the explanation here and the ample commenting in the source code.


Comments from the review on Reviewable.io

delroth added a commit that referenced this pull request Feb 15, 2016
@delroth delroth merged commit 61ee799 into dolphin-emu:master Feb 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC Request for comments WIP / do not merge Work in progress (do not merge)