Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

D3D11: Implement non-blocking CPU EFB access #3652

Closed
wants to merge 1 commit into from
Closed

D3D11: Implement non-blocking CPU EFB access #3652

wants to merge 1 commit into from

Conversation

stenzek
Copy link
Contributor

@stenzek stenzek commented Feb 18, 2016

Trying implementing the EFB cache as a staging texture that is kept mapped until it is invalidated. Will be interesting to see what the overheads are here compared to the GL implementation which copies blocks at a time.

Should also work for MSAA and >1xIR cases.

Review on Reviewable

@delroth
Copy link
Member

delroth commented Feb 18, 2016

I'd really prefer if we could have one implementation of that, however it's done.

@lioncash lioncash added the WIP / do not merge Work in progress (do not merge) label Feb 18, 2016
@JMC47
Copy link
Contributor

JMC47 commented Feb 19, 2016

Some performance numbers:

Monster Hunter Tri: Before - 4 FPS, After - 71 FPS, OGL - 3 FPS
F-Zero GX Sand Ocean: Before - 57 FPS, After - 94 FPS, OGL - 64 FPS

Fixes issue 8427 and issue 8247, one of which are regression. Thus, this should be considered for 5.0. Also, look at those perf numbers.

@degasus
Copy link
Member

degasus commented Feb 19, 2016

I don't think this should go into 5.0. It's a new feature, not a fix. As we're an emulator, of course most new features are fixes for games, but not fixes for the emulator itself.

But the byte order fix might go to 5.0, indeed.

@stenzek
Copy link
Contributor Author

stenzek commented Feb 20, 2016

no need to invalidate here swap does not imply efb changes

Correct, however this is more a precaution against drivers being silly. Keeping a texture mapped while calling Present (or, submitting commands) seems like something that would never pop up during "normal" workloads and is probably an untested path.

CD3D11_RECT src_rect(0, 0, m_target_width, m_target_height);
CD3D11_VIEWPORT vp(0.0f, 0.0f, EFB_WIDTH, EFB_HEIGHT);
D3D::context->RSSetViewports(1, &vp);
D3D::context->OMSetRenderTargets(1, &m_efb.depth_read_texture->GetRTV(), nullptr);

This comment was marked as off-topic.

@Tinob
Copy link
Contributor

Tinob commented Feb 20, 2016

if you are interested in make it work for dx12 here you have it implemented.
Tinob/Ishiiruka@d3ab000

@CrossVR
Copy link
Contributor

CrossVR commented Feb 21, 2016

Review status: 0 of 9 files reviewed at latest revision, 3 unresolved discussions.


Source/Core/VideoBackends/D3D/FramebufferManager.cpp, line 124 [r1] (raw file):
I take it you chose to remove the m_efb.slices because peeks are only done on the first layer anyway?


Comments from the review on Reviewable.io

@stenzek stenzek changed the title WIP: D3D11: Implement EFB peek cache (and some other minor fixes) D3D11: Implement EFB peek cache (and some other minor fixes) Feb 21, 2016
@stenzek
Copy link
Contributor Author

stenzek commented Feb 21, 2016

Review status: 0 of 9 files reviewed at latest revision, 3 unresolved discussions.


Source/Core/VideoBackends/D3D/FramebufferManager.cpp, line 124 [r1] (raw file):
Correct. It's only used as a copy destination, never drawn to. I'm not how to best handle the stereo case here, which is why I just used the left eye image.


Comments from the review on Reviewable.io

@JMC47
Copy link
Contributor

JMC47 commented Feb 21, 2016

Huge speedup, more accurate, what's not to like?

Verified to work at > 1x IR now.

@delroth
Copy link
Member

delroth commented Feb 21, 2016

Code duplication and feature freeze? :)

@lioncash
Copy link
Member

Review status: 0 of 9 files reviewed at latest revision, 7 unresolved discussions.


Source/Core/VideoBackends/D3D/FramebufferManager.cpp, line 265 [r2] (raw file):
This should be memcpying the value out of memory instead of using reinterpret_cast. The initial reinterpret_cast on pData (which is void* iirc), should be to its assumed type (first non-void type it's casted to, assuming the variable is used for memory allocation or something), or to char/unsigned char.


Source/Core/VideoBackends/D3D/FramebufferManager.cpp, line 282 [r2] (raw file):
Ditto


Source/Core/VideoBackends/D3D/FramebufferManager.cpp, line 292 [r2] (raw file):
Ditto, but into memory


Source/Core/VideoBackends/D3D/FramebufferManager.cpp, line 302 [r2] (raw file):
Ditto, but into memory


Comments from the review on Reviewable.io

@lioncash
Copy link
Member

@Tinob It's not safe to typecast across pointer types like what is in your code. Consider using memcpy instead.

@stenzek
Copy link
Contributor Author

stenzek commented Feb 21, 2016

I'm tempted to not call this an "EFB cache" anymore, since we're just accessing a texture that's potentially in vram, not making a copy and storing it in system memory.

A note on accuracy: For EFB pokes, should we be applying the same logic and handling 16-bit Z/16-bit color in the same way as peeks?

IRT to sharing code with GL, I'm not sure there is much that can actually be shared. The majority of the work is in FramebufferManager::MapEFB{Color,Depth}Copy, and this is all D3D-specific code. The only stuff that could be shared is in Renderer::AccessEFB, and that's just a few lines now.

I've implemented the same idea for GL using pixel buffer objects, but it seems slower when only reading a few pixels (but faster when reading pixels further apart, or a larger count than the block sizes). D3D has to go through a copy/map path always, so there isn't really a comparison here. So I'm not sure what the best option is?

@RisingFog
Copy link
Member

Any update on this?

@RisingFog
Copy link
Member

What's holding this PR back?

@BhaaLseN
Copy link
Member

"Code duplication and feature freeze? :)"

@bb010g
Copy link
Contributor

bb010g commented Mar 15, 2016

@BhaaLseN It fixes a regression.

This is done through a copy of the buffers that is invalidated once the
next draw occurs. If multiple accesses occur inbetween draw calls, the
read won't block.
@stenzek stenzek changed the title D3D11: Implement EFB peek cache (and some other minor fixes) D3D11: Implement non-blocking CPU EFB access Mar 25, 2016
@bb010g
Copy link
Contributor

bb010g commented Mar 27, 2016

@RisingFog It looks like just fixing what lioncash and possibly Tinob have mentioned on Reviewable.

@emko
Copy link

emko commented Jun 23, 2016

for me this causes performance problems
Mario Galaxy with this i get max fps in a exact same spot 45-50FPS vs without 80fps

@degasus
Copy link
Member

degasus commented Jun 23, 2016

@emko What is "for me"? Which GPU do you have?

@JMC47
Copy link
Contributor

JMC47 commented Jun 23, 2016

Are you sure it isn't because D3D EFB Access got fixed recently?

@Tinob
Copy link
Contributor

Tinob commented Jun 23, 2016

just to clarify emko report: he was testing performance betwen master and ishiiruka and find out that there is a big performance diff betwen dx11 master vs dx11 ishiruka, we started to track down the source and i finish discovering that the difference was that ishiirukka have this pr already merged, i send him a link to this version to be sure and yes the speed lost for him was the same for this pr with master code. for his profile i see that he use a nvidia 970 with a intel 2600k, it seems to be a problem in that specific card because the rest of the configurations i tested have huge improvements.

@emko
Copy link

emko commented Jun 23, 2016

@degasus for me because i see others reporting improved performance, could it be my GPU GTX 970? i will test out the games JMC47 seen performance improvements in maybe its just this game

@emko
Copy link

emko commented Jun 23, 2016

High IR 4X native what i use most of the time
F-Zero GX Sand Ocean: Before - 18 FPS, After - 18 FPS

lower the resolution to 3X native i get Before - 27 FPS, After - 27FPS
no performance difference in this game for me, maybe its my hardware

@RisingFog
Copy link
Member

Now that we've released 5.0 and are out of feature freeze, is there anything holding this PR back? (Other than a rebase)

@stenzek stenzek closed this Jun 10, 2017
@stenzek stenzek deleted the d3d11-efb-cache branch February 11, 2019 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP / do not merge Work in progress (do not merge)