-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
D3D: Implemented cache for dynamic render states #475
Conversation
i also replace the d3d description structure for lighter version and replace the GetCrc32 with a template magic function that unroll the loop for optimal code :) Not yet pushed on my fork still :) |
As far as I can tell, we don't need to ever calculate hash of state descriptions. Everything can be packed into u32, except for aniso level. |
If you strip everything that is not useful yet from the description, but i On Sun, Jun 8, 2014 at 3:22 PM, kayru notifications@github.com wrote:
|
Good point about collisions. Though, if this is identified as a real problem, members of the bitfield can be re-arranged to reduce collisions. As for keeping things general, I'm not sure if it's needed in case of Dolphin. If generalized version is as fast and as simple, then it would make sense to use it, of course. |
For example, i made use temporary of the zbias stuff in the rasterizer state, that's an int and two float, not possible to reduce it without reducing the authorized range. And that' by far not a bottleneck anymore anyway |
{ | ||
if (state.mag_filter) // linear mag filter | ||
{ | ||
if (mip == TEXF_NONE) sampdc.Filter = D3D11_FILTER_MIN_MAG_LINEAR_MIP_POINT; |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
@lioncash fixed the if/else new line thing |
If the description ever does grow too big to fit into a u64, it isn't too hard to change this patch to accommodate that: it's just a matter of implementing a custom hash function. The trick of packing the values into a u32/u64 doesn't really affect most of the patch. Other than that, the approach in this patch seems generally reasonable. |
} gx_state; | ||
|
||
class GXStateCache |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
Is there a reason not to have something similar in VideoCommon, so that OpenGL could benefit from it, too? |
Does OpenGL have the same performance issue, though? |
Probably not because GL is not object based and the state machine is lazier. Still, the reason to have something in VideoCommon exists, but at a large scale, Dolphin lacks an lightweight abstract layer to write rendering code once and not have all the copy/past that exist between gl and d3d at the moment. |
OpenGL doesn't have any equivalent to the ID3D11RasterizerState, ID3D11DepthStencilState, or ID3D11BlendState objects. A shared sampler cache implementation probably makes sense; OpenGL already has one, though, so it would just be code refactoring. |
Thanks :) |
I've addressed most of the comments. One thing that I did not do is switch to BitField template. There are other unions/bitfields in the codebase that follow the same explicit style, such as TexMode0 in BPMemory.h. |
The BitField class is new, so not all existing code uses it. New code which depends on bitfield layout (e.g. a union with u64) should. Also, lint is failing with the latest patch. |
@magumagu there's another problem with BitField. Some of the packed members represent D3D11 enums. Using BitField requires casts like this: (D3D11_COLOR_WRITE_ENABLE)(u32)state.write_mask. |
I think BitField supports using an enum as the field type; does that not work for you? |
Yes it does, but if we're not in control over enum storage type, it may not always work. |
I don't follow. The enum is part of the DirectX SDK, therefore the representation can't change without breaking compatibility with existing projects. |
Consider a situation where D3D enum is part of a union that's over 32 bits. |
Err, oh, I see what you mean. Might make sense to modify the BitField class to allow specifying the storage type separately from the type of the value. |
I think modifying BitField class is beyond the scope of this patch. |
Okay. I guess it's not really such a big deal here, given that all you're doing with the u64 is using it as the key for a hashtable. |
I just encountered a pretty scary issue with BitField in unions. |
LGTM. |
|
||
union RasterizerState | ||
{ | ||
BitField<0, 2, D3D11_CULL_MODE> cull_mode; |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
Ok, I'll try to repro it. Does it happen immediately at startup? |
Yes, it happens immediately on D3D(of course D3D) |
I should say that this provides a 20 - 30% performance boost in D3D in pretty much every game. I would get more exact numbers, but there's no benefit to knowing them. |
It seems that I actually have the exact same problem as the user BadKarma above. |
Launch another game first, end emulation and then launch mutant turtles. It
|
Yeah that's what I did and it made the game work. So it's the same issue we're seeing. |
I am able to repro the issue in Wind Waker. Expect fix soon :) |
Wind waker, brawl and a few others I can't remember.
|
6ddadbb
to
e6e7516
Compare
Refactored StateCache::Get() to early out for narrower indentation. Added comments to clarify ownership of objects returned by StateCache::Get().
e6e7516
to
bea68c9
Compare
Please try bea68c9. It fixes the WW issue. I'm going to assume that TMNT hits the same bug. |
Yep. The latest patch fixes all the games in my library that had that issue. |
@dolphin-emu-bot rebuild |
LGTM, however I'd be more happy to see this (mostly all of those state setters currently in Render.cpp) in videocommon. As gx_state rarely depends on D3D enums, it's a step in the right direction :) I guess it's worth to implement some general state hacks there, eg the current logicOp -> blending mappings. Do you think it's worth to reduce the identical state space? eg disabled blending but different blending parameters? |
Yeah, I think reducing the state space a little bit would be nice. Unlikely to improve performance much, but is a good practice. I'll try it out tonight. |
…llapsing some states that have blending disabled
gx_state.blenddc.RenderTarget[0].DestBlendAlpha = val; | ||
|
||
gx_state.blenddc.RenderTarget[0].DestBlend = val; | ||
gx_state.blend.dst_blend = val; | ||
} | ||
|
||
void SetBlendOp(D3D11_BLEND_OP val) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
On NVIDIA, I haven't found any problems with this yet; it's just seems to be faster. It's still not as fast as OpenGL in most situations, but it's an excellent step in the right direction, and doesn't seem to bring any instabilities. On my Radeon HD5850 OpenGL and D3D seems to be the fastest backend no question. There are some exceptions (Melee is pretty close,) but in most games with this Pull Request, I see a healthy boost in speed. I didn't notice any graphical regressions in either of my graphics cards. @kayru I know this is slightly off topic, but you're the only one I know actively working on D3D and I dunno where else to find you. There are some pretty infamous viewport issues in D3D that make it different from OpenGL and Software Renderer in behavior. Mario Tennis's stadiums blocking the view, Luigi's Mansion's Mirrors not working, Silent Hill's flashlight being blocked by the guy's hand. There's a patch from Galop1n's fork that fixes this, and shouldn't affect anything else. I was wondering if you'd be willing to look at it, and possibly clean it up if necessary and make a Pull Request for it. I feel bad bugging you about it, but, it'd be great to have the backends all giving the same results. Patch location: https://dl.dropboxusercontent.com/u/484730/SH%20fix.patch |
Still nothing reach te main branch from my amazing perf boost and accurate My goods were stuck at the custom and even if i still sleep on the floor up I will come again, but i send this email with my only internet access, my
|
@galop1n: Trust me, I'm trying to get your stuff to master! It's hard because separating out your stuff from the mega commit is tough, and going through to isolate features is difficult because a lot of the builds on the fork don't compile. Your work was amazing. |
Nothing remaining to criticize. LGTM |
D3D: Implemented cache for dynamic render states
Most render states are now cached, rather than created every time, saving roughly 50% of ApplyState().
This is similar to #365, except only limited to render state caching. Additionally, pending render state is kept in a compact structure, which avoids significant cost of hashing and comparing D3D state descriptions in #365.