Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

D3D: Implemented cache for dynamic render states #475

Merged
merged 7 commits into from
Oct 18, 2014

Conversation

kayru
Copy link
Contributor

@kayru kayru commented Jun 8, 2014

Most render states are now cached, rather than created every time, saving roughly 50% of ApplyState().
This is similar to #365, except only limited to render state caching. Additionally, pending render state is kept in a compact structure, which avoids significant cost of hashing and comparing D3D state descriptions in #365.

@galop1n
Copy link
Contributor

galop1n commented Jun 8, 2014

i also replace the d3d description structure for lighter version and replace the GetCrc32 with a template magic function that unroll the loop for optimal code :) Not yet pushed on my fork still :)

@kayru
Copy link
Contributor Author

kayru commented Jun 8, 2014

As far as I can tell, we don't need to ever calculate hash of state descriptions. Everything can be packed into u32, except for aniso level.

@galop1n
Copy link
Contributor

galop1n commented Jun 8, 2014

If you strip everything that is not useful yet from the description, but i
prefer to keep them as they may be needed at some point. And even a 32bits
value need to be hashed to be converted to something that will reduce
collision in the hashmap :)

On Sun, Jun 8, 2014 at 3:22 PM, kayru notifications@github.com wrote:

As far as I can tell, we don't need to ever calculate hash of state
descriptions. Everything can be packed into u32, except for aniso level.


Reply to this email directly or view it on GitHub
#475 (comment).

@kayru
Copy link
Contributor Author

kayru commented Jun 8, 2014

Good point about collisions. Though, if this is identified as a real problem, members of the bitfield can be re-arranged to reduce collisions.

As for keeping things general, I'm not sure if it's needed in case of Dolphin. If generalized version is as fast and as simple, then it would make sense to use it, of course.

@galop1n
Copy link
Contributor

galop1n commented Jun 8, 2014

For example, i made use temporary of the zbias stuff in the rasterizer state, that's an int and two float, not possible to reduce it without reducing the authorized range. And that' by far not a bottleneck anymore anyway

{
if (state.mag_filter) // linear mag filter
{
if (mip == TEXF_NONE) sampdc.Filter = D3D11_FILTER_MIN_MAG_LINEAR_MIP_POINT;

This comment was marked as off-topic.

@kayru
Copy link
Contributor Author

kayru commented Jun 8, 2014

@lioncash fixed the if/else new line thing

@magumagu
Copy link
Contributor

magumagu commented Jun 8, 2014

If the description ever does grow too big to fit into a u64, it isn't too hard to change this patch to accommodate that: it's just a matter of implementing a custom hash function. The trick of packing the values into a u32/u64 doesn't really affect most of the patch.

Other than that, the approach in this patch seems generally reasonable.

} gx_state;

class GXStateCache

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@neobrain
Copy link
Member

neobrain commented Jun 8, 2014

Is there a reason not to have something similar in VideoCommon, so that OpenGL could benefit from it, too?

@kayru
Copy link
Contributor Author

kayru commented Jun 8, 2014

Does OpenGL have the same performance issue, though?

@galop1n
Copy link
Contributor

galop1n commented Jun 8, 2014

Probably not because GL is not object based and the state machine is lazier. Still, the reason to have something in VideoCommon exists, but at a large scale, Dolphin lacks an lightweight abstract layer to write rendering code once and not have all the copy/past that exist between gl and d3d at the moment.

@magumagu
Copy link
Contributor

magumagu commented Jun 8, 2014

OpenGL doesn't have any equivalent to the ID3D11RasterizerState, ID3D11DepthStencilState, or ID3D11BlendState objects.

A shared sampler cache implementation probably makes sense; OpenGL already has one, though, so it would just be code refactoring.

@lioncash
Copy link
Member

lioncash commented Jun 8, 2014

@kayru

fixed the if/else new line thing

Thanks :)

@kayru
Copy link
Contributor Author

kayru commented Jun 8, 2014

I've addressed most of the comments. One thing that I did not do is switch to BitField template. There are other unions/bitfields in the codebase that follow the same explicit style, such as TexMode0 in BPMemory.h.

@magumagu
Copy link
Contributor

magumagu commented Jun 8, 2014

The BitField class is new, so not all existing code uses it. New code which depends on bitfield layout (e.g. a union with u64) should.

Also, lint is failing with the latest patch.

@kayru
Copy link
Contributor Author

kayru commented Jun 9, 2014

@magumagu there's another problem with BitField. Some of the packed members represent D3D11 enums. Using BitField requires casts like this: (D3D11_COLOR_WRITE_ENABLE)(u32)state.write_mask.
Is there a better way? If it was a custom enum, then appropriate storage type could be specified and everything would be fine.

@magumagu
Copy link
Contributor

magumagu commented Jun 9, 2014

I think BitField supports using an enum as the field type; does that not work for you?

@kayru
Copy link
Contributor Author

kayru commented Jun 9, 2014

Yes it does, but if we're not in control over enum storage type, it may not always work.

@magumagu
Copy link
Contributor

magumagu commented Jun 9, 2014

I don't follow. The enum is part of the DirectX SDK, therefore the representation can't change without breaking compatibility with existing projects.

@kayru
Copy link
Contributor Author

kayru commented Jun 9, 2014

Consider a situation where D3D enum is part of a union that's over 32 bits.

@magumagu
Copy link
Contributor

magumagu commented Jun 9, 2014

Err, oh, I see what you mean.

Might make sense to modify the BitField class to allow specifying the storage type separately from the type of the value.

@kayru
Copy link
Contributor Author

kayru commented Jun 9, 2014

I think modifying BitField class is beyond the scope of this patch.

@magumagu
Copy link
Contributor

magumagu commented Jun 9, 2014

Okay. I guess it's not really such a big deal here, given that all you're doing with the u64 is using it as the key for a hashtable.

@kayru
Copy link
Contributor Author

kayru commented Jun 9, 2014

I just encountered a pretty scary issue with BitField in unions.
When BitField is copied, then the entire storage is copied with it, not just the affected bits. This is a problem if you want to partially copy a union with BitField-s.
Example: https://gist.github.com/kayru/327ad61b698f09642ab6

@magumagu
Copy link
Contributor

LGTM.


union RasterizerState
{
BitField<0, 2, D3D11_CULL_MODE> cull_mode;

This comment was marked as off-topic.

This comment was marked as off-topic.

@kayru
Copy link
Contributor Author

kayru commented Oct 15, 2014

Ok, I'll try to repro it. Does it happen immediately at startup?

@PatrickFerry
Copy link
Member

Yes, it happens immediately on D3D(of course D3D)

@JMC47
Copy link
Contributor

JMC47 commented Oct 15, 2014

I should say that this provides a 20 - 30% performance boost in D3D in pretty much every game. I would get more exact numbers, but there's no benefit to knowing them.

@PatrickFerry
Copy link
Member

It seems that I actually have the exact same problem as the user BadKarma above.

@badkarma12
Copy link

Launch another game first, end emulation and then launch mutant turtles. It
will work then.
On Oct 15, 2014 2:27 AM, "Patrick" notifications@github.com wrote:

This PR breaks Teenage Mutant Ninja Turtles(2003)
It gives the error "Failed to create rasterizer state at GfxState.cpp 262"


Reply to this email directly or view it on GitHub
#475 (comment).

@PatrickFerry
Copy link
Member

Yeah that's what I did and it made the game work. So it's the same issue we're seeing.

@kayru
Copy link
Contributor Author

kayru commented Oct 15, 2014

I am able to repro the issue in Wind Waker. Expect fix soon :)

@badkarma12
Copy link

Wind waker, brawl and a few others I can't remember.
On Oct 15, 2014 12:56 PM, "kayru" notifications@github.com wrote:

@badkarma12 https://github.com/badkarma12 what game did you test this
on?


Reply to this email directly or view it on GitHub
#475 (comment).

@kayru
Copy link
Contributor Author

kayru commented Oct 15, 2014

Please try bea68c9. It fixes the WW issue. I'm going to assume that TMNT hits the same bug.

@PatrickFerry
Copy link
Member

Yep. The latest patch fixes all the games in my library that had that issue.

@Parlane
Copy link
Member

Parlane commented Oct 16, 2014

@dolphin-emu-bot rebuild

@degasus
Copy link
Member

degasus commented Oct 16, 2014

LGTM, however I'd be more happy to see this (mostly all of those state setters currently in Render.cpp) in videocommon. As gx_state rarely depends on D3D enums, it's a step in the right direction :) I guess it's worth to implement some general state hacks there, eg the current logicOp -> blending mappings.

Do you think it's worth to reduce the identical state space? eg disabled blending but different blending parameters?

@kayru
Copy link
Contributor Author

kayru commented Oct 16, 2014

Yeah, I think reducing the state space a little bit would be nice. Unlikely to improve performance much, but is a good practice. I'll try it out tonight.

…llapsing some states that have blending disabled
gx_state.blenddc.RenderTarget[0].DestBlendAlpha = val;

gx_state.blenddc.RenderTarget[0].DestBlend = val;
gx_state.blend.dst_blend = val;
}

void SetBlendOp(D3D11_BLEND_OP val)

This comment was marked as off-topic.

This comment was marked as off-topic.

@JMC47
Copy link
Contributor

JMC47 commented Oct 17, 2014

On NVIDIA, I haven't found any problems with this yet; it's just seems to be faster. It's still not as fast as OpenGL in most situations, but it's an excellent step in the right direction, and doesn't seem to bring any instabilities.

On my Radeon HD5850 OpenGL and D3D seems to be the fastest backend no question. There are some exceptions (Melee is pretty close,) but in most games with this Pull Request, I see a healthy boost in speed.

I didn't notice any graphical regressions in either of my graphics cards.

@kayru I know this is slightly off topic, but you're the only one I know actively working on D3D and I dunno where else to find you. There are some pretty infamous viewport issues in D3D that make it different from OpenGL and Software Renderer in behavior. Mario Tennis's stadiums blocking the view, Luigi's Mansion's Mirrors not working, Silent Hill's flashlight being blocked by the guy's hand. There's a patch from Galop1n's fork that fixes this, and shouldn't affect anything else. I was wondering if you'd be willing to look at it, and possibly clean it up if necessary and make a Pull Request for it. I feel bad bugging you about it, but, it'd be great to have the backends all giving the same results.

Patch location: https://dl.dropboxusercontent.com/u/484730/SH%20fix.patch

@galop1n
Copy link
Contributor

galop1n commented Oct 17, 2014

Still nothing reach te main branch from my amazing perf boost and accurate
fixes !!

My goods were stuck at the custom and even if i still sleep on the floor up
to wenesday, i am living the California dream :)

I will come again, but i send this email with my only internet access, my
phone...
On Oct 17, 2014 3:12 AM, "JMC47" notifications@github.com wrote:

On NVIDIA, I haven't found any problems with this yet; it's just seems to
be faster. It's still not as fast as OpenGL in most situations, but it's an
excellent step in the right direction, and doesn't seem to bring any
instabilities.

On my Radeon HD5850 OpenGL and D3D seems to be the fastest backend no
question. There are some exceptions (Melee is pretty close,) but in most
games with this Pull Request, I see a healthy boost in speed.

I didn't notice any graphical regressions in either of my graphics cards.

@kayru https://github.com/Kayru I know this is slightly off topic, but
you're the only one I know actively working on D3D and I dunno where else
to find you. There are some pretty infamous viewport issues in D3D that
make it different from OpenGL and Software Renderer in behavior. Mario
Tennis's stadiums blocking the view, Luigi's Mansion's Mirrors not working,
Silent Hill's flashlight being blocked by the guy's hand. There's a patch
from Galop1n's fork that fixes this, and shouldn't affect anything else. I
was wondering if you'd be willing to look at it, and possibly clean it up
if necessary and make a Pull Request for it. I feel bad bugging you about
it, but, it'd be great to have the backends all giving the same results.

Patch location:
http://dolphin-emu.googlecode.com/issues/attachment?aid=77550002000&name=SH+fix.patch&token=ABZ6GAeoyloo47isooaFX1qp4xQRvmS2vg%3A1413540538364


Reply to this email directly or view it on GitHub
#475 (comment).

@JMC47
Copy link
Contributor

JMC47 commented Oct 17, 2014

@galop1n: Trust me, I'm trying to get your stuff to master! It's hard because separating out your stuff from the mega commit is tough, and going through to isolate features is difficult because a lot of the builds on the fork don't compile. Your work was amazing.

@degasus
Copy link
Member

degasus commented Oct 17, 2014

Nothing remaining to criticize. LGTM

skidau added a commit that referenced this pull request Oct 18, 2014
D3D: Implemented cache for dynamic render states
@skidau skidau merged commit a5674bb into dolphin-emu:master Oct 18, 2014
@kayru kayru deleted the d3d_state_cache branch October 18, 2014 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.