Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial port of zfreeze branch (3.5-1729) #1767

Closed
wants to merge 7 commits into from

Conversation

NanoByte011
Copy link
Contributor

Initial port of original zfreeze branch (3.5-1729) by NeoBrainX into
most recent build of Dolphin.

Makes Rogue Squadron 2 very playable at full speed thanks to recent core
speedups made to Dolphin. Works on DirectX Video plugin only for now.

These are the Game Settings I have for GSWE64.ini under edit config

[Core]
DSPHLE = False
[DSP]
EnableJIT
[Video_Settings]
AspectRatio = 1

FastDepthCalc = False

Enjoy! and Merry Xmas!!

Initial port of original zfreeze branch (3.5-1729) by NeoBrainX into
most recent build of Dolphin.

Makes Rogue Squadron 2 very playable at full speed thanks to recent core
speedups made to Dolphin. Works on DirectX Video plugin only for now.

These are the Game Settings I have for GSWE64.ini under edit config
===============
[Core]
DSPHLE = False
[DSP]
EnableJIT
[Video_Settings]
AspectRatio = 1
FastDepthCalc = False
===============

Enjoy!  and Merry Xmas!!
@MayImilae
Copy link
Contributor

Ooooh @phire!

@JMC47
Copy link
Contributor

JMC47 commented Dec 25, 2014

I'm pretty shocked.

Mario Power Tennis - the three regular courts work. The other courts work partially; some objects are wrongfully culled. It's more playable than master.
RS2 - Seem to work okay. The ending cutscene of the first mission has some definite zfighting though!
RS3 - Some wrong culling in cutscenes, in-game seems fine. Opening Cutscene of the first level has some skybox leaking over the stars; easy to see.
Mario Golf Toadstool Tour - Main Menu better than master, same as phire's hacks.
NHL 2003 - Shadows work correctly with no zfighting.
NBA Street V3 - No Shadows
Need For Speed Hot Pursuit 2 - No Zfreeze Shadows (it has multiple types drawing; the zfreeze shadows are missing)
NBA Live 2005 - Zfighting on shadows

@neobrain
Copy link
Member

Nice work! Just a minor note, please spell my name as "neobrain" in the commit message ;)

I'll try to give this a full review once I'm back from vacation.

@neobrain
Copy link
Member

@dolphin-emu-bot rebuild


PrepareDrawBuffers(stride);

if (!bpmem.genMode.zfreeze && indices >= 3)
{

This comment was marked as off-topic.

@degasus
Copy link
Member

degasus commented Dec 25, 2014

Nice work to get this ported.
For OGL, I see a bigger issue. We load our vertices directly into an ogl mapped write-only buffer. So we are not allowed to read them afterwards again.
I see two possible solution: Always extract the last three vertices within the vertx loader (very ugly), or just decode within a temporary buffer and memcpy afterwards (likely slower).
But imo it's fine to just restore all OGL changes and to call this base function in the next PR.

@NanoByte011
Copy link
Contributor Author

Thanks for the code reviews guys... I can review and improve the code based on the feedback. It should be noted that I'm brand new to the code base for dolphin and my c++ is pretty rusty, but I'll continue to get up to speed here so can contribute as much as can!

@JMC47
Copy link
Contributor

JMC47 commented Dec 25, 2014

If you ever need any help, feel free to pop into IRC on Freenode on the channels #dolphin-emu (for general discussion) or #dolphin-dev (where you'll find more developers/development topics, like zfreeze)

Based on the feedback from pull request dolphin-emu#1767 I have put in most of
degasus's suggestions in here now.

I think we have a real winner here as moving the code to
VertexManagerBase for a function has allowed OGL to utilize zfreeze now
:)

Correct use of the vertex pointer has also corrected most of the issue
found in pull request dolphin-emu#1767 that JMC47 stated.  Which also for me now
has Mario Tennis working with no polygon spikes on the characters
anymore!  Shadows are still an issue and probably in the other games
with shadow problems.  Rebel Strike also seems better but random skybox
glitches can show up.
@NanoByte011
Copy link
Contributor Author

Ok I've updated and cleaned up the code based on feedback and I think we have a good start on zfreeze now.

@@ -179,9 +179,14 @@ void VertexManager::vFlush(bool useDstAlpha)
}

u32 stride = VertexLoaderManager::GetCurrentVertexFormat()->GetVertexStride();

This comment was marked as off-topic.

@JMC47
Copy link
Contributor

JMC47 commented Dec 26, 2014

A lot of the bugs I ran into in the earlier build are now fixed. Like the broken cutscenes in RS2/3, especially in OpenGL.

On the other hand, D3D and OpenGL produce different results on zfreeze. Mario Tennis on the other hand is weird. The basic courts work in D3D, while OpenGL shows black but have the lines. They appear to produce the same results in the gimmick courts.

I mean, it's not like Mario Power Tennis is a great experience with all the zfreeze problems, so it's not horrible to accept some changes in behavior to add a new feature, but it's still not great to see stuff broken on OGL that works on D3D, and stuff working on OGL that is broken on D3D.

The other titles, as far as I can tell, are pretty much identical as before.

As per lioncash request
@NanoByte011
Copy link
Contributor Author

Ok removed extra whitespace lioncash found.

@NanoByte011
Copy link
Contributor Author

Wow JMC47 you are quick at testing things, you really impress me on this! If this zfreeze fix is causing differences between OGL and D3D in Mario Tennis as an example, then it probably has to do with what degasus said about the OGL vertex buffer being loaded into a write-only buffer and not being able to read them afterwards. Though if this is the case, then I would have expected that zfreeze wouldn't work at all in OGL right now, yet it does.

@delroth
Copy link
Member

delroth commented Dec 26, 2014

I have to say, this looks surprisingly good. It's not perfect, but the implementation is clean/isolated and IMO we should really consider getting that merged before branching stable.

@neobrain when are you back from vacation?

@JMC47
Copy link
Contributor

JMC47 commented Dec 26, 2014

My main issue, and what I'd like you to look at if possible, would be trying to get the other courts in Mario Tennis to work. They're the main regression; if those could work; I'd have no qualms giving me support toward merging this. Even with that, I don't see it as a big deal since it was partially broken before, anyway. I can't speak for anyone else though.

@@ -22,6 +22,7 @@ class VertexManager : public ::VertexManager
protected:
virtual void ResetBuffer(u32 stride) override;
u16* GetIndexBuffer() { return &LocalIBuffer[0]; }
u8* GetVertexBuffer() { return &LocalVBuffer[0]; }

This comment was marked as off-topic.

@degasus
Copy link
Member

degasus commented Dec 26, 2014

@NanoByte011 Yeah, this implementation looks very well right now. We still violate the OGL specs, but I think it will likely work on all drivers. X86 cache modes doesn't allow to prohebit reading, it may just be uncached. So OGL may be slower, but it likely will work as expected.
Z-fighting issues aren't possible to resolv in this way. They are affected by everything (ogl vs d3d, gpu, driver version, ...), so don't expect this PR such issues. Maybe with some hacks:
depth += 0.5 * exp2(-16); // half gx depth epsilon to always overwrite the last value.

2 items pointed out by degasus
@MayImilae
Copy link
Contributor

Would this affect games that don't use zfreeze?

@JMC47
Copy link
Contributor

JMC47 commented Dec 27, 2014

It hasn't in my testing.

@degasus
Copy link
Member

degasus commented Dec 27, 2014

@MaJoRoesch Maybe a bit performance wise. Otherwise everything should be fine.

@MayImilae
Copy link
Contributor

JMC47 can you measure this before it's merged? See if there are any performance regressions?

@JMC47
Copy link
Contributor

JMC47 commented Dec 27, 2014

I can measure the performance impact. I'd like to see this merged for the Progress Report if possible, so I'm going to be selfish and stuff and try to push my agenda on everyone. Sorry in advance.

@MayImilae
Copy link
Contributor

Just test for any performance impacts first. :)

@JMC47
Copy link
Contributor

JMC47 commented Dec 27, 2014

In my Melee Benchmark it appears to be about 1.5% slower in OGL. Other games showed no performance regression (RS2/3???), and others I'm not 100% sure on due to needing to take a harder look (SMG1/2)

The performance regression disappears at 3x IR.

D3D had no performance regression in any game. If anything it benchmarked higher in 3 concurrent runs of Melee and RS2, and ended up even in Super Mario Galaxy.

@degasus
Copy link
Member

degasus commented Dec 27, 2014

OGL has a much bigger slowdown because of the uncached memory access. This slowdown is only on the gpu thread, so higher IR which moves the bottleneck to the gpu itself will reduce this effect.

@@ -22,7 +22,7 @@ class VertexManager : public ::VertexManager
protected:
virtual void ResetBuffer(u32 stride) override;
u16* GetIndexBuffer() { return &LocalIBuffer[0]; }

This comment was marked as off-topic.

@phire
Copy link
Member

phire commented Dec 28, 2014

Yeah, you can't get away with that kind of trickery.

Uses for zfreeze

The original Intention

Used by: Mario Power Tennis, Super Mario Strikers

zfreeze was designed as a way to eliminate zfighting when rendering decals instead other hacks like OpenGL's glPolygonOffset(), but the developers never really use it for that. I suspect it's just too expensive, requiring a new drawcall for every set of decals on a different triangle and developers just manually bias vertices instead.

Going through the list of fifologs which jmc47 collected, there is exactly two games (Mario Tennis and Mario Strikers) which uses zfreeze in it's intended decal rendering mode. Mario Strikers uses it for rendering the shadows onto the field and Mario Tennis uses it for rendering the tennis court lines. But Mario Tennis uses other zfreeze based tricks for it's shadows (which I'll cover below.) so Super Mario Strikers is the only game which can be fixed with that kind of trickery.

Depth override

Used By: Rogue Squadron 2/3, Mario Golf: Toadstool Tour, Blood Omen 2

Most famously used by Rogue Squadron's skyspheres, which are rendered close to the player and zfreeze is used to override the depth and project it out behind all other objects to the zfar plane. This is essentially the same as putting depth = 1.0 in a fragment shader (which is what my hack did), except that in the gamecube this is done triangle setup and early z culling still happens. Factor 5 used this method because putting the skysphere in the distance would take up a huge chunk of the zbuffer range (due to Factor 5 using Hardware Anti-aliasing, they were limited to a 16bit zbuffer) and rendering the skysphere first with zbuffer disabled would cause too much overdraw.

I'm not exactly sure why the other games use zfreeze for doing depth overrides, but they both lock different objects to both the zfar and znear planes.

EA shadows

Used By: Most EA sports games, Mario Power Tennis, Need For Speed: Hot Pursuit 2

Shadows are one of the harder things in 3d graphics, many methods have been developed for dynamic shadows over the years and they all have various tradeoffs. Selection of a shadowing technique depends a lot on the capabilities/performance of the hardware. Doom 3's famous stencil volume shadows produce the best looking results for sharp shadows, but modern hardware isn't optimised for it's excessive stencil operations so most modern games use shadow maps, which modern hardware is really good at doing (but the resolution is generally limited, resulting in pixelated shadows)

The gamecube doesn't have a stencil buffer so it can't do stencil volume shadows. It can kind of do shadow mapping (self-shadowing in Rogue Squadron, shadows in Luigi's Mansion) but most games use other methods.

Most games that I've looked into appear to use a hybrid between planar projection shadows and shadow mapping. Taking advantage of the cheap hardware vertex transformations and cheap framebuffer to ram copies, they render a character or object from the prospective of the light into a framebuffer with all black polygons. The resulting black and white shadow mask is copied to a texture which is carefully stretched across the level geometry with alpha blending to create the illusion of a shadow.

But EA sports games use the older method of pure projection shadows, where the shadow object is projected onto the floor in software (which is easy because the floor of sports games is completely flat) and rendered on the floor. This works fine if you want a pure black shadow, but generally you want an alpha blended shadow, which causes issues when polygons are overlapping. Either you get parts of the shadow which are blended twice, or you get zfighting. Normally the correct solution is render the shadow to the stencil buffer and blend each shadow pixel just once.

But the gamecube doesn't have a stencil buffer. Instead these games enable zfreeze, which ensures that each pixel on the screen will always have an identical depth in the zbuffer if rendered to twice. Then it changes the depth compare method from the usual less than or equal to less than, so each pixel of the shadow can only possibly be drawn once. This essentially creates a 1bit stencil buffer in the depth buffer.

I though Factor 5's use of zfreeze to preserve their limited zbuffer precision was pretty cool, but this shadow method used by EA is absolutely genius.

Edit: On second thought, stencil volume shadows might actually be possible. The alpha buffer with blend logic operations can also be used to emulate a stencil buffer. It supports xor, which is technically enough to implement stencil volume shadows.

@phire
Copy link
Member

phire commented Dec 28, 2014

Bad news.

This PR only works on some GPUs. On my AMD 5770, the shadows in NHL 2003 don't render at all (I assume they are rendering just below the ground)

@neobrain
Copy link
Member

Aside from possible conceptual issues (as pointed out by @phire), the code in this branch looks good to me. However, I stand by my original assertion that I would rather have some solid ground work (hwtests, software renderer implementation) done before prematurely merging this implementation (which has nontrivial effect on VideoCommon's code architecture and hence might make restructuring the affected code harder in the future).

@degasus
Copy link
Member

degasus commented Dec 28, 2014

@phire Thx for the summary of the rendering usages. So this implementation will only work with the middle usage. The other ones will still have z-fighting. IMO it's not possible in such a way to fix the z-fighting at all, neither with hacks. So don't expect any game fixed which don't use z-freeze like the second method.

@neobrain I don't think so. This implementation is well seperated from everything else. So removing / rewriting it won't be harder if we merge it now. But of course the psychological strain will be lower if it's already working in some games... But I see, the correct way to implement z-freeze will likely not share any code with this one. So it's just about whether we want this half-broken implemention for now...

@NanoByte011 Do you want to fix the remaining white-space issues?

@JMC47
Copy link
Contributor

JMC47 commented Dec 28, 2014

Not to undermine actual developers, but I'm totally okay with a half broken implementation that works sometimes vs no implementation that never works.

@phire
Copy link
Member

phire commented Dec 28, 2014

@JMC47 Are you sure NHL 2003 has working shadows and you aren't just confusing the reflections as shadows?

@JMC47
Copy link
Contributor

JMC47 commented Dec 28, 2014

I am sure, but only on D3D. Seems to be the same as Mario Tennis' courts. Unfortunately it seems the reflections draw over the shadows, so they aren't perfect.

gh3e69-2
gh3e69-3

@phire
Copy link
Member

phire commented Dec 29, 2014

Ah ha. I see the issue.

Apparently NHL 2003 is extremely resilient to depth planes at the wrong depth. In OpenGL I can force the reference plane to depth=znear and everything will render correctly (the shadows will leak over the edge of the court, but I couldn't get any of the characters over there.) The game must do a depth clear after rendering the shadows.

Apparently it's even more resilient in DirectX and (and software renderer) where it accepts a depth plane below the court (and so appears behind the reflections). There must be a bug in OpenGL causing depth to be written somewhere above where the reference plane is.

@NanoByte011 Anyway, there is a bug in CalculateZSlope. It's almost the same bug as Software Renderer (I'm assuming @neobrain tested this mostly on games using method two, where the reference planes are perpendicular to the screen), where it calculates the zslope relative to the triangle instead of relative to the screen.

slope_dfdx and slope_dfdy end up correct, but slope_f0 ends up very wrong. Unless the reference triangle is perpendicular to the screen in which case slope_dfdx and slope_dfdy are zero and slope_f0 is correct.

The Pixel shader is using screen relative coordinates to lookup the zplane, so your zslope needs to be generated in screen relative corrdinates too.

@phire
Copy link
Member

phire commented Dec 29, 2014

See #1780 for my attempt at fixing the same bug in software renderer.

Software renderer was using triangle relative coordinates to generate the slope and then triangle relative coordinates to retrieve the depth. But the triangle coordinate space changes for each triangle, giving you the wrong result when zfreeze is enabled.

@NanoByte011
Copy link
Contributor Author

@phire is that not what we are doing with TransformToClipSpace or you referring to something else?

@phire
Copy link
Member

phire commented Dec 29, 2014

TransformToClipSpace gets you close. But you need to go the rest of the way to screen space.

@NanoByte011
Copy link
Contributor Author

yes which is finished off in PixelShaderGen for the final depth value ;)

@phire
Copy link
Member

phire commented Jan 2, 2015

I spend the day debugging this, still haven't worked out all the issues, but here are my notes.

Known issues:

  • Despite the name, ClipPos.xy doesn't contain x and y in clipspace coordinates. It's actually x and y in worldspace coordinates. Possible solution, use rawpos instead, update CalculateZSlope to work in window coordinates.
  • CalculateZSlope generates a zslope from depth values which range from 2^24 (znear) to zero (zfar) Pixelshder gen doesn't divide by 0xffffff so the depth value gets clamped to the range 0.0 to 1.0
  • Most games render their reference planes with cullmode set to cull all, vertex manager currently skips these vertices before loading, which means this PR is using random reference planes.

Mario golf appears to be the only game which doesn't use cullall mode for it's reference planes.

So it's really surprising that this pr works at all. The clamping of 0.0 to 1.0 appears to cancel out a number of other bugs.

phire pushed a commit to phire/dolphin that referenced this pull request Jan 2, 2015
Based on the feedback from pull request dolphin-emu#1767 I have put in most of
degasus's suggestions in here now.

I think we have a real winner here as moving the code to
VertexManagerBase for a function has allowed OGL to utilize zfreeze now
:)

Correct use of the vertex pointer has also corrected most of the issue
found in pull request dolphin-emu#1767 that JMC47 stated.  Which also for me now
has Mario Tennis working with no polygon spikes on the characters
anymore!  Shadows are still an issue and probably in the other games
with shadow problems.  Rebel Strike also seems better but random skybox
glitches can show up.
@phire
Copy link
Member

phire commented Jan 2, 2015

I've made some improvements to this branch in PR #1812

phire pushed a commit to phire/dolphin that referenced this pull request Jan 2, 2015
Based on the feedback from pull request dolphin-emu#1767 I have put in most of
degasus's suggestions in here now.

I think we have a real winner here as moving the code to
VertexManagerBase for a function has allowed OGL to utilize zfreeze now
:)

Correct use of the vertex pointer has also corrected most of the issue
found in pull request dolphin-emu#1767 that JMC47 stated.  Which also for me now
has Mario Tennis working with no polygon spikes on the characters
anymore!  Shadows are still an issue and probably in the other games
with shadow problems.  Rebel Strike also seems better but random skybox
glitches can show up.
@phire
Copy link
Member

phire commented Jan 2, 2015

So PR #1812 is now a pretty much complete zfreeze implementation.

@lioncash
Copy link
Member

lioncash commented Jan 2, 2015

Closing because of phire's branch. Thanks for porting this over by the way :)

@lioncash lioncash closed this Jan 2, 2015
phire pushed a commit to phire/dolphin that referenced this pull request Jan 15, 2015
Based on the feedback from pull request dolphin-emu#1767 I have put in most of
degasus's suggestions in here now.

I think we have a real winner here as moving the code to
VertexManagerBase for a function has allowed OGL to utilize zfreeze now
:)

Correct use of the vertex pointer has also corrected most of the issue
found in pull request dolphin-emu#1767 that JMC47 stated.  Which also for me now
has Mario Tennis working with no polygon spikes on the characters
anymore!  Shadows are still an issue and probably in the other games
with shadow problems.  Rebel Strike also seems better but random skybox
glitches can show up.
@phire phire mentioned this pull request Jan 15, 2015
7 tasks
NanoByte011 added a commit to phire/dolphin that referenced this pull request Jan 22, 2015
Based on the feedback from pull request dolphin-emu#1767 I have put in most of
degasus's suggestions in here now.

I think we have a real winner here as moving the code to
VertexManagerBase for a function has allowed OGL to utilize zfreeze now
:)

Correct use of the vertex pointer has also corrected most of the issue
found in pull request dolphin-emu#1767 that JMC47 stated.  Which also for me now
has Mario Tennis working with no polygon spikes on the characters
anymore!  Shadows are still an issue and probably in the other games
with shadow problems.  Rebel Strike also seems better but random skybox
glitches can show up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
8 participants