New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial port of zfreeze branch (3.5-1729) #1767
Conversation
Initial port of original zfreeze branch (3.5-1729) by NeoBrainX into most recent build of Dolphin. Makes Rogue Squadron 2 very playable at full speed thanks to recent core speedups made to Dolphin. Works on DirectX Video plugin only for now. These are the Game Settings I have for GSWE64.ini under edit config =============== [Core] DSPHLE = False [DSP] EnableJIT [Video_Settings] AspectRatio = 1 FastDepthCalc = False =============== Enjoy! and Merry Xmas!!
Ooooh @phire! |
I'm pretty shocked. Mario Power Tennis - the three regular courts work. The other courts work partially; some objects are wrongfully culled. It's more playable than master. |
Nice work! Just a minor note, please spell my name as "neobrain" in the commit message ;) I'll try to give this a full review once I'm back from vacation. |
@dolphin-emu-bot rebuild |
|
||
PrepareDrawBuffers(stride); | ||
|
||
if (!bpmem.genMode.zfreeze && indices >= 3) | ||
{ |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
Nice work to get this ported. |
Thanks for the code reviews guys... I can review and improve the code based on the feedback. It should be noted that I'm brand new to the code base for dolphin and my c++ is pretty rusty, but I'll continue to get up to speed here so can contribute as much as can! |
If you ever need any help, feel free to pop into IRC on Freenode on the channels #dolphin-emu (for general discussion) or #dolphin-dev (where you'll find more developers/development topics, like zfreeze) |
Based on the feedback from pull request dolphin-emu#1767 I have put in most of degasus's suggestions in here now. I think we have a real winner here as moving the code to VertexManagerBase for a function has allowed OGL to utilize zfreeze now :) Correct use of the vertex pointer has also corrected most of the issue found in pull request dolphin-emu#1767 that JMC47 stated. Which also for me now has Mario Tennis working with no polygon spikes on the characters anymore! Shadows are still an issue and probably in the other games with shadow problems. Rebel Strike also seems better but random skybox glitches can show up.
Ok I've updated and cleaned up the code based on feedback and I think we have a good start on zfreeze now. |
@@ -179,9 +179,14 @@ void VertexManager::vFlush(bool useDstAlpha) | |||
} | |||
|
|||
u32 stride = VertexLoaderManager::GetCurrentVertexFormat()->GetVertexStride(); | |||
|
|||
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
A lot of the bugs I ran into in the earlier build are now fixed. Like the broken cutscenes in RS2/3, especially in OpenGL. On the other hand, D3D and OpenGL produce different results on zfreeze. Mario Tennis on the other hand is weird. The basic courts work in D3D, while OpenGL shows black but have the lines. They appear to produce the same results in the gimmick courts. I mean, it's not like Mario Power Tennis is a great experience with all the zfreeze problems, so it's not horrible to accept some changes in behavior to add a new feature, but it's still not great to see stuff broken on OGL that works on D3D, and stuff working on OGL that is broken on D3D. The other titles, as far as I can tell, are pretty much identical as before. |
As per lioncash request
Ok removed extra whitespace lioncash found. |
Wow JMC47 you are quick at testing things, you really impress me on this! If this zfreeze fix is causing differences between OGL and D3D in Mario Tennis as an example, then it probably has to do with what degasus said about the OGL vertex buffer being loaded into a write-only buffer and not being able to read them afterwards. Though if this is the case, then I would have expected that zfreeze wouldn't work at all in OGL right now, yet it does. |
I have to say, this looks surprisingly good. It's not perfect, but the implementation is clean/isolated and IMO we should really consider getting that merged before branching stable. @neobrain when are you back from vacation? |
My main issue, and what I'd like you to look at if possible, would be trying to get the other courts in Mario Tennis to work. They're the main regression; if those could work; I'd have no qualms giving me support toward merging this. Even with that, I don't see it as a big deal since it was partially broken before, anyway. I can't speak for anyone else though. |
@@ -22,6 +22,7 @@ class VertexManager : public ::VertexManager | |||
protected: | |||
virtual void ResetBuffer(u32 stride) override; | |||
u16* GetIndexBuffer() { return &LocalIBuffer[0]; } | |||
u8* GetVertexBuffer() { return &LocalVBuffer[0]; } |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
@NanoByte011 Yeah, this implementation looks very well right now. We still violate the OGL specs, but I think it will likely work on all drivers. X86 cache modes doesn't allow to prohebit reading, it may just be uncached. So OGL may be slower, but it likely will work as expected. |
2 items pointed out by degasus
Would this affect games that don't use zfreeze? |
It hasn't in my testing. |
@MaJoRoesch Maybe a bit performance wise. Otherwise everything should be fine. |
JMC47 can you measure this before it's merged? See if there are any performance regressions? |
I can measure the performance impact. I'd like to see this merged for the Progress Report if possible, so I'm going to be selfish and stuff and try to push my agenda on everyone. Sorry in advance. |
Just test for any performance impacts first. :) |
In my Melee Benchmark it appears to be about 1.5% slower in OGL. Other games showed no performance regression (RS2/3???), and others I'm not 100% sure on due to needing to take a harder look (SMG1/2) The performance regression disappears at 3x IR. D3D had no performance regression in any game. If anything it benchmarked higher in 3 concurrent runs of Melee and RS2, and ended up even in Super Mario Galaxy. |
OGL has a much bigger slowdown because of the uncached memory access. This slowdown is only on the gpu thread, so higher IR which moves the bottleneck to the gpu itself will reduce this effect. |
@@ -22,7 +22,7 @@ class VertexManager : public ::VertexManager | |||
protected: | |||
virtual void ResetBuffer(u32 stride) override; | |||
u16* GetIndexBuffer() { return &LocalIBuffer[0]; } | |||
|
|||
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This reverts commit 84861b3.
Yeah, you can't get away with that kind of trickery. Uses for zfreezeThe original IntentionUsed by: Mario Power Tennis, Super Mario Strikers zfreeze was designed as a way to eliminate zfighting when rendering decals instead other hacks like OpenGL's glPolygonOffset(), but the developers never really use it for that. I suspect it's just too expensive, requiring a new drawcall for every set of decals on a different triangle and developers just manually bias vertices instead. Going through the list of fifologs which jmc47 collected, there is exactly two games (Mario Tennis and Mario Strikers) which uses zfreeze in it's intended decal rendering mode. Mario Strikers uses it for rendering the shadows onto the field and Mario Tennis uses it for rendering the tennis court lines. But Mario Tennis uses other zfreeze based tricks for it's shadows (which I'll cover below.) so Super Mario Strikers is the only game which can be fixed with that kind of trickery. Depth overrideUsed By: Rogue Squadron 2/3, Mario Golf: Toadstool Tour, Blood Omen 2 Most famously used by Rogue Squadron's skyspheres, which are rendered close to the player and zfreeze is used to override the depth and project it out behind all other objects to the zfar plane. This is essentially the same as putting depth = 1.0 in a fragment shader (which is what my hack did), except that in the gamecube this is done triangle setup and early z culling still happens. Factor 5 used this method because putting the skysphere in the distance would take up a huge chunk of the zbuffer range (due to Factor 5 using Hardware Anti-aliasing, they were limited to a 16bit zbuffer) and rendering the skysphere first with zbuffer disabled would cause too much overdraw. I'm not exactly sure why the other games use zfreeze for doing depth overrides, but they both lock different objects to both the zfar and znear planes. EA shadowsUsed By: Most EA sports games, Mario Power Tennis, Need For Speed: Hot Pursuit 2 Shadows are one of the harder things in 3d graphics, many methods have been developed for dynamic shadows over the years and they all have various tradeoffs. Selection of a shadowing technique depends a lot on the capabilities/performance of the hardware. Doom 3's famous stencil volume shadows produce the best looking results for sharp shadows, but modern hardware isn't optimised for it's excessive stencil operations so most modern games use shadow maps, which modern hardware is really good at doing (but the resolution is generally limited, resulting in pixelated shadows) The gamecube doesn't have a stencil buffer so it can't do stencil volume shadows. It can kind of do shadow mapping (self-shadowing in Rogue Squadron, shadows in Luigi's Mansion) but most games use other methods. Most games that I've looked into appear to use a hybrid between planar projection shadows and shadow mapping. Taking advantage of the cheap hardware vertex transformations and cheap framebuffer to ram copies, they render a character or object from the prospective of the light into a framebuffer with all black polygons. The resulting black and white shadow mask is copied to a texture which is carefully stretched across the level geometry with alpha blending to create the illusion of a shadow. But EA sports games use the older method of pure projection shadows, where the shadow object is projected onto the floor in software (which is easy because the floor of sports games is completely flat) and rendered on the floor. This works fine if you want a pure black shadow, but generally you want an alpha blended shadow, which causes issues when polygons are overlapping. Either you get parts of the shadow which are blended twice, or you get zfighting. Normally the correct solution is render the shadow to the stencil buffer and blend each shadow pixel just once. But the gamecube doesn't have a stencil buffer. Instead these games enable zfreeze, which ensures that each pixel on the screen will always have an identical depth in the zbuffer if rendered to twice. Then it changes the depth compare method from the usual less than or equal to less than, so each pixel of the shadow can only possibly be drawn once. This essentially creates a 1bit stencil buffer in the depth buffer. I though Factor 5's use of zfreeze to preserve their limited zbuffer precision was pretty cool, but this shadow method used by EA is absolutely genius. Edit: On second thought, stencil volume shadows might actually be possible. The alpha buffer with blend logic operations can also be used to emulate a stencil buffer. It supports xor, which is technically enough to implement stencil volume shadows. |
Bad news. This PR only works on some GPUs. On my AMD 5770, the shadows in NHL 2003 don't render at all (I assume they are rendering just below the ground) |
Aside from possible conceptual issues (as pointed out by @phire), the code in this branch looks good to me. However, I stand by my original assertion that I would rather have some solid ground work (hwtests, software renderer implementation) done before prematurely merging this implementation (which has nontrivial effect on VideoCommon's code architecture and hence might make restructuring the affected code harder in the future). |
@phire Thx for the summary of the rendering usages. So this implementation will only work with the middle usage. The other ones will still have z-fighting. IMO it's not possible in such a way to fix the z-fighting at all, neither with hacks. So don't expect any game fixed which don't use z-freeze like the second method. @neobrain I don't think so. This implementation is well seperated from everything else. So removing / rewriting it won't be harder if we merge it now. But of course the psychological strain will be lower if it's already working in some games... But I see, the correct way to implement z-freeze will likely not share any code with this one. So it's just about whether we want this half-broken implemention for now... @NanoByte011 Do you want to fix the remaining white-space issues? |
Not to undermine actual developers, but I'm totally okay with a half broken implementation that works sometimes vs no implementation that never works. |
@JMC47 Are you sure NHL 2003 has working shadows and you aren't just confusing the reflections as shadows? |
Ah ha. I see the issue. Apparently NHL 2003 is extremely resilient to depth planes at the wrong depth. In OpenGL I can force the reference plane to depth=znear and everything will render correctly (the shadows will leak over the edge of the court, but I couldn't get any of the characters over there.) The game must do a depth clear after rendering the shadows. Apparently it's even more resilient in DirectX and (and software renderer) where it accepts a depth plane below the court (and so appears behind the reflections). There must be a bug in OpenGL causing depth to be written somewhere above where the reference plane is. @NanoByte011 Anyway, there is a bug in CalculateZSlope. It's almost the same bug as Software Renderer (I'm assuming @neobrain tested this mostly on games using method two, where the reference planes are perpendicular to the screen), where it calculates the zslope relative to the triangle instead of relative to the screen. slope_dfdx and slope_dfdy end up correct, but slope_f0 ends up very wrong. Unless the reference triangle is perpendicular to the screen in which case slope_dfdx and slope_dfdy are zero and slope_f0 is correct. The Pixel shader is using screen relative coordinates to lookup the zplane, so your zslope needs to be generated in screen relative corrdinates too. |
See #1780 for my attempt at fixing the same bug in software renderer. Software renderer was using triangle relative coordinates to generate the slope and then triangle relative coordinates to retrieve the depth. But the triangle coordinate space changes for each triangle, giving you the wrong result when zfreeze is enabled. |
@phire is that not what we are doing with TransformToClipSpace or you referring to something else? |
TransformToClipSpace gets you close. But you need to go the rest of the way to screen space. |
yes which is finished off in PixelShaderGen for the final depth value ;) |
I spend the day debugging this, still haven't worked out all the issues, but here are my notes. Known issues:
Mario golf appears to be the only game which doesn't use cullall mode for it's reference planes. So it's really surprising that this pr works at all. The clamping of 0.0 to 1.0 appears to cancel out a number of other bugs. |
Based on the feedback from pull request dolphin-emu#1767 I have put in most of degasus's suggestions in here now. I think we have a real winner here as moving the code to VertexManagerBase for a function has allowed OGL to utilize zfreeze now :) Correct use of the vertex pointer has also corrected most of the issue found in pull request dolphin-emu#1767 that JMC47 stated. Which also for me now has Mario Tennis working with no polygon spikes on the characters anymore! Shadows are still an issue and probably in the other games with shadow problems. Rebel Strike also seems better but random skybox glitches can show up.
I've made some improvements to this branch in PR #1812 |
Based on the feedback from pull request dolphin-emu#1767 I have put in most of degasus's suggestions in here now. I think we have a real winner here as moving the code to VertexManagerBase for a function has allowed OGL to utilize zfreeze now :) Correct use of the vertex pointer has also corrected most of the issue found in pull request dolphin-emu#1767 that JMC47 stated. Which also for me now has Mario Tennis working with no polygon spikes on the characters anymore! Shadows are still an issue and probably in the other games with shadow problems. Rebel Strike also seems better but random skybox glitches can show up.
So PR #1812 is now a pretty much complete zfreeze implementation. |
Closing because of phire's branch. Thanks for porting this over by the way :) |
Based on the feedback from pull request dolphin-emu#1767 I have put in most of degasus's suggestions in here now. I think we have a real winner here as moving the code to VertexManagerBase for a function has allowed OGL to utilize zfreeze now :) Correct use of the vertex pointer has also corrected most of the issue found in pull request dolphin-emu#1767 that JMC47 stated. Which also for me now has Mario Tennis working with no polygon spikes on the characters anymore! Shadows are still an issue and probably in the other games with shadow problems. Rebel Strike also seems better but random skybox glitches can show up.
Based on the feedback from pull request dolphin-emu#1767 I have put in most of degasus's suggestions in here now. I think we have a real winner here as moving the code to VertexManagerBase for a function has allowed OGL to utilize zfreeze now :) Correct use of the vertex pointer has also corrected most of the issue found in pull request dolphin-emu#1767 that JMC47 stated. Which also for me now has Mario Tennis working with no polygon spikes on the characters anymore! Shadows are still an issue and probably in the other games with shadow problems. Rebel Strike also seems better but random skybox glitches can show up.
Initial port of original zfreeze branch (3.5-1729) by NeoBrainX into
most recent build of Dolphin.
Makes Rogue Squadron 2 very playable at full speed thanks to recent core
speedups made to Dolphin. Works on DirectX Video plugin only for now.
These are the Game Settings I have for GSWE64.ini under edit config
[Core]
DSPHLE = False
[DSP]
EnableJIT
[Video_Settings]
AspectRatio = 1
FastDepthCalc = False
Enjoy! and Merry Xmas!!