Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GSdx HW Bug] Crash of the Titans/ Tag Team Racing #1920

Closed
FahrentH8t opened this issue Apr 27, 2017 · 34 comments

Comments

Projects
None yet
7 participants
@FahrentH8t
Copy link

commented Apr 27, 2017

PCSX2 version:
Stable build [v1.4.0] and v1.5.0-dev-2016-g201c9cd

PCSX2 options:
Defaults, no settings modifications affect the issue

Plugins used:
Bound GS: GSdx32-SSE2.dll [GSdx 20170423160019 (MSVC 19.00 SSE2/SSSE3) 1.1.0]
Bound PAD: LilyPad.dll [LilyPad (20170423160019) 0.12.1]
Bound SPU2: Spu2-X.dll [SPU2-X 20170423160019 2.0.0]
Bound CDVD: cdvdGigaherz.dll [cdvdGigaherz 20170423160019 0.11.0]
Bound USB: USBnull.dll [USBnull Driver 20170423160019 0.7.0]
Bound FW: FWnull.dll [FWnull Driver 20170423160019 0.7.0]
Bound DEV9: DEV9null.dll [DEV9null Driver 20170423160019 0.5.0]

-Plugin settings: Tested GSdx DX9/11 hardware and software renderers. The issue is fixed with a DX9/11 software renderer of GSdx

Description of the issue:
Every character ingame and cutscenes have somekinda shadow/glitch aronud body
DX9/11 [HW]
gsdx_20170427213709
gsdx_20170427213416
DX9/11 [SW]
gsdx_20170427213957

Dump:Here

How to reproduce the issue:
Load character ingame and cutscence

Last known version to work:
Issue in v.1.4.0 OGL [HW] is fine
but DX9/11/OGL in v1.5.0-dev-2016-g201c9cd have problem in [HW] mode

PC specifications:
CPU: Pentium(R) Dual-Core E5700 GPU: AMD Radeon 5570 HD OS: Windows 7 64bit``

@FlatOutPS2

This comment has been minimized.

Copy link
Member

commented Apr 27, 2017

Set the renderer to OpenGL (Hardware)

@FahrentH8t

This comment has been minimized.

Copy link
Author

commented Apr 27, 2017

@FlatOutPS2 issue in v1.4.0 with OGL [HW] is fixed
but v1.5.0-dev-2016-g201c9cd have problem in OGL [HW]

@gregory38

This comment has been minimized.

Copy link
Contributor

commented Apr 27, 2017

I'm surprised that 1.4 is working. Do you have shadows or nothing?

@gregory38

This comment has been minimized.

Copy link
Contributor

commented Apr 28, 2017

I need to check the dump, but it might miss a CRC hack as Jak/R&C series

@FahrentH8t

This comment has been minimized.

Copy link
Author

commented Apr 28, 2017

I'm surprised that 1.4 is working. Do you have shadows or nothing?

Nah,everything is default ,and issue also happen in Crash Tag Team Racing
DX9/11/OGL [HW]
gsdx_20170428215401
DX9/11/OGL [SW]
gsdx_20170428214034
issue is fine with OGL [HW] in v1.4.0,like crash of the titans

@FlatOutPS2

This comment has been minimized.

Copy link
Member

commented Apr 28, 2017

and issue also happen in Crash Tag Team Racing

That's what I thought and at least CTTR should not have this issue with OGL HW in recent v1.5.0 git builds. Can you post the log right after playing the game? Go to Config > Plugin/BIOS Selector > Folders tab, click "Open in Explorer" at Logs/Dumps and upload the emuLog.txt here.

@FahrentH8t

This comment has been minimized.

Copy link
Author

commented Apr 28, 2017

@gregory38

This comment has been minimized.

Copy link
Contributor

commented Apr 30, 2017

I don't see anything on the first post dump. I barely see any shadows actually. It would be better to have a dump with big shadows.

@FlatOutPS2

This comment has been minimized.

Copy link
Member

commented Apr 30, 2017

I don't see anything on the first post dump. I barely see any shadows actually. It would be better to have a dump with big shadows.

Yes, it just checked the dump and got the same results. It seems to show the chapter title and some characters far away in the background. There's nothing indicating there's something else wrong in the log so a good gs dump will be necessary to find out what's wrong.

@lightningterror

This comment has been minimized.

Copy link
Member

commented Dec 1, 2017

@FahrentH8t Any update on this ? A new gs dump or block dump with a savestate would be nice.

@lightningterror

This comment has been minimized.

Copy link
Member

commented Jan 22, 2018

OpenGL HW seems to be fine or my mind is playing tricks on me. I think I saw it happen once but wasn't able to reproduce it or I had d3d set I dunno.

Crash Of The Titans.zip

@FlatOutPS2

This comment has been minimized.

Copy link
Member

commented Jan 22, 2018

I couldn't reproduce it in Crash Tag Team Racing back then either. I guess something was wrong on the side of the user (disable depth emulation hack enabled?).

@lightningterror

This comment has been minimized.

Copy link
Member

commented Jan 22, 2018

Nah, same goes for blend. Maybe I had d3d selected which I can't remember xD

@lightningterror

This comment has been minimized.

Copy link
Member

commented Jan 23, 2018

I narrowed down the fix. It was introduced in #688 so I guess this information will help when porting the code to D3D someday 😛

Note: I'm not able to get the bug to appear on ogl even when disabling blending and depth.

@lightningterror lightningterror changed the title [GSdx HW Bug] Crash of the Titans [GSdx HW Bug] Crash of the Titans/ Tag Team Racing May 1, 2019

@hibye8313

This comment has been minimized.

Copy link
Contributor

commented May 22, 2019

So I did some investigating. This problem is being caused by incorrect formatting when drawing to PSMCT16/PSMCT16s format. The game will do some offscreen shadow drawing in a 16-bit framebuffer to create Crash's shadow (in Tag Team Racing). This is a standard shawdow algorithm where it uses framebuffer as a counter to decide which areas are in shadow and which are not. Basically it will set COLCLAMP=0 so that colors will be wrapped and not clamped (ie instead of a blended color of 0x123 becoming 0xff it will become 0x23) and sets the blending mode so that it is just additive blend (Cd + Cs). Then it will draw the shadow volume with colors 0xf8f8f8 (basically -8) for back facing polys to subtract and 0x080808 for front facing polys to add. (I think similar methods are used in other games like Castlevania or so according to comments in the code). The problem is that instead of actually using color 0xf8 for subtracting it uses 0xff. This is ok on the actual GS because the format is 16 bits and so the lower 3 bits of RGB are thrown away after the blend so it becomes the same as adding 0xf8. Problem is that the render targets we use are always 32 bits even when game is using 16 bits so those extra 3 garbage bits start accumulating errors after multiple overlapping draws.

That's the problem but unfortunately I don't have a good solution for it. First I thought this was easily solved by changing the render target to just be 16 bits to match the GS. DX11 does have a format BGR5_A1 where the RGB channels are stored in reverse order (why IDK). This itself might not have been so bad, we could have reversed the byte order when needed after drawing but unfortunately there is a worse problem. After a primitive is drawn, DX11 converts colors between different formats in a completely different way from the GS. The GS just truncates the left over bits (ie. 0xff becomes 0xf8, 0x5 becomes 0x0) but DX11 will round to the nearest values in the new format (ie. 0x3 becomes 0x0, 0x5 becomes 0x8, I think it does it this way so that full white 0xff can be represented in lower bit formats, which cannot be done in 16 bits on the GS) so unfortunately there's no way to make this work unless someone knows how to change the rounding behavior in DX11 (or OpenGl for that matter where I think the rounding method is unspecified)?

I think that the only way to do this correctly is to have 32 to 16 bit conversion done in the shader which will also require SW blending to be done in the shader since the blending must occur before format conversion. This will require sampling from the render target and breaking up the vertex list into separate prims and drawing each prim separately (like in the OGL renderer with m_require_full_barrier = true, see GSRendererOGL.cpp). There are probably some exceptions/hacks that can be done in certain situations like with accumulation blend where it doesn't matter if we blend before or after format conversion. In the Tag Team Racing case it is an accumulation blend so would be easy to put in a fix that tests for accumulation blend and if so will format in the shader (I attached a hack in the shader that will work in this situation just for testing but may not work in general if eg. a game uses fbmask, see below). Currently DX11 does not have SW blending implemented so it can't be done fully accurately. Side note: On OGL I think the behavior is correct on accident because SW blending is enabled for the color wrapping (COLCLAMP=0) and it is an accumulation blend although this will not work on all cases with 16 bit colors.

TLDR
I don't think there currently is fully accurate 16 bit color handling on either OGL or DX11. On OGL it should be fairly easy to implement by because SW blending and other things are already implemented. On DX11 it will probably be a bit harder but a lot of it can be copied over from OGL (I don't know for sure but DX11 will probably be slower also since it doesn't allow using the render target as a texture, it must be copied every time to a new texture to use). I wanted to get input from some of the devs, is it a good idea to implement SW blending in DX11 and fix the 16-bit accuracy in both OGL/DX11? If anyone thinks it is a good idea I could start working on SW blending for DX11 to see if it works. I don't know how frequently 16 bit is used in many games and in how many cases the accuracy really matter. Any help/suggestions are greatly appreciated.

Side note: I don't think fbmask is handled 100% accurately in OGL (and on DX11 not at all). In the case when the HDR algorithmn is used for accumulation blend and fbmask is enabled, fbmask will be done on SW but half the blending is done on SW and half on HW so the fbmask actually would happen half way between blending which is incorrect (it should be done after all blending). I think this would be easy to fix by just checking if fbmask is needed, then disable the HDR algo if so. However, I'm guessing it is very rare for this to actually happen.

Changes are at lines 917-919 on the attached file (tfx.fx)
hack-fix-tag-team-racing-shadows.zip

@gregory38

This comment has been minimized.

Copy link
Contributor

commented May 22, 2019

Gl side

Current code is quite messy. But I think we should use sw blending for colclamp when it is already enabled for fbmask.
Accu blend should work too, we should mask the 3 lower bits.
I.e. shader should output "alpha * color & 0xF8". The addition on blending unit won't change the lower bit. The only issue is the initial value which might not be 0 for lsb bits.

@hibye8313

This comment has been minimized.

Copy link
Contributor

commented May 22, 2019

I'm not sure because I have not tested (don't have the specs for OGL) but it looks like the accu blend/HDR algo is used even if fbmask is enabled if you look at GSRendererOGL::EmulateBlending() line 530 - 535

else if (accumulation_blend) {
// A fast algo that requires 2 passes
GL_INS("COLCLIP Fast HDR mode ENABLED");
m_ps_sel.hdr = 1;
sw_blending = true; // Enable sw blending for the HDR algo
} else

there is no check for if fbmask is enabled. Later in the function (line 563 - 574) there is:

if (accumulation_blend) {
// Keep HW blending to do the addition/subtraction
dev->OMSetBlendState(blend_index, 0, false, true);
if (ALPHA.A == 2) {
// The blend unit does a reverse subtraction so it means
// the shader must output a positive value.
// Replace 0 - Cs by Cs - 0
m_ps_sel.blend_a = ALPHA.B;
m_ps_sel.blend_b = 2;
}
// Remove the addition/substraction from the SW blending
m_ps_sel.blend_d = 2;

again no check for fbmask, so I think accu blend/HDR will be enabled even when there is fbmask. Won't this give incorrect results?

I was worried about the initial value also of the lsb initially but I think it should be alright because when the frame buffer is changed from 32 bits to 16 bits it should update the contents from GS local memory so the formatting should be correct (in theory...). If all the draws from then on are formatted correctly then there should be no problems.

@gregory38

This comment has been minimized.

Copy link
Contributor

commented May 22, 2019

I think it was something like full_barrier (because of fbmask) set sw blending. But you right it seems to check accu before sw blending.
What is the value of fbmask for the bad draw ?

@hibye8313

This comment has been minimized.

Copy link
Contributor

commented May 22, 2019

Unfortunately I'm not able to actually test the OGL renderer because my computer does not support it, so I don't actually know if there are games where it actually does fail (how often does a game use fbmask, colclamp=0 wrapping, and accu blend at the same time?). But if it is the way I think it does it should fail for even simple cases like fbmask = 0x00000002, blend eq = Cs + Cd, accu blend enabled. Then for example take Cs = 0x00000001, Cd = 0x00000001. Then SW blend will give:

  1. ps_blend() outputs: C = Cs = 0x1
  2. ps_fbmask() outputs: C = C & ~0x10 = 0x1 //incorrect order
  3. Hardware blend outputs: C = C + Cd = 0x1 + 0x1 = 0x2
  4. 0x2 get written to frame buffer incorrectly

Correct order with full SW blend should be

  1. ps_blend() outputs: C = Cs + Cd = 0x1 + 0x1 = 0x2
  2. ps_fbmask() outputs: C = C & ~0x2 = 0x2 & ~0x2 = 0x0
  3. Hardware does nothing, outputs: C = 0x0
  4. 0x0 written to frame buffer

I think it should be an easy fix in EmulateBlend() just add a check for if fbmask enabled then set accumulation_blend = false if so. But since I cannot test it I don't want to make any changes to OGL. Also, I think that an easy way to fix the 16 bit handling would be to just set fbmask = (fbmask | 0x7f070707) (mask of lower 3 bits of RGB, lower 7 of A) since it should essentially be exactly the same behavior. The full_barrier etc. will be needed for 16 bits just like for fbmask. I think it can just be added to EmulateTextureShuffleAndFbmask() line 288 right before the "if (m_ps_sel.fbmask) {":

if ((m_context->FRAME.PSM == PSM_PSMCT16) ||
(m_context->FRAME.PSM == PSM_PSMCT16S))
{
m_ps_sel.fbmask = true;
fbmask_v = fbmask_v | GSVector4i(0x7f070707));
}

and in EmulateBlending() line 520 just before "if (m_env.COLCLAMP.CLAMP == 0) {" I think should be added:
if (m_ps_sel.fbmask) accumulation_blend = false;

@gregory38

This comment has been minimized.

Copy link
Contributor

commented May 22, 2019

I think you take the issue the wrong way.

Gsdx is full of these "bugs". But it is just too slow to handle it like that.
If I were you, I would implement the accu_blend part of gl to dx.
Fbmask means to read the frame buffer. Slow !
Accu blend allow to round/trunc in shader but without the need to read the framebuffer. Fast :)
I don't say that game won't use fbmask + colclamp +.... But there is a high probability they won't.

@hibye8313

This comment has been minimized.

Copy link
Contributor

commented May 22, 2019

Well you're right, of course, the HW renderers are a compromise between speed and accuracy so makes sense that we have to take short cuts sometimes for speed, otherwise might as well use the SW renderer.

Ok, I will try to start porting over the SW blending to DX11 using OGL as a reference. Shouldn't be too hard since many of the features between them are 1-to-1 but I am by no means an advanced DX11 coder so could take a while. Thanks for the feedback.

@gregory38

This comment has been minimized.

Copy link
Contributor

commented May 22, 2019

I think the important part is accu blend. Sw blending might be impossible on dx as you can't read framebuffer. But accu blend is doable. And it could be enough. If it doesn't fix this game. I can tell you it will fix others games.

@hibye8313

This comment has been minimized.

Copy link
Contributor

commented May 23, 2019

Ok I see that you mean so to do SW blend on DX11 it would require copying the framebuffer on every single prim which I'm guessing would be very slow (much slower than using glTextureBarrier()?). Ok, I'll work on accu blend then. Also I had an idea for an algorithm for drawing overlapping prims, let me know what you think:

Setup the stencil buffer so that operation is increment so keeps track of number of overlapping draws on the pixel. Stage 1: Clear stencil to 0 and draw prims w/ stencil op ALWAYS and disable color/depth update etc. Read back stencil buffer on CPU and in bounding box of prims find the max overlap value n. This will be the number of passes to do on the next stage. Stage 2: Do a pass for each i = 1 .. n, to draw the i'th layer. On the i'th pass clear stencil to 0, set the stencil ref value to be i-1 and make the operation EQUALS and the update method NO_UPDATE. Then draw all the prims like normal. So only the i'th overlapping pixel is drawn on this pass and since all layers 1 to i-1 have already been drawn the order is correct, also none of the pixels on this pass overlap since they are at the same layer.

Problem with this approach is that we draw all the prims (n+1)*(# of prims) time (and also have to read the stencil buffer on the CPU once) vs. just drawing each prim once. (We could also compute a separate bounding/scissor box for each i'th layer when we read back the stencil buffer, might be a bit faster). But # of barriers/passes is n+1 instead of # of prims. Also works for any kind of prim not just sprites. Do you think this would give better or worse perf, or is completely wrong?

Edit: Didn't think of it before but the advantage of this method is that it might give much better perf in SW blending for a long list of non-overlapping (or only a little overlapping) tris (current implementation only checks for non overlapping sprites, and always draws tris with full barrier). Also after the first pass over the stencil buffer we could do a check to see if # overlaps is too large then just fall back on the original method w/ full barriers if so. However this still has the overhead of drawing to stencil, reading the stencil to the CPU, and iterating over all pixels of stencil (in bounding box) on every draw. Might be worth it though IDK.

(unrelated) Random question: In the DATE_GL42 test on OGL it uses imageLoad/imageStore to keep track of data shared between all calls of the pixel shader. If perf for this is good why cannot we just emulate the framebuffer as a texture and do imageLoad/imageStore to write to it and do full SW blending, instead of using HW blend unit and binding an actual framebuffer? (I'm guessing either perf is not so good or the order of pixel shaders is not guaranteed to be same as the input poly order)

@gregory38

This comment has been minimized.

Copy link
Contributor

commented May 23, 2019

I stop reading at "read stencil with CPU". Obviously you miss some knowledge on GPU. A read from CPU is the slowest operation. Never do that.

Fragment shaders run out of order. So obviously you can't do blending here. However recent GPUs gives you the possibility to force order. But we need recent GPU, recode everything. And perf impact is unknown. Why do you think GPU maker spend transistors for fixed blending unit.

Current dx is a dead end.

@gregory38

This comment has been minimized.

Copy link
Contributor

commented May 23, 2019

To conclude. Accu blend is a special case that can be done easily. Otherwise solution is to get an Nvidia card and uses OpenGL.

@hibye8313

This comment has been minimized.

Copy link
Contributor

commented May 23, 2019

I see, thanks for the explanation. I don't know much about GPU coding so wasn't sure how costly it actually is to read from stencil buffer. But do you think similar approach would work on the CPU for checking prim overlap and breaking into non-overlapping groups? ie. We rasterize tris on CPU (just to check for pixel coverage, not color, texture, etc.) to find if it overlaps with previous group, if so start a new group. Similar to bounding box algorithmn on sprites but works for all prims at expense of speed. Might allow to detect when we can skip full barrier though. Would at all be worth the effort do you think?

@lightningterror

This comment has been minimized.

Copy link
Member

commented May 24, 2019

Unfortunately I'm not able to actually test the OGL renderer because my computer does not support it.

@hibye8313 maybe you can use Mesa3D for Windows.
https://fdossena.com/?p=mesa/index.frag

No Texture Barrier support but for accu blend testing it could be enough. I'm seeing good results on my end on Castlevania.

Edit: Also what is the issue with Opengl not being able to run on your system, skylake igpu should be fully compatible.
Can you add debug_opengl = 1 in your gsdx ini and see what the logs say?

@gregory38

This comment has been minimized.

Copy link
Contributor

commented May 24, 2019

A rasterizer isn't free (but better than reading stencil). My personal feelings is
There are 3 kinds of draw

  • Post processing. Based on sprite. Often uses a paving. Current detection could be improved. Those draws are often the bad one.
  • Random primitives a bit everywhere. Some overlapping.
  • Lots of primitives in same area (model, shadows, ...). Overlaps like hell.

Eventually more hardware will support ROV/shader interlock (aka give order guarantee in shader). It need to benchmarked. But I'm pretty sure it would be better in various case.

So we can spend time trying to find a way to improve sw blending on old HW. But there is no such a thing as a good solution. All comes with various trade off. Or screw that, we dish it. And do a new future based renderer.

(Old HW) SW blending on dx is useless. Better give gl multi-threading for not Nvidia user (speed). On GL maybe it can be improved but my feeling is that it will be barely better.

@hibye8313

This comment has been minimized.

Copy link
Contributor

commented May 24, 2019

@lightningterror Ok I feel stupid now. I just tested again with OpenGL and everything works fine now. I updated drivers some time back but maybe I forgot to restart computer or something IDK. But yeah no shadow glitches with OpenGL at all.

3.x GL context successfully created
Failed to find glGetnCompressedTexImage
Failed to find glGetnTexImage
Failed to find glGetnUniformdv
OpenGL information. GPU: Intel(R) HD Graphics 520. Vendor: Intel. Driver: - Build 22.20.16.4771
INFO: GL_ARB_sparse_texture is NOT SUPPORTED
INFO: GL_ARB_sparse_texture2 is NOT SUPPORTED
INFO: GL_ARB_gpu_shader5 is available
INFO: GL_ARB_shader_image_load_store is available
INFO: GL_ARB_compute_shader is available
INFO: GL_ARB_shader_storage_buffer_object is available
INFO: GL_ARB_texture_view is available
INFO: GL_ARB_vertex_attrib_binding is available
INFO: GL_ARB_clear_texture is available
INFO: GL_ARB_multi_bind is available
INFO: GL_ARB_direct_state_access is available
INFO: GL_ARB_texture_barrier is available
INFO: GL_ARB_get_texture_sub_image is available

Current Renderer: OpenGL (Hardware renderer)

@gregory38 Ok I think I see what you are saying. So I guess it would not really be worth the effort to invest much time into the DX11 plugin since the API is being outdated and is really only being used for supporting older HW (and SW blend is not used enough to really improve perf all that much). I didn't realize it earlier but I see now that most of the active development is on the OGL side and I'm guessing what most user are using nowadays. I really don't know enough at all to attempt porting the renderer to DX12 (I'm guessing that would be a big task) so I will try to just work on porting the accu blend over to DX11.

Thanks for answering stupid questions :)

@gregory38

This comment has been minimized.

Copy link
Contributor

commented May 24, 2019

OpenGL is a super set of dx11. Besides I'm a Linux user. Hence the big improvement of GL.

Recently, Lt manages to port most of GL goodies to DX. It misses a few stuff (such asaccu blend) but otherwise API doesn't have any more juices.

@hibye8313

This comment has been minimized.

Copy link
Contributor

commented May 27, 2019

Ok I ported over accu blend to Dx11 which seems to fix the problem with shadows in Tag Team Racing but I haven't tested it thoroughly. I'm new to Github so I'm not sure of the process for submitting pull requests (I'm guessing I can't until someone gives me permissions). So I made a fork at https://github.com/hibye8313/pcsx2. It's mostly a copy of the OGL side and I tried to make a few changes to structure so that DX11 more closely mirrors OGL. There are some small changes to code structure on OGL also, I hope this is ok. Can one of the devs look over the changes please and see if they are acceptable for merging?

@tadanokojin

This comment has been minimized.

Copy link
Member

commented May 27, 2019

Didn't look at the code
I suggest you create a new branch, push that to your fork and then send us the PR from that branch.

Process is you submit pull request -> we review the code
If it's acceptable we will merge it
If changes need to be made we will ask you to make them.

@hibye8313

This comment has been minimized.

Copy link
Contributor

commented May 27, 2019

Ah, so thats how you do it. Thank you, I made the separate branch and submitted the PR correctly this time I think.

@lightningterror

This comment has been minimized.

Copy link
Member

commented Jun 13, 2019

Closing, fixed in #2973 on d3d11.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.