Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vulkan: Project Diva F broken graphics (AMD specific) #2129

Closed
Nezarn opened this issue Sep 7, 2016 · 73 comments
Closed

Vulkan: Project Diva F broken graphics (AMD specific) #2129

Nezarn opened this issue Sep 7, 2016 · 73 comments

Comments

@Nezarn
Copy link

@Nezarn Nezarn commented Sep 7, 2016

rpcs3_2016-09-07_13-23-08
rpcs3_2016-09-07_13-23-24

Using AMD RX 480 with latest Crimson Driver. (16.8.3 Hotfix)
OGL\DX12 works fine.

shaders: https://www.dropbox.com/s/zj44rrm6w4de8gd/shaderlog_amd_rx480.zip?dl=0

@raven02
Copy link
Contributor

@raven02 raven02 commented Sep 7, 2016

Looks like It is AMD specific .Nvidia looks fine using Vulkan backend.

untitled

@raven02
Copy link
Contributor

@raven02 raven02 commented Sep 7, 2016

Would it be latest driver issue?

@raven02 raven02 changed the title Vulkan: Project Diva F broken graphics Vulkan: Project Diva F broken graphics (AMD specific) Sep 7, 2016
@Nezarn
Copy link
Author

@Nezarn Nezarn commented Sep 7, 2016

@raven02 same with 16.7.3

@raven02
Copy link
Contributor

@raven02 raven02 commented Sep 7, 2016

I see .Which commit is last working for you?

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Sep 7, 2016

idk, im still testing, so far i didn't found any working commit (and just got this card, so it will take a while to find a working commit xD)

edit: i've tested as back as 0.0.0.9 merge commit~~, so i guess this is some kinda driver issue~~
edit2: it looks like it affects only Project Diva F and F 2nd, just tried Ar Tonelico Qoga, it works fine

@RainKikyou
Copy link

@RainKikyou RainKikyou commented Sep 7, 2016

@raven02 http://tieba.baidu.com/p/4572281750
In this post, you can look at, I was a few months ago when the AMD gpu would face such a problem, but others did not amd gpu, with this gpu yields about?
a2f6452309f790529a38c2e50bf3d7ca7acbd59c
888361d9f2d3572cebd60b078d13632763d0c38c

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Sep 9, 2016

@raven02 after testing a lot (starting from when Vulkan was added to rpcs3), finally found what breaks it. #1630

@raven02
Copy link
Contributor

@raven02 raven02 commented Sep 9, 2016

@kd-11 , did u get the same corruption on your AMD system ?

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Sep 10, 2016

This game has always worked okay for me. I cant test current drivers/master for the next few days, but it could be a new driver bug.

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Sep 10, 2016

@Nezarn Could be a RX 480 issue. I've submitted several vulkan bugfixes for project diva F (including the invisible model fix) and I was always using a 270x and it was fine. I guess the 480 diver is quite different from the older GCNs. If someone can confirm that it is different on older GPUs, it could be worth reporting to AMD.

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Sep 10, 2016

@kd-11 so in short, its broken on GCN that is newer than GCN2. (R9 290 is GCN2 it works fine, since my friend tested, R9 380 is broken (as you can see above), that is GCN3, and my card is GCN4)
my friends video + gpu-z:
https://www.youtube.com/watch?v=wVdmVJINn-M
http://i.imgur.com/v9uilQX.png

@RainKikyou
Copy link

@RainKikyou RainKikyou commented Sep 11, 2016

Is it really so? I mentioned above that post in the look fine is r9 200 series (260X)

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Sep 11, 2016

@Nezarn Yes. It seems to be a change in how textures are handled with newer gcn. Gcn 3 and newer have the new color cell compression that started with the tonga (285). It might be a driver bug, or we violate spec in a subtle manner. Is there any output if you enable debug reporting? If not, then it is probably a driver bug.

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Sep 11, 2016

@kd-11 what do i need to make debug output work? im getting this (or this is expected if its a driver bug?)

F LDR: class std::runtime_error thrown: Assertion failed! Result is FFFFFFFAh

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Sep 11, 2016

You have to install the latest vulkan sdk. Download it from the lunarg site (you dont need to sign up, the link is at the bottom of the page)

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Sep 11, 2016

@kd-11

E {rsx::thread} RSX: ERROR: [SC] Code 15 : Shader requires VkPhysicalDeviceFeatures::shaderClipDistance but is not enabled on the device

and from the log

·W {rsx::thread} RSX: WARNING: [SC] Code 2 : FS writes to output location 1 with no matching attachment
·W {rsx::thread} RSX: WARNING: [SC] Code 2 : FS writes to output location 2 with no matching attachment
·W {rsx::thread} RSX: WARNING: [SC] Code 2 : FS writes to output location 3 with no matching attachment

full log+shaders: https://www.dropbox.com/s/nz7vc0b8m167oeg/rpcs3_amdbug.zip?dl=0

edit: running vkinfo shows that shaderClipDistance = 1 so it should be enabled, right?
http://i.imgur.com/O3a2XUb.png also after some quick google, found this KhronosGroup/Vulkan-LoaderAndValidationLayers#298

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Sep 11, 2016

That a bug in the game itself or the renderpass selection code. I'll look into it, although I have a feeling that might not be the cause of the problems we are having here. We should still disable writing to non-existent attachments though.

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Sep 12, 2016

@kd-11 idk if you have read the log, but yesterday i missed to mention 2 additional error:

·E RSX: ERROR: [Swapchain] Code 31 : vkGetPhysicalDeviceSurfaceSupportKHR() called before calling the vkGetPhysicalDeviceQueueFamilyProperties function.
·E RSX: ERROR: [ParameterValidation] Code 5 : vkCreateDevice: parameter pCreateInfo->flags must be 0
·E RSX: ERROR: [ParameterValidation] Code 5 : vkCreateDevice: parameter pCreateInfo->pQueueCreateInfos[0].flags must be 0

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Sep 12, 2016

Well, those shouldn't affect the output like this. If a field can only be one value,it is reserved for future use. I'll add that to the other fix and submit soon.

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Sep 15, 2016

@kd-11 aside from project diva games, i have this same graphical issue in 1942: Joint Strike too.
rpcs3_2016-09-15_17-25-13

And this game just has the errors from my last comment. (that error code 31 and 5) (oh and also only ingame is broken like that, menu looks fine...))

What else can i do to help track down this issue? Or should we just ask AMD what kinda weird stuff are they doing on GCN3 and 4 in Vulkan?

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Sep 15, 2016

I'll update the backend to stricly comply with spec, before we get to filing bugs with AMD. Although, the swapchain issue not being present on other GPUs means a different path is being used with newer GCN

@mirh
Copy link

@mirh mirh commented Sep 15, 2016

CodeXL should be able to analyze frames.

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Sep 15, 2016

Tracing driver inconsistency doesnt work. The framedump will show up correctly on my PC and wrong on another one.
The issue here i'm suspecting is a bad (non-linear?) image format. I should have some free time in a few days, and I'll look into this as well as finish up the vertex texture stuff.

@mirh
Copy link

@mirh mirh commented Sep 15, 2016

Tracing driver inconsistency doesnt work. The framedump will show up correctly on my PC and wrong on another one.

It's not like there aren't the right tools.

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Sep 15, 2016

You misunderstand what I meant in my comment. I do have both CodeXL and PerfStudio installed; I'm just pointing out that there are cases where they aren't optimal.

@mirh
Copy link

@mirh mirh commented Sep 15, 2016

Sorry, but can't you see how shaders are going to run on different architecture cards in Analyze mode?

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Sep 15, 2016

Its tricky since it assumes drivers behave in a consistent manner. Haven't tried newer builds (last used version 1.9.x), but that doesn't usually work as expected. Problem with common frame analysis tools is that they use your GPU+driver to replay the dump; If the driver or GPU is buggy, the results are affected. It could catch an application side bug though. The only way to debug something like this across different GPU archs is to emulate the affected GPU+driver using software rendering which AMD will not do, I'm almost certain.

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Sep 21, 2016

@kd-11 can't you check #1630 and guess whats braking on newer AMD? Since before that PR Vulkan worked fine.

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Sep 21, 2016

@Nezarn Thanks! I missed your earlier comment on the regression. In that case, I think I know what might be causing this.
I'll look into a few possible scenarios, failing which I'll set up a testing branch on my fork and update with further instructions. Sync issues are a real pain to debug.

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Sep 21, 2016

@kd-11 okay, just comment here when you have something that needs testing 👍

@mirh
Copy link

@mirh mirh commented Sep 26, 2016

https://community.amd.com/thread/205798
Could this be of any relevance?

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Sep 26, 2016

Probably; not sure at the moment. It does seem related though since we are touching multiple renderpasses with poor synchronization. I have noticed minor flickering and macro-blocking artefacts on my AMD GPU (GCN 1) with some games, although rare, so this might not be limited to polaris

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Sep 26, 2016

@kd-11 why not just ask on AMD forum for advice? BTW its also strange that not every game has this issue, and so far this issue doesn't happen in any GCM samples i've tried.

edit: also a question: for example does 1942: Joint Strike have any kinda AA effect? (because project diva F has something since its so smooth in vulkan, and for example Ar Tonelico Qoga doesn't have any smoothing "AA" effect, and that game works flawlessly with vulkan on my card.)

edit2: also if i put rpcs3 fulscreen and back to windowed
https://gist.github.com/Nezarn/e7086fa569008de9463ea3e5457f56ef

And after pressing stop:
https://gist.github.com/Nezarn/180cb80af038808221b4edbd0d3575e0

edit3: here are some new renderdocs, 1 from the rpcs3 version it worked, and 1 current. (and its from the project diva f demo since on old rpcs3 fullgame didn't worked) https://www.dropbox.com/s/ujvqqm8jn2tkxfa/oldvscurrent.zip?dl=0

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Sep 28, 2016

@kd-11 it looks like this guy's code semi fixes the issue for rpcs3 too. https://community.amd.com/thread/205798

VkSubpassDependency dependencies[2];
// (in addition)
dependencies[0].srcSubpass = VK_SUBPASS_EXTERNAL;
dependencies[0].srcAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT|VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT;
dependencies[0].srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
dependencies[0].dstSubpass = 0;
dependencies[0].dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT|VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT;
dependencies[0].dstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
dependencies[0].dependencyFlags = VK_DEPENDENCY_BY_REGION_BIT;
// (this was already part of the code)
dependencies[1].srcSubpass = 0;
dependencies[1].srcAccessMask = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT|VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT;
dependencies[1].srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT;
dependencies[1].dstSubpass = 1;
dependencies[1].dstAccessMask = VK_ACCESS_INPUT_ATTACHMENT_READ_BIT;
dependencies[1].dstStageMask = VK_PIPELINE_STAGE_FRAGMENT_SHADER_BIT;
dependencies[1].dependencyFlags = VK_DEPENDENCY_BY_REGION_BIT;

and put these two to where they should go:

renderpassinfo.dependencyCount = VK_ARRAY_SIZE(dependencies);
renderpassinfo.pDependencies = dependencies;

#define VK_ARRAY_SIZE(x) (sizeof(x) / sizeof(x[0]))

(if i use the code 1:1 (with some minor corrections), games doesn't have the flashing colors, just minor graphical issues, for example in 1942: Joint Strike your basic bullets are invisible, and the pickups, enemies bullets are broken, project diva F has some minor broken stuff too, but at least now they won't give you seizure)

rpcs3_2016-09-28_10-24-53

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Sep 28, 2016

Thanks. That explains alot actually. I was right, we were missing synchronization between draw calls. Once the backend was optimized to collect all commands together and submit, a problem was born where several commands can be submitted in parallel especially with radeon and its asynchronous hardware schedulers, without considering that the output from one call is the input in another.
I'll review the suggested code and we can see about having a proper solution. By the way, in this case, I think both srcSubPass and dstSubPass should be equal to VK_SUBPASS_EXTERNAL and not 0 or 1. Please check that it works correctly.

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Sep 28, 2016

@kd-11 if i change them to VK_SUBPASS_EXTERNAL then the issue is back

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Sep 28, 2016

That's odd. Unfortunately we re-use our subpasses very heavily and in random order since they are precomputed. According to spec, we cannot set them both to VK_SUBPASS_EXTERNAL, but for our case, we only need to define srcSubPass to VK_SUBPASS_EXTERNAL. Set the dstSubPass to 0 for both then and hope for the best, since by the time subpass X is executed, 0 is guaranteed to have also been executed.

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Sep 28, 2016

Let me open a pull request and we can move the discussion there.

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Sep 28, 2016

Unfortunately, that code breaks some demos for me, but its a starting point. I'll add memory barriers before sampling rtt textures and we can see if that makes a difference
EDIT: Problem is elsewhere.

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Sep 28, 2016

@Nezarn Since we aren't using input attachments, does changing the count to 1 (ignoring the second dependancy) still work?
i.e renderpassinfo.dependencyCount = 1;

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Sep 28, 2016

@kd-11 looks like this issue can still happen in some games (not as bad as before)

rpcs3_2016-09-28_21-39-30

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Oct 4, 2016

@kd-11 theres another issue on newer AMD, in Project Diva F it crashes the driver\BSOD in certain songs at the same place always. (100% reproduce rate so far on my card..) log+shaders: https://www.dropbox.com/s/7u0l9f4y52s6wvt/rpcs3.zip?dl=0

@mirh
Copy link

@mirh mirh commented Oct 4, 2016

( ͡° ͜ʖ ͡°)
Does rpcs3 support some way of graphics command queue output?

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Oct 4, 2016

Log and shaders wont really help debugging this one I'm afraid. If it actually BSODs for real, you need to notify AMD about it. That should not happen on modern windows afaik. Expected result is a driver crash and even that is usually hell to debug without being hands on with the hardware.
That said, when I have some free time, i'll set up that debug branch for you to test.

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Oct 4, 2016

@mirh You can view the RSX command log from the RSX debugger (Its very glitchy though)

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Oct 4, 2016

@kd-11 well it only BSOD 2 out of 5 times, it was with the error "PAGE_FAULT_IN_NONPAGED_AREA" both times. Also i don't think this is really a driver issue, since real Vulkan games (like DOOM) works perfectly fine.

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Oct 4, 2016

It's probably a driver fault. While the emulator may be at fault (We may be hitting a guard page while working with pinned memory, or some other form of access violation), I dont think that's what is supposed to happen. The driver should manage its non-paged pool allocation either way.
The BSOD message does give me an idea of where to start though.

@mirh
Copy link

@mirh mirh commented Oct 4, 2016

Regardless of everything, a BSOD is always a driver problem.
User space stuff doesn't panic kernel.

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Oct 4, 2016

managed to BSOD again, on bluescreen is from atikmdag.sys

while bluescreenviewer says otherwise (but i guess thats ok(?), and also its always at the same adress

Caused By Driver : ntoskrnl.exe
Caused By Address : ntoskrnl.exe+14a2b0

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Oct 4, 2016

Drivers are complicated like that. Crash may have been during a call from amd driver to the kernel exe (it can access it the same way normal programs access dlls), or during a queued APC/work queue request.

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Oct 4, 2016

btw in issues like this, it would be nice to have a "dump" feature like what pcsx2 gsdx has

edit: also made a thread on AMD forum.

@paulsapps
Copy link

@paulsapps paulsapps commented Oct 4, 2016

Kernel dumps are pointless though, all you can do is try to prevent the user code from crashing the driver, but ultimately you can't fix the 3rd party driver bug yourself.

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Oct 5, 2016

https://community.amd.com/thread/206464

now the waiting game begins

@mirh
Copy link

@mirh mirh commented Oct 5, 2016

Kernel dumps are pointless though, all you can do is try to prevent the user code from crashing the driver, but ultimately you can't fix the 3rd party driver bug yourself.

They are "pointfull" if you give them to the driver dev.

Of course it would be even better if you could give them a proper testcase to be "tested live".
It took them more than 6 months to fix a problem with ogl blending when the actual 10 lines of code were given to them and I reached directly engineers.. I fear to think to "more obscure stuff".

For as much.. I hope BSOD could have a higher priority (and who knows then? this crashing problem may have the same cause of the one with pcsx2)

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Oct 17, 2016

I close this issue because the graphical issue is fixed in Project Diva F, and i've opened a new issue for the AMD specific issues on GCN3+. (for the crash\bsod, remaining graphical issue(s), and the 100% gpu usage) #2201

@Nezarn Nezarn closed this Oct 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.