Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vulkan: multiple issues on newer AMD (GCN3+) cards #2201

Closed
Nezarn opened this issue Oct 11, 2016 · 46 comments
Closed

Vulkan: multiple issues on newer AMD (GCN3+) cards #2201

Nezarn opened this issue Oct 11, 2016 · 46 comments

Comments

@Nezarn
Copy link

@Nezarn Nezarn commented Oct 11, 2016

I just make this issue so it will be easier to track when things gets fixed driver or rpcs3 wise. (its easier to have these issues in one place, since if i post it in some game's issue where its happening, it gets buried if theres a lot of comments :P)

Current issues:
- BSOD\Driver crash in certain games (100% reproducable in Project Diva F, reported to AMD)
- Always 100% GPU usage (http://i.imgur.com/Kg0hxul.png vs. http://i.imgur.com/wxEM8bU.png)
- Unique graphical issue(s) (http://i.imgur.com/a8LTaz0.png (look at the bottom of the pic))

If this issue isn't needed then feel free to close\delete.

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Oct 12, 2016

I have some idea why there are issues and will update this ticket with testing information when time allows.

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Oct 15, 2016

https://ci.appveyor.com/project/kd-11/rpcs3/build/1.0.342

Our pipeline barriers are crap, but there's alot of confusion about how the pairings are supposed to work. However, using TOP_OF_PIPE/BOTTOM_OF_PIPE to flush writes just makes no sense. I'll review this part of the spec when I have time and come up with a better solution rather than something off the top of my head.

This fixes alot of flickering visuals on AMD R9 200 series as well, so Its a step in the right direction I hope.

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Oct 15, 2016

@kd-11 Just tried this build, didn't fix anything yet for me sadly. Also if you don't know something, why don't you just ask for advice on AMD forums? Devs could help.

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Oct 15, 2016

The confusion includes members from GPU vendor themselves as well as khronos.
See KhronosGroup/Vulkan-Docs#128

After going through the spec for a few minutes, I believe I understand what the spec implies, but even from the thread above, you can see that examples given are often wrong.

BTW If you are still crashing, something else is probably wrong. The remaining visual corruptions should (in theory) be fixed though by that build. On my 200 series, I was getting flickering textures and green color in some games that is gone. The crash may have nothing to do with synchronization in that case. I've been experiencing crashes on my GPU replaying vulkan renderdoc and that raises a red flag since inspecting the code shows nothing suspicious, except that we always crash during a vkCmdPipelineBarrier call.

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Oct 15, 2016

If the graphical issue is unchanged, It might be because the stages in the barrier do not account for changes to and from LAYOUT_GENERAL which we use during buffer clears. I'll update and we can try again when I find the time.

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Oct 15, 2016

I also realized that I failed to implement part of the spec dealing with presentable images so its likely my fault here.

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Oct 15, 2016

@kd-11 its a little bit funny when the devs themselfs doesn't know how to use their stuff XD

Also yes, the driver crash is still there (well i tried it once, crashed at the same place, but at least this time it wasn't a BSOD (BSOD is a bit random, like 40% chance for BSOD, 60 for driver crash)) and the GPU usage and graphical issue is the same. (also the games that are affected by the graphical issue has a "flash" when the vulkan window opens)
http://i.imgur.com/5AlNU76.png
https://www.youtube.com/watch?v=lBUKw9_uKlY

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Oct 16, 2016

Looks like a collision on presentable images in your case.
Try https://ci.appveyor.com/project/kd-11/rpcs3/build/1.0.354

If that does not work I'll have to write a barrier manually for the transition just before present. This approach fixed corrupted overlays before. By the way, try enabling the debug overlay and see if it helps.
It is known that AMD GPUs rasterize from top left to bottom right (you can find the research online) so its no surprise that corruption happens just when the frame is about to finish copying the image and artefacts become visible at the bottom.

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Oct 16, 2016

@kd-11 yep it still happens, here is the log with debug output (there are a lot of warnings)
https://www.dropbox.com/s/epxqxvm1km6540b/vulkanlog.zip?dl=0

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Oct 16, 2016

What if you enable the debug overlay? It adds another stall before presenting.

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Oct 16, 2016

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Oct 17, 2016

@kd-11 AMD replied on the forum https://community.amd.com/thread/206464

So basically they can't do anything, since he would need to own the game. (and he says he can only look into it if no third party download is needed....)

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Oct 17, 2016

That sucks, but its kinda expected. Until rpcs3 can replay renderer state, this will be a difficult one to debug. However, the artifacting issue is visible even in renderdoc and they should be able to help with that one. GPUPerf studio also supports API tracing just no visual output so they can help with that at the very least.

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Oct 17, 2016

@kd-11 yep, oh well, i hope they can at least help with the graphics issue and with the 100% GPU usage...
Just posted renderdoc + video on their forum

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Oct 18, 2016

@kd-11 so AMD guy responded:

It appears that the glitches are introduced in colour passes 7-11, where the app appears to be doing a blur.
The biggest problem that's jarring from the trace you provided is that your application is executing a lot of renderpasses without defining external dependencies. This can lead to corruptions as the ones we're seeing because of the fact the GPU is free to run the commands in an overlapping manner which may lead to RAW hazards.
For performance reasons, also please consider coalescing the huge number of renderpasses your application is using right now, so that the draw calls are embedded in subpasses.

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Oct 18, 2016

The subpasses cannot be done since we have no way of knowing beforehand how the calls will be submitted to the rsx, and the RAW hazard is a known issue, which is why I've been working on the memory barriers. Subpass dependancies are memory barrier type ops so at least we were on the right track.
There is an external dependancy that we added, flushing previous color output before the current color output stage, but I guess we can add one to block memory read on fragment shader as well.

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Oct 18, 2016

https://ci.appveyor.com/project/kd-11/rpcs3/build/1.0.357 Adds a dependency on fragment shader stage. If it doesnt work, I'll have to reach out to AMD for assistance there as I may have completely misunderstood that part of the spec.

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Oct 18, 2016

@kd-11 still same :(

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Oct 18, 2016

Following amd guy's advice, I've removed the dependency_by_region bit.
https://ci.appveyor.com/project/kd-11/rpcs3/build/1.0.359

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Oct 18, 2016

@kd-11 still same.

also guy posted about the 100% gpu usage:

Oh, and reg. 100% GPU utilization: assuming you do not use any kind of a CPU-side-based frame limiting solution, that's absolutely fine. After all, wasn't the idea behind Vulkan to squeeze as much juice from the GPU as it's only possible?

So thats not an issue? Then why does it happen only on GCN3 and 4 cards? (even nvidia cards dont have 100% GPU usage)

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Oct 18, 2016

Check if running vulkan demos causes this as well. I suspect something is up with their driver.

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Oct 18, 2016

@kd-11 yep, looks like it happens with anything that uses Vulkan, tried the samples from the SDK, 100% usage, tried running DOOM, 100% usage (even in menu)

@kd-11
Copy link
Contributor

@kd-11 kd-11 commented Oct 18, 2016

Their driver is obviously having issues. I'll clean up the vulkan-wip branch until we have no validation issues, then we can continue with AMD support since they insist we must do validation first.

@mirh
Copy link

@mirh mirh commented Oct 18, 2016

I honestly don't see when utilizing all resources started to become an issue.
Isn't both doom and demos supposed to push as many frames as possible?

Assuming you haven't slow cpus that seems totally fine.

@RaulDJ
Copy link

@RaulDJ RaulDJ commented Oct 18, 2016

@mirh You realize that the emulator runs with almost any card on "idle", right? I don't even get past the ~7% of the TDP of my 1060 with absolutely any game, so similar should happend with the RXs. The emulator uses almost no GPU at all, so this situation right here is obviously not OK.

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Oct 18, 2016

@mirh then how come that on GCN2 cards, and nvidia cards GPU usage is never at 100% on simple stuff? (like the hello world sample on rpcs3)

Another example for normal GPU usage http://i.imgur.com/hAU98jm.jpg

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Oct 18, 2016

@kd-11 looks like DX12 renderer is affected by driver crash too (no BSOD so far), crashes exactly at the same place as Vulkan. (so maybe the offending stuff is in the common code that both renderer uses(?))

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Nov 1, 2016

@kd-11 looks like the graphical issue is a Driver Issue too, what are the chances that it affects 3 emulators (it affects rpcs3 in Vulkan, and affects Cemu and PCSX2 in opengl, tho not as badly)

@mirh
Copy link

@mirh mirh commented Nov 2, 2016

OGL in pcsx2 is fine (graphically at least) afaik.

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Nov 2, 2016

@mirh In pcsx2 using any kinda Blending Unit Accuracy (aside from none) brings out issue. (http://i.imgur.com/x0rl8C3.png, and similiar issue occurs in Cemu too so something is very broken driver wise. https://www.youtube.com/watch?v=3iHrUSbE8J8 )

@mirh
Copy link

@mirh mirh commented Nov 2, 2016

Uh.. Is this reproduced here?

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Nov 2, 2016

@mirh i don't see anything wrong on that dump, the problem i have is only visible with 2x native or higher. (maybe on native its so small that it can't be seen, if i set at least 2x on your dump its visible)

edit: also i think we should move our pcsx2 discussion to somewhere else :P (you can contact me on the forum too)

@mirh
Copy link

@mirh mirh commented Nov 2, 2016

I hope this is fine then.

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Nov 2, 2016

@mirh sure, i hope they fix their drivers, since more and more emulators get affected xD (for example Cemu 1.6.2 is unusable for me, crashes driver :( )

@mirh
Copy link

@mirh mirh commented Nov 2, 2016

That's "hopefully" PCSX2/pcsx2#1552

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Nov 2, 2016

@mirh yep i hope so (tried that dump and it does crash driver pretty hard (no bsod tho), also would be nice to find something that would reproduce Vulkan\DX12 crash in rpcs3, so they would work on that too...)

@mirh
Copy link

@mirh mirh commented Nov 2, 2016

Hopefully again whatever OGL fucks with is the same thing Vulkan triggers.

@RainKikyou
Copy link

@RainKikyou RainKikyou commented Nov 12, 2016

This game also has the same problem,RX 470
3a5c78ec54e736d197cd853c92504fc2d46269fe

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Nov 12, 2016

@rdeleonp it was already reported multiple times, even the similiar opengl issue that occurs in other emulators, but this Vulkan issue won't be fixed until theres a method to reproduce it without having 3rd party stuff (in this case LLE modules and the game itself).

And even if AMD starts to work on it, it will take at least half year. (just search the AMD forum how long it did take to fix an OpenGL issue)

@mirh
Copy link

@mirh mirh commented Nov 14, 2016

Well, seems like it took them only two months now.

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Nov 14, 2016

@mirh lol at next driver release, which next? xD (i remember one they fixed something internally, then it took half year until the fix arrived lol) hopefully this will fix Cemu too :D (Cemu is unusable from 1.6.1) and it would be nice if vulkan had the same issue or idk

@mirh
Copy link

@mirh mirh commented Nov 14, 2016

With the blending issue they had actually claimed the fix was to ship in _a_ future release, not the next one.

@AniLeo
Copy link
Member

@AniLeo AniLeo commented Jun 9, 2017

Can someone retest the specified issues with latest drivers and latest RPCS3 version?
I have a GCN2 so can't verify.

@Nezarn
Copy link
Author

@Nezarn Nezarn commented Jun 11, 2017

@AniLeo from a quick test, looks like only BSOD\driver crash remains (100% happens in Project Diva F, in Black★Rock Shooter song)

edit: looking at a youtube video https://www.youtube.com/watch?v=a1XF0kswre0 it happens at 0:44 (when the camera would look at the lamp (it crashes\bsod right before that)

@mirh
Copy link

@mirh mirh commented Dec 22, 2017

Today AMD open-sourced their linux vulkan driver.
This being windows one with some glue, if one wanted I think bugs could be fixed at the source.

https://github.com/GPUOpen-Drivers/xgl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants
You can’t perform that action at this time.