Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DriverDetails: Add broken discard with early-Z bug on Apple Silicon GPUs #10290

Merged
merged 5 commits into from Apr 24, 2022

Conversation

OatmealDome
Copy link
Member

The drivers for newer Apple Silicon GPUs appear to have some sort of issue with discard when early-Z testing is enabled. Fragments that are discarded can appear corrupted (glitchy textures, being rendered as all black, completely invisible, etc).

I workaround this issue by emulating discard using framebuffer fetch. Instead of using the built-in discard, I read in the value of the framebuffer and use that as the fragment output. Unfortunately, I couldn't figure out a good solution that works with hardware blending (what do I write out to ocol1?), so shader blending is enforced. This may decrease performance a bit.

I also solve a pre-existing bug with OpenGL where using shader blending along with ubershaders may result in double blending due to the fact that the OpenGL backend uses a different check for shader blending versus the Ubershader generator.

Fixes Super Mario Sunshine, Sonic Adventure 2: Battle, Phantasy Star Online Episodes 1 & 2, Chibi Robo, and probably a bunch more.

@andev092
Copy link

andev092 commented Jan 5, 2022

Hi @OatmealDome ,

Is there a certain build that this is already fixed in? Or is it not implemented into a build yet? I'm running build dev version 5.0-15801 on an M1 Mac mini and having the black textures issue with Vulcan backend. Thanks!

@OatmealDome
Copy link
Member Author

@andevode68 There is no beta or development build with this fix in it at this time. Before that happens, core Dolphin developers must review this pull request to ensure the code quality is acceptable, then choose to "merge" it. Once that happens, a new development build will be automatically made with this fix.

That being said, you can download a testing version of this pull request here. However, I don't recommend staying on a pull request build for very long, as you may miss other fixes and improvements that get added over time to the development / beta builds.

@andev092
Copy link

andev092 commented Jan 5, 2022

@OatmealDome FANTASTIC! Thank you for the quick response! That definitely fixed it, and after applying 16:9 codes, it looks just as good as the Steam version! Now I just wish the frame times were consistent, but I guess this game is just hard to emulate. But I'm not going to complain because now the game is playable! This is the only game I couldn't play, and now I can! And I'll be on the lookout for a proper development build. Thank you again!!

@JosJuice
Copy link
Member

Please squash the lint fix.

@OatmealDome
Copy link
Member Author

Squashed.

@dvessel
Copy link
Contributor

dvessel commented Apr 12, 2022

This also fixes Ikaruga. The only way to make it render correctly was to enable exclusive ubershaders before this fix.

edit: broken example: https://imgur.com/eVlsKg3

@nolrinale
Copy link
Contributor

With this build PSO isn't even starting anymore as I'm getting these errors while trying to get into the game. The video output completely freezes (but the audio continues playing) and I have no choice but to close the emulator.

image

image

Macbook Air M1 16/8GPU
Big Sur 11.6.5 (20G527)

@OatmealDome
Copy link
Member Author

@Neirene This might just be a dual core bug. Try turning off dual core (uncheck Config -> General -> Enable Dual Core).

@nolrinale
Copy link
Contributor

nolrinale commented Apr 18, 2022

@OatmealDome It was indeed a dual core bug, by disabling it I was able to proceed to the game as usual.

Now i'm posting here 2 short clips with the results and can confirm it fixes the graphics rendering issues on M1 for Phantasy Star Online Episode 1 & 2 Plus (GPOJ8P)

Before:
ccabefore

After:
ccaafter

Macbook Air M1 16/8GPU
Big Sur 11.6.5 (20G527)

@dvessel
Copy link
Contributor

dvessel commented Apr 19, 2022

Something changed after the rebase. I get flickering textures in F-zero, a game that didn’t have problems before these changes. Ikaruga which was affected now has it worse. What bypassed the problem before (enabling exclusive ubershaders) prevented the problem for the two I just tested.

Screen Shot 2022-04-19 at 11 55 55 AM

Screen Shot 2022-04-19 at 11 57 19 AM

I also got a ton of these dumps before I trashed the shader cache.

bad_ps_Vulkan_348.txt

@OatmealDome OatmealDome force-pushed the m1-earlyz-bug branch 2 times, most recently from cbcf6bd to a17c76e Compare April 19, 2022 21:22
@OatmealDome
Copy link
Member Author

@dvessel Please try again - I made a mistake while rebasing.

@dvessel
Copy link
Contributor

dvessel commented Apr 20, 2022

It’s not as severe but I still get broken textures. I got a bunch of render time logs from dev but I stopped after running f-zero from this pull.

Disabled audio, native resolution and emulation speed set to 5x to put the pressure on the CPU.

f-zero single attract loop, average frame time, over 10k samples:

  • master: 7.2256 ms
  • 10290: 7.4069 ms

@OatmealDome
Copy link
Member Author

@dvessel Would it be possible to send a FIFO log of the flickering textures? I wrote instructions here: https://bugs.dolphin-emu.org/issues/12705#note-1

Also, this change only really affects the GPU side of things, so increasing IR slowly and seeing when performance starts to dip is what I think would be the best way to benchmark this.

@dvessel
Copy link
Contributor

dvessel commented Apr 20, 2022

Okay, I’ll get to it later today. I thought I was looking for performance regressions due to instructions normally done on the GPU being done in the CPU instead due to the bug.

@nolrinale
Copy link
Contributor

nolrinale commented Apr 20, 2022

Running a similar test, whatever was changed, it really got broken.... the menus included

image

Fifo Menu
fifopsomenu.dff.zip

Pioneer 2
image

Fifo Pioneer 2

fifopioneer2.dff.zip

Central Control Area (The area shown in the previous reply)
image

Fifo CCA
fifocca.dff.zip

CCA Seaside Area
image

Fifo CCA Seaside
fifoccaseaside.dff.zip

I've increased the IR to up to 4x and the performance and FPSs are not affected only the graphics

@dvessel
Copy link
Contributor

dvessel commented Apr 20, 2022

@OatmealDome
Copy link
Member Author

@dvessel I fixed another issue with the rebase, please try again.

@nolrinale
Copy link
Contributor

nolrinale commented Apr 20, 2022

Testing once again with the latest build after the push, seems to be the issues are fixed again.

image

image

image

image

@dvessel
Copy link
Contributor

dvessel commented Apr 20, 2022

  • hybrid ubershaders.
  • 1x anistrophic filtering.
  • fast texture cache accuracy.
  • efb and xfb copies to texture only.
  • fast depth calculation.
  • disable bounding box.
  • backend multithreading.
  • defer efb cache invalidation.
fzero 1x 10290	frames: 11133	 average: 7.4198149645198956 ms
fzero 1x master	frames: 11443	 average: 7.2256925631390372 ms
fzero 2x 10290	frames: 11133	 average: 7.2740411389562558 ms
fzero 2x master	frames: 16300	 average: 7 ms
fzero 3x 10290	frames: 11131	 average: 7.4293953822657448 ms
fzero 3x master	frames: 11181	 average: 7.4691172524818885 ms
fzero 4x 10290	frames: 11158	 average: 7.4325327119555471 ms
fzero 4x master	frames: 11160	 average: 7.455313620071685 ms
fzero 5x 10290	frames: 11133	 average: 7.601464115692087 ms
fzero 5x master	frames: 11142	 average: 7.6092802010411065 ms
fzero 6x 10290	frames: 11159	 average: 7.7873734205573975 ms
fzero 6x master	frames: 11205	 average: 7.7745292280232041 ms
fzero 7x 10290	frames: 11132	 average: 7.9130614444843692 ms
fzero 7x master	frames: 11127	 average: 8.3176597465624162 ms
fzero 8x 10290	frames: 11152	 average: 8.1549049497847914 ms
fzero 8x master	frames: 11127	 average: 8.6827626494113428 ms

My usual settings (includes the other settings):

  • 6x (3840x3168) upscaled.
  • 16x anistrophic filtering.
  • FXAA post-processing.
  • 4x upscaled DDS textures loaded.
fzero 6x 10290	frames: 11197	 average: 7.86904527998571 ms
fzero 6x master	frames: 11200	 average: 8.6192232142857144 ms

What’s odd here is that my usual settings perform faster than master. ¯_(ツ)_/¯

@OatmealDome mentioned to me that this was held up because of the potential performance regression for games that weren’t affected. I could test Ikaruga but there’s no point. Using dedicated ubershaders to work around the bug was a massive performance drain and this fixes it.

@dvessel
Copy link
Contributor

dvessel commented Apr 20, 2022

Oh, and here is Ikaruga. No more texture problems.
Screen Shot 2022-04-20 at 4 16 12 PM

@dvessel
Copy link
Contributor

dvessel commented Apr 20, 2022

Quick Rogue Squadron test.. Removed 3 frame spikes (in excess of 1k ms) for each run. They are pretty much the same.

rs 1x 10290	frames: 6609	average: 8.1331971553941607 ms
rs 1x master	frames: 6610	average: 7.9330711043872917 ms
rs 3x 10290	frames: 6612	average: 8.4785541439806416 ms
rs 3x master	frames: 6600	average: 8.3203181818181822 ms
rs 6x 10290	frames: 6606	average: 9.0858764759309718 ms
rs 6x master	frames: 6604	average: 8.7511053906723202 ms
rs 8x 10290	frames: 6609	average: 9.8672870328340139 ms
rs 8x master	frames: 6625	average: 9.9530867924528295 ms

@dvessel
Copy link
Contributor

dvessel commented Apr 24, 2022

I’ve been keeping this up to date locally and it works great. What else is needed to get this merged? Please don’t let it go stale, thanks!

@JMC47 JMC47 merged commit c42392c into dolphin-emu:master Apr 24, 2022
10 checks passed
t895 added a commit to t895/dolphin that referenced this pull request Apr 24, 2022
commit c42392c
Merge: 61edcf7 259a5fc
Author: JMC47 <JMC4789@gmail.com>
Date:   Sun Apr 24 13:30:04 2022 -0400

    Merge pull request dolphin-emu#10290 from OatmealDome/m1-earlyz-bug

    DriverDetails: Add broken discard with early-Z bug on Apple Silicon GPUs

commit 61edcf7
Merge: 6abf367 787e3ef
Author: Admiral H. Curtiss <pikachu025@gmail.com>
Date:   Sun Apr 24 18:02:19 2022 +0200

    Merge pull request dolphin-emu#10606 from AdmiralCurtiss/memory-widget-refactoring-1

    Qt/MemoryWidget: Light refactoring and quality of life features.

commit 787e3ef
Author: Admiral H. Curtiss <pikachu025@gmail.com>
Date:   Sun Apr 24 06:49:10 2022 +0200

    Qt/MemoryViewWidget: Detect row breakpoint cell by cell data instead of cell position.

commit 6920a24
Author: Admiral H. Curtiss <pikachu025@gmail.com>
Date:   Sun Apr 24 06:30:40 2022 +0200

    Qt/MemoryViewWidget: Add option to copy the actually displayed cell value to clipboard.

commit 54ec0bd
Author: Admiral H. Curtiss <pikachu025@gmail.com>
Date:   Sun Apr 24 06:25:42 2022 +0200

    Qt/MemoryViewWidget: Don't use a member variable to hold information about the current mouse click.

commit 6abf367
Merge: 13e2dda 14f9ffe
Author: JosJuice <josjuice@gmail.com>
Date:   Sun Apr 24 10:37:45 2022 +0200

    Merge pull request dolphin-emu#10588 from JosJuice/jitarm64-psq-stxx-q0

    JitArm64: Always lock Q0 in psq_stXX

commit 13e2dda
Author: JosJuice <josjuice@gmail.com>
Date:   Sun Apr 24 10:06:44 2022 +0200

    Translation resources sync with Transifex

commit 26f9c8b
Author: Admiral H. Curtiss <pikachu025@gmail.com>
Date:   Sun Apr 24 05:24:20 2022 +0200

    Qt/MemoryWidget: Don't force a fixed size for the sidebar.

commit 4c080b8
Merge: e0afcb3 a7111e3
Author: Admiral H. Curtiss <pikachu025@gmail.com>
Date:   Sat Apr 23 22:23:31 2022 +0200

    Merge pull request dolphin-emu#10578 from TryTwo/PR_MemoryWidget_Dual_Views

    Debugger MemoryWidget: Add dual views

commit e0afcb3
Merge: cb5e967 b5a7ae5
Author: JosJuice <josjuice@gmail.com>
Date:   Sat Apr 23 22:04:10 2022 +0200

    Merge pull request dolphin-emu#10540 from nyanpasu64/fix-gcadapter-atomics

    Remove atomic usage and fix mutex locking in GCAdapter code

commit cb5e967
Merge: 8b5a61b 235f729
Author: Admiral H. Curtiss <pikachu025@gmail.com>
Date:   Sat Apr 23 21:07:35 2022 +0200

    Merge pull request dolphin-emu#10596 from richarm4/patch-3

    Added space in comment

commit 235f729
Author: Matthew Richards-Wells <91291346+richarm4@users.noreply.github.com>
Date:   Wed Apr 20 12:49:17 2022 -0700

    GameSettings: Add missing space in comment.

commit 8b5a61b
Merge: 19c71db 12cd81b
Author: Admiral H. Curtiss <pikachu025@gmail.com>
Date:   Sat Apr 23 20:32:47 2022 +0200

    Merge pull request dolphin-emu#10599 from shuffle2/libusb

    Libusb fixups

commit 19c71db
Merge: 69ca38d f5f5262
Author: Mai M <mathew1800@gmail.com>
Date:   Sat Apr 23 06:10:20 2022 -0400

    Merge pull request dolphin-emu#10597 from Simonx22/fix-ingame-menu-design

    Android: Fix in game menu rippleColor and colorEdgeEffect

commit a7111e3
Author: TryTwo <taolas@gmail.com>
Date:   Sun Apr 17 00:47:05 2022 -0700

    Dual View any size.

commit 14f9ffe
Author: JosJuice <josjuice@gmail.com>
Date:   Sat Apr 23 11:37:52 2022 +0200

    JitArm64: Add documentation comment for EmitBackpatchRoutine

commit 69ca38d
Merge: 56bb965 6eb9111
Author: JosJuice <josjuice@gmail.com>
Date:   Sat Apr 23 10:25:48 2022 +0200

    Merge pull request dolphin-emu#10600 from t895/modern-card

    Android: Modernize game card

commit 56bb965
Merge: 2e01dc0 7840798
Author: JMC47 <JMC4789@gmail.com>
Date:   Fri Apr 22 23:24:22 2022 -0400

    Merge pull request dolphin-emu#10584 from Pokechu22/emboss-single-normal-v2

    VideoCommon: Handle emboss texgen with only a single normal

commit 6eb9111
Author: Charles Lombardo <clombardo169@gmail.com>
Date:   Fri Apr 22 12:56:58 2022 -0400

    Modernize game card

    +Remove background on card
    +Increase max # of lines for game title
    +Root layout is now a linear layout with the card view rounding the corners on the box art

commit 7840798
Author: Pokechu22 <Pokechu022@gmail.com>
Date:   Tue Apr 19 17:46:20 2022 -0700

    VideoCommon: Add comment explaining why only the first normal gets normalized

    Co-authored-by: Scott Mansell <phiren@gmail.com>

commit 2a5c77f
Author: Pokechu22 <Pokechu022@gmail.com>
Date:   Wed Apr 13 22:03:34 2022 -0700

    VideoCommon: Handle emboss texgen with only a single normal

    Fixes a large number of effects in Rogue Squadron 2 and 3.

commit 39b2854
Author: Pokechu22 <Pokechu022@gmail.com>
Date:   Thu Apr 14 12:01:57 2022 -0700

    VertexLoader: Convert count register to remaining register

    This more accurately represents what's going on, and also ends at 0 instead of 1, making some indexing operations easier.  This also changes it so that position_matrix_index_cache actually starts from index 0 instead of index 1.

commit 97d0ff5
Author: Pokechu22 <Pokechu022@gmail.com>
Date:   Wed Apr 13 16:12:53 2022 -0700

    Convert vertex loader position cache to std::array

commit f722bdf
Author: Pokechu22 <Pokechu022@gmail.com>
Date:   Wed Apr 13 20:57:38 2022 -0700

    VertexLoaderX64: Refactor so that zfreeze is only in one place

    (Specifically, the copy for VertexLoaderManager::position_cache.  The position matrix index happens elsewhere, and the float path still has special logic to copy to scratch3.)

commit 6f1350a
Author: Pokechu22 <Pokechu022@gmail.com>
Date:   Wed Apr 13 17:03:53 2022 -0700

    VertexLoaderARM64: Fix z-freeze position matrix index

    Before, it would always write to index 0 (which is unused).  Now it writes to the correct index.

commit 04fdadd
Author: Pokechu22 <Pokechu022@gmail.com>
Date:   Fri Apr 22 12:50:44 2022 -0700

    VideoCommon: Rename norm0/norm1/norm2 to normal/tangent/binormal

commit 88134a6
Author: Pokechu22 <Pokechu022@gmail.com>
Date:   Tue Dec 28 13:01:57 2021 -0800

    VertexShaderGen: Simplify normal calculation

    This is a readability change; there should be no functional or performance differences.

commit 12cd81b
Author: Shawn Hoffman <godisgovernment@gmail.com>
Date:   Fri Apr 22 08:58:38 2022 -0700

    GCAdapter: don't call libusb_detach_kernel_driver on apple

commit 5cd3cf9
Author: Shawn Hoffman <godisgovernment@gmail.com>
Date:   Fri Apr 22 08:48:28 2022 -0700

    GCAdapter: fix retval check of libusb_detach_kernel_driver

commit 978c908
Author: Shawn Hoffman <godisgovernment@gmail.com>
Date:   Fri Apr 22 07:37:56 2022 -0700

    GCAdapter: move libusb context teardown last

commit 1c9dfb7
Author: Shawn Hoffman <godisgovernment@gmail.com>
Date:   Fri Apr 22 07:36:56 2022 -0700

    GCAdapter: some macro cleanup

commit f52d948
Author: Shawn Hoffman <godisgovernment@gmail.com>
Date:   Fri Apr 22 07:12:09 2022 -0700

    GCAdapter: set read/write thread names

commit 0a07c76
Author: Shawn Hoffman <godisgovernment@gmail.com>
Date:   Fri Apr 22 07:07:20 2022 -0700

    update libusb submodule to latest

commit af930bc
Author: Shawn Hoffman <godisgovernment@gmail.com>
Date:   Fri Apr 22 07:05:41 2022 -0700

    make libusb submodule shallow

commit f5f5262
Author: Simonx22 <simon@oatmealdome.me>
Date:   Wed Apr 20 16:22:06 2022 -0400

    Android: Fix in game menu rippleColor and colorEdgeEffect

commit 259a5fc
Author: OatmealDome <julian@oatmealdome.me>
Date:   Sat Dec 25 00:27:43 2021 -0500

    DriverDetails: Add broken discard with early-Z bug on Apple Silicon GPUs

commit e7f5e51
Author: OatmealDome <julian@oatmealdome.me>
Date:   Fri Jul 30 04:02:51 2021 -0400

    DriverDetails: Introduce new VENDOR_APPLE for Apple GPUs

commit 80dfefb
Author: OatmealDome <julian@oatmealdome.me>
Date:   Thu Jan 6 04:15:21 2022 -0500

    UberShaderPixel: Add support for non-dual source shader blending

commit c1d87db
Author: OatmealDome <julian@oatmealdome.me>
Date:   Thu Jan 6 04:15:07 2022 -0500

    PixelShaderGen: Add support for non-dual source shader blending

commit bad0283
Author: OatmealDome <julian@oatmealdome.me>
Date:   Sat Dec 25 00:27:53 2021 -0500

    VKPipeline: Add shader blending support

commit cc22f1a
Author: TryTwo <taolas@gmail.com>
Date:   Wed Apr 6 22:50:05 2022 -0700

    MemoryWidget add dual views for two separate column types. Force first column to be Hex32.

commit 2ef2d47
Author: JosJuice <josjuice@gmail.com>
Date:   Sat Apr 16 13:22:36 2022 +0200

    JitArm64: Always lock Q0 in psq_stXX

    Q0 is used as a scratch register by EmitBackpatchRoutine.

    Fixes a vertex explosion in Spider-Man 2 that was uncovered by 20b2300.

commit b5a7ae5
Author: nyanpasu64 <nyanpasu64@tuta.io>
Date:   Sun Mar 27 22:37:43 2022 -0700

    Fix locking the wrong mutex in GCAdapter_Android.cpp ResetRumble()

    I am not confident there are no race conditions between s_write_mutex,
    s_controller_write_payload_size, and s_controller_write_payload. But
    this code should be safer than before.

commit 7616027
Author: nyanpasu64 <nyanpasu64@tuta.io>
Date:   Sun Mar 27 22:27:44 2022 -0700

    Remove unnecessary atomic usage in GCAdapter_Android.cpp

    s_controller_write_payload_size needs to remain an atomic because Read()
    loads and stores without holding a mutex, Output() stores while holding
    s_write_mutex, and ResetRumble() stores while holding s_read_mutex! I'm
    pretty sure this code is wrong, specifically ResetRumble().

commit 871b01a
Author: nyanpasu64 <nyanpasu64@tuta.io>
Date:   Sun Mar 27 22:25:40 2022 -0700

    Remove unnecessary atomic usage in GCAdapter.cpp

    You can safely read or write non-atomic integers on multiple threads,
    as long as every thread reading or writing it holds the same mutex
    while doing so (here, s_mutex).

    Removing the atomic accesses makes the code faster, but the actual
    performance difference is probably negligible.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
6 participants