Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] rsx: Plug leaky texture cache section invalidate #9107

Closed
wants to merge 5 commits into from

Conversation

kd-11
Copy link
Contributor

@kd-11 kd-11 commented Oct 20, 2020

This is a particularly annoying issue to solve. When the texture cache was rewritten with ruipin's optimizations, we failed to plug one data path that can lead to corruption. For a sequence of overlapping memory sections A-B, an optimization was added that avoids flushing section B if section A is invalidated but the "dirty" region doesn't touch section B. However, it is valid for A and B to overlap which then causes flushing A to write into part of B in super memory. This shouldn't be a big deal since B is still protected, but the surface cache regularly checks if surfaces are corrupted and the mismatching contents will trigger B to get removed from the surface cache. Subsequent access of this (still valid in texture cache because it is protected) region will cause all manner of havoc.
The fix is two-fold:

  1. Resync B if A is flushed to fix the crash.
  2. If A is newer than B, flushing A then B will persist stale data as we write the newer data before the older one. In such cases, B shall inherit memory sections from A, like we have done by surface cache when creating new sections. This should preserve memory order.

What to test:
Fix 1 is fairly 'cheap' so no harm there.
Fix 2 is risky. If implemented incorrectly performance and stability can take a big hit. Test as many games as you can and keep an eye on VRAM usage to catch any leakage.

Fixes #9049

@xddxd
Copy link
Contributor

xddxd commented Oct 20, 2020

Madden 16 still crashes.
RPCS3.zip

@munchnbeaver
Copy link

munchnbeaver commented Oct 20, 2020

As @xddxd stated, the crashing still exists. I tested it with Madden 17 and the same thing happened.

{PPU[0x1000019] Thread (FGGameRender) [0x01399ea8]} SIG: Thread terminated due to fatal error: Verification failed (e=0x57) (0000000000000000):
(in file D:\a\1\s\rpcs3\Emu\RSX\VK\VKRenderTargets.h:602)

Untitled

RPCS3.log

@kd-11
Copy link
Contributor Author

kd-11 commented Oct 21, 2020

I am no longer able to reproduce the crash with madden 17. If I cannnot reproduce it within reasonable time the PR will get merged regardless and the broken games shall be revisited at a later date.

@xddxd
Copy link
Contributor

xddxd commented Oct 21, 2020

@Satan86
Copy link

Satan86 commented Oct 21, 2020

even worse on my 1050 Ti - as soon as a track loads in the entire PC crashes and reboots.
RPCS3.zip

@munchnbeaver
Copy link

munchnbeaver commented Oct 21, 2020

I am no longer able to reproduce the crash with madden 17. If I cannnot reproduce it within reasonable time the PR will get merged regardless and the broken games shall be revisited at a later date.

The same exact thing that originally happened, happens still. The same method, the same error, etc.

Just tested another time and it crashed and gave me this error:

{PPU[0x1000019] Thread (FGGameSim) [0x00b1004c]} VM: Access violation reading location 0x7367c190 (unmapped memory) [type=u32]

@kd-11 Try these steps as I mentioned in the original thread:

Start the game and go straight to practice mode (skipping skills trainer). On loading into practice it will crash.
If for some reason that doesn't crash and if practice mode works, exit practice mode and go to the play now mode. Start a match in play now. It should crash, and one more time if it does not, exit play now and go to connected franchise mode. start a new franchise using a created coach. Go to the first week, play the 1st week matchup, and upon loading into the match, it should crash.

@kd-11
Copy link
Contributor Author

kd-11 commented Oct 21, 2020

Crashing and corruption regression was because of a typo and is fixed. I'll keep trying to reproduce the madden issue, I have gone over the steps multiple times without issue, while on master I get it consistently every time if WCB is enabled.

@munchnbeaver
Copy link

Crashing and corruption regression was because of a typo and is fixed. I'll keep trying to reproduce the madden issue, I have gone over the steps multiple times without issue, while on master I get it consistently every time if WCB is enabled.

That's so odd. I'm using all default settings except for WCB being enabled and am still crashing exactly like I was before. @xddxd crashed on madden 16. Maybe try producing the issue on madden 16?

@kd-11
Copy link
Contributor Author

kd-11 commented Oct 21, 2020

@munchnbeaver Are you getting the purple verification failure or just generic crash/freezing? I am observing random freezing without errors in some cases, but this isn't new.

@munchnbeaver
Copy link

munchnbeaver commented Oct 21, 2020

@munchnbeaver Are you getting the purple verification failure or just generic crash/freezing? I am observing random freezing without errors in some cases, but this isn't new.

@kd-11 yes, I still get the exact error that I got before. Earlier I got passed the loading screen one time by fiddling with the settings but I still got the freeze and crash. One time I got this error code:

{PPU[0x1000019] Thread (FGGameSim) [0x00b1004c]} VM: Access violation reading location 0x7367c190 (unmapped memory) [type=u32]

But yes, I still get the crash with default settings and WCB being enabled, as well as the same error code.

@munchnbeaver
Copy link

munchnbeaver commented Oct 21, 2020

@kd-11 For some reason it works if I go to advanced and enable read color buffers however if I do that, the text in game is messed up:

1

3

RPCS3.log.gz

Why would it be working with read color buffers enabled and not disabled?

@kd-11
Copy link
Contributor Author

kd-11 commented Oct 21, 2020

Hmm, interesting. I always have RCB enabled if WCB is enabled for memory coherency (ideally you should always enable them in pairs), that explains why I cannot observe the crashing anymore. I'll try and figure out why it works ok with RCB now vs without.

@munchnbeaver
Copy link

Hmm, interesting. I always have RCB enabled if WCB is enabled for memory coherency (ideally you should always enable them in pairs), that explains why I cannot observe the crashing anymore. I'll try and figure out why it works ok with RCB now vs without.

I updated my post with screenshots of the text being messed up if RCB is enabled. The text appears like that only if RCB is enabled. It seems like there is a problem after problem haha

@kd-11
Copy link
Contributor Author

kd-11 commented Oct 21, 2020

As long as it was always broken, that is another problem that will not be fixed in this changeset. Once the crashing is gone a new issue should be opened with the RCB problem.

@Lagahan
Copy link

Lagahan commented Oct 21, 2020

Hmm, interesting. I always have RCB enabled if WCB is enabled for memory coherency (ideally you should always enable them in pairs), that explains why I cannot observe the crashing anymore. I'll try and figure out why it works ok with RCB now vs without.

Should the UI be updated with a suggestion to do this or move RCB from advanced next to WCB in GPU and link them? Just checked some of my games are configured with WCB only.

@munchnbeaver
Copy link

I still freeze and crash with this error:

VM: Access violation writing location 0x40 (unmapped memory) [type=u32]

RPCS3.log

@kd-11 kd-11 changed the title [TESTERS NEEDED] rsx: Plug leaky texture cache section invalidate [TESTERS NEEDED][WIP] rsx: Plug leaky texture cache section invalidate Oct 21, 2020
@kd-11
Copy link
Contributor Author

kd-11 commented Oct 21, 2020

VM crashes are outside scope. That is not RSX-related unless everything is fine with video output disabled.
Anyway, I figured out exactly why this other error is happening, but its a deep-rooted issue that will require a lot of changes. For one, the primary cause of the bug is because WCB and WDB are separate options which is a terrible idea. Enabling both options even on the previous build should fix the crashes for example.
Unifying them is not so simple however due to pre-existing workarounds in place to handle the separation. I've rewritten the memory testing rules for vulkan to avoid triggering the crash as a temporary measure and it works well enough for now. OpenGL does not have memory testing constraints and is unaffected by the particular bug.
I'm considering just cancelling this PR and handling all this as a separate sequence of tasks to avoid the code bloat and minimize risk of introducing nasty regressions.

@munchnbeaver
Copy link

Well I haven't crashed (other than the VM thing I posted), however it seems the fix(?) added another graphical issue. Half of the screen is blurry. But only in certain parts of the game, like cut scenes.

98

RPCS3.log

@munchnbeaver
Copy link

Just an update: I've been playing for a while and haven't had any crashes, however, as I mentioned above, there's a blur on half of the screen on different cut scenes that is caused by whatever you have done to fix it.

Untitled

RPCS3.log.gz

@kd-11 kd-11 changed the title [TESTERS NEEDED][WIP] rsx: Plug leaky texture cache section invalidate [WIP] rsx: Plug leaky texture cache section invalidate Oct 25, 2020
@kd-11
Copy link
Contributor Author

kd-11 commented Oct 25, 2020

This implementation is valid, but there are other underlying issues that need investigation before this can be continued.
Blocked by #9133

@MSuih MSuih added the RSX label Nov 18, 2020
@illusion0001
Copy link
Contributor

Fixes Battlefield 4, no longer needs RCB to go ingame.

@kd-11 kd-11 added the Blocked This issue's cause is known, but another ticket is blocking further progress. label Apr 13, 2021
…er flush events

- This is a mostly correct fix, but a corner case exists that can leak old data to the surface cache
- Avoid double-inheritance in case discarded section was yet to be synced
- Sync memory after all sections have flushed to handle overlaps correctly
- Consume soon-to-fail-memory-test sections immediately
- Rewrite inheritance rules a bit:
-- Allow block to inherit memory if its own memory is intact
@kd-11 kd-11 linked an issue Apr 18, 2021 that may be closed by this pull request
10 tasks
@kd-11
Copy link
Contributor Author

kd-11 commented Jun 9, 2021

I will start adding these commits to master along with some restructuring. I'm unblocking the ticket itself for the moment.

@kd-11 kd-11 removed the Blocked This issue's cause is known, but another ticket is blocking further progress. label Jun 9, 2021
@kd-11
Copy link
Contributor Author

kd-11 commented Jan 2, 2022

A lot of what is discussed here has been already committed into master. The RCB/RDB problems are acknowledged and solutions are in development for some time now.

@kd-11 kd-11 closed this Jan 2, 2022
@kd-11 kd-11 deleted the memleak_fix branch August 9, 2024 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Surface configuration works inconsistently Madden 17 Freezing Issue (BLUS31589)
8 participants