-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Texture Cache cleanup/refactoring & fixes #5115
Conversation
NHL 15 - Access violations and infinite loading to ingame |
Battlefield: Bad Company crashes on start up |
No problems with InFamous or Serious Sam 3. |
GCC build error on Ubuntu 18.04.
|
Utilities/types.h
Outdated
@@ -56,6 +70,16 @@ | |||
|
|||
#define STR_CASE(...) case __VA_ARGS__: return #__VA_ARGS__ | |||
|
|||
|
|||
#define ASSERT(x) do { if(!(bool)(x)) fmt::raw_error("Assertion failed: " STRINGIZE(x) HERE); } while(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use C style casts. if (!(x)) is much more correct.
Utilities/types.h
Outdated
@@ -21,20 +21,34 @@ | |||
#define IS_BE_MACHINE 0 | |||
|
|||
#ifdef _MSC_VER | |||
#define ASSUME(cond) __assume(cond) | |||
|
|||
#define ASSUME(cond) __assume(cond) // MSVC __assume ignores side-effects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Old problem, but most macro should use ... and VA_ARGS instead of a single named arg.
public: | ||
static constexpr u32 block_size = 0x100'0000; | ||
static_assert(block_size % 4096u == 0, "block_size must be a multiple of the page size"); | ||
static constexpr u32 num_blocks = (u32)(0x1'0000'0000ull / block_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C style cast damages.
static constexpr u32 num_blocks = u32{0x1'0000'0000ull / block_size};
This PR does not touch the shader compilation, so you might be confusing those improvements with the fact that the shader cache exists and was collected the first time you ran the game on master.
What gcc version are you using, for reference? Although I believe @Nekotekina has already pointed out my mistake @Nekotekina @ Everyone Else |
Nvidia drivers also have their own cache which tries to speed up shader operations. It can sometimes activate or deactivate itself randomly, speeding up or slowing down shader-related operations. |
Hyperdimension Neptunia performance drops by ~60% in comparison to master. Beside that I don't see any difference (P5, Project Diva f2nd, Infamous, Ethernal Sonata, Drakengard 3). |
@ruipin - Ubuntu 18.04 with repository packages:
|
Oh, i guess i just readed the title wrong, and i wasn't checking last merged pr's. |
Since there are known bugs right now, no more testing is needed until further notice. Thanks to everyone who reported regressions. I will let you know once this is ready for further testing. @Kravickas @stride21 @raveskirza |
@ruipin Your right, I don't know how I missed that master my mistake. |
b9db0d3
to
2040269
Compare
All reported regressions should be fixed, plus a significant number of extra changes (I have added an updated list to the PR description above). Please (re-)test. As always, expect things to go wrong. ( |
getting this with pretty much every game: {rsx::thread} class std::runtime_error thrown: VirtualProtect failed |
Killzone 3 crashes just before going ingame, throws " {rsx::thread} class std::runtime_error thrown: Assertion failed: region.matches_dimensions(width, height, 1, 1)" |
The gcc 7.3.0 build is confirmed fixed on Ubuntu 18.04. BLUS31535, BLUS30727, BLUS31597, BLUS30701, All running - No texture issues observed - Roughly ~27 FPS on BLUS30701 (Hyperdimension Neptunia) compared to Master's ~28 FPS in the Evil Cave (First Dungeon) / All other games running at 30 FPS. |
No problems on my side, no regressions compared to current master. What's more, I observed significant performance improvement (5~10FPS depending on game and locations in game). |
@elhaya Thanks for the report, but I do not see that error with "pretty much every game" otherwise I wouldn't have requested people to retest. I'm not omniscient, so please list at least one or two games you have tried where you got that error, explain when/how you got that error, and if possible submit a log so I can actually understand what is causing it and fix it. @stride21 That is indeed an assertion I was afraid might fail. Will take a look, thanks for the report! @vietnam13231 That is good to hear, although I am honestly quite surprised at a 5-10FPS improvement. Out of curiosity, can you give one or two examples of games/locations where you see such a drastic change? I'd like to take a look myself. Thanks. |
THPS HD: The whole emulator locks up when trying to load any level. Log doesn't seem to contain anything useful. Master loads levels without any issues. Red Dead Redemption: Throws |
@MSuih As for RDR, thanks for the report. |
In that assertion build THPS HD still freezes when loading levels but it doesn't lock up rest of the emulator anymore. I didn't see anything different in log, no errors are being printed to it. |
I'll keep retesting master to see if I can reproduce the black flash but your PR does fix graphic issues with the game so it might not even be a regression. |
This PR fixes the blue tint that'd appear in some situations in P5 NPEB02436 introduced by #5146 |
0d581f0
to
ba0b5ae
Compare
I have done some last changes to fix some style issues pointed out by @kd-11. I believe this PR is ready to be reviewed/merged whenever. Meanwhile, I took a quick look at the TLOU segfault with WCB on. After some discussions with kd-11, I also took the chance to add the single-line fix for this issue. This should allow games that segfaulted with WCB enabled to now go in-game. Debugging this segfault raised some issues/inneficiencies with the way |
3806 lines of code added. This is a big boi |
Little Big Planet 2 crashes on vulkan with |
@ScorchEmber256 does that not happen on master? I did not touch the surface cache (which is where these verifications are failing) |
nevermind, yes it is happening on master |
I've been testing Motorstorm on this PR and it crashes when trying to go ingame and it does not throw an error when it does. I've tested it on master too and it gets ingame but gets a Verification failed error shortly after getting ingame. Edit: Should I just wait for this and #5163 to be merge to see if the issue is fixed? |
- ASSUME now uses __builtin_assume in clang - ASSERT defined as a wrapper around verify - AUDIT aliases ASSERT when _DEBUG or _AUDIT are set, otherwise empty
This allows for further flexibility on the RSX side, allowing us to fix some bugs and crashes in later commits.
6660995
to
3b3189f
Compare
@stride21 Since master also doesn't work, this does not seem like a regression. Have you tried the latest build with the WCB fix? |
I'm going to merge this as-is since there are no major reported regressions. Open a regression tracker if you find any more issues. |
Best to post in the issue since one is already made. Then just tag this pr if it fixes it. And, there's no real reason to notify people you're about to test something, we like to keep GitHub comments clean. (But I understand you were just asking where to post). |
@ruipin I've done a lot of retesting and the verification failed error isn't happening in previous builds any more. I'm not sure what caused it but I'm actually able to complete races now even though the game can still randomly crash. With your build it black screens and crashes when trying to go ingame and it doesn't throw an error when it does. |
This PR is the result of over 7 weeks of significant effort to debug, cleanup and optimize the RSX Texture Cache.
(Note that the texture cache is mostly a "backend" part of the emulator, so don't expect these changes to have a large impact on the number of bugs)
There are a significant amount of changes, which means this PR is high-risk. I have tried to verify this as best as possible (having introduced the
AUDIT
macro for this exact purpose), however I am a single person and cannot test every single game. Things will break and performance could suffer.Although in my tests this PR shows a small performance improvement depending on the game, this might just be a coincidence. Please report any significant peformance differences (both improvements and losses), as well as any regressions you find.
As a final note, I'd like to thank @kd-11 for his invaluable help in the last couple of months.
Now for the (long) changelist:
General changes:
ASSERT
andAUDIT
macros as wrappers around verify.AUDIT
can be enabled selectively by defining_DEBUG
or_AUDIT
before includingtypes.h
.Justification:
__builtin_unreachable
: https://gcc.godbolt.org/z/xYAtu6__builtin_assume
: https://gcc.godbolt.org/z/UIpAzB__assume
): https://gcc.godbolt.org/z/vGP2I4RSX/Texture Cache changes:
AUDIT
assertions, as well as texture cache sanity checks gated behindTEXTURE_CACHE_DEBUG
std::pair<u32,u32>
for address ranges into a dedicatedutils::address_range
structget_intersecting_set
andunprotect_set
to avoid chaining into neighboring or unsynchronized sections unless necessary. Re-enable full-range protection by default even without SRM.memory_protect
calls when invalidating sectionsstd::vector<section_storage_type*>
in the invalidation algorithm into a dedicatedaddress_range_vector
structranged_storage
reference counting through callbacks, getting rid of the very commontexture_cache.h:301
verification failures foreverreset()
is called and a section is going to be reused for something elsem_flush_always_cache
tracking through callbacks to avoid forgetting to remove sections (e.g. when they get reset/replaced)std::unordered_map
into a dedicated classranged_storage
class ranged_storage_block_list
) instead of usingstd::vector
. This prevents constant reallocations when the vector grows in size, while avoiding the slow (uncacheable) iterations common to a list of objects. After some testing I have chosen 32 sections per list element, as this seemed to show no performance loss compared tostd::vector
.range_iterator
for ranged storage. Given an address range, it will iterate automatically through all overlapping sections, using an optimized algorithm taking advantage the new "unowned overlapping sections" list mentioned above.ranged_storage
,cached_texture_section
, etc) fromRSX/Common/texture_cache.h
toRSX/Common/texture_cache_utils.h
.Other than the above changes, I underwent a general review of the texture cache code, which resulted in various small bug fixes and tweaks I did not mention above (too many to list), some of which have already been merged in past PRs.
A second batch of changes fixes known regressions, and implements the following changes:
section.discard()
now resets confirmed_range appropriatelyTEXTURE_CACHE_DEBUG
) will keep track of the current protection of each page, as well as a refcount of the sections that require it to be protected. This allows us to quickly check whether something went wrong with the invalidation process, speeding up debug and developmentaddress_range
instead of just the base address, in order to avoid as many collisionsm_flush_always_cache
These changes are confirmed to bring the Transformer games to the menu, whereas before they would get stuck after the intro logos.