New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Texture Sync, incompatible overlap handling, data flush improvements. #2971
Conversation
Fixes a lot of issues with Unreal Engine games. Still a few minor issues (some caused by dma fast path?) Needs docs and cleanup.
Improve rules for fast DMA
…extures. Fixes the new Life is Strange game.
Note: nosy people can no longer merge this with Vulkan. (unless they are nosy enough to implement the new backend methods)
…ressed tex too This one's for metro
Download the artifacts for this pull request: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, great work, nice to see some of those texture issues fixed!
with the new build Crash4 not starting :( |
Disable guest internet access. |
Won't help, there might come a PR that allows the game to boot again tho. |
Seems that the last build before merging have some issues (i hve post some of them on general channel) |
This fixes some regressions caused by Ryujinx#2971 which caused rendered 3D texture data to be lost for most slices. Fixes issues with Xenoblade 2's colour grading, probably a ton of other games. This also removes the check from TextureCache, making it the tiniest bit smaller (any win is a win here).
#2993) This fixes some regressions caused by #2971 which caused rendered 3D texture data to be lost for most slices. Fixes issues with Xenoblade 2's colour grading, probably a ton of other games. This also removes the check from TextureCache, making it the tiniest bit smaller (any win is a win here).
This PR aims to solve a bunch of issues caused by texture modification and data flushing, by primarily handling data flush on a per-handle instead of per-view basis, and synchronizing flushes with syncpoint increments.
This introduces a new backend method, so it's not currently compatible with Vulkan (though the benefits are more important there)
Part 1: Syncing Texture Flush
This one has been a while coming. When texture flush via memory tracking was first added, our CPU and various other components were considerably slower. The GPU was so tight with the CPU as a result of both this and the lack of backend multithreading, so when the game tried to access the data it was almost certain that the draw would have completed and the data would be there, just by chance.
This did not last. Over time people started encountering various issues where textures would flush before their data was fully ready, causing white water in BOTW and rainbow lighting in Splatoon. This is compounded a lot by the addition of Background Multithreading and Vulkan, so much that people thought there was a regression. There wasn't, the texture flush was just behaving as it always did.
The new flushing mechanism is similar to the method we use for buffers, which has proven speedy and always gets the data in place by the time the CPU wants it. There is one key difference - texture data is only synced at syncpoint increment, rather than every WaitForIdle. This is mostly a limitation of buffer flush that can be fixed later, but it means that adding these syncpoints for texture flush should not cost anything at all comparatively.
But things are not as simple as shoving the buffer syncaction logic onto textures, as there is another hurdle that could affect games that write into complex texture view layouts.
Modified Flags move to the Texture Group
Texture groups are a useful abstraction for keeping track of each subimage's memory tracking within a storage texture, and is shared across all views of a texture. Before now, this was not used for texture flush, and each texture view could have its own modified flag and flush action... these views could also overlap. While this has the benefit of fetching the data for modified views in one chunk, it doesn't bode well for tracking rules that use the gpu modified state of each subimage, as they are decoupled from the texture group and might flush at the wrong time.
Now, the modified flag and flush action have been moved to the texture group, and are tracked for each handle individually. Each handle can also have its own sync numbers for determining which sync object to wait on, or if it should even wait at all (more on this later)
Flushing by slice, rather than by view
Since modified flags are now owned by texture group handles, rather than texture views, they must be able to flush at a granularity smaller than a view. I've added methods on the backend to allow this, though cases which would flush the whole storage texture will still try to do that. These methods will also use the OGL 4.5 methods for flushing texture sub data, rather than flushing an entire level (all layers) and picking out of that. This has potential to be a little faster or slower depending on the game. Hopefully no drivers complain about this.
Flushing data that isn't ready
When a game flushes texture data, it's a reasonable assumption that the guest must have waited on some synchronization so that it could be sure that the data was actually there. If not, then they're leaving things up to chance, and they've essentially caused a race. This is undefined behaviour, so if texture data is accessed before its syncpoint is incremented, we flush the data regardless of whether it is ready or not (but wait on the last syncpoint that the guest did acknowledge, if possible).
This is the best of both worlds - we ensure data that's appropriately waited on becomes available, don't waste time waiting on accidental flushes that happen after an unrelated gpu write or other textures that share pages.
When one of these undefined flushes happens, the Modified flag is not cleared. The tracking action is re-added when the sync for the texture is actually reached, in case the game wants to access the up to date data later on. CPU writes during this state are considered to be racing with the GPU, so this is also treated specially.
When the modified flag remains set on a texture handle, data is ignored from the CPU. One possible case is that data is written to one mip level of a texture, but another mip level of the same texture remains untouched and contains data modified by the GPU (and is not flushed due to sync rules). Both are on the same page, so both recieve the same dirty flag from write tracking... but one contains data that is unflushed or flushed out of date. It would be invalid to overwrite that unflushed data from CPU if it truly was not touched, so tracking dirty flags are consumed and ignored in this state. Given a race, we therefore say that the GPU always wins.
Finally, it's important to note that accesses from the GPU thread do not follow these rules at all. The GPU's own engines and methods must have the texture data at the latest point in time. Thankfully, flushing from the main GPU thread in our backends guarantees this, so sync is skipped and the data is accessed "immediately". Buffers, conversely, must wait right now.
Part 2: Flush ordering for incompatible data
As of right now, Ryujinx follows a core assumption with the texture cache, where a use of any given texture should be valid in its own layout+format. If this is the case, then any previously cached textures overlapping the same memory that do not share the layout and format are tharefore invalid - they will contain garbage data. This core assumption keeps important texture data alive while saving time by not flushing or loading data that would be "garbage" in the new format/layout.
This rule wasn't fully established before. Incompatible textures only have the potential to be deleted when a new overlap appears over them, and any checks only happened when the texture was created. If both textures existed in the cache at the same point, they would be disjoint, and could flush separately... in any order. This meant that old texture data could flush over new texture data - not ideal for games with complex streaming behaviour.
This is where the new incompatible overlap rules come into play. To put it simply, if a texture is written to and one or more other incompatible textures overlap its memory, their data is considered invalid and their modified flags are immediately cleared. This means that only one can live at a time, and therefore when data flushes, it will always be the latest. Problem solved! Note that these rules were already in place before, they were just enforced on creation, rather than on each use.
This is a large sweeping change that affects every game, so testing would be appreciated to see if everything is still holding up.
Part 3: Flushing host incompatible formats
Occasionally, a texture format or relationship is used which isn't fully supported on the host. Two such examples are:
Ryujinx supports these formats by decompressing them to a supported, uncompressed format on the CPU. But, this means that data cannot be copied to/from these textures on the gpu due to using the replacement format, so they are ineligible for texture copy dependencies and views. This causes problems when certain games draw or copy data to these formats by aliasing them as some other data format:
The changes to add incompatible overlaps allowed me to add an additional case for these textures - the ability to force overlaps of these host incompatible format textures to flush their data back to the CPU before it can be used. This is not the fastest, but it is fully compatible and allows us to cover cases which were completely broken before.
This path is essentially the "slow path" for this kind of relationship, and will allow us to support rendered compressed textures on platforms that don't support BCn like mobile hardware. In time, we can introduce compute shader paths that will allow copy dependencies between data textures and "emulated" compressed textures to be done without any data flushing at all, which will further improve performance.
Potential issues with buffer/texture flush as it stands
One tracking structure is shared between all users of the same address space - they all see and signal the same tracked handles. However, there might be times where we want to sync on GPU, but the sync should not be visible on CPU yet. On top of this, a sync from CPU could remove tracking without sync flushing, but the GPU might try to use that handle immediately after for something like a UBO/I2M write and expect to get the latest data.
This last case can happen with buffers and textures. Not that it couldn't happen before, but it's important to bring up as potential future work. I have not seen it happen in practice.
Fixes
1:
2:
3:
I also fixed an issue with partial 3D texture DMA/load/flush that caused the gob size in z to be incorrect. Not sure what this fixes, but it might improve some issues with lighting in UE4 games a little.
Some things have been fixed regarding the DmaClass fast path used for videos. It won't try to copy to compressed textures directly now, and is aware of its 1 layer limitation.
Testing would be greately appreciated, to ensure that nothing is broken and that performance is similar.
Specifically, AMD/Intel Windows and Linux should be tested to make sure they don't have any weird issues with
glGetTextureSubImage
, as it's technially a DSA method and these drivers are pretty terrible at that.