Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return mapped buffer pointer directly for flush, WriteableRegion for textures #2494

Merged
merged 6 commits into from Jul 19, 2021

Conversation

riperiperi
Copy link
Member

@riperiperi riperiperi commented Jul 18, 2021

A few changes here to generally improve performance, even for platforms not using the persistent buffer flush.

  • Texture and buffer flush now return a ReadOnlySpan. It's guaranteed that this span is pinned in memory, but it will be overwritten on the next flush from that thread, so it is expected that the data is used before calling again.
  • As a result, persistent mappings no longer copy to a new array - rather the persistent map is returned directly as a Span<>. A similar host array is used for the glGet flushes instead of allocating new arrays each time.
  • Texture flushes now do their layout conversion into a WriteableRegion when the texture is not MultiRange, which allows the flush to happen directly into guest memory rather than into a temporary span, then copied over. This avoids another copy when doing layout conversion.
  • The WritableRegion can now be created with or without triggering tracking. The IWritableBlock interface has a method implementation for an untracked write that will just call the tracked write if no unique implementation is provided.

Overall, this saves 1 data copy for buffer flush, 1 copy for linear textures with matching source/target stride, and 2 copies for block textures or linear textures with mismatching strides.

On platforms that don't flush data with the Persistent Buffers, this will still save on layout conversion, and there will be no allocations for buffer/texture flush. Everyone's a winner.

Comparisons

I totally forgot to do these the last time, so let's get a running comparison of how some affected games are doing:

Skyward Sword HD

Master 1.0.6969 (unfortunately Not Nice as it was before all the persistent buffer stuff)

image

Master (current)

image

This PR

image

Pokemon Sword

Master 1.0.6969

image

Master (current)

image

This PR

image

…textures

A few changes here to generally improve performance, even for platforms not using the persistent buffer flush.

- Texture and buffer flush now return a ReadOnlySpan<byte>. It's guaranteed that this span is pinned in memory, but it will be overwritten on the next flush from that thread, so it is expected that the data is used before calling again.
- As a result, persistent mappings no longer copy to a new array - rather the persistent map is returned directly as a Span<>. A similar host array is used for the glGet flushes instead of allocating new arrays each time.
- Texture flushes now do their layout conversion into a WriteableRegion when the texture is not MultiRange, which allows the flush to happen directly into guest memory rather than into a temporary span, then copied over. This avoids another copy when doing layout conversion.

Overall, this saves 1 data copy for buffer flush, 1 copy for linear textures with matching source/target stride, and 2 copies for block textures or linear textures with mismatching strides.
@riperiperi riperiperi added gpu Related to Ryujinx.Graphics performance Performance issue or improvement labels Jul 18, 2021
Copy link
Member

@gdkchan gdkchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Texture and buffer flush now return a ReadOnlySpan. It's guaranteed that this span is pinned in memory, but it will be overwritten on the next flush from that thread, so it is expected that the data is used before calling again.

The restriction is a bit unfortunate, it doesn't seems that unlikely that someone would inadvertently try to call the methods for 2 differents textures sequentially. It is not a problem right now though, as it is always called, then written back and done with.

The WritableRegion can now be created with or without triggering tracking. The IWritableBlock interface has a method implementation for an untracked write that will just call the tracked write if no unique implementation is provided.

Cool, I actually commented briefly about this on #2473 (comment)
The copies on DMA class could use it, instead of the span that is copied into a new array and then written back. Eventually, I2M should be able to use it too. I believe the SignalMemoryTracking call when the WritableRegion is created is not necessary if the span is fully written (that is, every single byte is overwritten without caring about the current value), it is only needed if the span is partially written. Whenever or not doing this little optimization is profitable depends on the SignalMemoryTracking cost though. If it's cheap, I guess its not worth the hassle.

I think VIC copies could use that too, I will give it a try after this is merged.

{
MemoryRange subrange = Range.GetSubRange(0);

WritableRegion region = _physicalMemory.GetWritableRegion(subrange.Address, (int)subrange.Size, tracked);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using WritableRegion region without the explicit dispose.

@@ -359,6 +359,7 @@ public static class LayoutConverter
}

public static ReadOnlySpan<byte> ConvertLinearToBlockLinear(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the return be removed on those methods now that its possible to specify the output storage?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the callers would need to be changed to allocate themselves rather than passing an empty span, but imo its better than making the function either use the supplied storage or allocate its own if empty.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a small problem with that and the linear -> linear strided conversion, where it does not allocate storage but instead uses the input storage when no conversion is needed. The caller would need to check this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its better to check this case on the caller anyway, since you can avoid doing the call.

@marysaka marysaka requested a review from gdkchan July 19, 2021 17:11
Copy link
Member

@gdkchan gdkchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks.

@gdkchan gdkchan merged commit 4b60371 into Ryujinx:master Jul 19, 2021
dtlnor added a commit to dtlnor/Ryujinx that referenced this pull request Oct 1, 2021
commit d92fff541bf6fddadabf6ab628ddf8fec41cd52e
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Wed Sep 29 01:27:03 2021 +0100

    Replace CacheResourceWrite with more general "precise" write (#2684)

    * Replace CacheResourceWrite with more general "precise" write

    The goal of CacheResourceWrite was to notify GPU resources when they were modified directly, by looking up the modified address/size in a structure and calling a method on each resource. The downside of this is that each resource cache has to be queried individually, they all have to implement their own way to do this, and it can only signal to resources using the same PhysicalMemory instance.

    This PR adds the ability to signal a write as "precise" on the tracking, which signals a special handler (if present) which can be used to avoid unnecessary flush actions, or maybe even more. For buffers, precise writes specifically do not flush, and instead punch a hole in the modified range list to indicate that the data on GPU has been replaced.

    The downside is that precise actions must ignore the page protection bits and always signal - as they need to notify the target resource to ignore the sequence number optimization.

    I had to reintroduce the sequence number increment after I2M, as removing it was causing issues in rabbids kingdom battle. However - all resources modified by I2M are notified directly to lower their sequence number, so the problem is likely that another unrelated resource is not being properly updated. Thankfully, doing this does not affect performance in the games I tested.

    This should fix regressions from #2624. Test any games that were broken by that. (RF4, rabbids kingdom battle)

    I've also added a sequence number increment to ThreedClass.IncrementSyncpoint, as it seems to fix buffer corruption in OpenGL homebrew. (this was a regression from removing sequence number increment from constant buffer update - another unrelated resource thing)

    * Add tests.

    * Add XML docs for GpuRegionHandle

    * Skip UpdateProtection if only precise actions were called

    This allows precise actions to skip reprotection costs.

commit b6e093b0fce3dc4fe607a84f57e6406b5ab8e387
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Wed Sep 29 01:11:05 2021 +0100

    Force copy when auto-deleting a texture with dependencies (#2687)

    When a texture is deleted by falling to the bottom of the AutoDeleteCache, its data is flushed to preserve any GPU writes that occurred. This ensures that the data appears in any textures recreated in the future, but didn't account for a texture that already existed with a copy dependency.

    This change forces copy dependencies to complete if a texture falls out from from the AutoDeleteCache. (not removed via overlap, as that would be wasted effort)

    Fixes broken lighting caused by pausing in SMO's Metro Kingdom. May fix some other issues.

commit fd7567a6b56fcb82a52b85097582fc0a67038457
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Tue Sep 28 20:55:12 2021 -0300

    Only make render target 2D textures layered if needed (#2646)

    * Only make render target 2D textures layered if needed

    * Shader cache version bump

    * Ensure topology is updated on channel swap

commit 312be74861dae16311f4376e32195f8a4fd372c6
Author: FICTURE7 <FICTURE7@gmail.com>
Date:   Wed Sep 29 03:38:37 2021 +0400

    Optimize `HybridAllocator` (#2637)

    * Store constant `Operand`s in the `LocalInfo`

    Since the spill slot and register assigned is fixed, we can just store
    the `Operand` reference in the `LocalInfo` struct. This allows skipping
    hitting the intern-table for a look up.

    * Skip `Uses`/`Assignments` management

    Since the `HybridAllocator` is the last pass and we do not care about
    uses/assignments we can skip managing that when setting destinations or
    sources.

    * Make `GetLocalInfo` inlineable

    Also fix a possible issue where with numbered locals. See or-assignment
    operator in `SetVisited(local)` before patch.

    * Do not run `BlockPlacement` in LCQ

    With the host mapped memory manager, there is a lot less cold code to
    split from hot code. So disabling this in LCQ gives some extra
    throughput - where we need it.

    * Address Mou-Ikkai's feedback

    * Apply suggestions from code review

    Co-authored-by: VocalFan <45863583+Mou-Ikkai@users.noreply.github.com>

    * Move check to an assert

    Co-authored-by: VocalFan <45863583+Mou-Ikkai@users.noreply.github.com>

commit 1ae690ba2f407042456207d40e425f8b1f900863
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Wed Sep 29 00:21:30 2021 +0100

    Use normal memory store path for DC ZVA (#2693)

    Seems like this is used as an optimized way to clear memory in homebrew applications. Unfortunately, calling the software fallback method every 8 bytes was not very optimal.

    The existing EmitStore is used by passing in ZR as the register to get a 0 write.

commit 33dc4c9ce40165795da884eaa684f16e8b643799
Author: Ac_K <Acoustik666@gmail.com>
Date:   Wed Sep 29 01:03:35 2021 +0200

    clkrst: Stub/Implement IClkrstManager and IClkrstSession calls (#2692)

    This PR stubs and implements some clkrst call because they are used to overclock the Switch hardware and it's pointless in our case as we emulate the system.
    Everything was done checked by RE.

    Fixes #2686

commit f4f496cb48a59aae36e3252baa90396e1bfadd2e
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Tue Sep 28 19:43:40 2021 -0300

    NVDEC (H264): Use separate contexts per channel and decode frames in DTS order (#2671)

    * Use separate NVDEC contexts per channel (for FFMPEG)

    * Remove NVDEC -> VIC frame override hack

    * Add missing bottom_field_pic_order_in_frame_present_flag

    * Make FFMPEG logging static

    * nit: Remove empty lines

    * New FFMPEG decoding approach -- call h264_decode_frame directly, trim surface cache to reduce memory usage

    * Fix case

    * Silence warnings

    * PR feedback

    * Per-decoder rather than per-codec ownership of surfaces on the cache

commit 0d23504e30395ba20d1704da464b41f3fe539062
Author: FICTURE7 <FICTURE7@gmail.com>
Date:   Wed Sep 29 02:28:34 2021 +0400

    Fix PTC count table relocation patching (#2666)

    Fix an issue introduced in #2190 where by 2 different count table entry
    addresses were used for LCQ functions. E.g:

    ```asm
     .L1:
       mov rbp,COUNT_TABLE_0   ;; This gets an address.
       mov ebp,[rbp]
       lea esi,[rbp+1]
       mov rdi,COUNT_TABLE_1   ;; This gets another address.
       mov [rdi],esi
       cmp ebp,64h
       je near .L34
    ```

    This caused LCQ functions to not tier up when they're loaded from the
    PTC cache. This does not happen when they're freshly compiled.

    This PR fixes the issue by ensuring only a single counter is created per
    translation.

commit 79c854dd2e68f96f802bbb42568e4c52e31fc80e
Author: Ac_K <Acoustik666@gmail.com>
Date:   Wed Sep 29 00:10:10 2021 +0200

    irs: Stub some service calls (#2665)

    This PR stubs some irs service calls which are needed to get some games playable or at least bootable since we don't support IR data throught real JoyCon for now.

    - Stubs `IIrSensorServer` `StopImageProcessor`, `RunMomentProcessor`, `RunClusteringProcessor`, `RunImageTransferProcessor`, `GetImageTransferProcessorState`, `RunTeraPluginProcessor`. All calls are a bit checked by RE.

    Closes #2267, #2248, #2126

    Night Vision and SpyAlarm are now bootable (but still unplayable due to the lack of the IR data):

commit 83bdafccdab01322af8503e9ad21f52981e646c1
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Tue Sep 28 18:52:27 2021 -0300

    Share scales array for graphics and compute (#2653)

commit 405840a24b75ecd7cab52c1960464071e2bc9b81
Author: VocalFan <45863583+Mou-Ikkai@users.noreply.github.com>
Date:   Tue Sep 28 17:26:45 2021 -0400

    Quick README update for game compatibility. (#2694)

commit 7c5ead1c196d597384085cc9a609afdc89a43774
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Sun Sep 19 14:09:53 2021 +0100

    Fast path for Inline2Memory buffer write that skips write tracking (#2624)

    * Fast path for Inline2Memory buffer write

    This PR adds a method to PhysicalMemory that attempts to write all cached resources directly, so that memory tracking can be avoided. The goal of this is both to avoid flushing buffer data, and to avoid raising the sequence number when data is written, which causes buffer and texture handles to be re-checked.

    This currently only targets buffers, with a side check on textures that falls back to a tracked write if any exist within the target range. It's not expected to write textures from here - this is just a mechanism to protect us if someone does decide to do that. It's possible to add a fast path for this in future (and for ShaderCache, once that starts using tracking)

    The forced read before inline2memory begins has been skipped, as the data is fully written when the transfer is completed anyways. This allows us to flush on read in emergency situations, but still write the new data over the flushed data.

    Improves performance on Xenoblade 2 and DE, which was flushing buffer data on the GPU thread when trying to write compute data. May improve performance in other games that write SSBOs from compute, and update data in the same/nearby pages often.

    Super Smash Bros Ultimate should probably be tested to make sure the vertex explosions haven't returned, as I think that's what this AdvanceSequence was for.

    * ForceDirty before write, to make sure data does not flush over the new write

commit db97b1d7d20715f60647c422cca5ec769a5e8223
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Sun Sep 19 13:55:07 2021 +0100

    Implement and use an Interval Tree for the MultiRangeList (#2641)

    * Implement and use an Interval Tree for the MultiRangeList

    * Feedback

    * Address Feedback

    * Missed this somehow

commit f08a280adef015e9a9a0e9273b4edffeb1157f3a
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Sun Sep 19 09:38:39 2021 -0300

    Use shader subgroup extensions if shader ballot is not supported (#2627)

    * Use shader subgroup extensions if shader ballot is not supported

    * Shader cache version bump + cleanup

    * The type is still required on the table

commit 7379bc2f39557929f283a423fe7f4b7390d08261
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Sun Sep 19 13:22:26 2021 +0100

    Array based RangeList that caches Address/EndAddress (#2642)

    * Array based RangeList that caches Address/EndAddress

    In isolation, this was more than 2x faster than the RangeList that checks using the interface. In practice I'm seeing much better results than I expected. The array is used because checking it is slightly faster than using a list, which loses time to struct copies, but I still want that data locality.

    A method has been added to the list to update the cached end address, as some users of the RangeList currently modify it dynamically.

    Greatly improves performance in Super Mario Odyssey, Xenoblade and any other GPU limited games.

    * Address Feedback

commit b0af010247a2bc1d9af1fb1068d4fad0319ad216
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Sun Sep 19 13:03:05 2021 +0100

    Set texture/image bindings in place rather than allocating and passing an array (#2647)

    * Remove allocations for texture bindings and state

    * Rent rather than stackalloc + copy

    A bit faster.

commit 32c09af71a5bebdb711b175627e1e26370275d96
Author: Mary <mary@mary.zone>
Date:   Sun Sep 19 13:42:16 2021 +0200

    amadeus: Fix regression from #2654 on ListAudioDeviceName

commit 40d1acd1982705224413bc882f6ae25d4bf8ee1a
Author: Ac_K <Acoustik666@gmail.com>
Date:   Sun Sep 19 12:57:39 2021 +0200

    vi: Unify resolutions values and accurate implementation of them. (#2640)

    * vi: Unify resolutions values and accurate implementation of them.

    To continue what was made in #2618, I've REd `vi` service a bit. Now values and checks related to displays are more accurate.

    - `am`  GetDefaultDisplayResolution / GetDefaultDisplayResolutionChangeEvent have more informations on what the service does.
    - `vi:u/vi:m/vi:s` GetDisplayService are now accurate.
    - `IApplicationDisplay` GetRelayService, GetSystemDisplayService, GetManagerDisplayService, GetIndirectDisplayTransactionService, ListDisplays, OpenDisplay, OpenDefaultDisplay, CloseDisplay, GetDisplayResolution are now properly implemented.
    - Some other calls are cleaned or have extra checks accordingly to RE.

    Additionnaly, `IFriendService` have some wrong aligned things, and `pm:info` service placeholder was missing.

    * just use _openedDisplayInfo.Remove()

    * use context.Memory.Fill()

    * fix some casting

    * remove unneeded comment

    * cleanup

    * uses TryAdd

    * displayId > ulong

    * GetDisplayResolution > ulong

    * UL

commit e17eb7bfafdd95084baea8e9f3dc77ee3f755347
Author: Mary <me@thog.eu>
Date:   Sun Sep 19 12:29:19 2021 +0200

    amadeus: Update to REV10 (#2654)

    * amadeus: Update to REV10

    This implements all the changes made with REV10 on 13.0.0.

    * Address Ack's comment

    * Address gdkchan's comment

commit fe9d5a1981cfe43c4535b7473064c9858addb3b5
Author: mpnico <mpnico@gmail.com>
Date:   Sat Sep 18 14:31:44 2021 +0200

    Fix problems added by Pause (#2645)

    * Disable Pause/Resume menu instead of trying to hide them

    * Fix Resume menu being active before renderer starts

    * Fix emulator not being able to close properly

commit d327e809c9c9d1f4c035c50bf6315eea83ce0147
Author: Ac_K <Acoustik666@gmail.com>
Date:   Thu Sep 16 00:09:48 2021 +0200

    gui: Hotfix for FileChooserNative during section extraction (#2644)

    Fix a regression introduced in #2633, FileChooserNative parent can't be set to null because it's running in modal.

commit 843401635acb08686c745351666f86146de841b7
Author: MutantAura <44103205+MutantAura@users.noreply.github.com>
Date:   Wed Sep 15 01:26:10 2021 +0100

    Adjustments to framerate metric and addition of frametime (#2638)

    * Adjust framerate data and add frametime

    * Update PerformanceStatistics.cs

    * Revert deletions of average framerate

    * Update Ryujinx.csproj

    * Remove separate GTK column

    * Increase FPS precision

    * general cleanup

    * even generaler cleanup

    * fix dumb

    * Remove legacy code

    * Update PerformanceStatistics.cs

    * Update PerformanceStatistics.cs

commit fb2e61a435f6a9d2201d521df47c9bfd12031df3
Author: Michael Gielda <mgielda@antmicro.com>
Date:   Wed Sep 15 01:47:10 2021 +0200

    Add Linux Unicorn patch + desc. (#2609)

commit 5d08e9b495a2315e8a4758a8123466665085d044
Author: Ac_K <Acoustik666@gmail.com>
Date:   Wed Sep 15 01:24:49 2021 +0200

    hos: Cleanup the project (#2634)

    * hos: Cleanup the project

    Since a lot of changes has been done on the HOS project, there are some leftover here and there, or class just used in one service, things at wrong places, and more.
    This PR fixes that, additionnally to that, I've realigned some vars because I though it make the code more readable.

    * Address gdkchan feedback

    * addresses Thog feedback

    * Revert ElfSymbol

commit 3f2486342b3ef4610b6af6a52624614d2a7ad8ae
Author: Ac_K <Acoustik666@gmail.com>
Date:   Tue Sep 14 23:52:08 2021 +0200

    gui: Replace FileChooserDialog by FileChooserNative (#2633)

    We currently use the FileChooser from GTK, which is a bit mess. Instead of it we could use the native FileChooser from all specifics OS. This is what this PR attempt to fix.

    It could be nice to get a test under linux since I've only tested it under Windows without any issues.

    Fixes #2584

commit a9343c9364246d3288b4e7f20919ca1ad2e1fd3e
Author: FICTURE7 <FICTURE7@gmail.com>
Date:   Tue Sep 14 03:23:37 2021 +0400

    Refactor `PtcInfo` (#2625)

    * Refactor `PtcInfo`

    This change reduces the coupling of `PtcInfo` by moving relocation
    tracking to the backend. `RelocEntry`s remains as `RelocEntry`s through
    out the pipeline until it actually needs to be written to the PTC
    streams. Keeping this representation makes inspecting and manipulating
    relocations after compilations less painful. This is something I needed
    to do to patch relocations to 0 to diff dumps.

    Contributes to #1125.

    * Turn `Symbol` & `RelocInfo` into readonly structs

    * Add documentation to `CompiledFunction`

    * Remove `Compiler.Compile<T>`

    Remove `Compiler.Compile<T>` and replace it by `Map<T>` of the
    `CompiledFunction` returned.

commit ac4ec1a0151fd958d7ec58146169763b446836fe
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Sat Sep 11 17:54:18 2021 -0300

    Account for negative strides on DMA copy (#2623)

    * Account for negative strides on DMA copy

    * Should account for non-zero Y

commit 016fc64b3df8e039e62f3022139244061a00ec30
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Sat Sep 11 17:39:02 2021 -0300

    Implement GetVaRegions on nvservices (#2621)

    * Implement GetVaRegions on nvservices

    * This would just result in 0

commit a4089fc87871d795dea2723b8869ab22cf970085
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Sat Sep 11 17:24:10 2021 -0300

    Report 1080p resolution when in docked mode (#2618)

commit 117e32a6fffc30cdb895aa98483af7df353a8dd1
Author: mpnico <mpnico@gmail.com>
Date:   Sat Sep 11 22:08:25 2021 +0200

    Implement a "Pause Emulation" option & hotkey (#2428)

    * Add a "Pause Emulation" option and hotkey

    Closes Ryujinx#1604

    * Refactoring how pause is handled

    * Applied suggested changes from review

    * Applied suggested fixes

    * Pass correct suspend type to threads for suspend/resume

    * Fix NRE after stoping emulation

    * Removing SimulateWakeUpMessage call after resuming emulation

    * Skip suspending non game process

    * Pause the tickCounter in the ExecutionContext

    * Refactoring tickCounter pause/resume as suggested

    * Fix Config migration to add pause hotkey

    * Fixed pausing only application threads

    * Fix exiting emulator while paused

    * Avoid pause/resume while already paused/resumed

    * Cleanup unused code

    * Avoid restarting audio if stopping emulation while in pause.

    * Added suggested changes

    * Fix ConfigurationState

commit b0e410a828fd37bf0d9021fc2f6b630e3944a861
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Sat Sep 11 20:52:54 2021 +0100

    Lift textures in the AutoDeleteCache for all modifications. (#2615)

    * Lift textures in the AutoDeleteCache for all modifications.

    Before, this would only apply to render targets and texture blit. Now it applies to image stores, the fast dma copy path and any other type of modification.

    Image store always at least has one reference in the texture pool, so the function of the AutoDeleteCache keeping textures _alive_ is not useful, but a very important function for a while has been its use to flush textures in order of modification when they are dereferenced, so that their data is not lost.

    Before, textures populated using image stores were being dereferenced and reloaded as garbage. Now, when these textures are dereferenced, their data will be put back into memory, and everything stays intact.

    Fixes lighting breaking when switching levels in THPS1+2, and potentially some more UE4 games. I've tested a bunch more games for regressions and performance impact, but they all seem fine.

    * Lift copy srcTexture so that it doesn't remain referenceless

    * Perform lift before reference count change on unbind.

    It's important to lift on unbind as that is the moment the texture was truly last modified, but definitely not after releasing every single reference.

commit 197f5878027c5bbb89962217e061cfc1d1993b11
Author: Agustin Insua <Nistenf@users.noreply.github.com>
Date:   Sat Sep 11 16:32:36 2021 -0300

    Fix GTK3 mapping for single quote key (#2612)

commit bcbe6ef6cdbd22d22566cb17d5cd80f9cd15bbac
Author: Agustin Insua <Nistenf@users.noreply.github.com>
Date:   Sat Sep 11 16:16:48 2021 -0300

    Update game metadata when stopping emulation (#2610)

    * Update game metadata when stopping emulation

    * Fix formatting

commit 830d1f097d62ea4888d2d405f3c74b54948889aa
Author: bobhope <william.reindl01@gmail.com>
Date:   Sat Sep 11 14:59:11 2021 -0400

    Remove file error popup (#2547)

    * Added check to detect if application file is more than zero bytes long

    * Removed file error popup

    * Removed unnecessary usings

    * Added empty lines

commit f0b00c1ae92a2e8fdff9c532e7f2f79ad708b184
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Thu Sep 2 04:17:43 2021 +0100

    Fix TXQ for 3D textures. (#2613)

    * Fix TXQ for 3D textures.

    Assumes the texture is 3D if the component mask contains Z.

    This fixes a bug in UE4 games where parts of the map had garbage pointers to lighting voxels, as the lookup 3D texture was not being initialized. Most notable game is THPS1+2.

    May need another PR to keep image store data alive and properly flush it in order using the AutoDeleteCache.

    * Get sampler type for TextureSize from bound textures.

commit 142cededd4db2ff4f83a4833580d343a4f0a8cde
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Tue Aug 31 06:51:57 2021 +0100

    Implement Shader Instructions SUATOM and SURED (#2090)

    * Initial Implementation

    * Further improvements (no support for float/64-bit types)

    * Merge atomic and reduce instructions, add missing format switch

    * Fix rebase issues.

    * Not used.

    * Whoops. Fixed.

    * Partial implementation of inc/dec, cleanup and TODOs

    * Remove testing path

    * Address Feedback

commit 416dc8fde49f8eb42d47b1ab606028a5cabe8f90
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Mon Aug 30 14:02:40 2021 -0300

    Fix out-of-bounds shader thread shuffle (#2605)

    * Fix out-of-bounds shader thread shuffle

    * Shader cache version bump

commit 82cefc8dd3babb781d4b7229435e26911fb083dd
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Sun Aug 29 16:52:38 2021 -0300

    Handle indirect draw counts with non-zero draw starts properly (#2593)

commit 15e7fe3ac940a1768a25326e66683ad0f23127e0
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Sun Aug 29 20:22:13 2021 +0100

    Avoid deleting textures when their data does not overlap. (#2601)

    * Avoid deleting textures when their data does not overlap.

    It's possible that while two textures start and end addresses indicate an overlap, that the actual data contained within them is sparse due to a layer stride. One such possibility is array slices of a cubemap at different mip levels - they overlap on a whole, but the actual texture data fills the gaps between each other's layers rather than actually overlapping.

    This fixes issues with UE4 games having incorrect lighting (solid white screen or really dark shadows). There are still remaining issues with games that use the 3D texture prebaked lighting, such as THPS1+2.

    This PR also fixes a bug with TexturePool's resized texture handling where the base level in the descriptor was not considered.

    * AllRegions granularity for 3d textures is now by level rather than by slice.

    * Address feedback

commit 54adc5f9fb65f4b03bc28da5899d2413a84f66c2
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Sun Aug 29 20:03:41 2021 +0100

    Ensure that all threads wait for a read tracking action to complete. (#2597)

    * Lock around tracking action consume + execute. Not particularly fast.

    * Lock around preaction registration and use

    * Create a lock object

    * Nit

commit 76e8f9ac87c0164f4f09600dcf8b6a5b4d062bf5
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Fri Aug 27 21:08:30 2021 +0100

    Only reupload the texture scale array if it changes. (#2595)

    * Only reupload the texture scale array if it changes.

    Before, this would be called all the time if any shader needed a scale value. The cost of doing this has increased with threaded-gal, as the scale array is copied to a span pool, and it's was called on pretty much every draw sometimes.

    This improves GPU performance in games, scaled or not. Most affected game seems to be Xenoblade Chronicles: Definitive Edition.

    * Just use = instead of |=

commit ee1038e54255797a94b89091f4d59b77daad1a7b
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Thu Aug 26 20:44:47 2021 -0300

    Initial support for shader attribute indexing (#2546)

    * Initial support for shader attribute indexing

    * Support output indexing too, other improvements

    * Fix order

    * Address feedback

commit ec3e848d7998038ce22c41acdbf81032bf47991f
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Thu Aug 26 23:31:29 2021 +0100

    Add a Multithreading layer for the GAL, multi-thread shader compilation at runtime (#2501)

    * Initial Implementation

    About as fast as nvidia GL multithreading, can be improved with faster command queuing.

    * Struct based command list

    Speeds up a bit. Still a lot of time lost to resource copy.

    * Do shader init while the render thread is active.

    * Introduce circular span pool V1

    Ideally should be able to use structs instead of references for storing these spans on commands. Will try that next.

    * Refactor SpanRef some more

    Use a struct to represent SpanRef, rather than a reference.

    * Flush buffers on background thread

    * Use a span for UpdateRenderScale.

    Much faster than copying the array.

    * Calculate command size using reflection

    * WIP parallel shaders

    * Some minor optimisation

    * Only 2 max refs per command now.

    The command with 3 refs is gone. :relieved:

    * Don't cast on the GPU side

    * Remove redundant casts, force sync on window present

    * Fix Shader Cache

    * Fix host shader save.

    * Fixup to work with new renderer stuff

    * Make command Run static, use array of delegates as lookup

    Profile says this takes less time than the previous way.

    * Bring up to date

    * Add settings toggle. Fix Muiltithreading Off mode.

    * Fix warning.

    * Release tracking lock for flushes

    * Fix Conditional Render fast path with threaded gal

    * Make handle iteration safe when releasing the lock

    This is mostly temporary.

    * Attempt to set backend threading on driver

    Only really works on nvidia before launching a game.

    * Fix race condition with BufferModifiedRangeList, exceptions in tracking actions

    * Update buffer set commands

    * Some cleanup

    * Only use stutter workaround when using opengl renderer non-threaded

    * Add host-conditional reservation of counter events

    There has always been the possibility that conditional rendering could use a query object just as it is disposed by the counter queue. This change makes it so that when the host decides to use host conditional rendering, the query object is reserved so that it cannot be deleted. Counter events can optionally start reserved, as the threaded implementation can reserve them before the backend creates them, and there would otherwise be a short amount of time where the counter queue could dispose the event before a call to reserve it could be made.

    * Address Feedback

    * Make counter flush tracked again.

    Hopefully does not cause any issues this time.

    * Wait for FlushTo on the main queue thread.

    Currently assumes only one thread will want to FlushTo (in this case, the GPU thread)

    * Add SDL2 headless integration

    * Add HLE macro commands.

    Co-authored-by: Mary <mary@mary.zone>

commit 501c3d5cea6b96f991453cc6f8d395d358d0d4c3
Author: Mary <me@thog.eu>
Date:   Fri Aug 27 00:07:44 2021 +0200

    Implement MSR instruction for A32 (#2585)

    * Implement MSR instruction

    Fix #1342.

    Now Pocket Rumble is playable.

    * Address gdkchan's comments

    * Address gdkchan's comments

    * Address gdkchan's comment

commit 8e1adb95cf7f67b976f105f4cac26d3ff2986057
Author: mpnico <mpnico@gmail.com>
Date:   Thu Aug 26 23:50:28 2021 +0200

    Add support for HLE macros and accelerate MultiDrawElementsIndirectCount #2 (#2557)

    * Add support for HLE macros and accelerate MultiDrawElementsIndirectCount

    * Add missing barrier

    * Fix index buffer count

    * Add support check for each macro hle before use

    * Add missing xml doc

    Co-authored-by: gdkchan <gab.dark.100@gmail.com>

commit 5cab8ea4ad2388bd035150e79f241ae5df95ab3b
Author: VocalFan <45863583+Mou-Ikkai@users.noreply.github.com>
Date:   Thu Aug 26 17:34:24 2021 -0400

    Fix Unicorn Warnings (#2575)

commit 32cad88cc60793f9ee6c11d15e8b5b71c3d725a2
Author: Alex Barney <thealexbarney@gmail.com>
Date:   Thu Aug 26 14:18:49 2021 -0700

    Bugfix LibHac update to 0.13.3 and remove SD card workaround (#2579)

commit 686b63e4794b975f8bb3cc5e03b2c9063c4d045f
Author: VocalFan <45863583+Mou-Ikkai@users.noreply.github.com>
Date:   Thu Aug 26 17:03:19 2021 -0400

    Added fallbacks for all Audio Backends (#2582)

    * Added fallbacks for all Audio Backends

    * Commit Suggestion

    Co-authored-by: gdkchan <gab.dark.100@gmail.com>

    * Resolve elses.

    * Revert "Resolve elses."

    This reverts commit 9aec3e9e7750803e5626da3d01d7e57fabe19f65.

    * Suggestions completed

    Co-authored-by: gdkchan <gab.dark.100@gmail.com>

commit 5b8ceb917308378c535fb4cd2288b8f19524bea0
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Thu Aug 26 17:47:21 2021 -0300

    Swap BGR565 components by changing the format (#2577)

commit 6d9bc7cf90e8016feea97eedb3cdd562c4628026
Author: Mary <me@thog.eu>
Date:   Thu Aug 26 22:26:28 2021 +0200

    sdl2: Update to Ryujinx.SDL2-CS 2.0.17 (#2553)

    * sdl2: Update to Ryujinx.SDL2-CS 2.0.17

    Update to latest SDL2 commit

    * Update to Ryujinx.SDL2-CS 2.0.17-build18

commit 5e99bff7deb51ad6d69d2393bc267a8a9428058c
Author: Alex Barney <thealexbarney@gmail.com>
Date:   Fri Aug 20 16:03:17 2021 -0700

    Ignore exceptions when cleaning the SD card saves (#2576)

commit d753de6d5de17cfaf36bb5ecfeff0f0d60846171
Author: VocalFan <45863583+Mou-Ikkai@users.noreply.github.com>
Date:   Fri Aug 20 17:48:00 2021 -0400

    Seeing if there are any other spelling errors to correct. (#2572)

    * "Informations" -> "Information"

    * Your -> You

    * will use -> using (Plus more detailed Appveyor error msg.)

    * Did a dumb thing, fixed it.

commit c702943af3c7e9396b8fa86e3c1be3cb9339addc
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Fri Aug 20 18:26:25 2021 -0300

    Swap BGR components for 16-bit BGR texture formats (#2567)

commit 6c76bc3bc0ecd1d3a86cf4e8c396c71370274ba1
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Fri Aug 20 22:09:30 2021 +0100

    Change disabled vertex attribute value to (0, 0, 0, 1) (#2573)

    This seems to be the default value when the vertex attribute is disabled, or components aren't defined.

    This fixes a regression from #2307 in SMO where a plant in the Wooded Kingdom would draw slightly differently in the depth prepass, leading to depth test failing later on.

    GDK has stated that the specific case in Gundam only expects x and y to be 0, and Vulkan's undefined value for z does appear to be 0 when a vertex attribute type does not have that component, hence the value (0, 0, 0, 1).

    This worked in Vulkan despite also providing an all 0s attribute due to the vertex attribute binding being R32Float, so the other values were undefined. It should be changed there separately.

commit bdc1f91a5b459a25cb74de9895d0136cf29d220d
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Fri Aug 20 21:52:09 2021 +0100

    Remove pool cache entries for incompatible overlapping textures (#2568)

    This greatly reduces memory usage in games that aggressively reuse memory without removing dead textures from the pool, such as the Xenoblade games, UE3 games, and to a lesser extent, UE4/unity games.

    This change stops memory usage from ballooning in xenoblade and some other games. It will also reduce texture view/dependency complexity in some games - for example in MK8D it will reduce the number of surface copies between lighting cubemaps generated for actors.

    There shouldn't be any performance impact from doing this, though the deletion and creation of textures could be improved by improving the OpenGL texture storage cache, which is very simple and limited right now. This will be improved in future.

    Another potential error has been fixed with the texture cache, which could prevent data loss when data is interchangably written to textures from both the GPU and CPU. It was possible that the dirty flag for a texture would be consumed without the data being synchronized on next use, due to the old overlap check. This check no longer consumes the dirty flag.

    Please test a bunch of games to make sure they still work, and there are no performance regressions.

commit e0af248e6f96efe7009915935407fc809eb774a9
Author: Alex Barney <thealexbarney@gmail.com>
Date:   Fri Aug 20 13:36:14 2021 -0700

    Clean the SD card save directory when opening the emulator (#2564)

    Cleans "sdcard:/Nintendo/save" and deletes "sdcard:/save" when opening the emulator.

    Works around invalid encryption when keys or the SD card encryption seed are changed.

commit 97aedc030d24bc5e32fa95a297155f2df2ecfcc2
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Fri Aug 20 18:59:39 2021 +0100

    Fix GetHandleInformation for mipmapped 3d textures (#2569)

    Got this the wrong way round - was causing games to try synchronize mipmap levels of like 52 on a 3d texture with 6 levels. Also, corrected the variable name in the method that _was_ working.

commit f2a7b300c471ee7ad9a925f1085ad86537f68154
Author: FICTURE7 <FICTURE7@gmail.com>
Date:   Fri Aug 20 21:42:00 2021 +0400

    Fix type mismatch in `BitwiseAnd` simplification (#2571)

    * Fix type mismatch in `BitwiseAnd` simplification

    `TryEliminateBitwiseAnd` would turn the `BitwiseAnd` operation into a
    copy of the wrong type. E.g:

    Before `Simplification`:
    ```llvm
    i64 %0 = BitwiseAnd i64 0x0, %1
    ```

    After `Simplication`:
    ```llvm
    i64 %0 = Copy i32 0x0
    ```

    Since the with the changes in #2515, we iterate in reverse order and
    `Simplication`, `ConstantFolding` does not indicate if it modified
    the CFG, the second pass to "retype" the copy into the proper
    destination type does not happen.

    This also blocked copy propagation since its destination type did not
    match with its source type. But in the cases I've seen, the
    `PreAllocator` would insert a copy for the propagated constant, which
    results in no diffs.

    Since the copy remained as is, asserts are fired when generating it.

    * Set PPTC version

commit 22b2cb39af00fb8881e908fd671fbf57a6e2db2a
Author: FICTURE7 <FICTURE7@gmail.com>
Date:   Tue Aug 17 22:08:34 2021 +0400

    Reduce JIT GC allocations (#2515)

    * Turn `MemoryOperand` into a struct

    * Remove `IntrinsicOperation`

    * Remove `PhiNode`

    * Remove `Node`

    * Turn `Operand` into a struct

    * Turn `Operation` into a struct

    * Clean up pool management methods

    * Add `Arena` allocator

    * Move `OperationHelper` to `Operation.Factory`

    * Move `OperandHelper` to `Operand.Factory`

    * Optimize `Operation` a bit

    * Fix `Arena` initialization

    * Rename `NativeList<T>` to `ArenaList<T>`

    * Reduce `Operand` size from 88 to 56 bytes

    * Reduce `Operation` size from 56 to 40 bytes

    * Add optimistic interning of Register & Constant operands

    * Optimize `RegisterUsage` pass a bit

    * Optimize `RemoveUnusedNodes` pass a bit

    Iterating in reverse-order allows killing dependency chains in a single
    pass.

    * Fix PPTC symbols

    * Optimize `BasicBlock` a bit

    Reduce allocations from `_successor` & `DominanceFrontiers`

    * Fix `Operation` resize

    * Make `Arena` expandable

    Change the arena allocator to be expandable by allocating in pages, with
    some of them being pooled. Currently 32 pages are pooled. An LRU removal
    mechanism should probably be added to it.

    Apparently MHR can allocate bitmaps large enough to exceed the 16MB
    limit for the type.

    * Move `Arena` & `ArenaList` to `Common`

    * Remove `ThreadStaticPool` & co

    * Add `PhiOperation`

    * Reduce `Operand` size from 56 from 48 bytes

    * Add linear-probing to `Operand` intern table

    * Optimize `HybridAllocator` a bit

    * Add `Allocators` class

    * Tune `ArenaAllocator` sizes

    * Add page removal mechanism to `ArenaAllocator`

    Remove pages which have not been used for more than 5s after each reset.

    I am on fence if this would be better using a Gen2 callback object like
    the one in System.Buffers.ArrayPool<T>, to trim the pool. Because right
    now if a large translation happens, the pages will be freed only after a
    reset. This reset may not happen for a while because no new translation
    is hit, but the arena base sizes are rather small.

    * Fix `OOM` when allocating larger than page size in `ArenaAllocator`

    Tweak resizing mechanism for Operand.Uses and Assignemnts.

    * Optimize `Optimizer` a bit

    * Optimize `Operand.Add<T>/Remove<T>` a bit

    * Clean up `PreAllocator`

    * Fix phi insertion order

    Reduce codegen diffs.

    * Fix code alignment

    * Use new heuristics for degree of parallelism

    * Suppress warnings

    * Address gdkchan's feedback

    Renamed `GetValue()` to `GetValueUnsafe()` to make it more clear that
    `Operand.Value` should usually not be modified directly.

    * Add fast path to `ArenaAllocator`

    * Assembly for `ArenaAllocator.Allocate(ulong)`:

      .L0:
        mov rax, [rcx+0x18]
        lea r8, [rax+rdx]
        cmp r8, [rcx+0x10]
        ja short .L2
      .L1:
        mov rdx, [rcx+8]
        add rax, [rdx+8]
        mov [rcx+0x18], r8
        ret
      .L2:
        jmp ArenaAllocator.AllocateSlow(UInt64)

      A few variable/field had to be changed to ulong so that RyuJIT avoids
      emitting zero-extends.

    * Implement a new heuristic to free pooled pages.

      If an arena is used often, it is more likely that its pages will be
      needed, so the pages are kept for longer (e.g: during PPTC rebuild or
      burst sof compilations). If is not used often, then it is more likely
      that its pages will not be needed (e.g: after PPTC rebuild or bursts
      of compilations).

    * Address riperiperi's feedback

    * Use `EqualityComparer<T>` in `IntrusiveList<T>`

    Avoids a potential GC hole in `Equals(T, T)`.

commit cd4530f29c6a4ffd1b023105350b0440fa63f47b
Author: Alex Barney <thealexbarney@gmail.com>
Date:   Tue Aug 17 10:46:52 2021 -0700

    Always use an all-zeros key for AES-XTS file systems (#2561)

commit 680d3ed198ba6211d8357e370f0d29f1b5e95c74
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Tue Aug 17 14:09:27 2021 -0300

    Enable transform feedback buffer flush (#2552)

commit dadc0e59daa89c4dd7f0c3356f302481a4e75e6d
Author: Alex Barney <thealexbarney@gmail.com>
Date:   Thu Aug 12 14:56:24 2021 -0700

    Update to LibHac 0.13.1 (#2475)

    * Update to LibHac 0.13.1

    * Recreate directories for indexed saves if they're missing on emulator start

commit 3977d1f72b8f091443018b68277044a840931054
Author: ooa113y <13thSlayer@gmail.com>
Date:   Thu Aug 12 23:48:15 2021 +0300

    Improve firmware install error due to outdated keys (#2541)

commit eb181425b16567eea5c67d696f9236f868b40e92
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Thu Aug 12 15:59:24 2021 -0300

    Fix size of cached compute shaders (#2548)

    * Fix size of cached compute shaders

    * Missed one

commit 8196086f7a61f39f6177e7988a371136c7301870
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Aug 11 22:13:48 2021 -0300

    Revert "Calculate vertex buffer sizes from index buffer (#1663)" (#2544)

    This reverts commit 10d649e6d3ad3e4af32d2b41e718bb0a2924da67.

commit 0ba4ade8f1fa64e2bda5c4b1e0c5b37e10d51c80
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Aug 11 19:44:41 2021 -0300

    Ensure render scale is initialized to 1 on the backend (#2543)

commit 3148c0c21cb45a92ff77344027757fb4808bb3cb
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Aug 11 18:56:59 2021 -0300

    Unify GpuAccessorBase and TextureDescriptorCapableGpuAccessor (#2542)

    * Unify GpuAccessorBase and TextureDescriptorCapableGpuAccessor

    * Shader cache version bump

commit d44d8f2eb6bb97f185add50e61443e79e8581123
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Aug 11 18:19:28 2021 -0300

    Workaround for cubemap view data upload bug on Intel (#2539)

    * Workaround for cubemap view data upload bug on Intel

    * Trigger CI

commit c3e2646f9e330633b0ed5e0038a976e33054a819
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Aug 11 18:01:06 2021 -0300

    Workaround for Intel FrontFacing built-in variable bug (#2540)

commit 0a80a837cb30402cad1f41293134edbaeeec6451
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Wed Aug 11 21:44:51 2021 +0100

    Use "Undesired" scale mode for certain textures rather than blacklisting (#2537)

    * Use "Undesired" scale mode for certain textures rather than blacklisting

    * Nit

    Co-authored-by: gdkchan <gab.dark.100@gmail.com>

    Co-authored-by: gdkchan <gab.dark.100@gmail.com>

commit ed754af8d5046d2fd7218c742521e38ab17cbcfe
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Aug 11 17:27:00 2021 -0300

    Make sure attributes used on subsequent shader stages are initialized (#2538)

commit 10d649e6d3ad3e4af32d2b41e718bb0a2924da67
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Aug 11 17:06:09 2021 -0300

    Calculate vertex buffer sizes from index buffer (#1663)

    * Calculate vertex buffer size from maximum index buffer index

    * Increase maximum index buffer count for it to be considered profitable for counting

commit bb8a920b63d6d287dba8ec42e298329b933f9654
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Aug 11 16:50:33 2021 -0300

    Do not dirty memory tracking region handles if they are partially unmapped (#2536)

commit 0f6ec446ea3be41b1c22aa5c3870bd7a6c595d1f
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Aug 11 16:33:43 2021 -0300

    Replace BGRA and scale uniforms with a uniform block (#2496)

    * Replace BGRA and scale uniforms with a uniform block

    * Setting the data again on program change is no longer needed

    * Optimize and resolve some warnings

    * Avoid redundant support buffer updates

    * Some optimizations to BindBuffers (now inlined)

    * Unify render scale arrays

commit b5b7e23fc41e7045f9e803d6926e98ec7d049f0c
Author: jduncanator <1518948+jduncanator@users.noreply.github.com>
Date:   Thu Aug 12 05:16:42 2021 +1000

    hle: Tidy-up ServiceNotImplementedException (#2535)

    * hle: Simplify ServiceNotImplementedException

    This removes the need to pass in whether the command is a Tipc command or a Hipc command to the exception constructor.

    * hle: Use the IPC Message type to determine command type

    This allows differentiating between Tipc and Hipc commands when invoking a handler that supports handling both Tipc and Hipc commands.

commit d9d18439f6900fd9f05bde41998526281f7638c5
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Aug 11 15:59:42 2021 -0300

    Use a new approach for shader BRX targets (#2532)

    * Use a new approach for shader BRX targets

    * Make shader cache actually work

    * Improve the shader pattern matching a bit

    * Extend LDC search to predecessor blocks, catches more cases

    * Nit

    * Only save the amount of constant buffer data actually used. Avoids crashes on partially mapped buffers

    * Ignore Rd on predicate instructions, as they do not have a Rd register (catches more cases)

commit 70f79e689bc947313aab11c41e59928ce43be517
Author: mpnico <mpnico@gmail.com>
Date:   Thu Aug 5 00:39:40 2021 +0200

    Implement vibrations (#2468)

    * First working vibration implementation

    * Fix Infinite Rumble in SDL2Mouse

    * Stop ignoring one vibValues every 2

    * Remove RumbleInfinity as suggested

    * Reworked all the vibration handle / calculation

    * Revert HidVibrationDevicePosition changes

    * Add UI to enable and tune rumble

    * Remove some stub logs

    * Add PlayerIndex in rumble debug log

    * Fix all requested changes

    * Implements hid::GetVibrationDeviceInfo

    * Better implements HidVibrationValue.Equals/GetHashCode

    * Added requested changes from code review

    * Last fixes from review

    * Update configuration file version for rebase

commit 46ffc81d90bd3a8f2d24c2997166d22f12ecbbb6
Author: ooa113y <13thSlayer@gmail.com>
Date:   Thu Aug 5 00:28:19 2021 +0300

    Hide UI rework/arrow key fix (#2504)

    * Unbreak arrow keys

    * Use bitshift for Flags instead of literal

commit 5ceaf344ce02931da897c943048b5e653050038b
Author: emmauss <emmausssss@gmail.com>
Date:   Wed Aug 4 21:08:33 2021 +0000

    Clamp controller sticks to circle, instead of square (#2493)

    * clamp controller sticks to circle, instead of square

    * fix deadzone

    * addressed comments

commit ff5df5d8a1fec6947f7feed3ec3ca0889cd892a5
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Aug 4 17:20:58 2021 -0300

    Support non-contiguous copies on I2M and DMA engines (#2473)

    * Support non-contiguous copies on I2M and DMA engines

    * Vector copy should start aligned on I2M

    * Nits

    * Zero extend the offset

commit ff8849671af5ac14fc9cc9d37da30f53d3f13d89
Author: Caian Benedicto <caianbene@gmail.com>
Date:   Wed Aug 4 17:05:17 2021 -0300

    Update TamperMachine and disable write-to-code prevention (#2506)

    * Enable write to memory and improve logging

    * Update tamper machine opcodes and improve reporting

    * Add Else support

    * Add missing private statement

commit a27986c31167d8ce60efcee7e901da241f63ed08
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Aug 4 15:28:33 2021 -0300

    Make audio disposal thread safe on all 3 backends (#2527)

    * Make audio disposal thread safe on all 3 backends

    * Make OpenAL more consistent with the other backends

    * Remove Window.Cursor = null, and change dummy TValue to byte

commit 06cd3abe6c5a8d86bf2473089c489415ce8c4573
Author: ooa113y <13thSlayer@gmail.com>
Date:   Sat Jul 24 21:48:00 2021 +0300

    Implement "hide UI" option (#2411)

    * Implement jduncanator method

    * Rename function/button ID

    * Move option to Actions menu (makes no sense while emulation is inactive...)

commit 8c7986eb58ec7130c1e3698cae02eb20ac52ab11
Author: emmauss <emmausssss@gmail.com>
Date:   Fri Jul 23 23:01:36 2021 +0000

    Ensure right joycon motion data is set (#2488)

    * motion fix

    * mirror motion data on right joycon in pair mode when using native motion source

    * fix

    * addressed comments

commit 4b60371e64601dba46387f8b7260b3deb770e097
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Mon Jul 19 23:10:54 2021 +0100

    Return mapped buffer pointer directly for flush, WriteableRegion for textures (#2494)

    * Return mapped buffer pointer directly for flush, WriteableRegion for textures

    A few changes here to generally improve performance, even for platforms not using the persistent buffer flush.

    - Texture and buffer flush now return a ReadOnlySpan<byte>. It's guaranteed that this span is pinned in memory, but it will be overwritten on the next flush from that thread, so it is expected that the data is used before calling again.
    - As a result, persistent mappings no longer copy to a new array - rather the persistent map is returned directly as a Span<>. A similar host array is used for the glGet flushes instead of allocating new arrays each time.
    - Texture flushes now do their layout conversion into a WriteableRegion when the texture is not MultiRange, which allows the flush to happen directly into guest memory rather than into a temporary span, then copied over. This avoids another copy when doing layout conversion.

    Overall, this saves 1 data copy for buffer flush, 1 copy for linear textures with matching source/target stride, and 2 copies for block textures or linear textures with mismatching strides.

    * Fix tests

    * Fix array pointer for Mesa/Intel path

    * Address some feedback

    * Update method for getting array pointer.

commit 10e17ab423ba84856eb7411ce04d38989ac70a58
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Sun Jul 18 15:45:50 2021 +0100

    Only use persistent buffers to flush on NVIDIA and Windows+AMD (#2489)

    It seems like this method of flushing data is much slower on Mesa drivers, and slightly slower on Intel Windows. Have not tested Intel Mesa, but I'm assuming it is the same as AMD.

    This also adds vendor detection for AMD on Unix, which counted as "Unknown" before.

commit b8ad676fb8cbe0a43617df41daaf284ab4421c75
Author: Mary <me@thog.eu>
Date:   Sun Jul 18 13:05:11 2021 +0200

    Amadeus: DSP code generation improvements (#2460)

    This improve RyuJIT codegen drastically on the DSP side.
    This may reduce CPU usage of the DSP thread quite a lot.

commit 97a21332071aceeef6f5035178a3523177570448
Author: Mary <me@thog.eu>
Date:   Sun Jul 18 12:49:39 2021 +0200

    shadertools: Prepare for new target Languages and APIs (#2465)

    * shadertools: Prepare for new target Langugaes and APIs

    This improves shader tools command line by adding support for target
    language and api.

    * Address gdkchan's comments

commit ca5ac37cd638222e7475ac8f632b878126f3462d
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Fri Jul 16 22:10:20 2021 +0100

    Flush buffers and texture data through a persistent mapped buffer. (#2481)

    * Use persistent buffers to flush texture data

    * Flush buffers via copy to persistent buffers.

    * Log error when timing out, small refactoring.

commit bb6fab200969531ff858de399879779de5aaeac0
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Jul 14 14:48:57 2021 -0300

    Ensure that DMA copy target textures are kept alive or flushed (#2478)

commit 96a070a9a76ee5809f1ed9e78c75606c6f803c6a
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Jul 14 14:27:22 2021 -0300

    Do not require texture and sampler pools being initialized (#2476)

commit 9d688e37d68dd88770ce0e4c7b133645ef7d0eec
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Jul 14 14:09:00 2021 -0300

    Close transfer memory properly on nvservices (#2477)

commit 208ba1dde2b9a4d31446ace2bba8f0d641d2e300
Author: Mary <1760003+Thog@users.noreply.github.com>
Date:   Tue Jul 13 16:48:54 2021 +0200

    Revert LibHac update

    Users are facing save destruction on failing extra data update apparently

commit 997380d48cb3b74e2438cee7fc3b017d6b59b714
Author: Alex Barney <thealexbarney@gmail.com>
Date:   Tue Jul 13 02:23:32 2021 -0700

    Fix the headless build since previous commit

commit 19afb3209c48db5f8e4b5f48f0faee925cd20d9f
Author: Alex Barney <thealexbarney@gmail.com>
Date:   Tue Jul 13 01:19:28 2021 -0700

    Update to LibHac 0.13.1 (#2328)

    Update the LibHac dependency to version 0.13.1. This brings a ton of improvements and changes such as:
    - Refactor `FsSrv` to match the official refactoring done in FS.
    - Change how the `Horizon` and `HorizonClient` classes are handled. Each client created represents a different process with its own process ID and client state.
    - Add FS access control to handle permissions for FS service method calls.
    - Add FS program registry to keep track of the program ID, location and permissions of each process.
    - Add FS program index map info manager to track the program IDs and indexes of multi-application programs.
    - Add all FS IPC interfaces.
    - Rewrite `Fs.Fsa` code to be more accurate.
    - Rewrite a lot of `FsSrv` code to be more accurate.
    - Extend directory save data to store `SaveDataExtraData`
    - Extend directory save data to lock the save directory to allow only one accessor at a time.
    - Improve waiting and retrying when encountering access issues in `LocalFileSystem` and `DirectorySaveDataFileSystem`.
    - More `IFileSystemProxy` methods should work now.
    - Probably a bunch more stuff.

    On the Ryujinx side:
    - Forward most `IFileSystemProxy` methods to LibHac.
    - Register programs and program index map info when launching an application.
    - Remove hacks and workarounds for missing LibHac functionality.
    - Recreate missing save data extra data found on emulator startup.
    - Create system save data that wasn't indexed correctly on an older LibHac version.

    `FsSrv` now enforces access control for each process. When a process tries to open a save data file system, FS reads the save's extra data to determine who the save owner is and if the caller has permission to open the save data. Previously-created save data did not have extra data created when the save was created.
    With access control checks in place, this means that processes with no permissions (most games) wouldn't be able to access their own save data. The extra data can be partially created from data in the save data indexer, which should be enough for access control purposes.

commit 04dce402ac94679c5439038be1c8ce090e7ad4cb
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Mon Jul 12 16:48:57 2021 -0300

    Implement a fast path for I2M transfers (#2467)

commit 9b08abc644c4afcb1b4eb59bfbe8057727ad9d70
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Mon Jul 12 16:20:33 2021 -0300

    Fix shader compilation on shaders that uses rectangle textures (#2471)

commit 40b21cc3c4d2622bbd4f88d43073341854d9a671
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Sun Jul 11 17:20:40 2021 -0300

    Separate GPU engines (part 2/2) (#2440)

    * 3D engine now uses DeviceState too, plus new state modification tracking

    * Remove old methods code

    * Remove GpuState and friends

    * Optimize DeviceState, force inline some functions

    * This change was not supposed to go in

    * Proper channel initialization

    * Optimize state read/write methods even more

    * Fix debug build

    * Do not dirty state if the write is redundant

    * The YControl register should dirty either the viewport or front face state too, to update the host origin

    * Avoid redundant vertex buffer updates

    * Move state and get rid of the Ryujinx.Graphics.Gpu.State namespace

    * Comments and nits

    * Fix rebase

    * PR feedback

    * Move changed = false to improve codegen

    * PR feedback

    * Carry RyuJIT a bit more

commit b5190f16810eb77388c861d1d1773e19644808db
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Sun Jul 11 16:24:31 2021 -0300

    Fix virtual memory allocation being out of range (#2464)

commit 0d841c8d5104a09d2733c0e78f6d5b7ebc8fee3e
Author: Ac_K <Acoustik666@gmail.com>
Date:   Sat Jul 10 23:37:29 2021 +0200

    am: Implement CreateApplicationAndRequestToStart (#2448)

    This PR implement `CreateApplicationAndRequestToStart` call, result code is checked by RE.
    Now we can restart a guest program by itself. This is needed by SSBU when you changes the game language.

    NOTE: This currently don't works using OpenAL backend due to another issue.

    Closes #2108

commit b1a9d17cf85ab8322aeb700ad28d58f0edf63d08
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Sat Jul 10 16:50:10 2021 -0300

    Fix GetWritableRegion write-back (#2456)

commit 59900d7f00b14681acfc7ef5e8d1e18d53664e1c
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Fri Jul 9 00:09:07 2021 -0300

    Unscale textureSize when resolution scaling is used (#2441)

    * Unscale textureSize when resolution scaling is used

    * Fix textureSize on compute

    * Flag texture size as needing res scale values too

commit b02719cf4173c0ca26e6d562424eba68965ce59c
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Jul 7 21:20:52 2021 -0300

    Flush UBO updates more frequently (#2407)

commit 8b44eb1c981d7106be37107755c7c71c3c3c0ce4
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Jul 7 20:56:06 2021 -0300

    Separate GPU engines and make state follow official docs (part 1/2) (#2422)

    * Use DeviceState for compute and i2m

    * Migrate 2D class, more comments

    * Migrate DMA copy engine

    * Remove now unused code

    * Replace GpuState by GpuAccessorState on GpuAcessor, since compute no longer has a GpuState

    * More comments

    * Add logging (disabled)

    * Add back i2m on 3D engine

commit 31cbd09a75a9d5f4814c3907a060e0961eb2bb15
Author: Mary <me@thog.eu>
Date:   Tue Jul 6 22:08:44 2021 +0200

    frontend: Add a SDL2 headless window (#2310)

commit d125fce3e8c780c042040ac8064155cd6751d353
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Tue Jul 6 16:20:06 2021 -0300

    Allow shader language and target API to be specified on the shader translator (#2402)

commit b0ac1ade7fcde04f15384a329a0eca5ae9ed5065
Author: emmauss <emmausssss@gmail.com>
Date:   Tue Jul 6 19:07:23 2021 +0000

    Add portable screenshot folder (#2447)

    * add portable screenshot folder

    * fix style

    Co-authored-by: Ac_K <Acoustik666@gmail.com>

    Co-authored-by: Ac_K <Acoustik666@gmail.com>

commit a6c2b5d6ec6d205a421e23b767ed9157c8296656
Author: Ac_K <Acoustik666@gmail.com>
Date:   Tue Jul 6 20:55:03 2021 +0200

    ui: Fixes GetShrinkedGamepadName (#2444)

    There is a wrong condition in `GetShrinkedGamepadName` which throw an oob if the controller name is equal to the checked value. It's now fixed and shoud closes #2442 .

commit 242e51c7f5da7f1bb044400332383c89ff379121
Author: Ac_K <Acoustik666@gmail.com>
Date:   Tue Jul 6 20:41:11 2021 +0200

    nifm: Fixes IsDynamicDnsEnabled not supported (#2443)

    For a strange reason `IPInterfaceProperties.IsDynamicDnsEnabled` returns a `PlatformNotSupported` exception in Linux.
    This PR fixes this issue with a `try/catch` and set the value to false. Closes #2415.

commit b72f7de4057b3dee8581e58295f2db5fc563d50c
Author: Ac_K <Acoustik666@gmail.com>
Date:   Tue Jul 6 20:17:06 2021 +0200

    aoc: Fixes some inconsistencies (#2434)

    * aoc: Fixes some inconsistencies

    This PR fixes an wrong returned value (introduced in #2414) which cause some DLC not recognized in some games like Super Robot War T.
    Additionnally to that, I've removed the EventHandle check too, because it could cause some issues, but sadly it doesn't do the job so I reverted the changes. It should fix Diablo III: Eternal Collection.

    * Fix loop

    * Revert TitleLanguage change

    * write only available ids

commit 091edcebb4492eb8f666ee561661f4b46026c0c9
Author: mpnico <mpnico@gmail.com>
Date:   Tue Jul 6 20:04:21 2021 +0200

    Command line argument -f doesn't toggle 'Start games in fullscreen mode' (#2424)

    Closes Ryujinx#2308

commit ddb8351375fe58e37072a9577e1341a2e7f437d2
Author: Billy Laws <blaws05@gmail.com>
Date:   Tue Jul 6 18:49:51 2021 +0100

    Implement 12.0.0 hwopus functions (#2410)

    Based off of my RE of 12.0.2 audio services, the newly added parameter can be safely ignored due to ryu not using fixed-size I/O buffers.

commit 94cc365b635b0c42f6443af724ff0cdcb7ab00a3
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Sat Jul 3 05:55:04 2021 +0100

    Honour copy dependencies when switching render target (#2433)

    * Honour copy dependencies when switching render target

    When switching from one render target to another, when both have a copy dependency to each other, a copy can be deferred on the second target when unbinding the first.

    Before, this would not be honoured before binding the new texture, so the copy would stay deferred until the render targets change again, at which point it would copy in old data and essentially clear all the draws done during that time.

    This change runs synchronize memory to make sure that copies are honoured. This can cause a redundant copy, but it's better than it breaking for now.

    This should fix miiedit on AMD/Intel GPUs on windows. May fix other games, or perhaps rare copy dependency bugs on NVIDIA too.

    * Address feedback

commit f4078ae2670bf98ce0e9e41359a3614ac802946f
Author: Ac_K <Acoustik666@gmail.com>
Date:   Tue Jun 29 22:52:17 2021 +0200

    aoc: Fix wrong check (#2427)

    This PR fixes a wrong check added in #2414 which made Pokémon crash.

commit 00ce9eea620652b97b4d3e8cd9218c6fccff8b1c
Author: Mary <me@thog.eu>
Date:   Tue Jun 29 19:37:13 2021 +0200

    Fix disposing of IPC sessions server at emulation stop (#2334)

commit fbb4019ed5c12c4a888c7b09db648ac595366896
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Tue Jun 29 14:32:02 2021 -0300

    Initial support for separate GPU address spaces (#2394)

    * Make GPU memory manager a member of GPU channel

    * Move physical memory instance to the memory manager, and the caches to the physical memory

    * PR feedback

commit 8cc872fb60ec1b825655ba8dba06cc978fcd7e66
Author: Ac_K <Acoustik666@gmail.com>
Date:   Tue Jun 29 18:57:06 2021 +0200

    aoc/am: Cleanup aoc service and stub am calls (#2414)

    * aoc/am: Cleanup aoc service and stub am calls

    This PR implement aoc call `GetAddOnContentListChangedEventWithProcessId` (Closes #2408) and `CreateContentsServiceManager`. Additionnally, a big cleanup (checked by RE on latest firmware) is made on the whole service. I've added `CountAddOnContent`, `ListAddOnContent` and `GetAddonContentBaseId` for games which require version `1.0.0-6.2.0` too.

    Am service call `ReportUserIsActive` is stubbed (checked by RE, closes #2413).

    Since some logic in the service (aoc) which handle the DLCs has been changed, it could be nice to have some testing to be sure there is no regression.

    * Remove wrong check

    * Addresses gdkchan feedback

    * Fix GetAddOnContentLostErrorCode

    * fix null pid in services

    * Add missing comment

    * remove leftover comment

commit 28618c58d7ee1ae63fc57deca791a64ab38b57af
Author: emmauss <emmausssss@gmail.com>
Date:   Mon Jun 28 20:09:43 2021 +0000

    Add Screenshot Feature (#2354)

    * Add internal screenshot  capabilities

    * update version notice

commit a79b39b91347816ea14677b58af738b70df03e9c
Author: Ac_K <Acoustik666@gmail.com>
Date:   Mon Jun 28 20:54:45 2021 +0200

    no name: Mii Editor applet support (#2419)

    * no name: Mii Editor applet support

    * addresses gdkchan feedback

    * Fix comment

    * Bypass MountCounter of MiiDatabaseManager

    * Fix GetSettingsPlatformRegion

    * Disable Applet Menu for unsupported firmwares

commit fefd4619a5347b4ef86314a4e17e1d6e63ced297
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Fri Jun 25 20:11:54 2021 -0300

    Add support for custom line widths (#2406)

commit 493648df312b7501b0560a3c94b2deffab2e99cf
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Fri Jun 25 19:56:03 2021 -0300

    Fix default value for unwritten shader outputs (#2412)

    * Fix shader default output values

    * Shader cache version bump

commit ed2f5ede0f8d8f58390745f5e237bbfea36397fe
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Thu Jun 24 19:54:50 2021 -0300

    Fix texture sampling with depth compare and LOD level or bias (#2404)

    * Fix texture sampling with depth compare and LOD level or bias

    * Shader cache version bump

    * nit: Sorting

commit eac659e37bf2ad8398a959c91f7b30017e4ad7f3
Author: Ac_K <Acoustik666@gmail.com>
Date:   Fri Jun 25 00:37:48 2021 +0200

    caps: Stubs GetAlbumFileList0AafeAruidDeprecated and GetAlbumFileList3AaeAruid (#2403)

    This PR stubs caps service call `GetAlbumFileList0AafeAruidDeprecated` and `GetAlbumFileList3AaeAruid` (Closes #2035, Closes #2401), both are checked by RE.
    This avoid using "ignore missing services" when you want to play World of Light in Super Smash Bros Ultimate.

commit 3359b0fd975fc2b53631dd7110254f4e2df02a12
Author: ooa113y <13thSlayer@gmail.com>
Date:   Thu Jun 24 03:21:52 2021 +0300

    Improve gameTable search (#2398)

    * Improve gameTable search

    * Remove useless split

    * Remove unneeded brackets

    * Simplify searchEqualFunc

    Co-authored-by: Ac_K <Acoustik666@gmail.com>

    * Remove leftovers (oops)

    Co-authored-by: Ac_K <Acoustik666@gmail.com>

commit 77aab9aca302bbe635d94750f57fb9a1ad910b74
Author: emmauss <emmausssss@gmail.com>
Date:   Thu Jun 24 00:09:08 2021 +0000

    Add Direct Mouse Support (#2374)

    * and direct mouse support

    * and direct mouse support

    * hide cursor if mouse enabled

    * add config

    * update docs

    * sorted usings

commit a10b2c5ff26886e9ffc6f19e3f0fe9505a503b2f
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Jun 23 20:51:41 2021 -0300

    Initial support for GPU channels (#2372)

    * Ground work for separate GPU channels

    * Rename TextureManager to TextureCache

    * Decouple texture bindings management from the texture cache

    * Rename BufferManager to BufferCache

    * Decouple buffer bindings management from the buffer cache

    * More comments and proper disposal

    * PR feedback

    * Force host state update on channel switch

    * Typo

    * PR feedback

    * Missing us…
dtlnor added a commit to dtlnor/Ryujinx that referenced this pull request Oct 1, 2021
commit d92fff541bf6fddadabf6ab628ddf8fec41cd52e
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Wed Sep 29 01:27:03 2021 +0100

    Replace CacheResourceWrite with more general "precise" write (#2684)

    * Replace CacheResourceWrite with more general "precise" write

    The goal of CacheResourceWrite was to notify GPU resources when they were modified directly, by looking up the modified address/size in a structure and calling a method on each resource. The downside of this is that each resource cache has to be queried individually, they all have to implement their own way to do this, and it can only signal to resources using the same PhysicalMemory instance.

    This PR adds the ability to signal a write as "precise" on the tracking, which signals a special handler (if present) which can be used to avoid unnecessary flush actions, or maybe even more. For buffers, precise writes specifically do not flush, and instead punch a hole in the modified range list to indicate that the data on GPU has been replaced.

    The downside is that precise actions must ignore the page protection bits and always signal - as they need to notify the target resource to ignore the sequence number optimization.

    I had to reintroduce the sequence number increment after I2M, as removing it was causing issues in rabbids kingdom battle. However - all resources modified by I2M are notified directly to lower their sequence number, so the problem is likely that another unrelated resource is not being properly updated. Thankfully, doing this does not affect performance in the games I tested.

    This should fix regressions from #2624. Test any games that were broken by that. (RF4, rabbids kingdom battle)

    I've also added a sequence number increment to ThreedClass.IncrementSyncpoint, as it seems to fix buffer corruption in OpenGL homebrew. (this was a regression from removing sequence number increment from constant buffer update - another unrelated resource thing)

    * Add tests.

    * Add XML docs for GpuRegionHandle

    * Skip UpdateProtection if only precise actions were called

    This allows precise actions to skip reprotection costs.

commit b6e093b0fce3dc4fe607a84f57e6406b5ab8e387
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Wed Sep 29 01:11:05 2021 +0100

    Force copy when auto-deleting a texture with dependencies (#2687)

    When a texture is deleted by falling to the bottom of the AutoDeleteCache, its data is flushed to preserve any GPU writes that occurred. This ensures that the data appears in any textures recreated in the future, but didn't account for a texture that already existed with a copy dependency.

    This change forces copy dependencies to complete if a texture falls out from from the AutoDeleteCache. (not removed via overlap, as that would be wasted effort)

    Fixes broken lighting caused by pausing in SMO's Metro Kingdom. May fix some other issues.

commit fd7567a6b56fcb82a52b85097582fc0a67038457
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Tue Sep 28 20:55:12 2021 -0300

    Only make render target 2D textures layered if needed (#2646)

    * Only make render target 2D textures layered if needed

    * Shader cache version bump

    * Ensure topology is updated on channel swap

commit 312be74861dae16311f4376e32195f8a4fd372c6
Author: FICTURE7 <FICTURE7@gmail.com>
Date:   Wed Sep 29 03:38:37 2021 +0400

    Optimize `HybridAllocator` (#2637)

    * Store constant `Operand`s in the `LocalInfo`

    Since the spill slot and register assigned is fixed, we can just store
    the `Operand` reference in the `LocalInfo` struct. This allows skipping
    hitting the intern-table for a look up.

    * Skip `Uses`/`Assignments` management

    Since the `HybridAllocator` is the last pass and we do not care about
    uses/assignments we can skip managing that when setting destinations or
    sources.

    * Make `GetLocalInfo` inlineable

    Also fix a possible issue where with numbered locals. See or-assignment
    operator in `SetVisited(local)` before patch.

    * Do not run `BlockPlacement` in LCQ

    With the host mapped memory manager, there is a lot less cold code to
    split from hot code. So disabling this in LCQ gives some extra
    throughput - where we need it.

    * Address Mou-Ikkai's feedback

    * Apply suggestions from code review

    Co-authored-by: VocalFan <45863583+Mou-Ikkai@users.noreply.github.com>

    * Move check to an assert

    Co-authored-by: VocalFan <45863583+Mou-Ikkai@users.noreply.github.com>

commit 1ae690ba2f407042456207d40e425f8b1f900863
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Wed Sep 29 00:21:30 2021 +0100

    Use normal memory store path for DC ZVA (#2693)

    Seems like this is used as an optimized way to clear memory in homebrew applications. Unfortunately, calling the software fallback method every 8 bytes was not very optimal.

    The existing EmitStore is used by passing in ZR as the register to get a 0 write.

commit 33dc4c9ce40165795da884eaa684f16e8b643799
Author: Ac_K <Acoustik666@gmail.com>
Date:   Wed Sep 29 01:03:35 2021 +0200

    clkrst: Stub/Implement IClkrstManager and IClkrstSession calls (#2692)

    This PR stubs and implements some clkrst call because they are used to overclock the Switch hardware and it's pointless in our case as we emulate the system.
    Everything was done checked by RE.

    Fixes #2686

commit f4f496cb48a59aae36e3252baa90396e1bfadd2e
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Tue Sep 28 19:43:40 2021 -0300

    NVDEC (H264): Use separate contexts per channel and decode frames in DTS order (#2671)

    * Use separate NVDEC contexts per channel (for FFMPEG)

    * Remove NVDEC -> VIC frame override hack

    * Add missing bottom_field_pic_order_in_frame_present_flag

    * Make FFMPEG logging static

    * nit: Remove empty lines

    * New FFMPEG decoding approach -- call h264_decode_frame directly, trim surface cache to reduce memory usage

    * Fix case

    * Silence warnings

    * PR feedback

    * Per-decoder rather than per-codec ownership of surfaces on the cache

commit 0d23504e30395ba20d1704da464b41f3fe539062
Author: FICTURE7 <FICTURE7@gmail.com>
Date:   Wed Sep 29 02:28:34 2021 +0400

    Fix PTC count table relocation patching (#2666)

    Fix an issue introduced in #2190 where by 2 different count table entry
    addresses were used for LCQ functions. E.g:

    ```asm
     .L1:
       mov rbp,COUNT_TABLE_0   ;; This gets an address.
       mov ebp,[rbp]
       lea esi,[rbp+1]
       mov rdi,COUNT_TABLE_1   ;; This gets another address.
       mov [rdi],esi
       cmp ebp,64h
       je near .L34
    ```

    This caused LCQ functions to not tier up when they're loaded from the
    PTC cache. This does not happen when they're freshly compiled.

    This PR fixes the issue by ensuring only a single counter is created per
    translation.

commit 79c854dd2e68f96f802bbb42568e4c52e31fc80e
Author: Ac_K <Acoustik666@gmail.com>
Date:   Wed Sep 29 00:10:10 2021 +0200

    irs: Stub some service calls (#2665)

    This PR stubs some irs service calls which are needed to get some games playable or at least bootable since we don't support IR data throught real JoyCon for now.

    - Stubs `IIrSensorServer` `StopImageProcessor`, `RunMomentProcessor`, `RunClusteringProcessor`, `RunImageTransferProcessor`, `GetImageTransferProcessorState`, `RunTeraPluginProcessor`. All calls are a bit checked by RE.

    Closes #2267, #2248, #2126

    Night Vision and SpyAlarm are now bootable (but still unplayable due to the lack of the IR data):

commit 83bdafccdab01322af8503e9ad21f52981e646c1
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Tue Sep 28 18:52:27 2021 -0300

    Share scales array for graphics and compute (#2653)

commit 405840a24b75ecd7cab52c1960464071e2bc9b81
Author: VocalFan <45863583+Mou-Ikkai@users.noreply.github.com>
Date:   Tue Sep 28 17:26:45 2021 -0400

    Quick README update for game compatibility. (#2694)

commit 7c5ead1c196d597384085cc9a609afdc89a43774
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Sun Sep 19 14:09:53 2021 +0100

    Fast path for Inline2Memory buffer write that skips write tracking (#2624)

    * Fast path for Inline2Memory buffer write

    This PR adds a method to PhysicalMemory that attempts to write all cached resources directly, so that memory tracking can be avoided. The goal of this is both to avoid flushing buffer data, and to avoid raising the sequence number when data is written, which causes buffer and texture handles to be re-checked.

    This currently only targets buffers, with a side check on textures that falls back to a tracked write if any exist within the target range. It's not expected to write textures from here - this is just a mechanism to protect us if someone does decide to do that. It's possible to add a fast path for this in future (and for ShaderCache, once that starts using tracking)

    The forced read before inline2memory begins has been skipped, as the data is fully written when the transfer is completed anyways. This allows us to flush on read in emergency situations, but still write the new data over the flushed data.

    Improves performance on Xenoblade 2 and DE, which was flushing buffer data on the GPU thread when trying to write compute data. May improve performance in other games that write SSBOs from compute, and update data in the same/nearby pages often.

    Super Smash Bros Ultimate should probably be tested to make sure the vertex explosions haven't returned, as I think that's what this AdvanceSequence was for.

    * ForceDirty before write, to make sure data does not flush over the new write

commit db97b1d7d20715f60647c422cca5ec769a5e8223
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Sun Sep 19 13:55:07 2021 +0100

    Implement and use an Interval Tree for the MultiRangeList (#2641)

    * Implement and use an Interval Tree for the MultiRangeList

    * Feedback

    * Address Feedback

    * Missed this somehow

commit f08a280adef015e9a9a0e9273b4edffeb1157f3a
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Sun Sep 19 09:38:39 2021 -0300

    Use shader subgroup extensions if shader ballot is not supported (#2627)

    * Use shader subgroup extensions if shader ballot is not supported

    * Shader cache version bump + cleanup

    * The type is still required on the table

commit 7379bc2f39557929f283a423fe7f4b7390d08261
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Sun Sep 19 13:22:26 2021 +0100

    Array based RangeList that caches Address/EndAddress (#2642)

    * Array based RangeList that caches Address/EndAddress

    In isolation, this was more than 2x faster than the RangeList that checks using the interface. In practice I'm seeing much better results than I expected. The array is used because checking it is slightly faster than using a list, which loses time to struct copies, but I still want that data locality.

    A method has been added to the list to update the cached end address, as some users of the RangeList currently modify it dynamically.

    Greatly improves performance in Super Mario Odyssey, Xenoblade and any other GPU limited games.

    * Address Feedback

commit b0af010247a2bc1d9af1fb1068d4fad0319ad216
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Sun Sep 19 13:03:05 2021 +0100

    Set texture/image bindings in place rather than allocating and passing an array (#2647)

    * Remove allocations for texture bindings and state

    * Rent rather than stackalloc + copy

    A bit faster.

commit 32c09af71a5bebdb711b175627e1e26370275d96
Author: Mary <mary@mary.zone>
Date:   Sun Sep 19 13:42:16 2021 +0200

    amadeus: Fix regression from #2654 on ListAudioDeviceName

commit 40d1acd1982705224413bc882f6ae25d4bf8ee1a
Author: Ac_K <Acoustik666@gmail.com>
Date:   Sun Sep 19 12:57:39 2021 +0200

    vi: Unify resolutions values and accurate implementation of them. (#2640)

    * vi: Unify resolutions values and accurate implementation of them.

    To continue what was made in #2618, I've REd `vi` service a bit. Now values and checks related to displays are more accurate.

    - `am`  GetDefaultDisplayResolution / GetDefaultDisplayResolutionChangeEvent have more informations on what the service does.
    - `vi:u/vi:m/vi:s` GetDisplayService are now accurate.
    - `IApplicationDisplay` GetRelayService, GetSystemDisplayService, GetManagerDisplayService, GetIndirectDisplayTransactionService, ListDisplays, OpenDisplay, OpenDefaultDisplay, CloseDisplay, GetDisplayResolution are now properly implemented.
    - Some other calls are cleaned or have extra checks accordingly to RE.

    Additionnaly, `IFriendService` have some wrong aligned things, and `pm:info` service placeholder was missing.

    * just use _openedDisplayInfo.Remove()

    * use context.Memory.Fill()

    * fix some casting

    * remove unneeded comment

    * cleanup

    * uses TryAdd

    * displayId > ulong

    * GetDisplayResolution > ulong

    * UL

commit e17eb7bfafdd95084baea8e9f3dc77ee3f755347
Author: Mary <me@thog.eu>
Date:   Sun Sep 19 12:29:19 2021 +0200

    amadeus: Update to REV10 (#2654)

    * amadeus: Update to REV10

    This implements all the changes made with REV10 on 13.0.0.

    * Address Ack's comment

    * Address gdkchan's comment

commit fe9d5a1981cfe43c4535b7473064c9858addb3b5
Author: mpnico <mpnico@gmail.com>
Date:   Sat Sep 18 14:31:44 2021 +0200

    Fix problems added by Pause (#2645)

    * Disable Pause/Resume menu instead of trying to hide them

    * Fix Resume menu being active before renderer starts

    * Fix emulator not being able to close properly

commit d327e809c9c9d1f4c035c50bf6315eea83ce0147
Author: Ac_K <Acoustik666@gmail.com>
Date:   Thu Sep 16 00:09:48 2021 +0200

    gui: Hotfix for FileChooserNative during section extraction (#2644)

    Fix a regression introduced in #2633, FileChooserNative parent can't be set to null because it's running in modal.

commit 843401635acb08686c745351666f86146de841b7
Author: MutantAura <44103205+MutantAura@users.noreply.github.com>
Date:   Wed Sep 15 01:26:10 2021 +0100

    Adjustments to framerate metric and addition of frametime (#2638)

    * Adjust framerate data and add frametime

    * Update PerformanceStatistics.cs

    * Revert deletions of average framerate

    * Update Ryujinx.csproj

    * Remove separate GTK column

    * Increase FPS precision

    * general cleanup

    * even generaler cleanup

    * fix dumb

    * Remove legacy code

    * Update PerformanceStatistics.cs

    * Update PerformanceStatistics.cs

commit fb2e61a435f6a9d2201d521df47c9bfd12031df3
Author: Michael Gielda <mgielda@antmicro.com>
Date:   Wed Sep 15 01:47:10 2021 +0200

    Add Linux Unicorn patch + desc. (#2609)

commit 5d08e9b495a2315e8a4758a8123466665085d044
Author: Ac_K <Acoustik666@gmail.com>
Date:   Wed Sep 15 01:24:49 2021 +0200

    hos: Cleanup the project (#2634)

    * hos: Cleanup the project

    Since a lot of changes has been done on the HOS project, there are some leftover here and there, or class just used in one service, things at wrong places, and more.
    This PR fixes that, additionnally to that, I've realigned some vars because I though it make the code more readable.

    * Address gdkchan feedback

    * addresses Thog feedback

    * Revert ElfSymbol

commit 3f2486342b3ef4610b6af6a52624614d2a7ad8ae
Author: Ac_K <Acoustik666@gmail.com>
Date:   Tue Sep 14 23:52:08 2021 +0200

    gui: Replace FileChooserDialog by FileChooserNative (#2633)

    We currently use the FileChooser from GTK, which is a bit mess. Instead of it we could use the native FileChooser from all specifics OS. This is what this PR attempt to fix.

    It could be nice to get a test under linux since I've only tested it under Windows without any issues.

    Fixes #2584

commit a9343c9364246d3288b4e7f20919ca1ad2e1fd3e
Author: FICTURE7 <FICTURE7@gmail.com>
Date:   Tue Sep 14 03:23:37 2021 +0400

    Refactor `PtcInfo` (#2625)

    * Refactor `PtcInfo`

    This change reduces the coupling of `PtcInfo` by moving relocation
    tracking to the backend. `RelocEntry`s remains as `RelocEntry`s through
    out the pipeline until it actually needs to be written to the PTC
    streams. Keeping this representation makes inspecting and manipulating
    relocations after compilations less painful. This is something I needed
    to do to patch relocations to 0 to diff dumps.

    Contributes to #1125.

    * Turn `Symbol` & `RelocInfo` into readonly structs

    * Add documentation to `CompiledFunction`

    * Remove `Compiler.Compile<T>`

    Remove `Compiler.Compile<T>` and replace it by `Map<T>` of the
    `CompiledFunction` returned.

commit ac4ec1a0151fd958d7ec58146169763b446836fe
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Sat Sep 11 17:54:18 2021 -0300

    Account for negative strides on DMA copy (#2623)

    * Account for negative strides on DMA copy

    * Should account for non-zero Y

commit 016fc64b3df8e039e62f3022139244061a00ec30
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Sat Sep 11 17:39:02 2021 -0300

    Implement GetVaRegions on nvservices (#2621)

    * Implement GetVaRegions on nvservices

    * This would just result in 0

commit a4089fc87871d795dea2723b8869ab22cf970085
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Sat Sep 11 17:24:10 2021 -0300

    Report 1080p resolution when in docked mode (#2618)

commit 117e32a6fffc30cdb895aa98483af7df353a8dd1
Author: mpnico <mpnico@gmail.com>
Date:   Sat Sep 11 22:08:25 2021 +0200

    Implement a "Pause Emulation" option & hotkey (#2428)

    * Add a "Pause Emulation" option and hotkey

    Closes Ryujinx#1604

    * Refactoring how pause is handled

    * Applied suggested changes from review

    * Applied suggested fixes

    * Pass correct suspend type to threads for suspend/resume

    * Fix NRE after stoping emulation

    * Removing SimulateWakeUpMessage call after resuming emulation

    * Skip suspending non game process

    * Pause the tickCounter in the ExecutionContext

    * Refactoring tickCounter pause/resume as suggested

    * Fix Config migration to add pause hotkey

    * Fixed pausing only application threads

    * Fix exiting emulator while paused

    * Avoid pause/resume while already paused/resumed

    * Cleanup unused code

    * Avoid restarting audio if stopping emulation while in pause.

    * Added suggested changes

    * Fix ConfigurationState

commit b0e410a828fd37bf0d9021fc2f6b630e3944a861
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Sat Sep 11 20:52:54 2021 +0100

    Lift textures in the AutoDeleteCache for all modifications. (#2615)

    * Lift textures in the AutoDeleteCache for all modifications.

    Before, this would only apply to render targets and texture blit. Now it applies to image stores, the fast dma copy path and any other type of modification.

    Image store always at least has one reference in the texture pool, so the function of the AutoDeleteCache keeping textures _alive_ is not useful, but a very important function for a while has been its use to flush textures in order of modification when they are dereferenced, so that their data is not lost.

    Before, textures populated using image stores were being dereferenced and reloaded as garbage. Now, when these textures are dereferenced, their data will be put back into memory, and everything stays intact.

    Fixes lighting breaking when switching levels in THPS1+2, and potentially some more UE4 games. I've tested a bunch more games for regressions and performance impact, but they all seem fine.

    * Lift copy srcTexture so that it doesn't remain referenceless

    * Perform lift before reference count change on unbind.

    It's important to lift on unbind as that is the moment the texture was truly last modified, but definitely not after releasing every single reference.

commit 197f5878027c5bbb89962217e061cfc1d1993b11
Author: Agustin Insua <Nistenf@users.noreply.github.com>
Date:   Sat Sep 11 16:32:36 2021 -0300

    Fix GTK3 mapping for single quote key (#2612)

commit bcbe6ef6cdbd22d22566cb17d5cd80f9cd15bbac
Author: Agustin Insua <Nistenf@users.noreply.github.com>
Date:   Sat Sep 11 16:16:48 2021 -0300

    Update game metadata when stopping emulation (#2610)

    * Update game metadata when stopping emulation

    * Fix formatting

commit 830d1f097d62ea4888d2d405f3c74b54948889aa
Author: bobhope <william.reindl01@gmail.com>
Date:   Sat Sep 11 14:59:11 2021 -0400

    Remove file error popup (#2547)

    * Added check to detect if application file is more than zero bytes long

    * Removed file error popup

    * Removed unnecessary usings

    * Added empty lines

commit f0b00c1ae92a2e8fdff9c532e7f2f79ad708b184
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Thu Sep 2 04:17:43 2021 +0100

    Fix TXQ for 3D textures. (#2613)

    * Fix TXQ for 3D textures.

    Assumes the texture is 3D if the component mask contains Z.

    This fixes a bug in UE4 games where parts of the map had garbage pointers to lighting voxels, as the lookup 3D texture was not being initialized. Most notable game is THPS1+2.

    May need another PR to keep image store data alive and properly flush it in order using the AutoDeleteCache.

    * Get sampler type for TextureSize from bound textures.

commit 142cededd4db2ff4f83a4833580d343a4f0a8cde
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Tue Aug 31 06:51:57 2021 +0100

    Implement Shader Instructions SUATOM and SURED (#2090)

    * Initial Implementation

    * Further improvements (no support for float/64-bit types)

    * Merge atomic and reduce instructions, add missing format switch

    * Fix rebase issues.

    * Not used.

    * Whoops. Fixed.

    * Partial implementation of inc/dec, cleanup and TODOs

    * Remove testing path

    * Address Feedback

commit 416dc8fde49f8eb42d47b1ab606028a5cabe8f90
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Mon Aug 30 14:02:40 2021 -0300

    Fix out-of-bounds shader thread shuffle (#2605)

    * Fix out-of-bounds shader thread shuffle

    * Shader cache version bump

commit 82cefc8dd3babb781d4b7229435e26911fb083dd
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Sun Aug 29 16:52:38 2021 -0300

    Handle indirect draw counts with non-zero draw starts properly (#2593)

commit 15e7fe3ac940a1768a25326e66683ad0f23127e0
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Sun Aug 29 20:22:13 2021 +0100

    Avoid deleting textures when their data does not overlap. (#2601)

    * Avoid deleting textures when their data does not overlap.

    It's possible that while two textures start and end addresses indicate an overlap, that the actual data contained within them is sparse due to a layer stride. One such possibility is array slices of a cubemap at different mip levels - they overlap on a whole, but the actual texture data fills the gaps between each other's layers rather than actually overlapping.

    This fixes issues with UE4 games having incorrect lighting (solid white screen or really dark shadows). There are still remaining issues with games that use the 3D texture prebaked lighting, such as THPS1+2.

    This PR also fixes a bug with TexturePool's resized texture handling where the base level in the descriptor was not considered.

    * AllRegions granularity for 3d textures is now by level rather than by slice.

    * Address feedback

commit 54adc5f9fb65f4b03bc28da5899d2413a84f66c2
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Sun Aug 29 20:03:41 2021 +0100

    Ensure that all threads wait for a read tracking action to complete. (#2597)

    * Lock around tracking action consume + execute. Not particularly fast.

    * Lock around preaction registration and use

    * Create a lock object

    * Nit

commit 76e8f9ac87c0164f4f09600dcf8b6a5b4d062bf5
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Fri Aug 27 21:08:30 2021 +0100

    Only reupload the texture scale array if it changes. (#2595)

    * Only reupload the texture scale array if it changes.

    Before, this would be called all the time if any shader needed a scale value. The cost of doing this has increased with threaded-gal, as the scale array is copied to a span pool, and it's was called on pretty much every draw sometimes.

    This improves GPU performance in games, scaled or not. Most affected game seems to be Xenoblade Chronicles: Definitive Edition.

    * Just use = instead of |=

commit ee1038e54255797a94b89091f4d59b77daad1a7b
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Thu Aug 26 20:44:47 2021 -0300

    Initial support for shader attribute indexing (#2546)

    * Initial support for shader attribute indexing

    * Support output indexing too, other improvements

    * Fix order

    * Address feedback

commit ec3e848d7998038ce22c41acdbf81032bf47991f
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Thu Aug 26 23:31:29 2021 +0100

    Add a Multithreading layer for the GAL, multi-thread shader compilation at runtime (#2501)

    * Initial Implementation

    About as fast as nvidia GL multithreading, can be improved with faster command queuing.

    * Struct based command list

    Speeds up a bit. Still a lot of time lost to resource copy.

    * Do shader init while the render thread is active.

    * Introduce circular span pool V1

    Ideally should be able to use structs instead of references for storing these spans on commands. Will try that next.

    * Refactor SpanRef some more

    Use a struct to represent SpanRef, rather than a reference.

    * Flush buffers on background thread

    * Use a span for UpdateRenderScale.

    Much faster than copying the array.

    * Calculate command size using reflection

    * WIP parallel shaders

    * Some minor optimisation

    * Only 2 max refs per command now.

    The command with 3 refs is gone. :relieved:

    * Don't cast on the GPU side

    * Remove redundant casts, force sync on window present

    * Fix Shader Cache

    * Fix host shader save.

    * Fixup to work with new renderer stuff

    * Make command Run static, use array of delegates as lookup

    Profile says this takes less time than the previous way.

    * Bring up to date

    * Add settings toggle. Fix Muiltithreading Off mode.

    * Fix warning.

    * Release tracking lock for flushes

    * Fix Conditional Render fast path with threaded gal

    * Make handle iteration safe when releasing the lock

    This is mostly temporary.

    * Attempt to set backend threading on driver

    Only really works on nvidia before launching a game.

    * Fix race condition with BufferModifiedRangeList, exceptions in tracking actions

    * Update buffer set commands

    * Some cleanup

    * Only use stutter workaround when using opengl renderer non-threaded

    * Add host-conditional reservation of counter events

    There has always been the possibility that conditional rendering could use a query object just as it is disposed by the counter queue. This change makes it so that when the host decides to use host conditional rendering, the query object is reserved so that it cannot be deleted. Counter events can optionally start reserved, as the threaded implementation can reserve them before the backend creates them, and there would otherwise be a short amount of time where the counter queue could dispose the event before a call to reserve it could be made.

    * Address Feedback

    * Make counter flush tracked again.

    Hopefully does not cause any issues this time.

    * Wait for FlushTo on the main queue thread.

    Currently assumes only one thread will want to FlushTo (in this case, the GPU thread)

    * Add SDL2 headless integration

    * Add HLE macro commands.

    Co-authored-by: Mary <mary@mary.zone>

commit 501c3d5cea6b96f991453cc6f8d395d358d0d4c3
Author: Mary <me@thog.eu>
Date:   Fri Aug 27 00:07:44 2021 +0200

    Implement MSR instruction for A32 (#2585)

    * Implement MSR instruction

    Fix #1342.

    Now Pocket Rumble is playable.

    * Address gdkchan's comments

    * Address gdkchan's comments

    * Address gdkchan's comment

commit 8e1adb95cf7f67b976f105f4cac26d3ff2986057
Author: mpnico <mpnico@gmail.com>
Date:   Thu Aug 26 23:50:28 2021 +0200

    Add support for HLE macros and accelerate MultiDrawElementsIndirectCount #2 (#2557)

    * Add support for HLE macros and accelerate MultiDrawElementsIndirectCount

    * Add missing barrier

    * Fix index buffer count

    * Add support check for each macro hle before use

    * Add missing xml doc

    Co-authored-by: gdkchan <gab.dark.100@gmail.com>

commit 5cab8ea4ad2388bd035150e79f241ae5df95ab3b
Author: VocalFan <45863583+Mou-Ikkai@users.noreply.github.com>
Date:   Thu Aug 26 17:34:24 2021 -0400

    Fix Unicorn Warnings (#2575)

commit 32cad88cc60793f9ee6c11d15e8b5b71c3d725a2
Author: Alex Barney <thealexbarney@gmail.com>
Date:   Thu Aug 26 14:18:49 2021 -0700

    Bugfix LibHac update to 0.13.3 and remove SD card workaround (#2579)

commit 686b63e4794b975f8bb3cc5e03b2c9063c4d045f
Author: VocalFan <45863583+Mou-Ikkai@users.noreply.github.com>
Date:   Thu Aug 26 17:03:19 2021 -0400

    Added fallbacks for all Audio Backends (#2582)

    * Added fallbacks for all Audio Backends

    * Commit Suggestion

    Co-authored-by: gdkchan <gab.dark.100@gmail.com>

    * Resolve elses.

    * Revert "Resolve elses."

    This reverts commit 9aec3e9e7750803e5626da3d01d7e57fabe19f65.

    * Suggestions completed

    Co-authored-by: gdkchan <gab.dark.100@gmail.com>

commit 5b8ceb917308378c535fb4cd2288b8f19524bea0
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Thu Aug 26 17:47:21 2021 -0300

    Swap BGR565 components by changing the format (#2577)

commit 6d9bc7cf90e8016feea97eedb3cdd562c4628026
Author: Mary <me@thog.eu>
Date:   Thu Aug 26 22:26:28 2021 +0200

    sdl2: Update to Ryujinx.SDL2-CS 2.0.17 (#2553)

    * sdl2: Update to Ryujinx.SDL2-CS 2.0.17

    Update to latest SDL2 commit

    * Update to Ryujinx.SDL2-CS 2.0.17-build18

commit 5e99bff7deb51ad6d69d2393bc267a8a9428058c
Author: Alex Barney <thealexbarney@gmail.com>
Date:   Fri Aug 20 16:03:17 2021 -0700

    Ignore exceptions when cleaning the SD card saves (#2576)

commit d753de6d5de17cfaf36bb5ecfeff0f0d60846171
Author: VocalFan <45863583+Mou-Ikkai@users.noreply.github.com>
Date:   Fri Aug 20 17:48:00 2021 -0400

    Seeing if there are any other spelling errors to correct. (#2572)

    * "Informations" -> "Information"

    * Your -> You

    * will use -> using (Plus more detailed Appveyor error msg.)

    * Did a dumb thing, fixed it.

commit c702943af3c7e9396b8fa86e3c1be3cb9339addc
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Fri Aug 20 18:26:25 2021 -0300

    Swap BGR components for 16-bit BGR texture formats (#2567)

commit 6c76bc3bc0ecd1d3a86cf4e8c396c71370274ba1
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Fri Aug 20 22:09:30 2021 +0100

    Change disabled vertex attribute value to (0, 0, 0, 1) (#2573)

    This seems to be the default value when the vertex attribute is disabled, or components aren't defined.

    This fixes a regression from #2307 in SMO where a plant in the Wooded Kingdom would draw slightly differently in the depth prepass, leading to depth test failing later on.

    GDK has stated that the specific case in Gundam only expects x and y to be 0, and Vulkan's undefined value for z does appear to be 0 when a vertex attribute type does not have that component, hence the value (0, 0, 0, 1).

    This worked in Vulkan despite also providing an all 0s attribute due to the vertex attribute binding being R32Float, so the other values were undefined. It should be changed there separately.

commit bdc1f91a5b459a25cb74de9895d0136cf29d220d
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Fri Aug 20 21:52:09 2021 +0100

    Remove pool cache entries for incompatible overlapping textures (#2568)

    This greatly reduces memory usage in games that aggressively reuse memory without removing dead textures from the pool, such as the Xenoblade games, UE3 games, and to a lesser extent, UE4/unity games.

    This change stops memory usage from ballooning in xenoblade and some other games. It will also reduce texture view/dependency complexity in some games - for example in MK8D it will reduce the number of surface copies between lighting cubemaps generated for actors.

    There shouldn't be any performance impact from doing this, though the deletion and creation of textures could be improved by improving the OpenGL texture storage cache, which is very simple and limited right now. This will be improved in future.

    Another potential error has been fixed with the texture cache, which could prevent data loss when data is interchangably written to textures from both the GPU and CPU. It was possible that the dirty flag for a texture would be consumed without the data being synchronized on next use, due to the old overlap check. This check no longer consumes the dirty flag.

    Please test a bunch of games to make sure they still work, and there are no performance regressions.

commit e0af248e6f96efe7009915935407fc809eb774a9
Author: Alex Barney <thealexbarney@gmail.com>
Date:   Fri Aug 20 13:36:14 2021 -0700

    Clean the SD card save directory when opening the emulator (#2564)

    Cleans "sdcard:/Nintendo/save" and deletes "sdcard:/save" when opening the emulator.

    Works around invalid encryption when keys or the SD card encryption seed are changed.

commit 97aedc030d24bc5e32fa95a297155f2df2ecfcc2
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Fri Aug 20 18:59:39 2021 +0100

    Fix GetHandleInformation for mipmapped 3d textures (#2569)

    Got this the wrong way round - was causing games to try synchronize mipmap levels of like 52 on a 3d texture with 6 levels. Also, corrected the variable name in the method that _was_ working.

commit f2a7b300c471ee7ad9a925f1085ad86537f68154
Author: FICTURE7 <FICTURE7@gmail.com>
Date:   Fri Aug 20 21:42:00 2021 +0400

    Fix type mismatch in `BitwiseAnd` simplification (#2571)

    * Fix type mismatch in `BitwiseAnd` simplification

    `TryEliminateBitwiseAnd` would turn the `BitwiseAnd` operation into a
    copy of the wrong type. E.g:

    Before `Simplification`:
    ```llvm
    i64 %0 = BitwiseAnd i64 0x0, %1
    ```

    After `Simplication`:
    ```llvm
    i64 %0 = Copy i32 0x0
    ```

    Since the with the changes in #2515, we iterate in reverse order and
    `Simplication`, `ConstantFolding` does not indicate if it modified
    the CFG, the second pass to "retype" the copy into the proper
    destination type does not happen.

    This also blocked copy propagation since its destination type did not
    match with its source type. But in the cases I've seen, the
    `PreAllocator` would insert a copy for the propagated constant, which
    results in no diffs.

    Since the copy remained as is, asserts are fired when generating it.

    * Set PPTC version

commit 22b2cb39af00fb8881e908fd671fbf57a6e2db2a
Author: FICTURE7 <FICTURE7@gmail.com>
Date:   Tue Aug 17 22:08:34 2021 +0400

    Reduce JIT GC allocations (#2515)

    * Turn `MemoryOperand` into a struct

    * Remove `IntrinsicOperation`

    * Remove `PhiNode`

    * Remove `Node`

    * Turn `Operand` into a struct

    * Turn `Operation` into a struct

    * Clean up pool management methods

    * Add `Arena` allocator

    * Move `OperationHelper` to `Operation.Factory`

    * Move `OperandHelper` to `Operand.Factory`

    * Optimize `Operation` a bit

    * Fix `Arena` initialization

    * Rename `NativeList<T>` to `ArenaList<T>`

    * Reduce `Operand` size from 88 to 56 bytes

    * Reduce `Operation` size from 56 to 40 bytes

    * Add optimistic interning of Register & Constant operands

    * Optimize `RegisterUsage` pass a bit

    * Optimize `RemoveUnusedNodes` pass a bit

    Iterating in reverse-order allows killing dependency chains in a single
    pass.

    * Fix PPTC symbols

    * Optimize `BasicBlock` a bit

    Reduce allocations from `_successor` & `DominanceFrontiers`

    * Fix `Operation` resize

    * Make `Arena` expandable

    Change the arena allocator to be expandable by allocating in pages, with
    some of them being pooled. Currently 32 pages are pooled. An LRU removal
    mechanism should probably be added to it.

    Apparently MHR can allocate bitmaps large enough to exceed the 16MB
    limit for the type.

    * Move `Arena` & `ArenaList` to `Common`

    * Remove `ThreadStaticPool` & co

    * Add `PhiOperation`

    * Reduce `Operand` size from 56 from 48 bytes

    * Add linear-probing to `Operand` intern table

    * Optimize `HybridAllocator` a bit

    * Add `Allocators` class

    * Tune `ArenaAllocator` sizes

    * Add page removal mechanism to `ArenaAllocator`

    Remove pages which have not been used for more than 5s after each reset.

    I am on fence if this would be better using a Gen2 callback object like
    the one in System.Buffers.ArrayPool<T>, to trim the pool. Because right
    now if a large translation happens, the pages will be freed only after a
    reset. This reset may not happen for a while because no new translation
    is hit, but the arena base sizes are rather small.

    * Fix `OOM` when allocating larger than page size in `ArenaAllocator`

    Tweak resizing mechanism for Operand.Uses and Assignemnts.

    * Optimize `Optimizer` a bit

    * Optimize `Operand.Add<T>/Remove<T>` a bit

    * Clean up `PreAllocator`

    * Fix phi insertion order

    Reduce codegen diffs.

    * Fix code alignment

    * Use new heuristics for degree of parallelism

    * Suppress warnings

    * Address gdkchan's feedback

    Renamed `GetValue()` to `GetValueUnsafe()` to make it more clear that
    `Operand.Value` should usually not be modified directly.

    * Add fast path to `ArenaAllocator`

    * Assembly for `ArenaAllocator.Allocate(ulong)`:

      .L0:
        mov rax, [rcx+0x18]
        lea r8, [rax+rdx]
        cmp r8, [rcx+0x10]
        ja short .L2
      .L1:
        mov rdx, [rcx+8]
        add rax, [rdx+8]
        mov [rcx+0x18], r8
        ret
      .L2:
        jmp ArenaAllocator.AllocateSlow(UInt64)

      A few variable/field had to be changed to ulong so that RyuJIT avoids
      emitting zero-extends.

    * Implement a new heuristic to free pooled pages.

      If an arena is used often, it is more likely that its pages will be
      needed, so the pages are kept for longer (e.g: during PPTC rebuild or
      burst sof compilations). If is not used often, then it is more likely
      that its pages will not be needed (e.g: after PPTC rebuild or bursts
      of compilations).

    * Address riperiperi's feedback

    * Use `EqualityComparer<T>` in `IntrusiveList<T>`

    Avoids a potential GC hole in `Equals(T, T)`.

commit cd4530f29c6a4ffd1b023105350b0440fa63f47b
Author: Alex Barney <thealexbarney@gmail.com>
Date:   Tue Aug 17 10:46:52 2021 -0700

    Always use an all-zeros key for AES-XTS file systems (#2561)

commit 680d3ed198ba6211d8357e370f0d29f1b5e95c74
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Tue Aug 17 14:09:27 2021 -0300

    Enable transform feedback buffer flush (#2552)

commit dadc0e59daa89c4dd7f0c3356f302481a4e75e6d
Author: Alex Barney <thealexbarney@gmail.com>
Date:   Thu Aug 12 14:56:24 2021 -0700

    Update to LibHac 0.13.1 (#2475)

    * Update to LibHac 0.13.1

    * Recreate directories for indexed saves if they're missing on emulator start

commit 3977d1f72b8f091443018b68277044a840931054
Author: ooa113y <13thSlayer@gmail.com>
Date:   Thu Aug 12 23:48:15 2021 +0300

    Improve firmware install error due to outdated keys (#2541)

commit eb181425b16567eea5c67d696f9236f868b40e92
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Thu Aug 12 15:59:24 2021 -0300

    Fix size of cached compute shaders (#2548)

    * Fix size of cached compute shaders

    * Missed one

commit 8196086f7a61f39f6177e7988a371136c7301870
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Aug 11 22:13:48 2021 -0300

    Revert "Calculate vertex buffer sizes from index buffer (#1663)" (#2544)

    This reverts commit 10d649e6d3ad3e4af32d2b41e718bb0a2924da67.

commit 0ba4ade8f1fa64e2bda5c4b1e0c5b37e10d51c80
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Aug 11 19:44:41 2021 -0300

    Ensure render scale is initialized to 1 on the backend (#2543)

commit 3148c0c21cb45a92ff77344027757fb4808bb3cb
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Aug 11 18:56:59 2021 -0300

    Unify GpuAccessorBase and TextureDescriptorCapableGpuAccessor (#2542)

    * Unify GpuAccessorBase and TextureDescriptorCapableGpuAccessor

    * Shader cache version bump

commit d44d8f2eb6bb97f185add50e61443e79e8581123
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Aug 11 18:19:28 2021 -0300

    Workaround for cubemap view data upload bug on Intel (#2539)

    * Workaround for cubemap view data upload bug on Intel

    * Trigger CI

commit c3e2646f9e330633b0ed5e0038a976e33054a819
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Aug 11 18:01:06 2021 -0300

    Workaround for Intel FrontFacing built-in variable bug (#2540)

commit 0a80a837cb30402cad1f41293134edbaeeec6451
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Wed Aug 11 21:44:51 2021 +0100

    Use "Undesired" scale mode for certain textures rather than blacklisting (#2537)

    * Use "Undesired" scale mode for certain textures rather than blacklisting

    * Nit

    Co-authored-by: gdkchan <gab.dark.100@gmail.com>

    Co-authored-by: gdkchan <gab.dark.100@gmail.com>

commit ed754af8d5046d2fd7218c742521e38ab17cbcfe
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Aug 11 17:27:00 2021 -0300

    Make sure attributes used on subsequent shader stages are initialized (#2538)

commit 10d649e6d3ad3e4af32d2b41e718bb0a2924da67
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Aug 11 17:06:09 2021 -0300

    Calculate vertex buffer sizes from index buffer (#1663)

    * Calculate vertex buffer size from maximum index buffer index

    * Increase maximum index buffer count for it to be considered profitable for counting

commit bb8a920b63d6d287dba8ec42e298329b933f9654
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Aug 11 16:50:33 2021 -0300

    Do not dirty memory tracking region handles if they are partially unmapped (#2536)

commit 0f6ec446ea3be41b1c22aa5c3870bd7a6c595d1f
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Aug 11 16:33:43 2021 -0300

    Replace BGRA and scale uniforms with a uniform block (#2496)

    * Replace BGRA and scale uniforms with a uniform block

    * Setting the data again on program change is no longer needed

    * Optimize and resolve some warnings

    * Avoid redundant support buffer updates

    * Some optimizations to BindBuffers (now inlined)

    * Unify render scale arrays

commit b5b7e23fc41e7045f9e803d6926e98ec7d049f0c
Author: jduncanator <1518948+jduncanator@users.noreply.github.com>
Date:   Thu Aug 12 05:16:42 2021 +1000

    hle: Tidy-up ServiceNotImplementedException (#2535)

    * hle: Simplify ServiceNotImplementedException

    This removes the need to pass in whether the command is a Tipc command or a Hipc command to the exception constructor.

    * hle: Use the IPC Message type to determine command type

    This allows differentiating between Tipc and Hipc commands when invoking a handler that supports handling both Tipc and Hipc commands.

commit d9d18439f6900fd9f05bde41998526281f7638c5
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Aug 11 15:59:42 2021 -0300

    Use a new approach for shader BRX targets (#2532)

    * Use a new approach for shader BRX targets

    * Make shader cache actually work

    * Improve the shader pattern matching a bit

    * Extend LDC search to predecessor blocks, catches more cases

    * Nit

    * Only save the amount of constant buffer data actually used. Avoids crashes on partially mapped buffers

    * Ignore Rd on predicate instructions, as they do not have a Rd register (catches more cases)

commit 70f79e689bc947313aab11c41e59928ce43be517
Author: mpnico <mpnico@gmail.com>
Date:   Thu Aug 5 00:39:40 2021 +0200

    Implement vibrations (#2468)

    * First working vibration implementation

    * Fix Infinite Rumble in SDL2Mouse

    * Stop ignoring one vibValues every 2

    * Remove RumbleInfinity as suggested

    * Reworked all the vibration handle / calculation

    * Revert HidVibrationDevicePosition changes

    * Add UI to enable and tune rumble

    * Remove some stub logs

    * Add PlayerIndex in rumble debug log

    * Fix all requested changes

    * Implements hid::GetVibrationDeviceInfo

    * Better implements HidVibrationValue.Equals/GetHashCode

    * Added requested changes from code review

    * Last fixes from review

    * Update configuration file version for rebase

commit 46ffc81d90bd3a8f2d24c2997166d22f12ecbbb6
Author: ooa113y <13thSlayer@gmail.com>
Date:   Thu Aug 5 00:28:19 2021 +0300

    Hide UI rework/arrow key fix (#2504)

    * Unbreak arrow keys

    * Use bitshift for Flags instead of literal

commit 5ceaf344ce02931da897c943048b5e653050038b
Author: emmauss <emmausssss@gmail.com>
Date:   Wed Aug 4 21:08:33 2021 +0000

    Clamp controller sticks to circle, instead of square (#2493)

    * clamp controller sticks to circle, instead of square

    * fix deadzone

    * addressed comments

commit ff5df5d8a1fec6947f7feed3ec3ca0889cd892a5
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Aug 4 17:20:58 2021 -0300

    Support non-contiguous copies on I2M and DMA engines (#2473)

    * Support non-contiguous copies on I2M and DMA engines

    * Vector copy should start aligned on I2M

    * Nits

    * Zero extend the offset

commit ff8849671af5ac14fc9cc9d37da30f53d3f13d89
Author: Caian Benedicto <caianbene@gmail.com>
Date:   Wed Aug 4 17:05:17 2021 -0300

    Update TamperMachine and disable write-to-code prevention (#2506)

    * Enable write to memory and improve logging

    * Update tamper machine opcodes and improve reporting

    * Add Else support

    * Add missing private statement

commit a27986c31167d8ce60efcee7e901da241f63ed08
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Aug 4 15:28:33 2021 -0300

    Make audio disposal thread safe on all 3 backends (#2527)

    * Make audio disposal thread safe on all 3 backends

    * Make OpenAL more consistent with the other backends

    * Remove Window.Cursor = null, and change dummy TValue to byte

commit 06cd3abe6c5a8d86bf2473089c489415ce8c4573
Author: ooa113y <13thSlayer@gmail.com>
Date:   Sat Jul 24 21:48:00 2021 +0300

    Implement "hide UI" option (#2411)

    * Implement jduncanator method

    * Rename function/button ID

    * Move option to Actions menu (makes no sense while emulation is inactive...)

commit 8c7986eb58ec7130c1e3698cae02eb20ac52ab11
Author: emmauss <emmausssss@gmail.com>
Date:   Fri Jul 23 23:01:36 2021 +0000

    Ensure right joycon motion data is set (#2488)

    * motion fix

    * mirror motion data on right joycon in pair mode when using native motion source

    * fix

    * addressed comments

commit 4b60371e64601dba46387f8b7260b3deb770e097
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Mon Jul 19 23:10:54 2021 +0100

    Return mapped buffer pointer directly for flush, WriteableRegion for textures (#2494)

    * Return mapped buffer pointer directly for flush, WriteableRegion for textures

    A few changes here to generally improve performance, even for platforms not using the persistent buffer flush.

    - Texture and buffer flush now return a ReadOnlySpan<byte>. It's guaranteed that this span is pinned in memory, but it will be overwritten on the next flush from that thread, so it is expected that the data is used before calling again.
    - As a result, persistent mappings no longer copy to a new array - rather the persistent map is returned directly as a Span<>. A similar host array is used for the glGet flushes instead of allocating new arrays each time.
    - Texture flushes now do their layout conversion into a WriteableRegion when the texture is not MultiRange, which allows the flush to happen directly into guest memory rather than into a temporary span, then copied over. This avoids another copy when doing layout conversion.

    Overall, this saves 1 data copy for buffer flush, 1 copy for linear textures with matching source/target stride, and 2 copies for block textures or linear textures with mismatching strides.

    * Fix tests

    * Fix array pointer for Mesa/Intel path

    * Address some feedback

    * Update method for getting array pointer.

commit 10e17ab423ba84856eb7411ce04d38989ac70a58
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Sun Jul 18 15:45:50 2021 +0100

    Only use persistent buffers to flush on NVIDIA and Windows+AMD (#2489)

    It seems like this method of flushing data is much slower on Mesa drivers, and slightly slower on Intel Windows. Have not tested Intel Mesa, but I'm assuming it is the same as AMD.

    This also adds vendor detection for AMD on Unix, which counted as "Unknown" before.

commit b8ad676fb8cbe0a43617df41daaf284ab4421c75
Author: Mary <me@thog.eu>
Date:   Sun Jul 18 13:05:11 2021 +0200

    Amadeus: DSP code generation improvements (#2460)

    This improve RyuJIT codegen drastically on the DSP side.
    This may reduce CPU usage of the DSP thread quite a lot.

commit 97a21332071aceeef6f5035178a3523177570448
Author: Mary <me@thog.eu>
Date:   Sun Jul 18 12:49:39 2021 +0200

    shadertools: Prepare for new target Languages and APIs (#2465)

    * shadertools: Prepare for new target Langugaes and APIs

    This improves shader tools command line by adding support for target
    language and api.

    * Address gdkchan's comments

commit ca5ac37cd638222e7475ac8f632b878126f3462d
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Fri Jul 16 22:10:20 2021 +0100

    Flush buffers and texture data through a persistent mapped buffer. (#2481)

    * Use persistent buffers to flush texture data

    * Flush buffers via copy to persistent buffers.

    * Log error when timing out, small refactoring.

commit bb6fab200969531ff858de399879779de5aaeac0
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Jul 14 14:48:57 2021 -0300

    Ensure that DMA copy target textures are kept alive or flushed (#2478)

commit 96a070a9a76ee5809f1ed9e78c75606c6f803c6a
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Jul 14 14:27:22 2021 -0300

    Do not require texture and sampler pools being initialized (#2476)

commit 9d688e37d68dd88770ce0e4c7b133645ef7d0eec
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Jul 14 14:09:00 2021 -0300

    Close transfer memory properly on nvservices (#2477)

commit 208ba1dde2b9a4d31446ace2bba8f0d641d2e300
Author: Mary <1760003+Thog@users.noreply.github.com>
Date:   Tue Jul 13 16:48:54 2021 +0200

    Revert LibHac update

    Users are facing save destruction on failing extra data update apparently

commit 997380d48cb3b74e2438cee7fc3b017d6b59b714
Author: Alex Barney <thealexbarney@gmail.com>
Date:   Tue Jul 13 02:23:32 2021 -0700

    Fix the headless build since previous commit

commit 19afb3209c48db5f8e4b5f48f0faee925cd20d9f
Author: Alex Barney <thealexbarney@gmail.com>
Date:   Tue Jul 13 01:19:28 2021 -0700

    Update to LibHac 0.13.1 (#2328)

    Update the LibHac dependency to version 0.13.1. This brings a ton of improvements and changes such as:
    - Refactor `FsSrv` to match the official refactoring done in FS.
    - Change how the `Horizon` and `HorizonClient` classes are handled. Each client created represents a different process with its own process ID and client state.
    - Add FS access control to handle permissions for FS service method calls.
    - Add FS program registry to keep track of the program ID, location and permissions of each process.
    - Add FS program index map info manager to track the program IDs and indexes of multi-application programs.
    - Add all FS IPC interfaces.
    - Rewrite `Fs.Fsa` code to be more accurate.
    - Rewrite a lot of `FsSrv` code to be more accurate.
    - Extend directory save data to store `SaveDataExtraData`
    - Extend directory save data to lock the save directory to allow only one accessor at a time.
    - Improve waiting and retrying when encountering access issues in `LocalFileSystem` and `DirectorySaveDataFileSystem`.
    - More `IFileSystemProxy` methods should work now.
    - Probably a bunch more stuff.

    On the Ryujinx side:
    - Forward most `IFileSystemProxy` methods to LibHac.
    - Register programs and program index map info when launching an application.
    - Remove hacks and workarounds for missing LibHac functionality.
    - Recreate missing save data extra data found on emulator startup.
    - Create system save data that wasn't indexed correctly on an older LibHac version.

    `FsSrv` now enforces access control for each process. When a process tries to open a save data file system, FS reads the save's extra data to determine who the save owner is and if the caller has permission to open the save data. Previously-created save data did not have extra data created when the save was created.
    With access control checks in place, this means that processes with no permissions (most games) wouldn't be able to access their own save data. The extra data can be partially created from data in the save data indexer, which should be enough for access control purposes.

commit 04dce402ac94679c5439038be1c8ce090e7ad4cb
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Mon Jul 12 16:48:57 2021 -0300

    Implement a fast path for I2M transfers (#2467)

commit 9b08abc644c4afcb1b4eb59bfbe8057727ad9d70
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Mon Jul 12 16:20:33 2021 -0300

    Fix shader compilation on shaders that uses rectangle textures (#2471)

commit 40b21cc3c4d2622bbd4f88d43073341854d9a671
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Sun Jul 11 17:20:40 2021 -0300

    Separate GPU engines (part 2/2) (#2440)

    * 3D engine now uses DeviceState too, plus new state modification tracking

    * Remove old methods code

    * Remove GpuState and friends

    * Optimize DeviceState, force inline some functions

    * This change was not supposed to go in

    * Proper channel initialization

    * Optimize state read/write methods even more

    * Fix debug build

    * Do not dirty state if the write is redundant

    * The YControl register should dirty either the viewport or front face state too, to update the host origin

    * Avoid redundant vertex buffer updates

    * Move state and get rid of the Ryujinx.Graphics.Gpu.State namespace

    * Comments and nits

    * Fix rebase

    * PR feedback

    * Move changed = false to improve codegen

    * PR feedback

    * Carry RyuJIT a bit more

commit b5190f16810eb77388c861d1d1773e19644808db
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Sun Jul 11 16:24:31 2021 -0300

    Fix virtual memory allocation being out of range (#2464)

commit 0d841c8d5104a09d2733c0e78f6d5b7ebc8fee3e
Author: Ac_K <Acoustik666@gmail.com>
Date:   Sat Jul 10 23:37:29 2021 +0200

    am: Implement CreateApplicationAndRequestToStart (#2448)

    This PR implement `CreateApplicationAndRequestToStart` call, result code is checked by RE.
    Now we can restart a guest program by itself. This is needed by SSBU when you changes the game language.

    NOTE: This currently don't works using OpenAL backend due to another issue.

    Closes #2108

commit b1a9d17cf85ab8322aeb700ad28d58f0edf63d08
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Sat Jul 10 16:50:10 2021 -0300

    Fix GetWritableRegion write-back (#2456)

commit 59900d7f00b14681acfc7ef5e8d1e18d53664e1c
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Fri Jul 9 00:09:07 2021 -0300

    Unscale textureSize when resolution scaling is used (#2441)

    * Unscale textureSize when resolution scaling is used

    * Fix textureSize on compute

    * Flag texture size as needing res scale values too

commit b02719cf4173c0ca26e6d562424eba68965ce59c
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Jul 7 21:20:52 2021 -0300

    Flush UBO updates more frequently (#2407)

commit 8b44eb1c981d7106be37107755c7c71c3c3c0ce4
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Jul 7 20:56:06 2021 -0300

    Separate GPU engines and make state follow official docs (part 1/2) (#2422)

    * Use DeviceState for compute and i2m

    * Migrate 2D class, more comments

    * Migrate DMA copy engine

    * Remove now unused code

    * Replace GpuState by GpuAccessorState on GpuAcessor, since compute no longer has a GpuState

    * More comments

    * Add logging (disabled)

    * Add back i2m on 3D engine

commit 31cbd09a75a9d5f4814c3907a060e0961eb2bb15
Author: Mary <me@thog.eu>
Date:   Tue Jul 6 22:08:44 2021 +0200

    frontend: Add a SDL2 headless window (#2310)

commit d125fce3e8c780c042040ac8064155cd6751d353
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Tue Jul 6 16:20:06 2021 -0300

    Allow shader language and target API to be specified on the shader translator (#2402)

commit b0ac1ade7fcde04f15384a329a0eca5ae9ed5065
Author: emmauss <emmausssss@gmail.com>
Date:   Tue Jul 6 19:07:23 2021 +0000

    Add portable screenshot folder (#2447)

    * add portable screenshot folder

    * fix style

    Co-authored-by: Ac_K <Acoustik666@gmail.com>

    Co-authored-by: Ac_K <Acoustik666@gmail.com>

commit a6c2b5d6ec6d205a421e23b767ed9157c8296656
Author: Ac_K <Acoustik666@gmail.com>
Date:   Tue Jul 6 20:55:03 2021 +0200

    ui: Fixes GetShrinkedGamepadName (#2444)

    There is a wrong condition in `GetShrinkedGamepadName` which throw an oob if the controller name is equal to the checked value. It's now fixed and shoud closes #2442 .

commit 242e51c7f5da7f1bb044400332383c89ff379121
Author: Ac_K <Acoustik666@gmail.com>
Date:   Tue Jul 6 20:41:11 2021 +0200

    nifm: Fixes IsDynamicDnsEnabled not supported (#2443)

    For a strange reason `IPInterfaceProperties.IsDynamicDnsEnabled` returns a `PlatformNotSupported` exception in Linux.
    This PR fixes this issue with a `try/catch` and set the value to false. Closes #2415.

commit b72f7de4057b3dee8581e58295f2db5fc563d50c
Author: Ac_K <Acoustik666@gmail.com>
Date:   Tue Jul 6 20:17:06 2021 +0200

    aoc: Fixes some inconsistencies (#2434)

    * aoc: Fixes some inconsistencies

    This PR fixes an wrong returned value (introduced in #2414) which cause some DLC not recognized in some games like Super Robot War T.
    Additionnally to that, I've removed the EventHandle check too, because it could cause some issues, but sadly it doesn't do the job so I reverted the changes. It should fix Diablo III: Eternal Collection.

    * Fix loop

    * Revert TitleLanguage change

    * write only available ids

commit 091edcebb4492eb8f666ee561661f4b46026c0c9
Author: mpnico <mpnico@gmail.com>
Date:   Tue Jul 6 20:04:21 2021 +0200

    Command line argument -f doesn't toggle 'Start games in fullscreen mode' (#2424)

    Closes Ryujinx#2308

commit ddb8351375fe58e37072a9577e1341a2e7f437d2
Author: Billy Laws <blaws05@gmail.com>
Date:   Tue Jul 6 18:49:51 2021 +0100

    Implement 12.0.0 hwopus functions (#2410)

    Based off of my RE of 12.0.2 audio services, the newly added parameter can be safely ignored due to ryu not using fixed-size I/O buffers.

commit 94cc365b635b0c42f6443af724ff0cdcb7ab00a3
Author: riperiperi <rhy3756547@hotmail.com>
Date:   Sat Jul 3 05:55:04 2021 +0100

    Honour copy dependencies when switching render target (#2433)

    * Honour copy dependencies when switching render target

    When switching from one render target to another, when both have a copy dependency to each other, a copy can be deferred on the second target when unbinding the first.

    Before, this would not be honoured before binding the new texture, so the copy would stay deferred until the render targets change again, at which point it would copy in old data and essentially clear all the draws done during that time.

    This change runs synchronize memory to make sure that copies are honoured. This can cause a redundant copy, but it's better than it breaking for now.

    This should fix miiedit on AMD/Intel GPUs on windows. May fix other games, or perhaps rare copy dependency bugs on NVIDIA too.

    * Address feedback

commit f4078ae2670bf98ce0e9e41359a3614ac802946f
Author: Ac_K <Acoustik666@gmail.com>
Date:   Tue Jun 29 22:52:17 2021 +0200

    aoc: Fix wrong check (#2427)

    This PR fixes a wrong check added in #2414 which made Pokémon crash.

commit 00ce9eea620652b97b4d3e8cd9218c6fccff8b1c
Author: Mary <me@thog.eu>
Date:   Tue Jun 29 19:37:13 2021 +0200

    Fix disposing of IPC sessions server at emulation stop (#2334)

commit fbb4019ed5c12c4a888c7b09db648ac595366896
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Tue Jun 29 14:32:02 2021 -0300

    Initial support for separate GPU address spaces (#2394)

    * Make GPU memory manager a member of GPU channel

    * Move physical memory instance to the memory manager, and the caches to the physical memory

    * PR feedback

commit 8cc872fb60ec1b825655ba8dba06cc978fcd7e66
Author: Ac_K <Acoustik666@gmail.com>
Date:   Tue Jun 29 18:57:06 2021 +0200

    aoc/am: Cleanup aoc service and stub am calls (#2414)

    * aoc/am: Cleanup aoc service and stub am calls

    This PR implement aoc call `GetAddOnContentListChangedEventWithProcessId` (Closes #2408) and `CreateContentsServiceManager`. Additionnally, a big cleanup (checked by RE on latest firmware) is made on the whole service. I've added `CountAddOnContent`, `ListAddOnContent` and `GetAddonContentBaseId` for games which require version `1.0.0-6.2.0` too.

    Am service call `ReportUserIsActive` is stubbed (checked by RE, closes #2413).

    Since some logic in the service (aoc) which handle the DLCs has been changed, it could be nice to have some testing to be sure there is no regression.

    * Remove wrong check

    * Addresses gdkchan feedback

    * Fix GetAddOnContentLostErrorCode

    * fix null pid in services

    * Add missing comment

    * remove leftover comment

commit 28618c58d7ee1ae63fc57deca791a64ab38b57af
Author: emmauss <emmausssss@gmail.com>
Date:   Mon Jun 28 20:09:43 2021 +0000

    Add Screenshot Feature (#2354)

    * Add internal screenshot  capabilities

    * update version notice

commit a79b39b91347816ea14677b58af738b70df03e9c
Author: Ac_K <Acoustik666@gmail.com>
Date:   Mon Jun 28 20:54:45 2021 +0200

    no name: Mii Editor applet support (#2419)

    * no name: Mii Editor applet support

    * addresses gdkchan feedback

    * Fix comment

    * Bypass MountCounter of MiiDatabaseManager

    * Fix GetSettingsPlatformRegion

    * Disable Applet Menu for unsupported firmwares

commit fefd4619a5347b4ef86314a4e17e1d6e63ced297
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Fri Jun 25 20:11:54 2021 -0300

    Add support for custom line widths (#2406)

commit 493648df312b7501b0560a3c94b2deffab2e99cf
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Fri Jun 25 19:56:03 2021 -0300

    Fix default value for unwritten shader outputs (#2412)

    * Fix shader default output values

    * Shader cache version bump

commit ed2f5ede0f8d8f58390745f5e237bbfea36397fe
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Thu Jun 24 19:54:50 2021 -0300

    Fix texture sampling with depth compare and LOD level or bias (#2404)

    * Fix texture sampling with depth compare and LOD level or bias

    * Shader cache version bump

    * nit: Sorting

commit eac659e37bf2ad8398a959c91f7b30017e4ad7f3
Author: Ac_K <Acoustik666@gmail.com>
Date:   Fri Jun 25 00:37:48 2021 +0200

    caps: Stubs GetAlbumFileList0AafeAruidDeprecated and GetAlbumFileList3AaeAruid (#2403)

    This PR stubs caps service call `GetAlbumFileList0AafeAruidDeprecated` and `GetAlbumFileList3AaeAruid` (Closes #2035, Closes #2401), both are checked by RE.
    This avoid using "ignore missing services" when you want to play World of Light in Super Smash Bros Ultimate.

commit 3359b0fd975fc2b53631dd7110254f4e2df02a12
Author: ooa113y <13thSlayer@gmail.com>
Date:   Thu Jun 24 03:21:52 2021 +0300

    Improve gameTable search (#2398)

    * Improve gameTable search

    * Remove useless split

    * Remove unneeded brackets

    * Simplify searchEqualFunc

    Co-authored-by: Ac_K <Acoustik666@gmail.com>

    * Remove leftovers (oops)

    Co-authored-by: Ac_K <Acoustik666@gmail.com>

commit 77aab9aca302bbe635d94750f57fb9a1ad910b74
Author: emmauss <emmausssss@gmail.com>
Date:   Thu Jun 24 00:09:08 2021 +0000

    Add Direct Mouse Support (#2374)

    * and direct mouse support

    * and direct mouse support

    * hide cursor if mouse enabled

    * add config

    * update docs

    * sorted usings

commit a10b2c5ff26886e9ffc6f19e3f0fe9505a503b2f
Author: gdkchan <gab.dark.100@gmail.com>
Date:   Wed Jun 23 20:51:41 2021 -0300

    Initial support for GPU channels (#2372)

    * Ground work for separate GPU channels

    * Rename TextureManager to TextureCache

    * Decouple texture bindings management from the texture cache

    * Rename BufferManager to BufferCache

    * Decouple buffer bindings management from the buffer cache

    * More comments and proper disposal

    * PR feedback

    * Force host state update on channel switch

    * Typo

    * PR feedback

    * Missing us…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gpu Related to Ryujinx.Graphics performance Performance issue or improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants