Skip to content

FEX-2607

@Sonicadvance1 Sonicadvance1 tagged this 03 Jul 00:47
- AVX
  - Remove unnecessary moves from VPSHUF{D, HW, LW} (a4f89b79a)
  - Reduce codegen for 256-bit VMOVMSKPD/VMOVMSKPS (394a6f28d)
  - Handle two field insertions in VPERMQ (83a989c6d)
  - Slightly trim codegen for VPSADW 256-bit case (9740f488c)
  - Handle trivial UZP/ZIP operations in VPERMQ (c85e426cc)
  - Handle trivial cases better for VDPPS (3470dd1e7)
  - Handle easily broadcastable permutations in VPERMQ (7ae55d73c)
  - Skip identity insertions in VPERMQ (d5be15c90)
  - Handle transpose cases in VPERMQ (ec1b24d05)
  - Remove unnecessary dup if sources are the same in SHUFOpImpl (d93997c1c)
  - Shave some moves off 256-bit VPALIGNR (ab4e0f653)
  - Reduce moves in 256-bit VPSHUFB (fda023e7d)
  - Wire up helper to VPERMILPD (588126625)
  - Wire up lane helper for VPERMILPS imm variant (b846cb6c2)
  - Wire up lane helper for VSHUFPD/VSHUFPS  (62ecdd650)
  - Wire up lane helper for VPSHUFD/VPSHUFLW/VPSHUFHW (9f195ff37)
  - Reduce inserts in VBLENDPS/VPBLENDD/VPBLENDW (4eb569487)
  - Remove unnecessary moves from PINSRX ops  (5ad06b025)

- Allocator
  - Fix and optimize VA range detection (af4da43bb)

- Arm64
  - Fix byte-size handling in unaligned STLXR emulation (f66368b19)

- Arm64Emitter
  - Tidy up load/stores in Push/PopCalleeSavedRegisters (500d2374a)

- Build
  - Enable ccache sloppiness for time macros (cae5da577)

- CodeCache
  - Support targeting WOW64/ARM64EC in FEXOfflineCompiler (08ed4fb98)
  - Resolve comment https://github.com/FEX-Emu/FEX/pull/5441#discussion_r3255384588 (df73e8472)
  - Implement lazy code loading (bb0d142a6)

- Config
  - Use more sensible default for portable config location (33f3b8659)

- Core
  - Start a new block for the next op in full SMC check (b23fa3009)

- FEXBash
  - Drop implicit -c and add colored PS1 (9be7d6d11)

- FEXCore
  - Pass host type that changes codegen to FEXCore (3e60aa573)
  - Ensure LOCK prefix instructions are handled correctly (d848cbbc0)
  - Allow InterruptFaultPage to be significantly further away (e4daea406)
  - Add support for developer single stepping, read/write watching. (d4c80d909)

- FEXGetConfig
  - Even more correctness changes for X2E (ab9a8c62a)
  - Showcase RMW versus loadstore atomic differences (50f449487)

- FEXOfflineCompiler
  - Fixes HostFeature detection under Win32 (7fa4d7826)
  - Add "process-all" verb (cb018257c)

- FEXServerClient
  - Workaround sun_path 108 byte limit (1db45e2a7)

- Format
  - Fix missed clang-format (5fd917ec2)

- Frontend
  - Fix vsyscall page tracking.  (e02953dc1)

- HostFeatures
  - Put SVE support querying into single function (b9aeccf13)
  - Don't capture CTR/MIDR under simulator (07f7aa3c8)
  - Only enable `dc zva` optimization on Ampere CPUs (c98cef0da)

- IR
  - Add constant for swapping midsections of 256-bit vectors around  (110313e7d)

- ImageTracker
  - Support using image IDs as an extended volatile metadata key (ed724a61a)

- IntrusiveIRList
  - Amend signature for PostRA() (b87ff1e2d)

- JIT
  - Arm64: fix the loop in CacheLineClear/Clean (27a5f0918)
  - Amend op typos in implementations (55c90cfc3)

- LibraryForwarding
  - Implement support for CUDA (e12bd2710)
  - Add annotation for snd_htimestamp_t (a5c3fc475)

  - cuda
    - Convert constexpr to const (9ac608ca4)

- LinuxSyscalls
  - add missing thread header (a1071ec01)

- OpcodeDispatcher
  - Fix missing zeroing for vcvtps2ph (9f2e98294)
  - Fixes CRC32 with high 8-bit register (43bd24345)
  - Fix incorrect comment for BTOp. ZF must be preserved. (e19aa975c)
  - Fix typo in comment (7dc2dc874)
  - Sanitize selectors for VBLENDPD/VPBLENDD (681c5e809)
  - Eliminate redundant moves in VMOVHPOp (dd44bc8d0)
  - Make use of Bind consistently (110c7cb62)
  - Move a few stray literal accesses to Literal() (09aa5abbd)
  - Fixes 64-bit LODs with address size override  (ed6a178ae)

- Passes
  - Trim unnecessary forward declarations (280568df2)

- Proton
  - Fixes Mafia 3 (34344e576)

- Vector
  - Only signify 128-bit vector loads in UCOMISxOp (e26a792b7)
  - Fix typo in VPERMQOp (70fe9a440)
  - Remove unused OpcodeArgs parameter from SHUFOpImpl (ad618be97)

- VectorOps
  - Eliminate unnecessary moves in VMov if applicable (6bcadde65)
  - Avoid temporary if able in 256-bit VFRecp (5f1c8efe0)
  - Reduce temporary usage in 64-bit AdvSIMD min max paths (16f90b33f)
  - Avoid move in 256-bit VAddP/VFAddP if possible (5d8d052a7)
  - Avoid dup if able in VInsElement 128-bit element path (01b0b4e65)

- WOW64
  - Support disabling DEP (f5efdac2e)

- Windows
  - Fixes SHM stats reallocation (b4e2f5118)
  - Load unixlib if possible  (78832cc0d)
  - Adds empty Linux side unix library (fe4d2bc6c)
  - Trace interrupt translation and prototype INT 0x29 fail-fast mapping (f5fafa5b9)

  - UnixLib
    - Fix loading with new MemoryWineLoadUnixLibByName mechanism (44e24c9e6)
    - Adds remaining helpers (32b96c259)
    - Adds support for Hardware TSO support (24da43f82)

- Misc
  - Re-optimize FYL2X for reduced precision x87 path (3d593ce87)
  - instcountci/VEX_map3: Add missing third param to VPBLENDD (d138c854f)
  - instcountci/VEX_map1: Remove obsolete comments (7e2d3b07c)
  - Do not exec FEX if it is a folder in FEXBash (848c4b2d6)
  - Fix UAF of PS1 (4f995cbc1)
  - [SVE256] Add fast paths for trivial VSHUFPD flags (0b0000, and 0b1111) (161937425)
  - [SVE256] Remove unnecessary move in VCVTPS2PD (c13064e20)
  - [SVE256] Remove heavy handed moves from scalar compares (37e32fbcb)
  - [SVE256] More comprehensively test SSE insertions for scalar comparisons (27acbba52)
  - [SVE256] Add more SSE scalar variant unit tests (6f33d2b4c)
  - [SVE256] Handle SSE insertions for PCMPESTRM/PCMPISTRM ops (c5880e761)
  - [SVE256] Handle SSE insertions for MOVQ2DQ (9d0c05d9c)
  - [SVE256] Handle SSE insertions for EXTRQ/INSERTQ (f6a68cb7f)
  - [SVE256] Handle SSE insertions for CVTPI2PD (462c78541)
  - Fix incorrect RSP update for 16bit leave (f5477039f)
  - [SVE256] Handle SSE insertions for PMADDWD (ee4794c99)
  - [SVE256] Handle SSE insertions for CMPSD/CMPSS (3a23bb4b7)
  - [SVE256] Handle SSE insertions for MOVSHDUP/MOVSLDUP (46ffb25f8)
  - [SVE256] Handle SSE insertion for aligned/unaligned loads and non-temporal loads (069a4025c)
  - [SVE256] Handle SSE insertions for MOVH(PD, PD, LPS) and MOVL(PD, PS, HPS) (3929d25dc)
  - [SVE256] Handle SSE insertions for XOR special case (f98ac7f26)
  - [SVE256] Handle SSE insertions for INSERTPS, PSIGN, PINSR, and shuffles (23099100b)
  - [SVE256] Handle SSE insertions for shifts (adad3c27d)
  - (470aeab21)
  - [SVE256] Handle SSE insertions for more misc ops (99662b70f)
  - [SVE256] Handle SSE insertions for PHMINPOSUW, DPPD, and DPPS (1a606de29)
  - [SVE256] Handle SSE insertions for blends (223e0f4e5)
  - (0b1f336e0)
  - [SVE256] Handle more SSE insertions for some one-off instructions (d9ea6651f)
  - [SVE256] Vector: Handle SSE insertion properly for various ALU operations (9d5494d9f)
  - Drop unused CMakeSettings.json (97f1f47fa)
  - set sysroot to X86_DEV_ROOTFS for guest toolchain (53c269ee2)
  - New CPL0 instructions from #5510 but with unittests (1240a00fa)
  - code-format-helper: More dependabot changes (0b871bf54)
  - Cherry-pick #5508 with instcountci changes (268081e5d)
  - Windows additions for code caching (27324ded8)
  - Library Forwarding: Various build system improvements (b4fe65f2c)
  - JIT-inline FPREM/FPREM1 for reduced precision x87 path (0d7289048)

- arm64ec
  - Single instruction optimization in EC map lookup (65b05fa8c)
  - Fixes some FEX allocations that were missing TOP_DOWN  (7ff2069e6)

- instcountci
  - Add a few more cases for VPBLENDW/VPBLENDD and VSHUFPD/VSHUFPS (72e01274c)
  - Add 16-bit pcmpxstrx variants  (3d66be9e5)

- meta
  - Add CONTRIBUTING.md (83f325de0)

- unittests
  - Add selector tests for VPSHUF{D, HW, LW} (a79c471c3)
  - Add test for stress-testing VPBLENDD selectors  (417bd8604)
Assets 2
Loading