Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jit64: Optimize fsel a bit more #9127

Merged
merged 2 commits into from Oct 19, 2020
Merged

Conversation

Sintendo
Copy link
Member

@Sintendo Sintendo commented Oct 3, 2020

This PR builds upon optimizations introduced in pull request #8992, which were only applied to the packed variant of this instruction. As it turns out, these optimizations can also be applied to the non-packed variant when the destination register and a specific input register match. Luckily, according to my testing, this happens to be the case most of the time in games that use this instruction.


fsel (AVX)
Before:

66 0F 57 C0          xorpd       xmm0,xmm0
F2 41 0F C2 C6 06    cmpnlesd    xmm0,xmm14
C4 C3 09 4B CA 00    vblendvpd   xmm1,xmm14,xmm10,xmm0
F2 44 0F 10 F1       movsd       xmm14,xmm1

After:

66 0F 57 C0          xorpd       xmm0,xmm0
F2 41 0F C2 C6 06    cmpnlesd    xmm0,xmm14
C4 43 09 4B F2 00    vblendvpd   xmm14,xmm14,xmm10,xmm0

fsel (SSE4.1)
Before:

66 0F 57 C0          xorpd       xmm0,xmm0
F2 41 0F C2 C6 06    cmpnlesd    xmm0,xmm14
41 0F 28 CE          movaps      xmm1,xmm14
66 41 0F 38 15 CA    blendvpd    xmm1,xmm10,xmm0
F2 44 0F 10 F1       movsd       xmm14,xmm1

After:

66 0F 57 C0          xorpd       xmm0,xmm0
F2 41 0F C2 C6 06    cmpnlesd    xmm0,xmm14
66 45 0F 38 15 F2    blendvpd    xmm14,xmm10,xmm0

For the non-packed variant of this instruction, a MOVSD instruction was
generated to copy only the lower 64 bits of XMM1 to the destination
register. This was done in order to keep the destination register's
upper half intact.

However, when register c and the destination register are the same,
there is no need for this copy. Because the registers match and due to
the way the mask is generated, VBLENDVPD will end up taking the upper
half from the destination register, as intended.

Before:
66 0F 57 C0          xorpd       xmm0,xmm0
F2 41 0F C2 C6 06    cmpnlesd    xmm0,xmm14
C4 C3 09 4B CA 00    vblendvpd   xmm1,xmm14,xmm10,xmm0
F2 44 0F 10 F1       movsd       xmm14,xmm1

After:
66 0F 57 C0          xorpd       xmm0,xmm0
F2 41 0F C2 C6 06    cmpnlesd    xmm0,xmm14
C4 43 09 4B F2 00    vblendvpd   xmm14,xmm14,xmm10,xmm0
For the non-packed variant of this instruction, a MOVSD instruction was
generated to copy only the lower 64 bits of XMM1 to the destination
register. This was done in order to keep the destination register's
upper half intact.

However, when register c and the destination register are the same,
there is no need for this copy. Because the registers match and due to
the way the mask is generated, BLENDVPD will end up taking the upper
half from the destination register, as intended.

Additionally, the MOVAPS to copy Rc into XMM1 can also be skipped.

Before:
66 0F 57 C0          xorpd       xmm0,xmm0
F2 41 0F C2 C6 06    cmpnlesd    xmm0,xmm14
41 0F 28 CE          movaps      xmm1,xmm14
66 41 0F 38 15 CA    blendvpd    xmm1,xmm10,xmm0
F2 44 0F 10 F1       movsd       xmm14,xmm1

After:
66 0F 57 C0          xorpd       xmm0,xmm0
F2 41 0F C2 C6 06    cmpnlesd    xmm0,xmm14
66 45 0F 38 15 F2    blendvpd    xmm14,xmm10,xmm0
@lioncash lioncash merged commit fc5fbf5 into dolphin-emu:master Oct 19, 2020
10 checks passed
@Sintendo Sintendo deleted the fselx-movsd branch November 1, 2020 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants