New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Optimize JitAsmCommon, Float, and PS implementations #686
Conversation
|
@dolphin-emu-bot rebuild |
|
Haven't yet looked much at the paired-single commit, but the other two commits lgtm. |
|
I pushed another commit to this branch with some similar optimizations for the regular, non-paired float functions. I also added MOVLPD/HPD, which should help dealing with the unpaired floats a little bit, since they can read/write to/from the top/bottom halves of an SSE register. |
|
@dolphin-emu-bot rebuild |
|
Fixed a bug with my blendvpd implementation of ps_sel. |
|
@dolphin-emu-bot rebuild |
|
I'm highly sceptic about this. The introduction of SSE optimizations has historically been prone to bring regressions along with them, often being hard to find ones (i.e. due to alignment requirements on OS X, which is a less tested platform). Given that these patch are purely for optimization, I'd hence say this needs some cputests for https://github.com/dolphin-emu/hwtests to make sure behavior is correct. |
|
Not sure if relevant, but Clang gives me an "LLVM ERROR" when compiling with -sse4 or -march=native anyway. I'm not sure if that implies that OS X builds compile without any SSE4.x opts at all (including Dolphin's) or if it just does ours and doesn't do extra opts. |
|
@FioraAeterna Maybe you could extract the 3-byte opcode commit into a separate PR to get it merged more quickly? Also, what do you think about adding some wrapper functions for the CPU checks? Maybe even a macro that stringifies the instruction function names so you don't have to provide them every time? |
1f8b58b
to
bb0852d
Compare
|
I stripped out a bunch of parts of this patch because I do not trust any of this code I wrote weeks ago <_<;;; |
b9bc411
to
d918316
Compare
Use some SSE4 instructions in on CPUs that support them. Use float instructions instead of int where appropriate (it's a cycle faster on CPUs with arithmetic unit forwarding penalties).
Based on a patch by Tilka.
| @@ -77,16 +77,7 @@ void Jit64::fp_tri_op(int d, int a, int b, bool reversible, bool single, void (X | |||
| if (single) | |||
| { | |||
| ForceSinglePrecisionS(fpr.RX(d)); | |||
| if (cpu_info.bSSE3) | |||
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
JIT: Optimize JitAsmCommon, Float, and PS implementations
I did various optimizations to JitAsmCommon to improve the quantization/dequantization routines and save a few instructions, plus take advantage of the SSE4 instructions I added a bit back.