New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: faster PPC_FP code #856
Conversation
I had a friend test this by using the first stage of Starfox Assault and she was able to complete the level without issues, but that's all the testing I've done. |
3486830
to
49976c6
Compare
Some stats (first ~30 seconds of Sonic Colors) Double to Single: 378,150,740 floats sent to fast path, 18,092 to slow path |
// A sneaky hack: floating-point zero is rather common and we don't want to confuse it for denormals and | ||
// needlessly send it through the slow path. If we subtract 1 from the input, it turns float-zero into | ||
// 0xffffffff (skipping the slow path). This results in a single non-denormal being sent through the | ||
// slow path (0x00800000), but the effect of that should be negligible. |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
Nice PR. Just let me test this on one of the games which the original PPC_FP merge broke. |
The PPC_FP conversion code can be made a lot simpler with the observation that the only values that need to be sent through the slow x87 path are denormals. A whole bunch faster: 708->678 seconds on POV-RAY.
Tested Tak and the Power of Juju (GJUE78); works fine. |
Wow, super pro. This sounds awesome. |
4% boost in my CPU limited virtual console games. Affects Melee Fountain of Dreams by about 4 - 5% as well. Tested a ton more games, Everything seems to be working, including the games that were broke by the original (broken) PPC_FP merge. This looks good to me! Looking forward to the performance boosts. |
LGTM. I think this is ready to be merged. |
// if it is. | ||
MOVQ_xmm(R(RAX), src); | ||
SHR(64, R(RAX), Imm8(55)); | ||
// Exponents 0x369 <= x <= 0x380 are denormal. This code accepts the range 0x368 <= x <= 0x387 |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
} | ||
|
||
SetJumpTarget(dont_reset_qnan_bit); | ||
MOVDDUP(dst, R(XMM0)); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
The PPC_FP conversion code can be made a lot simpler with the observation
that the only values that need to be sent through the slow x87 path are
denormals.
A whole bunch faster: 708->678 seconds on POV-RAY.