Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: faster PPC_FP code #856

Merged
merged 1 commit into from Aug 24, 2014
Merged

Conversation

FioraAeterna
Copy link
Contributor

The PPC_FP conversion code can be made a lot simpler with the observation
that the only values that need to be sent through the slow x87 path are
denormals.

A whole bunch faster: 708->678 seconds on POV-RAY.

@FioraAeterna
Copy link
Contributor Author

I had a friend test this by using the first stage of Starfox Assault and she was able to complete the level without issues, but that's all the testing I've done.

@FioraAeterna FioraAeterna force-pushed the ppcfpopt branch 7 times, most recently from 3486830 to 49976c6 Compare August 23, 2014 05:19
@FioraAeterna
Copy link
Contributor Author

Some stats (first ~30 seconds of Sonic Colors)

Double to Single: 378,150,740 floats sent to fast path, 18,092 to slow path
Single to Double: 148,950,282 floats sent to fast path, 17,949 to slow path

// A sneaky hack: floating-point zero is rather common and we don't want to confuse it for denormals and
// needlessly send it through the slow path. If we subtract 1 from the input, it turns float-zero into
// 0xffffffff (skipping the slow path). This results in a single non-denormal being sent through the
// slow path (0x00800000), but the effect of that should be negligible.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@phire
Copy link
Member

phire commented Aug 23, 2014

Nice PR.

Just let me test this on one of the games which the original PPC_FP merge broke.

The PPC_FP conversion code can be made a lot simpler with the observation
that the only values that need to be sent through the slow x87 path are
denormals.

A whole bunch faster: 708->678 seconds on POV-RAY.
@autofire372
Copy link
Contributor

Tested Tak and the Power of Juju (GJUE78); works fine.

@Sonicadvance1
Copy link
Contributor

Wow, super pro. This sounds awesome.

@JMC47
Copy link
Contributor

JMC47 commented Aug 23, 2014

4% boost in my CPU limited virtual console games. Affects Melee Fountain of Dreams by about 4 - 5% as well.

Tested a ton more games, Everything seems to be working, including the games that were broke by the original (broken) PPC_FP merge. This looks good to me! Looking forward to the performance boosts.

@phire
Copy link
Member

phire commented Aug 23, 2014

LGTM.

I think this is ready to be merged.

// if it is.
MOVQ_xmm(R(RAX), src);
SHR(64, R(RAX), Imm8(55));
// Exponents 0x369 <= x <= 0x380 are denormal. This code accepts the range 0x368 <= x <= 0x387

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

delroth added a commit that referenced this pull request Aug 24, 2014
@delroth delroth merged commit aaff5a0 into dolphin-emu:master Aug 24, 2014
}

SetJumpTarget(dont_reset_qnan_bit);
MOVDDUP(dst, R(XMM0));

This comment was marked as off-topic.

This comment was marked as off-topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
7 participants