New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of CR functions in JIT64 #746
Conversation
|
Well, it's smaller and neater. But look up tables are known to have surprises on modern CPUs. I suspect this one is small enough, but I'd like to see benchmarks. |
|
We already use lookup tables in a bunch of other places (like the quantization/dequantization), so it doesn't seem like anything new, unless I'm missing something? |
|
@phire what surprises would you be talking about? Fwiw, if there's even a tiny chance of this PR regressing, I'd prefer if a hwtest covering the whole instruction input range was written (I'll work on a feature to ease writing such tests soon-ish, so maybe wait until that's published). |
|
Lookup table is 3-4 cycles if it's an L1 hit and all the other instructions depend on each other. I think it's 7 cycles best case with a worst case in the hundreds of cycles. The old code has very few dependencies between the blocks, only the ORs depend on each other. I suspect it can do 10 cycles consistently. Like I said, I'd be interested in benchmarks. |
|
We use lookup tables in a ton of other cases where constants could be generated on the fly (for cheaper than here), so I don't think this could possibly be any worse... do we have any games that make heavy enough use of these instructions to bench? |
|
@phire Is this lookup tables bigger than the reduced code size? You've missed to code cache miss. |
|
749 to 736 seconds on POV-RAY benchmark, so 1.7% faster overall. |
|
lgtm I'd be interested in a benchmark comparison of this PR with and without the lookup tables. |
|
@FioraAeterna: This comment grants you the permission to merge this pull request whenever you think it is ready. After addressing the remaining comments, click this link to merge. @dolphin-emu-bot allowmerge |
|
I'm very slightly nervous about this since some of the variants of these instructions seem incredibly rare, but if something does go wrong, I have a blanket fort prepared to defend myself. |
Improve performance of CR functions in JIT64
Use a lookup table instead of calculating it on the fly.