New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: optimize for the common case of unquantized psq_l/st #1830
Conversation
Idea seems fine... but it'll need to be benchmarked, and I'm not really happy with all the duplicated code. |
Definitely needs a benchmark. Regarding duplicated code, I think at some point we could simply split off all the address calculation for loads and stores (of all types) into a separate function, and then remove it from lXX, stx, stxx, lfxx, stxx, psq_lxx and psq_stxx |
int padding = totalSize - BACKPATCH_SIZE; | ||
u8* returnPtr = codePtr + 5 + padding; | ||
const u8 *trampoline = trampolines.GenerateReadTrampoline(info, registersInUse, exceptionHandler, returnPtr); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
9c5a7d9
to
7b34b44
Compare
This added an extra 10% performance (roughly 49 fps to 55 fps) in the dancing seen in Rogue Squadron 3 on bootup. |
@@ -163,5 +171,6 @@ class BitSet | |||
|
|||
} | |||
|
|||
typedef BS::BitSet<u8> BitSet8; |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
int gqr; | ||
for (int i = 0; i < 8; i++) | ||
if (code_block.m_gqr_used[i]) | ||
gqr = i; |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
@@ -154,6 +154,9 @@ struct CodeBlock | |||
|
|||
// Did we have a memory_exception? | |||
bool m_memory_exception; | |||
|
|||
// Which GQRs this block uses, if any. |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
Optimistically assume used GQRs are 0 in blocks that only use one GQR, and bail at the start of the block and recompile if that assumption fails. Many games use almost entirely unquantized stores (e.g. Rebel Strike, Sonic Colors), so this will likely be a big performance improvement across the board for games with heavy use of paired singles.
Not completely happy with the copy-pasted addressing code, but I'll let it pass for now if you have some idea how it should be refactored in the future. |
I think we should completely eliminate all the addressing code in Jit_Loadstore/Jit_LoadStoreFloating and make a single calculate-address function, maybe one for load and one for store. |
@FioraAeterna: This comment grants you the permission to merge this pull request whenever you think it is ready. After addressing the remaining comments, click this link to merge. @dolphin-emu-bot allowmerge |
JIT: optimize for the common case of unquantized psq_l/st
It's definitely possible to do this for other cases than just unquantized, and this patch is probably quite extensible for that, but I didn't feel like writing it. I went for the low-hanging fruit.
One extra benefit of this approach (instead of calling into JitAsmCommon) is we can now use fastmem for paired loadstore.