Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EE JIT: Implement SDR/SDL, LDR/LDL instructions #4739

Merged
merged 2 commits into from
Sep 17, 2021
Merged

Conversation

refractionpcsx2
Copy link
Member

@refractionpcsx2 refractionpcsx2 commented Sep 7, 2021

Description of Changes

Added SDR/SDL, LDR/LDL instructions, in all their evilness (most of the code is just reading/writing the value in memory)

Rationale behind Changes

Stops the EE JIT dropping down to the interpreter for these instructions, theoretically faster

Suggested Testing Steps

Play games, make sure no new bad graphics or crashes happen.

@F0bes
Copy link
Member

F0bes commented Sep 16, 2021

Okay, I wrote a program that can be used to bench this PR
There are two different ones, one is quite unrealistic and just has a bunch of SDR and SDL instructions lined up (no-rec-flush) and the other does some stuff in-between the SDR and SDL instruction uses(rec-flush).

Here are my results on my i7-9750H running 64 bit linux and averaging it with my eyeballs

Register Flushing No Register Flushing
Master 205fps 100fps
PR 235fps 94fps

Now I ask of you to please not scream 30 FPS INCREASE, as this program is definitely a best case scenario, however, this pr is faster :)

Refer to the comment below for an updated benching.
sdl-sdr-benches.zip

@lightningterror lightningterror added this to the Release 1.8 milestone Sep 16, 2021
@refractionpcsx2
Copy link
Member Author

Yeah I imagine unless there's a game that really spams them, there is a negotiable change with this PR, but hey, at least it's not dropping back to the interpreter i guess xD

@refractionpcsx2
Copy link
Member Author

Did some optimisation and got it to these results

32bit Flush:

Master: 130fps
Pr: 217fps

32bit No-Flush:

Master: 68fps
PR: 94fps

64bit Flush:

Master: 161fps
Pr: 244fps

64bit No-Flush:

Master: 88fps
PR: 106fps

The move in the middle that brings in dummyValue is super slow, if I remove that line, my FPS in Fobes' tests almost doubles, but I don't know why it's slow to begin with, there seems to be some contention with the read done before it. but MOVQ (64bit) was faster than MOVDQA (128bit), so that helps it suck less.

@RedDevilus
Copy link
Contributor

RedDevilus commented Sep 16, 2021

Version Flex:
SDL_SDR2

32-bit flush 32-bit no-flush 64-bit flush 64-bit no-flush
Master VPS 130 VPS 68 VPS 161 VPS 88 VPS
PR VPS 217 VPS (+87) 94 VPS (+26) 244 VPS (+83) 106 VPS (+18)
PR performance uplift 66.9% 38.2% 51.6% 20.5%

Also fixed slight optimisation bug in SDL
@refractionpcsx2 refractionpcsx2 changed the title EE JIT: Implement SDR/SDL instructions EE JIT: Implement SDR/SDL, LDR/LDL instructions Sep 16, 2021
@refractionpcsx2
Copy link
Member Author

From this point on in the PR, LDL and LDR has also been added

@refractionpcsx2
Copy link
Member Author

refractionpcsx2 commented Sep 16, 2021

LDL/LDR only benchmarks. Remember this is BEST CASE SCENARIO with custom made ELF's

LDR/L Benches:

32Bit No Flush:

Master: 66fps
Pr: 120fps

32Bit with Flush:

Master: 169fps
PR: 304fps

@refractionpcsx2
Copy link
Member Author

refractionpcsx2 commented Sep 16, 2021

Combined LD/SD L/R test ELFs results:

LDR/L Benches:

32Bit No Flush:

Master: 48fps
Pr: 69fps

32Bit with Flush:

Master: 108fps
PR: 185fps

ldX-and-sdX-benches.zip

@refractionpcsx2 refractionpcsx2 merged commit 862d606 into master Sep 17, 2021
@refractionpcsx2 refractionpcsx2 deleted the sdr_sdl branch September 17, 2021 12:06
@RedDevilus
Copy link
Contributor

SDL/SDR PAL
Version 32 Bit Flush 32 Bit No-Flush
Stable 1.6 Speed 115% / 57 VPS Speed 220% / 110 VPS
Dev1.7-1745 Speed 120% / 60 VPS Speed 230% / 115 VPS
Dev1.7-1762 Speed 200% / 100 VPS Speed 460% / 230 VPS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants