New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Greg/gsdx 64b #1664
Greg/gsdx 64b #1664
Conversation
Do you run the core in interpreter? How do you benchmark? In GS replayer? |
Yes I benchmarked without the core through the replayer. Interpreter will be too slow and I have additional timing info on the replayer. Here an example on HW renderer (but it is the same).
Edit: I benchmarked with the turbo off to avoid variation. |
Interesting that the histogram shows two maxima. Maybe one can improve the bottleneck of the slower one. Have you tried with different compiler? I guess compiler can be important here. Also do you know the bottleneck is in gsdx and not in the gpu driver? Maybe calling the driver from 64bit results in differences? |
Above was an example on the hw renderer (32 bits). I didn't test the hw renderer but it could be interesting. It is less critical on the hw renderer (often gpu limited) Some frames are nearly empty (internal 30 fps). So they are quick to render. |
By the way, if someone can test the SW renderer on 32 bits with all ISA. I want to avoid regression due to code factorization. |
I pushed additional change to select the ISA (SSE2/SSSE3/SSE41/AVX1) at runtime. It only impact the self-generated code (AKA the SW renderer). Nevertheless (if I manage to compile it on VS), it make
So in the future, we could limit it to SSE2/SSE4.1/AVX2. |
@turtleli Edit: it seems there are some black-magics in ./GSdx.vcxproj |
Yeah, it's the exclude build stuff. Fix for 32-bit Windows build (64 bits doesn't compile because |
Thanks for the patch. |
1/ Check all "levels" 2/ requires AVX for 64 bits
Very useful to stop the JIT
Allow to compare 32/64 bits (and all ISAs too) Allow to breakpoint (int3) Print selector info Print size of buffer and start (disabled by default)
Based on Gabest's work. * Miss mipmap Note: dithering info It is a bit tricky as a2 on linux was rdx register which overlap with fzm (dh/dl) It might require dedicated windows code
mov with the stack pointer require less bytecode
It will requires a generic (register naming) linear interpolation to use it properly Gather instruction requires an extra mask register therefore all registers name will be shuffled Perf wise, initial haswell implementation seems to be microcode emulated.
…scanline) It won't give the full SSE41 speed boost but it is better than nothing
The JIT will automatically select the best ISA (only AVX1 so far)
Thanks for the patch :)
2260a4f
to
ef25502
Compare
Let's go. The best way to have some test coverages 👍 |
Did I understand correctly that this pr solved #796? Ah I see. It only solves it partially. |
No. It is only a step forward. You have the C code, and you have the code generated by the C itself which is then executed. The latter is used by the SW renderer. However, I think SSSE3 and AVX1 build are useless now (I hope I won't have AVX penalty). |
Mhh.. is this affecting #357 then? |
Well it was only a stub (on gsdx sw) to allow the compilation (used to crash). New code is working.
Tbh, AVX2 will be done. But I don't think pure SSE will be done. It is too slow to worth it. |
AVX/x64 implementation of GSdx JIT compiler. So far, perf sucks. -20% to -10% versus 32 bits.
With hard work, it might be possible to reach the 32 bits performance level (or close enough) but it is really a low priority. At least, we can run, test and dev on a 64 bits GSdx.