Greg/gsdx 64b #1664

gregory38 · 2016-11-19T10:01:19Z

AVX/x64 implementation of GSdx JIT compiler. So far, perf sucks. -20% to -10% versus 32 bits.

With hard work, it might be possible to reach the 32 bits performance level (or close enough) but it is really a low priority. At least, we can run, test and dev on a 64 bits GSdx.

willkuer · 2016-11-19T11:26:34Z

Do you run the core in interpreter? How do you benchmark? In GS replayer?

gregory38 · 2016-11-19T11:30:17Z

Yes I benchmarked without the core through the replayer. Interpreter will be too slow and I have additional timing info on the replayer.

Here an example on HW renderer (but it is the same).

Performance Profile for 53 frames:
Min  1.40 ms    (711.80 fps)
Mean 7.50 ms    (133.31 fps)
Max  13.54 ms   (73.88 fps)
SD   3.72 ms

Frame Repartition
  0 ms =>   2 ms       1
  2 ms =>   4 ms      16
  4 ms =>   6 ms       9
  6 ms =>   8 ms       0
  8 ms =>  10 ms       7
 10 ms =>  12 ms      15
 12 ms =>  14 ms       5
 14 ms =>  16 ms       0

Edit: I benchmarked with the turbo off to avoid variation.

willkuer · 2016-11-19T12:22:28Z

Interesting that the histogram shows two maxima. Maybe one can improve the bottleneck of the slower one.

Have you tried with different compiler? I guess compiler can be important here.

Also do you know the bottleneck is in gsdx and not in the gpu driver? Maybe calling the driver from 64bit results in differences?

gregory38 · 2016-11-19T12:29:20Z

Above was an example on the hw renderer (32 bits). I didn't test the hw renderer but it could be interesting. It is less critical on the hw renderer (often gpu limited)

Some frames are nearly empty (internal 30 fps). So they are quick to render.

gregory38 · 2016-11-19T12:45:48Z

By the way, if someone can test the SW renderer on 32 bits with all ISA. I want to avoid regression due to code factorization.

gregory38 · 2016-11-19T14:20:45Z

I pushed additional change to select the ISA (SSE2/SSSE3/SSE41/AVX1) at runtime.

It only impact the self-generated code (AKA the SW renderer). Nevertheless (if I manage to compile it on VS), it make

SSSE3 build useless
AVX1 build mostly useless

So in the future, we could limit it to SSE2/SSE4.1/AVX2.

gregory38 · 2016-11-19T14:51:40Z

@turtleli
Could you help me to fix the compilation issue on Windows? Are AVX files (aka *.x86.avx.cpp) compiled on SSE windows build ?

Edit: it seems there are some black-magics in ./GSdx.vcxproj

turtleli · 2016-11-19T15:12:23Z

Yeah, it's the exclude build stuff.

Fix for 32-bit Windows build (64 bits doesn't compile because __x86_64__ in stdafx.h isn't defined on Windows) - https://gist.github.com/turtleli/a05906730b9b885d5dde2e862440238f

gregory38 · 2016-11-19T15:49:55Z

Thanks for the patch.

1/ Check all "levels" 2/ requires AVX for 64 bits

Very useful to stop the JIT

Allow to compare 32/64 bits (and all ISAs too) Allow to breakpoint (int3) Print selector info Print size of buffer and start (disabled by default)

Based on Gabest's work. * Miss mipmap Note: dithering info It is a bit tricky as a2 on linux was rdx register which overlap with fzm (dh/dl) It might require dedicated windows code

mov with the stack pointer require less bytecode

It will requires a generic (register naming) linear interpolation to use it properly Gather instruction requires an extra mask register therefore all registers name will be shuffled Perf wise, initial haswell implementation seems to be microcode emulated.

…scanline) It won't give the full SSE41 speed boost but it is better than nothing

…ine)

The JIT will automatically select the best ISA (only AVX1 so far)

Thanks for the patch :)

gregory38 · 2016-11-19T17:13:17Z

Let's go. The best way to have some test coverages 👍

willkuer · 2016-11-19T18:05:53Z

Did I understand correctly that this pr solved #796?

Ah I see. It only solves it partially.

gregory38 · 2016-11-19T18:14:38Z

No. It is only a step forward. You have the C code, and you have the code generated by the C itself which is then executed. The latter is used by the SW renderer.
In case of SSE2 up AVX1 build, the generated code will be based on your CPU capabilities. However all C code will depend on the compilation flag.

However, I think SSSE3 and AVX1 build are useless now (I hope I won't have AVX penalty).

mirh · 2016-11-19T19:52:58Z

Mhh.. is this affecting #357 then?

gregory38 · 2016-11-19T20:01:02Z

Well it was only a stub (on gsdx sw) to allow the compilation (used to crash). New code is working.
The limitation

mipmap isn't implemented
no AVX2
no SSE but any CPU that support AVX will pick up the AVX version (~70% of the users)

Tbh, AVX2 will be done. But I don't think pure SSE will be done. It is too slow to worth it.

gregory38 added the GS: Software label Nov 19, 2016

gregory38 and others added 20 commits November 19, 2016 17:00

cmake: always define avx on 64 bits build

8b4da69

gsdx: properly check SSE support

82d1269

1/ Check all "levels" 2/ requires AVX for 64 bits

gsdx: separate dump directory for 32/64 bits

43b4cfc

xbyak: add int3 instruction

633f7a1

Very useful to stop the JIT

gsdx: SW JIT debug helper

e31ce87

Allow to compare 32/64 bits (and all ISAs too) Allow to breakpoint (int3) Print selector info Print size of buffer and start (disabled by default)

gsdx: define the linux x64 ABI

4a47224

gsdx sw x64: update setup prim generator x64 SSE&AVX

8e29e09

gsdx sw x64: port the scanline generator on AVX

a281bda

Based on Gabest's work. * Miss mipmap Note: dithering info It is a bit tricky as a2 on linux was rdx register which overlap with fzm (dh/dl) It might require dedicated windows code

gsdx sw x64: prefer faster 32 bits operation when possible

141c9e9

gsdx sw x64: small stack optimization on linux

051c5c4

mov with the stack pointer require less bytecode

gsdx: small x64 printf warning fixes

8abf242

gsdx linux: plug vtune as Windows

d58e43e

gsdx sw: factorize color split in split16_2x8

e728a14

gsdx sw x64: restore read texel optimization

2e20693

gsdx sw JIT: dynamically select SSE41 at runtime even on SSE2 build (…

6b78b8f

…scanline) It won't give the full SSE41 speed boost but it is better than nothing

gsdx sw JIT: dynamically select between AVX1 and SSE code path (scanl…

574a2c7

…ine)

gsdx sw JIT: dynamically select ISA for SetupPrim

8fd46e9

gsdx: Relax SSE/AVX constraint on 64 bits

cc6d193

The JIT will automatically select the best ISA (only AVX1 so far)

gsdx build: don't exclude AVX files.

ef25502

Thanks for the patch :)

gregory38 force-pushed the greg/gsdx-64b branch from 2260a4f to ef25502 Compare November 19, 2016 16:09

gregory38 merged commit 58c3794 into master Nov 19, 2016

gregory38 deleted the greg/gsdx-64b branch November 19, 2016 17:12

lightningterror mentioned this pull request Mar 25, 2018

64 bit : interpreter & GSdx Hardware working #357

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Greg/gsdx 64b #1664

Greg/gsdx 64b #1664

gregory38 commented Nov 19, 2016

willkuer commented Nov 19, 2016

gregory38 commented Nov 19, 2016 •

edited

willkuer commented Nov 19, 2016

gregory38 commented Nov 19, 2016

gregory38 commented Nov 19, 2016

gregory38 commented Nov 19, 2016

gregory38 commented Nov 19, 2016 •

edited

turtleli commented Nov 19, 2016

gregory38 commented Nov 19, 2016

gregory38 commented Nov 19, 2016

willkuer commented Nov 19, 2016 •

edited

gregory38 commented Nov 19, 2016

mirh commented Nov 19, 2016

gregory38 commented Nov 19, 2016

Greg/gsdx 64b #1664

Greg/gsdx 64b #1664

Conversation

gregory38 commented Nov 19, 2016

willkuer commented Nov 19, 2016

gregory38 commented Nov 19, 2016 • edited

willkuer commented Nov 19, 2016

gregory38 commented Nov 19, 2016

gregory38 commented Nov 19, 2016

gregory38 commented Nov 19, 2016

gregory38 commented Nov 19, 2016 • edited

turtleli commented Nov 19, 2016

gregory38 commented Nov 19, 2016

gregory38 commented Nov 19, 2016

willkuer commented Nov 19, 2016 • edited

gregory38 commented Nov 19, 2016

mirh commented Nov 19, 2016

gregory38 commented Nov 19, 2016

gregory38 commented Nov 19, 2016 •

edited

gregory38 commented Nov 19, 2016 •

edited

willkuer commented Nov 19, 2016 •

edited