New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Work on X86_64 support #3451
Work on X86_64 support #3451
Conversation
pcsx2/x86/ix86-32/iR5900-32.cpp
Outdated
|
|
||
| #warning "JC: modified" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#warning isn't supported by Visual Studio.
|
If you could make the IMO the easiest way to get x86Emitter generating the x86-64 instructions you ask it to is with unit tests (rather then running it, letting it crash, and then looking at the results in a debugger), which I've started work on here |
|
Hmm, you shouldn't be changing the addressing form used in the source. This should remain as SiB addressing, instead of being shifted to require multiple separate operations. Shifting it like this is a massive performance cost, as not only does it increase the instruction count, it also requires register renaming and creates artificial instruction dependences. Instead, I'd replace the constant four in the original source with a definition POINTER_WIDTH, which is defined to four on x86 and eight on x64. |
|
Also, the s32 -> sptr changes were a mistake. Notice the comment:
The maximum value, even on x64, is four bytes. |
|
@arcum42 I will remove the warning directives. That's probably why the check failed. Thanks! is a bit goofy with recLUT being an 8-byte pointer. Maybe change to ? (And yes, above cast would be uptr. All I needed was a 64-bit pointer there. Will be changed) Ebx is loaded with the stack pointer 0xbfc0 0000, then the value retrieved from recLUT causes ecx+ebx to overflow. So the s32 displacement topic is still giving me a headache. So far, it did fix the two "mov 64bit-offset" / "mov 64-bit immediate" in _DynGen_DispatcherReg, see above byte code. When you guys say "there are only 32bit displacements", is that referring to the EE? I guess this comes back to my question about what to do with the LUTs... I'm not so much worried about performance at the moment. I noticed when I printf many thousands of debug lines to the terminal (I had 10 or 20 in there for each pseudo assembly code so that the buffer of the terminal wasn't enough), the load time for the game becomes long, but the game itself was fine. Also, I understood you wanted to replace the entire part, so I won't focus too much on optimizations "in the old code". |
|
@beaumanvienna I mean, there is literally no mechanism for a 64-bit displacement on x64. This has nothing to do with PCSX2 or the EmotionEngine. All displacements must be 32-bit. (There are absolute 64-bit forms, but nothing relative 64-bit). |
This makes Next Generation Tennis 2003 (Roland Garros French Open 2003) and Spongebob Battle for Bikini Bottom (PAL) work.
|
It works now! The jitted code from _DynGen_DispatcherReg retrieves the correct pointer from recLUT, which is recLutReserve_RAM + 64MB - cpuReg.pc, reads the function pointer from there, which is JITCompile(), and jumps to this function accordingly. If you're wondering why the code is subtracting 0x800000000 from the pointer retrieved from recLUT, this has to do with the overflow under i386 I mentioned earlier. On i386 the equation is whereby the truncation corresponds to subtracting 0x1 0000 0000. This overflow doesn't happen on x64. I think there's another overflow happening in recLUT_SetPage, that happens on both architectures: Under i386 they cancel each other out. Here's the code: |
|
Nicely done :) |
…hanged according to previous commit; merged with tellowkrinkle's branch 64bitInterpreterUnix
|
Personally I think we should be looking at what the old code was supposed to do to decide which things need to change and how. (And since we're going through all this, maybe add comments to the old code saying what it's trying to do so the next person doesn't have to figure it out for themselves) xMOV( eax, ptr[&psxRegs.pc] ); // Load the PS2 PC into eax
xMOV( ebx, eax ); // ... and ebx
xSHR( eax, 16 ); // psxRecLUT is a 2-level page table so first get the top 16 bits
xMOV( ecx, ptr[psxRecLUT + (eax*4)] ); // ... and use them to index psxRecLUT
// Normally we would want to jump to ((uptr*)ecx)[(ebx & 0xFFFF)/4]
// but as an optimization ecx already has the top bits of ebx subtracted from it
// so we can just add and jump
xJMP( ptr32[ecx+ebx] );Based on these, a few things with the new code:
Personally, here's what I think the code should look like (using e** as 32-bit registers and r** as 64-bit registers): xMOV( eax, ptr[&psxRegs.pc] ); // Load the PS2 PC into eax
xMOV( ebx, eax ); // ... and ebx
xSHR( eax, 16 ); // psxRecLUT is a 2-level page table so first get the top 16 bits
xMOV( rcx, ptr[psxRecLUT + (rax*sizeof(uptr))] ); // ... and use them to index psxRecLUT
// Normally we would want to jump to ((uptr*)ecx)[(ebx & 0xFFFF)/4]
// but as an optimization rcx already has the top bits of ebx subtracted from it
// so we can just add and jump
xJMP( ptrNative[rcx+rbx*(sizeof(uptr)/4)] );Note: We don't currently have a |
|
psx (aka iop) optimization are useless.honestly iop should be a subset of ee. |
|
@gregory38 We are trying to port the recompilers to x86. Optimizations are not the primary focus, we're still struggling to make it work in general. We fixed a bunch of broken emitted code, some 4-byte/8-byte pointer issues, and partially the entries in the LUT, but it is still work in progress. recLUT and recLUTReserve_RAM are now 8-byte wide. We are discussing the pseudo assembler and C-code on how to address those changes. There's a lot discussion about eXx registers vs the rXx registers. Also, the convenient SIB 32 addressing cannot always be used as before. But we got this :-) I noticed you are a reviewer on tellowkrinkle's PR about making the interpreter work on x64/Unix. My branch here is already synced up with most of tellowkrinkle's changes. Those changes did fix for example the xFastcall instruction in JITCompile. The current status is that we can
PC_GETBLOCK is still broken, not only because it assumes 4-byte table elements, but also because we have a weird offset of 0x800000000 in the recLUT elements. A last thing I wanted to mention, this PR will have zero-performance impact on the i386 "production" branch. We use either the C compiler to change from four to eight bytes or __M_X86_64, where not possible. This way you guys can probably pull it in more easily. |
Yes, rbx has to be adjusted, too. I fully agree. This is still missing in my suggested code. The base offset retrieved from recLUT needs also to be changed then. That was very helpful, thank you! |
|
For the pc. It depends if you need sign extension. But it would be more cache efficient to use only 4B. |
|
Thanks, Gregory! From my understanding, we have a 4-byte PS2 pc and 8-byte look-up table elements. The code in Baseblock.h should be ok. It's basically the C version of the pseudo assembler code above. In my code, I overlooked the pc*sizeof(BASEBLOC)/4 part.So the access to recLUTReserve_RAM (via PC_GETBLOCK) should advance in steps of eight bytes: But what do the values 0x2000, 0x000, 0x8000, and 0xa000 in iR5900-32.cpp stand for? I did understand that recROM is 4MB, corresponding to 64 pages (0x2000-0x40=0x1fc0), each used to jump into a 64k page in recLUTReserve_RAM |
|
Ram is mirrored. |
|
Top of my head mips memory is 8 segments of 512 MB |
…3456) Fixes hang after EA intro (IPU).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like copy-paste error - it should not be cpuRegs.PERF.n.pccr.b.U1 ???
…about jump tables added
|
@arcum42 / @tellowkrinkle I changed the format of the documentation to markup and revised it per your feedback. I also added comments to the code as discussed and fixed the bug in the pseudo assembler code about the missing multiplication
It runs now until where the recompile process is started and fails in recompileNextInstruction (here) with "SIB target is too far away, needs an indirect register" (x86emiter.cpp). |
Move Additional colclip info, dithering, FixedTEX0 to extra debug logs. It will allow to keep track of more important stuff going on and they can be enabled with ENABLE_EXTRA_LOG if needed. Change context creation log type from stderr to stdout.
Hope this is okay. ... # some stuff right before the crash
loadmodule: id 28, ret 0
loadmodule: fname cdrom0:¥M¥PADMAN.IRX;1 args 0 arg
sifcmd sceSifRegisterRpc: rpc_id 80000400
RegisterLibraryEntries: padman version 2.03
sifcmd sceSifRegisterRpc: rpc_id 80000100
sifcmd sceSifRegisterRpc: rpc_id 80000101
loadmodule: id 29, ret 0
loadmodule: fname cdrom0:¥M¥LIBSD.IRX;1 args 0 arg
RegisterLibraryEntries: libsd version 1.04
loadmodule: id 30, ret 0
loadmodule: fname cdrom0:¥M¥SDRDRV.IRX;1 args 0 arg
SDR driver version 2.0.0 (C)SCEI
Exit rsd_main
sifcmd sceSifRegisterRpc: rpc_id 80000701
loadmodule: id 31, ret 0
loadmodule: fname cdrom0:¥M¥KL2.IRX;1 args 0 arg
sifcmd sceSifRegisterRpc: rpc_id 12346
loadmodule: id 32, ret 0
(UpdateVSyncRate) Mode Changed to NTSC.
(UpdateVSyncRate) FPS Limit Changed : 59.94 fps
Set GS CRTC configuration. NTSC 640x448 @ 59.940 (59.82) Interlaced (FRAME)
RegisterIntrHandler: intr INT_dmaSPU, handler 7d5a4
RegisterIntrHandler: intr INT_dmaSPU2, handler 7d5a4
RegisterIntrHandler: intr INT_SPU, handler 7cc14
Set GS CRTC configuration. NTSC 640x448 @ 59.940 (59.82) Interlaced (FRAME)
j8 greater than 0x7f!! [1]
PCSX2: /tmp/makepkg/pcsx2/src/pcsx2/common/src/x86emitter/legacy.cpp:103: void x86SetJ8(u8*): Assertion `0' failed.
Thread 9 "EE Core" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe0c7d640 (LWP 268678)]
0x00007ffff6665615 in raise () from /usr/lib/libc.so.6
(gdb) bt
#0 0x00007ffff6665615 in raise () at /usr/lib/libc.so.6
#1 0x00007ffff664e862 in abort () at /usr/lib/libc.so.6
#2 0x00007ffff664e747 in _nl_load_domain.cold () at /usr/lib/libc.so.6
#3 0x00007ffff665dbf6 in () at /usr/lib/libc.so.6
#4 0x00005555559bd557 in x86SetJ8(unsigned char*) (j8=j8@entry=0x5555a007b82c "")
at /tmp/makepkg/pcsx2/src/pcsx2/common/src/x86emitter/legacy.cpp:103
#5 0x00005555558cd276 in R5900::Dynarec::OpcodeImpl::COP1::DOUBLE::ToPS2FPU_Full(int, bool, int, bool, bool)
(reg=reg@entry=9, flags=flags@entry=true, absreg=absreg@entry=10, acc=acc@entry=false, addsub=addsub@entry=true) at /tmp/makepkg/pcsx2/src/pcsx2/pcsx2/x86/iFPUd.cpp:249
#6 0x00005555558ce3d8 in R5900::Dynarec::OpcodeImpl::COP1::DOUBLE::ToPS2FPU(int, bool, int, bool, bool)
(reg=reg@entry=9, flags=flags@entry=true, absreg=absreg@entry=10, acc=acc@entry=false, addsub=addsub@entry=true) at /tmp/makepkg/pcsx2/src/pcsx2/pcsx2/x86/iFPUd.cpp:259
#7 0x00005555558d789d in R5900::Dynarec::OpcodeImpl::COP1::DOUBLE::recFPUOp(int, int, int, bool)
(info=info@entry=32798, regd=regd@entry=0, op=op@entry=0, acc=acc@entry=false)
at /tmp/makepkg/pcsx2/src/pcsx2/pcsx2/x86/iFPUd.cpp:437
#8 0x00005555558d81d5 in R5900::Dynarec::OpcodeImpl::COP1::DOUBLE::recADD_S_xmm(int) (info=info@entry=32798)
at /tmp/makepkg/pcsx2/src/pcsx2/pcsx2/x86/iFPUd.cpp:451
#9 0x0000555555946778 in eeFPURecompileCode(void (*)(int), void (*)(), int)
(xmmcode=0x5555558d81ba <R5900::Dynarec::OpcodeImpl::COP1::DOUBLE::recADD_S_xmm(int)>, fpucode=<optimized out>, xmminfo=xmminfo@entry=208) at /tmp/makepkg/pcsx2/src/pcsx2/pcsx2/x86/ix86-32/iR5900Templates.cpp:722
#10 0x00005555558b1d3e in R5900::Dynarec::OpcodeImpl::COP1::recADD_S() ()
at /tmp/makepkg/pcsx2/src/pcsx2/pcsx2/x86/iFPU.cpp:610
#11 0x0000555555916ce1 in recompileNextInstruction(int) (delayslot=delayslot@entry=0)
at /tmp/makepkg/pcsx2/src/pcsx2/pcsx2/x86/ix86-32/iR5900-32.cpp:1394
#12 0x0000555555920c0b in recRecompile(u32) (startpc=3198120)
at /tmp/makepkg/pcsx2/src/pcsx2/pcsx2/x86/ix86-32/iR5900-32.cpp:2051
#13 0x00005555611b002b in eeRecDispatchers ()
#14 0x0000000000000000 in ()
(gdb) |
|
When you say
you mean the latest upstream version, right? EDIT:It works for me with the latest upstream. I tried on a Core-i7/Nvidia GTX machine and on a Ryzen 5/Radeon 5700 machine, both with Arch. Works. Actually, only kind of. 2 minutes into the game at the 2nd ledge (after the block you have to smash and the cave) the little helper thingies that are supposed to boost you up are misplaced. They're at the exit of the cave, one is even half in the ground, not at the ledge like here |
|
@rien333 I also tested this on my two laptops with Ubuntu and Mint. I cannot reproduce the issue on any of my 5 machines. Apparently, Dobie had the same issue with j8 boundaries broken. If you could let us know your compile instructions, please. Are you using any options for PCSX2? Hardware/OpenGL or SoftwareOpen/GL? This is the bug I get. The green elevator thingies are misplaced. Also a known & fixed bug from Dobie: |
When I said that I tried your x86_64 branch, I meant that I cloned the code from this github page. The 1.7.0 version(s) I tried corresponds to the last two iterations of the PKGBUILD in the arch community repo. There's is a line in each PKGBUILD that specifies the commit that was used.
Just the options listed in aforementioned PKGBUILDs. I have actually also build pcsx2 with slightly different, more vanilla options, but I haven't messed with every single configuration option used in the official PKGBUILDs. I'm pretty sure I've also tried to produce the most vanilla build possible (basically removing everything, except for
No special options, no. I've made sure to reset all my options to "factory settings" a few times (moving my config, and hitting "Clear all settings..." in the menu). I just tried softwareOpen/GL, but I get the same |
|
Do you know for sure that your iso works with the 32-bit version? Maybe you copied from a USB stick and forgot to sync or something like that. Ok, I will feed the build scripts you mentioned into extra-x86_64-build tonight, let's see if I can reproduce it then. I have included PCSX2 in an emulator bundle I provide. The version there was rebased last week, so it's pretty fresh. There is a PKGBUILD script available, if you want to try that. Yesterday I actually tested mainly with that version, maybe you noticed at the end of the video I posted. |
Nope, totally sure it worked with the 32bit flatpak version of PCSX2 I installed. Same BIOS as well. Perhaps there is a subtle, hidden problem with my OS environment that flatpak somehow avoids though (i.e. the new x86_64 code might not be at fault, but some other change in e.g. a system lib is). Then again, 3/4 of the other games I tried work perfectly fine.
I will give that a try now. Don't sweat it if you have no idea what's causing my issue. Perhaps a fix will come along eventually, or other users may run into similar problems when the x86_64 version of this emulator becomes more popular. Thanks for investigating this problem though! |
After a pretty long build process, I got an error somewhere along the final stages of trying to build your "marley" emulator frontend (all emulators compiled fine, I think). I could open an issue over at marley's github if you want. Here is the last part of the compilation log (I just ran Compilation log |
Yes, that would be great. It will build in a clean chroot. Try the extra-x86_64-build script from https://wiki.archlinux.org/index.php/DeveloperWiki:Building_in_a_clean_chroot Regarding the compile time, there are some notes in the readme.md about setting MAKEFLAGS. |
|
The compile options from the PKBUILD script you mentioned I tried first with the extra-x86_64-build script and then manually with Could you do a manual build per the above, but with |
|
It seems this branch have quite a few conflicts with master and it seems many of them are with Tellow's commits. What's the status of this PR w.r.t. to master at this point? I have a local commit related to cdvd but that part was recently largely changed in master, so I may need to rebase the branch to master to test it, but seemingly it's not possible at the moment with conflicts in codegen part as I don't have related knowledge to locally merge them. |
(I assume you mean Okay, I thought I tried this, but when following your instructions and calling cmake without any config options except |
The changes for the x64 port have been merged into master. So, basically, we are done here. The port seems to be stable. This thread here was more a means to bring everybody together and make the x64 bit port happen. The only actual changes that I'm aware of are some opcode define to replace magic numbers. Long story short, you need to rebase with master. It's all in there. @rien333 Nice to hear it starts now. Do you have the bug with the misplaced object? |
yes |
|
BTW: even though the cmake step in the arch linux build process did not change (which we established seemed to be part of the reason why Klonoa didn't launch), I got the game to work in the official Arch community build by disabling "Automatic game fixes" everywhere. Perhaps there is a problem in one of those fixes? And perhaps the reason why you had difficulty reproducing my issue was because your "automatic game fixes" settings differed? |
I left those on default. I actually tested Klonoa just yesterday, coincidence :-) It runs but the green elevator thingies are still misplaced. Can you confirm this bug? The version I tested with is from Sunday (0448b49). I'm rebasing PCSX2 for Marley, the cmake settings are: If you posted your cmake settings please one more time, I can try to reproduce this bug. |
|
@beaumanvienna |
Ah bummer. Still odd that the game launches fine if I turn them off.
Yeah, I also noticed that. Couldn't really progress beyond the first level. I checked out Klonoa again because I found someone with the exact same issue.. They were also unable to progress beyond the first level (with automatic game fixes off), and report the same
I'm running the precompiled binary from the arch repos. I think you also tried that binary at one point. @kattjevfel already linked you the PKGBUILD, I see (which contains the cmake settings that were used). I'll retry running Klonoa in Marley over the weekend. |
|
Let me finish the rebase first. I will let you know. Normally, WeirdBeard is maintaining the AUR. I had tried to connect @kattjevfel and WeirdBeard before, not sure if they ever met or not. Maybe there are two different PKGBUILDs, one from WeirdBeard (official) and the one you use? @tadanokojin @arcum42 I can submit the above changes for |
I did join the discord and talk with them, yes. But let's just say I don't agree with a lot of things there and I ended up leaving. I now only use the official package from the community repo, maintained by alucryd. |
|
Just curious. What didn't you agree with him about? |
I'm not using the package from the AUR. I also never have, because that builds a 32bit binary. As I said, I've been using the binary from the official community repo (the PKGBUILD used to create said binary was previously linked). The AUR is never official, I guess. As mentioned, there is indeed a difference between the PKGBUILD in the community repo and the one the AUR, though that doesn't matter much here, because the AUR package has always been configured to build a 32bit binary (which works fine for Klonoa, hence this issue). |
|
Ah, I'd heard the opposite. I heard it was the AUR repo that was the official package for pcsx2. Also I retesting that the other builds 64 bit. Seems a bit unstable for a package that's supposed to maintain from my understanding 1.6 only |
Not so much about him in person, but the community in general seemingly seeing 64-bit as a bother and would rather keep 32-bit forever. Perhaps a lot of it is sysadmin vs developer, but still.
Official in the eyes of arch, but unofficial in the eyes of pcsx2 devs. |
I have to say that community package indeed enables "experimental" options one would normally expect to find in an AUR package, but so far, I'm actually really happy with how it works. The options also make sense in the context of the rest of the arch eco system. Except that Klonoa has been a bit tricky to get running. |
|
@rien333 @arcum42 @kattjevfel @gregory38 I found a bug in the Debian rules file I use to package PCSX2 for Marley. It was adding compiler flags that would crash PCSX2 with "Illegal instruction". I could confirm this bug on four machines. I had to override/reset the MAKEFLAGS option with It might be the same with other packaging tools such as |
|
I wasn't following this in a long time but why did you close this @beaumanvienna? Is everything important merged? |
|
Everything is merged. This is mentioned in the first post of this thread all the way up. We kept this thread here open to collect feedback. It should be fine now. |
|
for the record I added a discussion about this: might be an idea to move there. |

Give it a try, guys! We would like to know if it runs on your 64-bit OS as well. Please checkout this here branch or @tellowkrinkle's JIT64 branch.
We are preparing Tellow's branch to be pulled in. The changes will be split into four PRs. Three out of four have been issued so far: #3512 #3523 #3524