Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Have you taken a look at HQEMU? #95

Open
roybaer opened this issue Nov 29, 2020 · 7 comments
Open

Question: Have you taken a look at HQEMU? #95

roybaer opened this issue Nov 29, 2020 · 7 comments
Labels

Comments

@roybaer
Copy link

roybaer commented Nov 29, 2020

Given that Hangover's performance is reportedly mostly limited by QEMU, I would like to ask whether you have heard of HQEMU.

To quote from HQEMU's webpage:

HQEMU is a retargetable and multi-threaded dynamic binary translator on multicores. It integrates QEMU and LLVM as its building blocks. The translator in the enhanced QEMU acts as a fast translator with low translation overhead. The optimization-intensive LLVM optimizer running on separate threads dynamically improves code for higher performance. With the hybrid QEMU+LLVM approach, HQEMU can achieve low translation overhead and good translated code quality.

HQEMU supports process-level emulation and full-system virtualization. It provides translation modes of running the QEMU translator and LLVM optimizer in one process, or running the LLVM optimizer as a stand-alone optimization server (version 0.13.0).

I have not had a chance to try it out myself but the description sounds promising.

@AndreRH
Copy link
Owner

AndreRH commented Dec 5, 2020

See discussions at
#77 (comment)
And
#20 (comment)

@roybaer
Copy link
Author

roybaer commented Dec 6, 2020

Interesting read.

In the meantime I have been able to rebase HQEMU's LLVM patches onto more recent LLVM versions with some manual intervention.
The full LLVM build process succeeds for the patched versions 7, 9, 10 and 11, while version 8 fails to build for unrelated reasons.
A successful build obviously does not mean that it still works, but I cannot really test it right now, because I do not have the relevant AArch64 hardware handy.

When it comes to HQEMU's additions and modifications to QEMU, it is probably easier to manually reapply them to a new QEMU.

@AndreRH
Copy link
Owner

AndreRH commented Dec 8, 2020

Could you please try to apply the qemu changes onto our qemu?

@roybaer
Copy link
Author

roybaer commented Dec 9, 2020

I can try, but it's going to take a while.
Right now, HQEMU does not even compile with the updated patched LLVM, because of API changes.
If we rely on LLVM 6, only, the changes to the QEMU code base still amount to 2454 insertions and 331 deletions, not counting newly added files. We'd have to see how much QEMU has changed from version 2.5 to version 5.

@stefand
Copy link
Collaborator

stefand commented Dec 9, 2020

One conceptual problem with optimizing the generated ARM code is exception handling: It is difficult to impossible to merge two x86 instructions into one ARM instruction (or any other less-than-1:1 matching). If there's an exception in an ARM instruction that doesn't clearly match an x86 instruction qemu can't properly report the exception location.

I don't know if HQEMU attempts to do a n:m optimization or if it attempts to do anything about signal handling in this case.

@roybaer
Copy link
Author

roybaer commented Dec 9, 2020

I somehow doubt that LLVM's optimizer is going to pay any attention to that.
It's probably going to be the typical speed vs. accuracy trade-off.
I get the impression, though, that the byte-exact location of an exception only really matters in combination with anti-debugger code.
HQEMU is apparently at least good enough to run Windows XP in full system emulation mode and the speedup is very desirable.

@owlshrimp
Copy link

owlshrimp commented Jul 2, 2021

I somehow doubt that LLVM's optimizer is going to pay any attention to that.
It's probably going to be the typical speed vs. accuracy trade-off.
I get the impression, though, that the byte-exact location of an exception only really matters in combination with anti-debugger code.
HQEMU is apparently at least good enough to run Windows XP in full system emulation mode and the speedup is very desirable.

This could be highly problematic. A strong driving force behind WINE these days seems to be VALVe's Proton fork and it's use in gaming on Linux, which has been quite technically successful. The games on their Steam platform were produced by varoius publishers for windows, often several years ago. Many of them contain a large number of DRM measures over which VALVe has no control. If the emulation of x86 isn't accurate enough, particularly against anti-debugger code, then it would block the emulation of these games on non-x86 platforms.

I could see VALVe wanting to pursue this in the future (they have supposedly been working on a Nintendo Switch competitor, but have been forced to use a less power-efficient x86 mobile chip from AMD instead of an ARM chip from NVIDIA) so some future way of mitigating this is probably worth consideration.

Perhaps in the future regular checkpointing could be employed and more instruction-accurate emulation selected to roll forwards in the event of a (rare) exception?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants