-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of the Ingenic SoCs. Help needed. #81
Conversation
With these patches GCC is able to use XBurst specific instructions LXW, LXH[U] and LXB[U] to improve performance and reduce code size. WARNING: the patches doesn't introduce new machine as they shall, thus using these patches outside of this repo is a risk. However, it allows to not change any compilation flags in dependent packages to achieve the goal. By my observations they can improve performance for about 4% when all system and apps are rebuilt with improved GCC. Binaries size reduced for about 1%. Signed-off-by: Siarhei Volkau <lis8215@gmail.com>
That's interesting. 4% perf increase sounds very nice! One comment though. MXU instruction set needs to be enabled first by setting some bits in register xr16. But this is not done here? What binutils / GCC versions are these patches for? I'm going to bump to binutils 2.38 / GCC 12 soon., so I hope it won't be a problem. |
MXU register set ins't used here, XBurst have 5 instructions to extend general MIPS32 instructions set: LXW, LXH, LXHU, LXB, LXBU. Using these ones by code generation that's the goal. |
Lepus fault is not related to the PR. 2022-09-03T00:38:35.3436665Z >>> umtprd 1.6.2 Downloading Seems like network fault or so: 4.5 hours of downloading umtprd ... |
These are MXU instructions in my book, they are described in my MXU PDF document among regular MXU instructions. But it's true that they don't access any MXU register. |
It all builds now. What did you use to measure the performance difference? |
Well, actually I just observe the FPS in fceux :) , I think its not very good for precise measuring because there were +1-2 FPS benefit. As of now, checking stability is most important I think, because I tested it only on one hardware platform, jz4725b based. |
As long as there are no regressions, I'm fine with it. Are you going to attempt to upstream those patches? I can confirm it works fine on the JZ4725B, JZ4760B and JZ4770 SoCs. |
Dou you mean upstream to GCC/Binutils or OpenDingux? |
Upstream to GCC / Binutils. |
Well, actually I have no time to make it suitable for upstream, maybe in future. |
Benchmarks are looking very good so far: https://docs.google.com/spreadsheets/d/1OoYdmaKMBIcRDYXHztjPSX-ZO_ed37rY3uku2Q35AIk/edit?usp=sharing |
@SiarheiVolkau Looking at the binutils patch right now, I wonder if the opcode formats could be made better. Opcodes which do I/O generally use parentheses. edit: Actually might not be a good idea - the instruction formats are specified in the official documentation, so we kind of need to follow it. |
I tried to keep these commands to look like in official papers, because it we modify them it will lead to confusion of people who want to dig into this. Also it produce incompatibilities with Ingenic's toolchain and assembler macro header you use already. |
According to this document: https://www.sccs.swarthmore.edu/users/16/mmcconv1/jz-simd-docs.pdf I have a version of your binutils patch that allows enabling MXU conditionally by passing -mmxu. I'm trying to update it with the rest of the MXU instructions. |
@SiarheiVolkau if you make the PR "ready for review" I can merge it. I still want to have this merged upstream at some point, but in the meantime we can have a local patch in our buildroot repo. |
What about JZ4740? Do you support it in OD? |
@SiarheiVolkau no, JZ4740 is not supported right now in the current OpenDingux. |
Merging, thanks! |
@SiarheiVolkau this is amazing at os level, 27% reduction in the time needed to sha512sum a 475MB file. Thank you! |
@SiarheiVolkau this seems to break a few OPKs. Notably, UMG (which is closed-source, so not recompiled), and I think it affects ScummVM as well. It gets a "illegal instruction" error, which happens within libmodplug. Here's what GDB says, with some context:
The offending opcode ( However, as you can see
What I don't know, is where the bug is.
|
It might be because mips32r2 has unaligned access which spread to LX commands too, although seems like LX still have to be aligned. Will check how to keep alignment. Try to compile it as mips32r1, as a workaround, just to make sure it is. |
It's weird though that it results in "illegal instruction" rather than "bus error", as that is the usual error you get on unaligned memory access. Is it possible the kernel-mode emulation of unaligned reads is enabled and it doesn't know how to emulate this new instruction? |
That's very possible, yes. |
@pcercuei @citral23 could you please review and check OpenDingux/linux#14 |
With this PR GCC is able to use XBurst specific instructions LXW,
LXH[U] and LXB[U] to improve performance and reduce code size.
NOTE: the PR doesn't introduce new machine as it shall, thus
using patches in this PR outside the OpenDingux and Ingenic JZ SoCs.
However, it allows to not change any compilation flags in dependent
packages to achieve the goal.
By my observations it improves performance for about 4% when all
system and apps are rebuilt with improved GCC. Binaries size reduced for
about 1%.
This PR is WIP and aims to check it on all supported by OD devices.
So if someone can help to test OD on the supported devices, feel free to
post your questions and results here.