Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of the Ingenic SoCs. Help needed. #81

Merged
merged 1 commit into from
Sep 21, 2022

Conversation

SiarheiVolkau
Copy link

With this PR GCC is able to use XBurst specific instructions LXW,
LXH[U] and LXB[U] to improve performance and reduce code size.

NOTE: the PR doesn't introduce new machine as it shall, thus
using patches in this PR outside the OpenDingux and Ingenic JZ SoCs.
However, it allows to not change any compilation flags in dependent
packages to achieve the goal.

By my observations it improves performance for about 4% when all
system and apps are rebuilt with improved GCC. Binaries size reduced for
about 1%.

This PR is WIP and aims to check it on all supported by OD devices.
So if someone can help to test OD on the supported devices, feel free to
post your questions and results here.

With these patches GCC is able to use XBurst specific instructions LXW,
LXH[U] and LXB[U] to improve performance and reduce code size.

WARNING: the patches doesn't introduce new machine as they shall, thus
using these patches outside of this repo is a risk. However, it allows
to not change any compilation flags in dependent packages to achieve the
goal.

By my observations they can improve performance for about 4% when all
system and apps are rebuilt with improved GCC. Binaries size reduced for
about 1%.

Signed-off-by: Siarhei Volkau <lis8215@gmail.com>
@pcercuei
Copy link
Member

pcercuei commented Sep 2, 2022

That's interesting. 4% perf increase sounds very nice!

One comment though. MXU instruction set needs to be enabled first by setting some bits in register xr16. But this is not done here?

What binutils / GCC versions are these patches for? I'm going to bump to binutils 2.38 / GCC 12 soon., so I hope it won't be a problem.

@SiarheiVolkau
Copy link
Author

MXU register set ins't used here, XBurst have 5 instructions to extend general MIPS32 instructions set: LXW, LXH, LXHU, LXB, LXBU. Using these ones by code generation that's the goal.
It based on GCC 11.1 Binutils 2.37, however the patches shall apply to newer GCC/Binutils because there were no changes in MIPS subtrees for a decade :) .

@SiarheiVolkau
Copy link
Author

Lepus fault is not related to the PR.
This is the log last lines:

2022-09-03T00:38:35.3436665Z >>> umtprd 1.6.2 Downloading
2022-09-03T00:38:35.3553942Z wget --passive-ftp -nd -t 3 -O '/__w/buildroot/buildroot/output/lepus/build/.umtprd-1.6.2.tar.gz.d02LKj/output' 'https://github.com/viveris/uMTP-Responder/archive/umtprd-1.6.2.tar.gz'
2022-09-03T00:38:35.3579097Z --2022-09-03 00:38:35-- https://github.com/viveris/uMTP-Responder/archive/umtprd-1.6.2.tar.gz
2022-09-03T00:38:35.3632893Z Resolving github.com (github.com)... 140.82.114.4
2022-09-03T00:38:35.4001867Z Connecting to github.com (github.com)|140.82.114.4|:443... connected.
2022-09-03T05:12:48.9524049Z HTTP request sent, awaiting response...
2022-09-03T05:12:48.9574192Z ##[error]The operation was canceled.

Seems like network fault or so: 4.5 hours of downloading umtprd ...

@pcercuei
Copy link
Member

pcercuei commented Sep 3, 2022

These are MXU instructions in my book, they are described in my MXU PDF document among regular MXU instructions. But it's true that they don't access any MXU register.

@pcercuei
Copy link
Member

pcercuei commented Sep 3, 2022

It all builds now.

What did you use to measure the performance difference?

@SiarheiVolkau
Copy link
Author

Well, actually I just observe the FPS in fceux :) , I think its not very good for precise measuring because there were +1-2 FPS benefit.

As of now, checking stability is most important I think, because I tested it only on one hardware platform, jz4725b based.
It will be good to test on all devices supported by OpenDingux.

@pcercuei
Copy link
Member

pcercuei commented Sep 3, 2022

As long as there are no regressions, I'm fine with it.

Are you going to attempt to upstream those patches?

I can confirm it works fine on the JZ4725B, JZ4760B and JZ4770 SoCs.

@SiarheiVolkau
Copy link
Author

Dou you mean upstream to GCC/Binutils or OpenDingux?

@pcercuei
Copy link
Member

pcercuei commented Sep 3, 2022

Upstream to GCC / Binutils.

@SiarheiVolkau
Copy link
Author

Well, actually I have no time to make it suitable for upstream, maybe in future.

@citral23
Copy link

citral23 commented Sep 6, 2022

@pcercuei
Copy link
Member

pcercuei commented Sep 6, 2022

@SiarheiVolkau Looking at the binutils patch right now, I wonder if the opcode formats could be made better. Opcodes which do I/O generally use parentheses.
What about: lxw $t0, ($t1, $t2, 1)?

edit: Actually might not be a good idea - the instruction formats are specified in the official documentation, so we kind of need to follow it.

@SiarheiVolkau
Copy link
Author

SiarheiVolkau commented Sep 6, 2022

I/O generally use parentheses.

I tried to keep these commands to look like in official papers, because it we modify them it will lead to confusion of people who want to dig into this. Also it produce incompatibilities with Ingenic's toolchain and assembler macro header you use already.

@pcercuei
Copy link
Member

pcercuei commented Sep 6, 2022

According to this document: https://www.sccs.swarthmore.edu/users/16/mmcconv1/jz-simd-docs.pdf
The LX* instructions are MXU rev.2 only, which means they won't work on anything older than the JZ4725B, including the JZ4740.

I have a version of your binutils patch that allows enabling MXU conditionally by passing -mmxu. I'm trying to update it with the rest of the MXU instructions.

@pcercuei
Copy link
Member

@SiarheiVolkau if you make the PR "ready for review" I can merge it.

I still want to have this merged upstream at some point, but in the meantime we can have a local patch in our buildroot repo.

@SiarheiVolkau
Copy link
Author

What about JZ4740? Do you support it in OD?

@pcercuei
Copy link
Member

@SiarheiVolkau no, JZ4740 is not supported right now in the current OpenDingux.

@SiarheiVolkau SiarheiVolkau marked this pull request as ready for review September 20, 2022 12:26
@pcercuei
Copy link
Member

Merging, thanks!

@pcercuei pcercuei merged commit b7b803b into OpenDingux:opendingux Sep 21, 2022
@citral23
Copy link

citral23 commented Oct 5, 2022

@SiarheiVolkau this is amazing at os level, 27% reduction in the time needed to sha512sum a 475MB file. Thank you!

@pcercuei
Copy link
Member

@SiarheiVolkau this seems to break a few OPKs. Notably, UMG (which is closed-source, so not recompiled), and I think it affects ScummVM as well.

It gets a "illegal instruction" error, which happens within libmodplug. Here's what GDB says, with some context:

   0x77c5734c <+1256>:	lw	v0,3560(sp)
   0x77c57350 <+1260>:	addu	v1,v0,s5
   0x77c57354 <+1264>:	addiu	v0,v1,2
   0x77c57358 <+1268>:	sltu	a0,v0,s1
   0x77c5735c <+1272>:	beqz	a0,0x77c57378 <_ZN10CSoundFile6ReadITEPKhj+1300>
   0x77c57360 <+1276>:	lw	gp,24(sp)
=> 0x77c57364 <+1280>:	0x70701968
   0x77c57368 <+1284>:	sll	v1,v1,0x3
   0x77c5736c <+1288>:	addu	v1,v1,v0
   0x77c57370 <+1292>:	sltu	a0,v1,s1
   0x77c57374 <+1296>:	movz	v1,v0,a0
   0x77c57378 <+1300>:	lui	v0,0x1

The offending opcode (0x70701968) seems to be lxhu v1, v1, s0, 0.
The opcode itself is fine, I verified it with a test program.

However, as you can see v1 + s0 is not half-word aligned:

(gdb) i r
          zero       at       v0       v1       a0       a1       a2       a3
 R0   00000000 00000001 00000249 00000247 00000001 7758824f 00000000 7f904e98 
            t0       t1       t2       t3       t4       t5       t6       t7
 R8   00005174 000052ea 0000549f 00005651 00005745 00005839 00005936 00005b15 
            s0       s1       s2       s3       s4       s5       s6       s7
 R16  77588008 0007726a 0053ddd0 7f905188 7f904dc8 00000177 0000004d 00000011 
            t8       t9       k0       k1       gp       sp       s8       ra
 R24  00000010 7f904d78 77beb680 00000000 77c52890 7f904da8 00000007 77c1034c 
        status       lo       hi badvaddr    cause       pc
      00000413 0001d23e 0000022b 7758824f 00800010 77c10364 
          fcsr      fir  restart
      00800004 00330000 00000000 

What I don't know, is where the bug is.

  • GCC should only use this instruction if it knows that s0 + v1 result in an aligned read, is it the case?
  • Maybe libmodplug does an unaligned read, but then I assume it would have crashed without this patch as well (as unaligned accesses are done with LWL/LWR on MIPS)

@SiarheiVolkau
Copy link
Author

It might be because mips32r2 has unaligned access which spread to LX commands too, although seems like LX still have to be aligned. Will check how to keep alignment.

Try to compile it as mips32r1, as a workaround, just to make sure it is.

@mthuurne
Copy link
Member

It's weird though that it results in "illegal instruction" rather than "bus error", as that is the usual error you get on unaligned memory access. Is it possible the kernel-mode emulation of unaligned reads is enabled and it doesn't know how to emulate this new instruction?

@pcercuei
Copy link
Member

That's very possible, yes.

@SiarheiVolkau
Copy link
Author

@pcercuei @citral23 could you please review and check OpenDingux/linux#14
it might resolve the issue with unaligned access.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants