Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turn on MICROPY_OPT_COMPUTED_GOTO for 5x CPU-bound speedup #1934

Merged
merged 5 commits into from
Jun 12, 2019

Conversation

dhalbert
Copy link
Collaborator

Turning this on gives about a 5x speedup (!) for CPU-bound code.

I expect the first submission of this PR to fail because it won't fit on some boards. I'll adjust with further commits.

@shazz
Copy link

shazz commented Jun 12, 2019

Tested, works on the pygamer.

Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that the inline knob is breaking the code.

Here is a sample disassembly with 25:

(gdb) disassemble audioio_mixer_obj_get_sample_rate
Dump of assembler code for function audioio_mixer_obj_get_sample_rate:
   0x0000ead0 <+0>:	push	{r4, lr}
   0x0000ead2 <+2>:	movs	r4, r0
   0x0000ead4 <+4>:	ldr	r0, [r0, #4]
   0x0000ead6 <+6>:	negs	r3, r0
   0x0000ead8 <+8>:	adcs	r0, r3
   0x0000eada <+10>:	uxtb	r0, r0
   0x0000eadc <+12>:	bl	0xe364 <raise_error_if_deinited>
   0x0000eae0 <+16>:	movs	r0, #1
   0x0000eae2 <+18>:	ldr	r3, [r4, #20]
   0x0000eae4 <+20>:	lsls	r3, r3, #1
   0x0000eae6 <+22>:	orrs	r0, r3
   0x0000eae8 <+24>:	pop	{r4, pc}

With 23:

(gdb) disassemble audioio_mixer_obj_get_sample_rate
Dump of assembler code for function audioio_mixer_obj_get_sample_rate:
   0x0000e85e <+0>:	ldr	r3, [r0, #20]
   0x0000e860 <+2>:	movs	r0, #1
   0x0000e862 <+4>:	lsls	r3, r3, #1
   0x0000e864 <+6>:	orrs	r0, r3
   0x0000e866 <+8>:	bx	lr

@tannewt
Copy link
Member

tannewt commented Jun 12, 2019

I did some size improvements here: tannewt@5399f02

@dhalbert
Copy link
Collaborator Author

dhalbert commented Jun 12, 2019

That jogged my memory: I saw breakage a long time ago when I set the inline limit really low: #322. I tested it briefly with 23 but it was really just a smoke test

@dhalbert
Copy link
Collaborator Author

USB_MIDI is turned on for all builds. We could turn it off for small builds. Saves 728 bytes.

I looked at the pinyin translation in hopes of trimming it a bit. It's not that verbose in terms of character count: it's the double-byte characters that really push it up.

@dhalbert
Copy link
Collaborator Author

Using regular -Os instead of -O3 (SUPEROPT) for gc.c saves 2144 bytes.

Using regular -Os instead of -O3 (SUPEROPT) for vm.c only saves 524 bytes.

My inclination is to slow down gc for CIRCUITPY_SMALL_BUILD and/or for the overflowing builds, as you suggested. Then we can dial way back towards normal on CFLAGS_INLINE_LIMIT. I'll work on setting up compilation flags for this.

Copy link
Member

@tannewt tannewt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thank you so much!

@tannewt tannewt merged commit a06a97e into adafruit:master Jun 12, 2019
@dhalbert dhalbert deleted the vm-computed-goto branch June 12, 2019 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants