Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCC: remove `-fno-builtin` option #10322

Merged
merged 1 commit into from Apr 26, 2019

Conversation

Projects
None yet
10 participants
@kjbracey-arm
Copy link
Contributor

commented Apr 5, 2019

Description

Since the year dot GCC has been passed the -fno-builtin option, which eliminates all compiler knowledge of the C library, even down to basic stuff like memcpy or memset, potentially inhibiting quite a lot of
optimisations.

Remove the option to re-enable the optimisations.

There is no record in the source as to why the option is present - maybe we'll find out by trying to remove it. If necessary, it could be selectively turned back on for particular functions.

My attention was drawn to this when I saw the horrible compiler output for small structure atomics in #10274 - that really wants small memcpy to be easy.

Pull request type

[ ] Fix
[X] Refactor
[ ] Target update
[ ] Functionality change
[ ] Docs update
[ ] Test update
[ ] Breaking change
GCC: remove `-fno-builtin` option
Since the year dot GCC has been passed the `-fno-builtin` option, which
eliminates all compiler knowledge of the C library, even down to basic
stuff like `memcpy` or `memset`, potentially inhibiting quite a lot of
optimisations.

Remove the option to re-enable the optimisations.

There is no record in the source as to why the option is present - maybe
we'll find out by trying to remove it. If necessary, it could be
selectively turned back on for particular functions.

@kjbracey-arm kjbracey-arm force-pushed the kjbracey-arm:gcc_builtins branch from d74d668 to 3caa480 Apr 5, 2019

@ciarmcom

This comment has been minimized.

Copy link
Member

commented Apr 5, 2019

@kjbracey-arm, thank you for your changes.
@ARMmbed/mbed-os-tools @ARMmbed/mbed-os-maintainers please review.

@ciarmcom ciarmcom requested review from ARMmbed/mbed-os-maintainers Apr 5, 2019

@bridadan

This comment has been minimized.

Copy link
Contributor

commented Apr 5, 2019

I'll be honest, I don't really know the consequences of removing this (though I understood what you wrote @kjbracey-arm 😄). I think running this by @geky or @theotherjimmy would be worthwhile.

@TeroJaasko

This comment has been minimized.

Copy link
Contributor

commented Apr 10, 2019

On some codebases, where a memcpy() length parameter are known in build time, this saves quite a bit:

Source:

    if (data->available.pmin) {
        memcpy(&parameters->pmin, obs_data, sizeof(uint32_t));
        obs_data += sizeof(uint32_t);
    }
    if (data->available.pmax) {
        memcpy(&parameters->pmax, obs_data, sizeof(uint32_t));
        obs_data += sizeof(uint32_t);
    }
    if (data->available.gt) {
        memcpy(&parameters->gt, obs_data, sizeof(float));
        obs_data += sizeof(float);
    }

GCC, with "-fno-builtin"

  4c:	782b      	ldrb	r3, [r5, #0]
  4e:	0799      	lsls	r1, r3, #30
  50:	d505      	bpl.n	5e <read_obs_data+0x5e>
  52:	2204      	movs	r2, #4
  54:	4621      	mov	r1, r4
  56:	18b0      	adds	r0, r6, r2
  58:	f7ff fffe 	bl	0 <memcpy>
  5c:	3404      	adds	r4, #4
  5e:	782b      	ldrb	r3, [r5, #0]
  60:	075a      	lsls	r2, r3, #29
  62:	d506      	bpl.n	72 <read_obs_data+0x72>
  64:	4621      	mov	r1, r4
  66:	2204      	movs	r2, #4
  68:	f106 0008 	add.w	r0, r6, #8
  6c:	f7ff fffe 	bl	0 <memcpy>
  70:	3404      	adds	r4, #4
  72:	782b      	ldrb	r3, [r5, #0]
  74:	071b      	lsls	r3, r3, #28
  76:	d506      	bpl.n	86 <read_obs_data+0x86>
  78:	4621      	mov	r1, r4
  7a:	2204      	movs	r2, #4
  7c:	f106 000c 	add.w	r0, r6, #12
  80:	f7ff fffe 	bl	0 <memcpy>

vs

GCC, without "-fno-builtin"

  3e:	780a      	ldrb	r2, [r1, #0]
  40:	bf4c      	ite	mi
  42:	1d8b      	addmi	r3, r1, #6
  44:	1c8b      	addpl	r3, r1, #2
  46:	0792      	lsls	r2, r2, #30
  48:	bf44      	itt	mi
  4a:	f853 2b04 	ldrmi.w	r2, [r3], #4
  4e:	6042      	strmi	r2, [r0, #4]
  50:	780a      	ldrb	r2, [r1, #0]
  52:	0752      	lsls	r2, r2, #29
  54:	bf44      	itt	mi
  56:	f853 2b04 	ldrmi.w	r2, [r3], #4
  5a:	6082      	strmi	r2, [r0, #8]
  5c:	780a      	ldrb	r2, [r1, #0]
  5e:	0712      	lsls	r2, r2, #28
  60:	bf44      	itt	mi
  62:	f853 2b04 	ldrmi.w	r2, [r3], #4
  66:	60c2      	strmi	r2, [r0, #12]
@kjbracey-arm

This comment has been minimized.

Copy link
Contributor Author

commented Apr 10, 2019

I have seen the occasional pathological case, eg a 120(?)-byte memcpy of the error context in mbed_error ending up inlined as a dozen LDM/STM pairs. Perhaps it needs a bit more tuning for -Os.

In one case I was amused by the fact that GCC doesn't seem to think too hard about 16-bit versus 32-bit instructions, so its inline copy was something like

xxxx        LDM src!,{regs}    
xxxx        STM dst!,{regs}
       .. repeat multiple times
xxxx        LDM src!,{regs}    
xxxx        STM dst!,{regs}
xxxxxxxx    LDM src,{regs}
xxxxxxxx    STM dst,{regs}

With the last multiple load/store omitting the ! which writes back the incremented pointer - cos it doesn't need to increment any more, right? Except that omitting the ! makes the instruction a complicated 32-bit Thumb-2 one - 4 bytes wasted.

@kjbracey-arm

This comment has been minimized.

Copy link
Contributor Author

commented Apr 10, 2019

I guess omitting the '!' is a potentially benefit because it frees up a register - you could use the "src" register for data in that last transfer. To some extent you want the compiler and/or assembler to have a "don't care" at various stages for things like "update register" or "update flags", allowing it to maybe not decide until it matters, so that if you ended up not using the flags or extra register, the final generation could choose whichever was the smaller form.

Sadly no assembler has that on final output. While doing the assembler atomics I was annoyed I couldn't write ADD{S} %[newVal],%[oldVal],%[arg] with the S optional so it can choose the best encoding depending on registers and argument, on the basis that I don't care whether or not it sets the flags. I had to commit to either not putting the S on which would rule out 16-bit instructions, or putting the S on which rules out the 12-bit constant 32-bit ones.

@0xc0170 0xc0170 requested a review from ARMmbed/mbed-os-core Apr 11, 2019

@0xc0170

This comment has been minimized.

Copy link
Member

commented Apr 11, 2019

I recall this flag has been there always, no questions about its presence. Reading the docs, I would remove it as well. Looks fine to me

@ARMmbed/mbed-os-core Please review

@deepikabhavnani

This comment has been minimized.

Copy link
Contributor

commented Apr 11, 2019

  1. Should we keep -fno-builtin in debug profile? As per GCC docs, built-in code will be smaller and faster, but all builtin functions will replaced and not present so we cannot debug or set breakpoint in case of issues.

  2. I am not 100% sure of this flag's impact-on/relation-to re-target layer changes. List of builtin function is here: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#Other-Builtins. It will be good to cross check once

@kjbracey-arm

This comment has been minimized.

Copy link
Contributor Author

commented Apr 16, 2019

  1. Should we keep -fno-builtin in debug profile? As per GCC docs, built-in code will be smaller and faster, but all builtin functions will replaced and not present so we cannot debug or set breakpoint in case of issues.

Maybe? Personally, I've always assumed that the C library code is effectively inaccessible, so don't rely on breakpointing it. There's never any debugging info anyway for the C library anyway, so it's not easy to even look at parameters, and you definitely can't step through the C library at source level.

  1. I am not 100% sure of this flag's impact-on/relation-to re-target layer changes. List of builtin function is here: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#Other-Builtins. It will be good to cross check once

I've had a look, and I believe we're clear. The "built-in" stuff in the compiler is either (a) knowledge about what the function does, so it can infer things about the arguments it sees us passing, or (b) knowledge about how to do the function itself without help.

All the retargetting stuff is things that the compiler inherently can't do itself - we're the OS support, so it can't be having them as builtins.

There is a separate consideration about "wrapper" functions for things like malloc, but that's handled by separate options.

@cmonr cmonr added needs: CI and removed needs: review labels Apr 16, 2019

@mbed-ci

This comment has been minimized.

Copy link

commented Apr 17, 2019

Test run: FAILED

Summary: 3 of 7 test jobs failed
Build number : 1
Build artifacts

Failed test jobs:

  • jenkins-ci/mbed-os-ci_build-GCC_ARM
  • jenkins-ci/mbed-os-ci_mbed2-build-IAR
  • jenkins-ci/mbed-os-ci_build-IAR
@0xc0170

This comment has been minimized.

Copy link
Member

commented Apr 17, 2019

Aborted build, will restart after 5.12.2 jobs

@bulislaw
Copy link
Member

left a comment

Does anyone know why the memory diff check is not run? I'd like to confirm the "theory"

@adbridge

This comment has been minimized.

Copy link
Contributor

commented Apr 25, 2019

ci started

@mbed-ci

This comment has been minimized.

Copy link

commented Apr 25, 2019

Test run: SUCCESS

Summary: 11 of 11 test jobs passed
Build number : 2
Build artifacts

@adbridge adbridge added ready for merge and removed needs: CI labels Apr 26, 2019

@adbridge adbridge merged commit 1cd709f into ARMmbed:master Apr 26, 2019

26 checks passed

continuous-integration/jenkins/pr-head This commit looks good
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
jenkins-ci/build-ARM Success
Details
jenkins-ci/build-GCC_ARM Success
Details
jenkins-ci/build-IAR Success
Details
jenkins-ci/cloud-client-test Success
Details
jenkins-ci/dynamic-memory-usage Success
Details
jenkins-ci/exporter Success
Details
jenkins-ci/greentea-test Success
Details
jenkins-ci/mbed2-build-ARM Success
Details
jenkins-ci/mbed2-build-GCC_ARM Success
Details
jenkins-ci/mbed2-build-IAR Success
Details
jenkins-ci/unittests Success
Details
travis-ci/astyle Local astyle testing has passed
Details
travis-ci/docs Local docs testing has passed
Details
travis-ci/doxy-spellcheck Local doxy-spellcheck testing has passed
Details
travis-ci/events Passed, runtime is 9333 cycles (+164 cycles)
Details
travis-ci/gitattributestest Local gitattributestest testing has passed
Details
travis-ci/include_check Local include_check testing has passed
Details
travis-ci/licence_check Local licence_check testing has passed
Details
travis-ci/littlefs Passed, code size is 8408B (+0.00%)
Details
travis-ci/psa-autogen Local psa-autogen testing has passed
Details
travis-ci/tools-py2.7 Local tools-py2.7 testing has passed
Details
travis-ci/tools-py3.5 Local tools-py3.5 testing has passed
Details
travis-ci/tools-py3.6 Local tools-py3.6 testing has passed
Details
travis-ci/tools-py3.7 Local tools-py3.7 testing has passed
Details

@kjbracey-arm kjbracey-arm deleted the kjbracey-arm:gcc_builtins branch Jul 16, 2019

@kjbracey-arm kjbracey-arm referenced this pull request Jul 16, 2019

Open

Minimal printf #11051

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.