Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force inline Timer::attach() to get rid of floating-point instructions #11236

Merged

Conversation

@hugueskamba
Copy link
Contributor

commented Aug 15, 2019

Description

Calls to Timer::attach() are inlined in order not to use floating-point
library functions calls given Timer::attach_us() expects an integer
for the callback interval.

Pull request type

[ ] Fix
[X] Refactor
[ ] Target update
[ ] Functionality change
[ ] Docs update
[ ] Test update
[ ] Breaking change

Reviewers

@evedon @gpsimenos @jamesbeyond

Release Notes

@ciarmcom ciarmcom requested review from evedon, gpsimenos, jamesbeyond and ARMmbed/mbed-os-maintainers Aug 15, 2019
@ciarmcom

This comment has been minimized.

drivers/Ticker.h Outdated Show resolved Hide resolved
hal/ticker_api.h Outdated Show resolved Hide resolved
@kjbracey-arm

This comment has been minimized.

Copy link
Contributor

commented Aug 15, 2019

Personally I'm not that strongly opposed to the float API existing, but there should be an integer one. We do have attach_us, but that's clunky if you just want 1s.

I'm a bit wary about making changes to any time-related APIs at this point, because we have just shifted to C++14.

That opens the door to doing all time-related APIs better using <chrono> - the type-safe time system with automatic resolution conversion. You could redefine (or overload) the APIs to take std::duration, so users can do

 ticker->attach(1s);
 ticker->attach(40ms);

That could work now, with ARMC6, GCC and IAR, but we don't have chrono for ARMC5 yet, and I don't want to totally break it.

But if we are making a change, I'd want the change to be to that, rather than some intermediate.

But given that attach(float) is actually inline anyway, and almost always used with a constant, is it actually ever really generating actual floating point instructions in practice?

attach(0.5) should be getting inlined to attach_us(500000) anyway.

@hugueskamba

This comment has been minimized.

Copy link
Contributor Author

commented Aug 15, 2019

But given that attach(float) is actually inline anyway, and almost always used with a constant, is it actually ever really generating actual floating point instructions in practice?

attach(0.5) should be getting inlined to attach_us(500000) anyway.

Build the example project mbed-os-example-tls/benchmark with no floating point in minimal printf and have a look at the ELF file. There are floating point function calls prefixed with __aeabi_. The example project calls attach in main.cpp. With this change the floating point function calls disappear.

I'll let @evedon comment on the rest of your comment.
Thanks

@kjbracey-arm

This comment has been minimized.

Copy link
Contributor

commented Aug 15, 2019

Which toolchain(s) and profiles were you looking at for the ELF?

@hugueskamba

This comment has been minimized.

Copy link
Contributor Author

commented Aug 15, 2019

Which toolchain(s) were you looking at for the ELF?

GCC_ARM

@evedon

This comment has been minimized.

Copy link
Contributor

commented Aug 15, 2019

Personally I'm not that strongly opposed to the float API existing, but there should be an integer one. We do have attach_us, but that's clunky if you just want 1s.

We are trying to remove usage of floats In mbed-os where it is not necessary. I think that this attach float API could be removed without losing functionality so it is a good candidate.

TESTS/mbed_drivers/lp_ticker/main.cpp Outdated Show resolved Hide resolved
TESTS/mbed_drivers/ticker/main.cpp Outdated Show resolved Hide resolved
@kjbracey-arm

This comment has been minimized.

Copy link
Contributor

commented Aug 16, 2019

We are trying to remove usage of floats In mbed-os where it is not necessary.

Removing unnecessary usage is good, but I'm a bit wary about outright yanking APIs.

In this case, the problem does seem to be arising due to reluctance of the compiler to inline attach. I see both GCC and ARMC6 being a bit reluctant. That's possibly due to the complexity of the Callback bit.

But if you add MBED_FORCEINLINE to attach, then that seems to make it go away - an attach(X, 0.5) does get inlined to attach_us(X, 500000), dropping the call to the FP library.

I'd be happy with something like that as an interim measure (pending a <chrono> API extension) , given that the vast majority of such calls take constants, thus that will be sufficient to eliminate the vast majority of FP.

For other non-time-related unnecessarily-float driver APIs, like ADC, I'd be a bit less reticent to make changes - it's just that time is an overall system, and I'm not keen on seeing piecemeal changes that are not in the direction I ultimately expect it to go in (<chrono>).

@kjbracey-arm

This comment has been minimized.

Copy link
Contributor

commented Aug 16, 2019

Still, I can live with attach_s. Neither that nor the deprecation of attach(float) gets in the way of a future attach(std::duration) overload.

(But I would like to deprecate all non-chrono C++ time APIs, includingattach_s and attach_us, at that time).

But do consider whether just adding the MBED_FORCEINLINE achieves a good enough effect without any user disruption.

@hugueskamba

This comment has been minimized.

Copy link
Contributor Author

commented Aug 16, 2019

Still, I can live with attach_s. Neither that nor the deprecation of attach(float) gets in the way of a future attach(std::duration) overload.

(But I would like to deprecate all non-chrono C++ time APIs, includingattach_s and attach_us, at that time).

But do consider whether just adding the MBED_FORCEINLINE achieves a good enough effect without any user disruption.

If indeed MBED_FORCEINLINE removes the FP calls I am in favour of this as it less code changes.

@hugueskamba hugueskamba changed the title Require integer to specify callback interval with Ticker API Force inline Timer::attach() to get rid of floating-point instruction Aug 16, 2019
@hugueskamba hugueskamba changed the title Force inline Timer::attach() to get rid of floating-point instruction Force inline Timer::attach() to get rid of floating-point instructions Aug 16, 2019
@evedon
evedon approved these changes Aug 16, 2019
Copy link
Contributor

left a comment

Nice solution indeed !

@kjbracey-arm

This comment has been minimized.

Copy link
Contributor

commented Aug 16, 2019

It is possible this has a negative effect of inlining an extra call to the Callback copy operator, making it a bit bigger than just calling attach_us directly. (Which may have been the reason for the compiler choosing not to inline).

A possible alternative would be to turn it into perfect-forwarding template:

template <typename F>
void attach(F&& func, float t)
{
    attach_us(std::forward<F>(func), t * 1000000.0f);
}

(That's the C++11 for "forward any F, preserving type/lifetime info by using an appropriate reference").

That version then might not actually need a MBED_FORCEINLINE, because the generated reference-using code should be simpler than the copy of a callback. It's definitely a better way of doing it than copying a Callback (which is not a trivial operation). It should be the case that the template version produces smaller code.

Only downside is binary compatibility - I don't think you could easily preserve the original attach for any pre-existing binaries that didn't inline it - the overloading with the template would go wrong. If you cared about that you'd have to have some macro magic to make the old method only visible for Ticker.cpp or something, so it could define it but no external code would see it.

@evedon

This comment has been minimized.

Copy link
Contributor

commented Aug 16, 2019

In light of the last comment, I would rather go back to the previous solution i.e. introduce a new attach_s API and deprecate attach(float).

@kjbracey-arm

This comment has been minimized.

Copy link
Contributor

commented Aug 16, 2019

Actually, just realised there is no binary compatibility issue - this is C++ inlining, so anyone using the old version non-inlined would have generated their own common out-of-line definition in their object file, they wouldn't be relying on us giving them one.

I was thinking in C99.

@hugueskamba

This comment has been minimized.

Copy link
Contributor Author

commented Aug 16, 2019

It is possible this has a negative effect of inlining an extra call to the Callback copy operator, making it a bit bigger than just calling attach_us directly. (Which may have been the reason for the compiler choosing not to inline).

A possible alternative would be to turn it into perfect-forwarding template:

template <typename F>
void attach(F&& func, float t)
{
    attach_us(std::forward<F>(func), t * 1000000.0f);
}

(That's the C++11 for "forward any F, preserving type/lifetime info by using an appropriate reference").

That version then might not actually need a MBED_FORCEINLINE, because the generated reference-using code should be simpler than the copy of a callback. It's definitely a better way of doing it than copying a Callback (which is not a trivial operation). It should be the case that the template version produces smaller code.

Only downside is binary compatibility - I don't think you could easily preserve the original attach for any pre-existing binaries that didn't inline it - the overloading with the template would go wrong. If you cared about that you'd have to have some macro magic to make the old method only visible for Ticker.cpp or something, so it could define it but no external code would see it.

@evedon @kjbracey-arm
Given the above, I used Perfect forwarding to avoid the possible heavy callback copy. The forced inlining is still needed as the compiler chose not to inline.
7df6160

@kjbracey-arm

This comment has been minimized.

Copy link
Contributor

commented Aug 16, 2019

Did you observe a size difference between those last two versions, at least?

drivers/Ticker.h Outdated Show resolved Hide resolved
@hugueskamba hugueskamba force-pushed the hugueskamba:hk-iotcore-1315-remove-floating-point-ticker branch from 1f5b4db to 3373d78 Aug 19, 2019
@hugueskamba

This comment has been minimized.

Copy link
Contributor Author

commented Aug 19, 2019

Force pushed after rebasing from master because of intermittent CI failure.

@mbed-ci

This comment has been minimized.

Copy link

commented Aug 19, 2019

Test run: FAILED

Summary: 2 of 4 test jobs failed
Build number : 2
Build artifacts

Failed test jobs:

  • jenkins-ci/mbed-os-ci_build-IAR
  • jenkins-ci/mbed-os-ci_build-GCC_ARM
@hugueskamba

This comment has been minimized.

Copy link
Contributor Author

commented Aug 19, 2019

CI re-started

@mbed-ci

This comment has been minimized.

Copy link

commented Aug 20, 2019

Test run: SUCCESS

Summary: 11 of 11 test jobs passed
Build number : 3
Build artifacts

@0xc0170

This comment has been minimized.

Copy link
Member

commented Aug 20, 2019

Add release notes, will be merged afterwards

@kjbracey-arm

This comment has been minimized.

Copy link
Contributor

commented Aug 20, 2019

I would say this isn't actually a functionality change - it's a refactor.

@hugueskamba

This comment has been minimized.

Copy link
Contributor Author

commented Aug 20, 2019

I would say this isn't actually a functionality change - it's a refactor.

@0xc0170 Indeed it is a refactor (description updated).

@0xc0170 0xc0170 merged commit 398515a into ARMmbed:master Aug 20, 2019
25 checks passed
25 checks passed
continuous-integration/jenkins/pr-head This commit looks good
Details
jenkins-ci/build-ARM Success
Details
jenkins-ci/build-GCC_ARM Success
Details
jenkins-ci/build-IAR Success
Details
jenkins-ci/cloud-client-test Success
Details
jenkins-ci/dynamic-memory-usage RTOS ROM(+0 bytes) RAM(+0 bytes)
Details
jenkins-ci/exporter Success
Details
jenkins-ci/greentea-test Success
Details
jenkins-ci/mbed2-build-ARM Success
Details
jenkins-ci/mbed2-build-GCC_ARM Success
Details
jenkins-ci/mbed2-build-IAR Success
Details
jenkins-ci/unittests Success
Details
travis-ci/astyle Success!
Details
travis-ci/docs Success!
Details
travis-ci/doxy-spellcheck Success!
Details
travis-ci/events Success! Runtime is 8629 cycles.
Details
travis-ci/gitattributestest Success!
Details
travis-ci/include_check Success!
Details
travis-ci/licence_check Success!
Details
travis-ci/littlefs Success! Code size is 8464B.
Details
travis-ci/psa-autogen Success!
Details
travis-ci/tools-py2.7 Success!
Details
travis-ci/tools-py3.5 Success!
Details
travis-ci/tools-py3.6 Success!
Details
travis-ci/tools-py3.7 Success!
Details
@hugueskamba hugueskamba deleted the hugueskamba:hk-iotcore-1315-remove-floating-point-ticker branch Sep 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.