Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLE API devirtualization #9727

Merged
merged 26 commits into from Mar 1, 2019

Conversation

@pan-
Copy link
Member

commented Feb 14, 2019

Description

This submission replace runtime polymorphism by compile time polymorphism in BLE main APIs, generic and pal layer.

Removing virtual functions allows the linker to remove functions that are not used by application code. Here are some numbers for the BLE layer on NRF52840 compiled with GCC in release mode.

Application Original Devirtualized Gain
BLE_BatteryLevel 106447 78307 28140
BLE_Beacon 105975 75369 30606
BLE_Button 106443 78303 28140
BLE_GAP 107112 79456 27656
BLE_GAPButton 106673 76067 30606
BLE_GattClient 106523 81433 25090
BLE_GattServer 106089 77433 28656
BLE_HeartRate 106477 78335 28142
BLE_LED 106443 77491 28952
BLE_LEDBlinker 105800 81524 24276
BLE_PeriodicAdvertising 106794 81774 25020
BLE_Privacy 106162 83000 23162
BLE_SM 106202 83364 22838
BLE_Thermometer 106513 78373 28140
TestApplication 109151 101335 7816

Even in our test application that uses all functions defined by the BLE API, we save around 7% of code.

More importantly this construct is the foundation for efficient compile time configuration (With both optimisation, the beacon example shrink to ~30K of code).

Principle

Compile time polymorphism is achieved with the Curiously Recurring Template Pattern (CRTP). In a nutshell, every interface definition is now a class template that accepts its implementation as parameter. Interface APIs call the (hidden) implementation defined in the derived class. If the function is not defined in the derived class then it fallbacks to a stub defined in the interface.

template <class Impl>
struct Interface {
    void foo() {
        // call to the implementation of foo
        static_cast<Impl*>(this)->foo_();
    }

protected:
    // convention is to use the suffix _ to name an implementation 
    void foo_() {
        printf("foo() default implementation");
    }
};

class Implementation : public Interface<Implementation> {
protected:
    void foo_() {
        printf("foo() derived implementation");
    }
};


class Stub : public Interface<Stub> { };

Implementation imp; 
Stub stub; 

imp.foo(); // call Implementation::foo_
stub.foo(); // call Interface::foo_

Backward compatibility

User facing API prototypes are now templates defined in the namespace ble::interface:

  • ::Gap -> ble::interface::LegacyGap
  • ::ble::Gap -> ble::interface::Gap
  • ::GattClient -> ble::interface::GattClient
  • ::GattServer -> ble::interface::GattServer
  • ::SecurityManager -> ble::interface::SecurityManager

It is expected that an implementation of ble forward the final implementation types in a header named BleImplementationForward.h and in the namespace ble::impl:

  • ble::impl::LegacyGap -> ble::interface::LegacyGap<LegacyGapImpl>
  • ble::impl::Gap -> ble::interface::Gap<GapImpl>
  • ble::impl::GattClient -> ble::interface::GattClient<GattClientImpl>
  • ble::impl::GattServer -> ble::interface::GattServer<GattServerImpl>
  • ble::impl::SecurityManager -> ble::interface::SecurityManager<SecurityManagerImpl>

These final types are then imported into the global namespace to expose types similar to the one we've been using for years. Note that to not leak information from the implementation class it is important to export the interface and not the implementation:

namespace ble {
namespace impl {

// Correct
typedef ble::interface::SecurityManager<SecurityManagerImpl> SecurityManager;

// Incorrect
//typedef SecurityManagerImpl SecurityManager;

} // impl  
} // ble 
Instantiation

To prevent expensive instantiation of template types (and horrible inclusion pattern for the generic layer ... ), function body of the template interfaces are not defined in headers files. Instead these are defined in .tpp files. It is expected from an implementation to instantiate the types it export in a cpp file:

#include "ble/GattClient.h"
#include "source/GattClient.tpp"
#include "GattClientImplementation.h" // include vendor implementation. 

// instantiate GattClient 
template class ble::interface::GattClient<GattClientImplementation>;
Out of scope

The following items would improve this submission but are out of scope for now:

  • Devirtualization of ::BLEInstanceBase.
  • Extract non dependent types in interfaces.
TODO
  • Squash history

Pull request type

[ ] Fix
[X] Refactor
[ ] Target update
[ ] Functionality change
[ ] Docs update
[ ] Test update
[ ] Breaking change

Reviewers

@kjbracey-arm @paul-szczepanek-arm @donatieng

@pan-

This comment has been minimized.

Copy link
Member Author

commented Feb 14, 2019

@mattbrown015 You might be interested by this.

@ciarmcom ciarmcom requested review from donatieng, kjbracey-arm, paul-szczepanek-arm and ARMmbed/mbed-os-maintainers Feb 14, 2019

@ciarmcom

This comment has been minimized.

Copy link
Member

commented Feb 14, 2019

@bridadan
Copy link
Contributor

left a comment

Boy, that was an easy tools review!

Be sure to add this to the list of supported file types in the docs (you can follow my similar pr here: ARMmbed/mbed-os-5-docs#952).

@@ -413,6 +413,7 @@ def add_directory(
".h": FileType.HEADER,
".hh": FileType.HEADER,
".hpp": FileType.HEADER,
".tpp": FileType.HEADER,

This comment has been minimized.

Copy link
@cmonr

cmonr Feb 15, 2019

Contributor

I remember a day when header files ended with a .h, and we liked it.

This comment has been minimized.

Copy link
@pan-

pan- Feb 15, 2019

Author Member

Do you ? Headers of the C++ library, a 35 years old language, don't have extensions. In C, X macros and friends have been traditionally put in files that don't end with .h too.
If the .h extension as not been standardised at the language level it is because other extensions predate standardisation and these use cases are legitimate.

This comment has been minimized.

Copy link
@kjbracey-arm

kjbracey-arm Feb 15, 2019

Contributor

Counter-pedant alert, and primed for counter-counter-pedantry.

Technically, the C++ standard doesn't say anything about file extensions. The standard headers are included via #include <atomic> or whatever in the C++ source, but they could perfectly well be stored on disc as atomic.hh or iso_atomic.hpp or just built in to the compiler. Mechanism could be totally separate to whatever the #include "xxx" does for user headers. (Similarly #include "myheader.h" might not actually find the file myheader.h. On RISC OS it would find h.myheader, with . being the directory separator).

I actually find it a bit annoying that vendors chose to drop suffixes from the C++ std filenames. Having no extension at all is quite unhelpful for most other tools.

Anyway, no strong feelings about the separate prefix here. I also prefer to have special prefixes for things that you only want #included with proper care, but it's up to tools folks to object. This one's a bit borderline, because nothing would actually break if randomly included, right? Would just slow down the build.

This comment has been minimized.

Copy link
@0xc0170

0xc0170 Feb 15, 2019

Member

@pan- There's PR to add .inc files, wouldnt this extension cover the use case for .tpp ?

This comment has been minimized.

Copy link
@pan-

pan- Feb 15, 2019

Author Member

This one's a bit borderline, because nothing would actually break if randomly included, right?

@kjbracey-arm There's no header guards .tpp files at the moment. With header guards nothing would break.

@0xc0170 I choose tpp as an extension because it is more descriptive than h or hpp (and avoid clashes with actual header files). Any extension can be use. What is expected in inc files ?

This comment has been minimized.

Copy link
@0xc0170

0xc0170 Feb 15, 2019

Member

I don't think we documented this - only extensions and purpose, without "what should be there".

inc are used currently for some autogenerated headers in TFM code.

It might be better if this extension addition is sent via separate PR where we can discuss this and alos update docs (see #9715)

This comment has been minimized.

Copy link
@bridadan

bridadan Feb 15, 2019

Contributor

From a tools perspective, I have no preference on the extensions. We can adapt to what you need.

The .inc extension is being added because the TF-M work is importing an external project into Mbed OS and the external project use .inc files. So we're simplifying the import process for that team.

This comment has been minimized.

Copy link
@sg-

sg- Feb 27, 2019

Member

Does *.tpp work with other builders ie: MDK, DS-5, IAR, etc... or will this break our exporters? How is wrapping this in pragma once or ifdef... not good enough to keep the code base consistent? Apologies if this was already considered, the thread is quite long and I didn't read it all.

This comment has been minimized.

Copy link
@pan-

pan- Feb 27, 2019

Author Member

I haven't tried for all exporters (but I can!). In this case, *.tpp files embed template class implementations and these are instantiated only once in a cpp file dedicated to the instantiation of the implementation type that contains all the required implementations.

These files can't be named .cpp as these are not compilation unit and cannot used the sole suffix .h as there already is a header containing the declaration so we would end up with two headers with the same name.

Suffix for template implementation really is just a convention as long as it respects the two requirements mentioned above. I started out with the suffix .impl.h and am happy to change the suffix to whatever is decided.


Note that it would be possible to include everything in header files (I tried) at the expense of weird include patterns, longer compilation time and random breakage requiring reordering if things get moved around. So instead I tried to keep template implementation files consistent with the way we write source implementation files (.cpp) beside the name.

This comment has been minimized.

Copy link
@sg-

sg- Mar 1, 2019

Member

Thanks @pan- it would be good to confirm the exporters are working as well as extend the tests for exporters to exercise the file type support to match what the os is capable of building. @bridadan

typedef generic::GenericSecurityManager<
pal::vendor::cordio::CordioSecurityManager,
vendor::cordio::SigningEventMonitor
> GenericSecurityManagerImpl;

This comment has been minimized.

Copy link
@paul-szczepanek-arm

paul-szczepanek-arm Feb 15, 2019

Member

why both namespace "impl" and "Impl" suffix

This comment has been minimized.

Copy link
@pan-

pan- Feb 15, 2019

Author Member

I've added some information in the description; basically there is implementation classes and implementation interfaces. In that case GenericSecurityManagerImpl is the implementation class. The interface imported in the global namespace will be ble::interface::SecurityManager<GenericSecurityManagerImpl>. We don't want to import GenericSecurityManagerImpl as it requires visibility of GenericSecurityManagerImpl and may export thing differently than ble::interface::SecurityManager (visibility of function can change as an example).

@paul-szczepanek-arm

This comment has been minimized.

Copy link
Member

commented Feb 15, 2019

I can tell that the next request will be for release notes and SPDX identifiers

SecurityManager &sm = createBLEInstance()->getSecurityManager();
ble_error_t status = sm.getLinkEncryption(connection_handle, &encryption);
if (status == BLE_ERROR_NONE &&
(encryption == link_encryption_t::ENCRYPTED ||
encryption == link_encryption_t::ENCRYPTED_WITH_MITM ||
encryption == link_encryption_t::ENCRYPTED_WITH_SC_AND_MITM)
) {
cmd = GattClient::GATT_OP_WRITE_CMD;
cmd = Base::GATT_OP_WRITE_CMD;

This comment has been minimized.

Copy link
@paul-szczepanek-arm

paul-szczepanek-arm Feb 15, 2019

Member

someone still hasn't configured their new IDE ;)

bool bondable,
bool mitm,
SecurityIOCapabilities_t iocaps,
const Passkey_t passkey,
const uint8_t* passkey,

This comment has been minimized.

Copy link
@paul-szczepanek-arm
@kjbracey-arm
Copy link
Contributor

left a comment

Approve of the general concept. Makes sense as an optimisation on the assumption that you want to optimise for the "1 BLE implementation in the system" case.

Would presumably be a pessimisation with two implementations, but anyone putting two implementations in one image in these sorts of systems probably needs to rethink their ROM-saving approach. (Or am I wrong?)

Here are some numbers for the BLE layer on NRF52840 compiled with GCC in release mode.

This is a great chance to do some performance comparison - I'd love to see the same numbers for all the other toolchains. I believe GCC (and ARMC6?) stand to gain the most by this, as they can't eliminate unused virtuals. Presumably IAR and ARMC5 gain much less, as they should be eliminating unused virtuals.

I believe the same CRTP approach may be worthwhile on a smaller scale for a bunch of our HAL driver classes, with their virtual void lock() functions. Probably not many images are trying to simultaneously use SPI and UnlockedSPI : public SPI, and they certainly don't need run-time polymorphism.

(Although some key things do need it, like FileHandle or Socket, so we should still be looking at how to get compilers to optimise virtuals when actually required - eg spotting that UARTSerial is the only FileHandle in the build as LTO?).

@pan-

This comment has been minimized.

Copy link
Member Author

commented Feb 15, 2019

Would presumably be a pessimisation with two implementations, but anyone putting two implementations in one image in these sorts of systems probably needs to rethink their ROM-saving approach. (Or am I wrong?)

If the two ports are not using the generic layer (and pal layer) then there is virtually no pessimisation as BLE interfaces are (almost) pure interfaces. If the two ports use the pal layer then it is possible to reuse the generic layer with a virtual pal without changing existing code:

// note: simplified for demonstration purposes 
template<class Pal>
struct GenericGattClient : ble::interface::GattClient<GenericGattClient<Pal> > { 
    GenericGattClient(Pal* pal)  { }
};  

template<class Impl> 
struct PalGattClient { 
    void foo() { 
         static_cast<Impl*>(this)->foo_();
    }
};

struct PalGattClientFoo : PalGattClient<PalGattClientFoo> { 
    void foo_() { }
};

struct PalGattClientBar : PalGattClient<PalGattClientBar> { 
    void foo_() { }
};

struct RuntimePalGattClient : PalGattClient<RuntimePalGattClient> { 
    virtual void foo_() = 0;
};

template<class Impl>
struct RuntimePalClientBridge : RuntimePalGattClient {
    RuntimePalClientBridge(Impl* impl) : _impl(impl) { }

    virtual void foo_() { 
        _impl->foo_();
    }

private:
    Impl* _impl;
};

namespace ble { 
namespace impl { 

typedef GenericGattClient<RuntimePalGattClient> GenericGattClientImpl;
typedef ble::interface::GattClient<GenericGattClientImpl> GattClient;

}
}

typedef ble::impl::GattClient GattClient;


class BleInstanceBaseFoo { 

GattClient &client() { 
    static PalGattClientFoo pal;
    static RuntimePalClientBridge<PalGattClientFoo> bridge(&pal);
    static ble::impl::GenericGattClientImpl generic(&bridge);
    return generic;
}

};

class BleInstanceBaseBar { 

GattClient &client() { 
    static PalGattClientBar pal;
    static RuntimePalClientBridge<PalGattClientBar> bridge(&pal);
    static ble::impl::GenericGattClientImpl generic(&bridge);
    return generic;
}

};

Note that in practice, two BLE radio are faaaar from being common (Unlike WiFi, I never seen any in real world product for BLE) and if you can afford two identical radio you can usually afford a device with a beefier flash.

Presumably IAR and ARMC5 gain much less, as they should be eliminating unused virtuals.

I haven't tried yet (but I will shortly 😄 ), however in my first try the implementation wasn't instantiated separately and the average gain from full inlining was ~4K. Effect of LTO where also reduced to noise (~1% gain).

@pan-

This comment has been minimized.

Copy link
Member Author

commented Feb 15, 2019

@kjbracey-arm So I made some benchmark with the various compiler we use. Note that unlike the previous numbers exposed I removed the cordio stack as it is a fixed cost and unrelated to this PR.

Compiler Original (no stack) Devirtualized (no stack) Gain
GCC 55447 27307 28140
ARMC5 28926 25931 2995
IAR 28814 26347 2467
ARMC6 57897 28332 29565
IAR8 27510 25099 2411

There's several things to notice in this number:

  • Gains with ARMC5 and IAR are small but still about 10%. ARMC5 and IAR linker are able to rewrite vtables and remove unused virtual functions.
  • ARMC6 and GCC benefits the most from this optimisation: indeed their linker is not able to rewrite vtable and remove unused virtual functions.
  • The optimisation stabilise code size across compilers.
@0xc0170

This comment has been minimized.

Copy link
Member

commented Feb 18, 2019

@pan- This PR is targeting 5.12?

@pan-

This comment has been minimized.

Copy link
Member Author

commented Feb 18, 2019

@0xc0170 I think that would be the best.

@0xc0170

This comment has been minimized.

Copy link
Member

commented Feb 20, 2019

This needs a rebase now

@0xc0170 0xc0170 added needs: work and removed needs: review labels Feb 20, 2019

@pan-

This comment has been minimized.

Copy link
Member Author

commented Feb 20, 2019

@0xc0170 I believe it will get (partially) squashed as well.

@cmonr cmonr added the risk: A label Feb 25, 2019

@pan- pan- force-pushed the pan-:optimize-size branch Feb 25, 2019

@pan-

This comment has been minimized.

Copy link
Member Author

commented Feb 25, 2019

@0xc0170 I made a rebase and cleaned up the history as well.

@bulislaw

This comment has been minimized.

Copy link
Member

commented Feb 26, 2019

This PR is at risk of missing 5.12 release as it's marked as "needs: work". Code freeze is coming! On Friday 1st. Please made necessary updates ASAP and make sure the reviewers are aligned for prompt code inspection.

@cmonr

This comment has been minimized.

Copy link
Contributor

commented Feb 26, 2019

@0xc0170 I fixed the failing test. Would it be possible to relaunch tests ?

Not until IAR8 is in.

@cmonr cmonr added needs: CI and removed needs: work labels Feb 26, 2019

@cmonr

This comment has been minimized.

Copy link
Contributor

commented Feb 26, 2019

CI started

@mbed-ci

This comment has been minimized.

Copy link

commented Feb 27, 2019

Test run: SUCCESS

Summary: 12 of 12 test jobs passed
Build number : 2
Build artifacts

@cmonr

This comment has been minimized.

Copy link
Contributor

commented Feb 27, 2019

CI passed.

Reviewers, please review:
@0xc0170 @donatieng @kjbracey-arm @paul-szczepanek-arm

@0xc0170 0xc0170 requested a review from bulislaw Feb 27, 2019

@donatieng
Copy link
Member

left a comment

Looks really good - very excited about seeing this one coming in. I think we should discuss how we could apply the same solution to other areas of the OS.

@paul-szczepanek-arm paul-szczepanek-arm referenced this pull request Feb 27, 2019

Merged

Ble conf #9790

@pan- pan- force-pushed the pan-:optimize-size branch to 73f29e7 Feb 27, 2019

@cmonr cmonr added needs: CI and removed needs: review labels Feb 28, 2019

@0xc0170

This comment has been minimized.

Copy link
Member

commented Feb 28, 2019

CI started

@mbed-ci

This comment has been minimized.

Copy link

commented Feb 28, 2019

Test run: FAILED

Summary: 1 of 13 test jobs failed
Build number : 3
Build artifacts

Failed test jobs:

  • jenkins-ci/mbed-os-ci_exporter
@alekla01

This comment has been minimized.

Copy link
Contributor

commented Feb 28, 2019

Ci restarted (iar8 exporter issue we will resolve separately).

@cmonr cmonr added ready for merge and removed needs: CI labels Feb 28, 2019

@cmonr cmonr merged commit aaf3ce4 into ARMmbed:master Mar 1, 2019

28 checks passed

continuous-integration/jenkins/pr-head This commit looks good
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
jenkins-ci/build-ARMC5 Success
Details
jenkins-ci/build-ARMC6 Success
Details
jenkins-ci/build-GCC_ARM Success
Details
jenkins-ci/build-IAR8 Success
Details
jenkins-ci/cloud-client-test Success
Details
jenkins-ci/dynamic-memory-usage RTOS ROM(+0 bytes) RAM(+0 bytes)
Details
jenkins-ci/exporter Success
Details
jenkins-ci/greentea-test Success
Details
jenkins-ci/mbed2-build-ARMC5 Success
Details
jenkins-ci/mbed2-build-ARMC6 Success
Details
jenkins-ci/mbed2-build-GCC_ARM Success
Details
jenkins-ci/mbed2-build-IAR8 Success
Details
jenkins-ci/unittests Success
Details
travis-ci/astyle Local astyle testing has passed
Details
travis-ci/docs Local docs testing has passed
Details
travis-ci/doxy-spellcheck Local doxy-spellcheck testing has passed
Details
travis-ci/events Passed, runtime is 10410 cycles
Details
travis-ci/gitattributestest Local gitattributestest testing has passed
Details
travis-ci/include_check Local include_check testing has passed
Details
travis-ci/licence_check Local licence_check testing has passed
Details
travis-ci/littlefs Passed, code size is 8408B (+0.00%)
Details
travis-ci/psa-autogen Local psa-autogen testing has passed
Details
travis-ci/tools-py2.7 Local tools-py2.7 testing has passed
Details
travis-ci/tools-py3.5 Local tools-py3.5 testing has passed
Details
travis-ci/tools-py3.6 Local tools-py3.6 testing has passed
Details
travis-ci/tools-py3.7 Local tools-py3.7 testing has passed
Details

@cmonr cmonr removed the ready for merge label Mar 1, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.